Communication and Intelligent Systems: Proceedings of ICCIS 2022, Volume 1 (Lecture Notes in Networks and Systems, 686) 981992099X, 9789819920990

This book gathers selected research papers presented at the Fourth International Conference on Communication and Intelli

112 3 17MB

English Pages 751 [722] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Communication and Intelligent Systems: Proceedings of ICCIS 2022, Volume 2 (Lecture Notes in Networks and Systems, 689) 9819923212, 9789819923212

This book gathers selected research papers presented at the Fourth International Conference on Communication and Intelli

116 66 18MB Read more

Communication and Intelligent Systems: Proceedings of ICCIS 2019 (Lecture Notes in Networks and Systems, 120) 9811533245, 9789811533242

This book gathers selected research papers presented at the International Conference on Communication and Intelligent Sy

116 55 20MB Read more

Communication and Intelligent Systems: Proceedings of ICCIS 2021 (Lecture Notes in Networks and Systems, 461) 9811921296, 9789811921292

This book gathers selected research papers presented at the Third International Conference on Communication and Intellig

128 71 33MB Read more

Intelligent Systems and Applications: Proceedings of the 2021 Intelligent Systems Conference (IntelliSys) Volume 1: 294 (Lecture Notes in Networks and Systems) [1st ed. 2022] 3030821927, 9783030821920

This book presents Proceedings of the 2021 Intelligent Systems Conference which is a remarkable collection of chapters c

1,324 114 98MB Read more

Intelligent Sustainable Systems: Proceedings of ICISS 2021 (Lecture Notes in Networks and Systems, 213) [1st ed. 2022] 9811624216, 9789811624216

This book features research papers presented at the 4th International Conference on Intelligent Sustainable Systems (ICI

2,415 153 25MB Read more

Intelligent Systems and Applications: Proceedings of the 2023 Intelligent Systems Conference (IntelliSys) Volume 3 (Lecture Notes in Networks and Systems, 824) [1st ed. 2024] 3031477146, 9783031477140

The book is a unique collection of studies involving intelligent systems and applications of artificial intelligence in

119 103 85MB Read more

International Conference on Advanced Intelligent Systems for Sustainable Development: Volume 1 - Advanced Intelligent Systems on Artificial ... (Lecture Notes in Networks and Systems, 637) 9783031263842, 9783031263835, 3031263839

This book describes the potential contributions of emerging technologies in different fields as well as the opportunitie

236 116 118MB Read more

Intelligent Computing: Proceedings of the 2021 Computing Conference, Volume 1 (Lecture Notes in Networks and Systems, 283) [1st ed. 2022] 3030801187, 9783030801182

This book is a comprehensive collection of chapters focusing on the core areas of computing and their further applicatio

1,043 149 26MB Read more

Data Analytics and Learning: Proceedings of DAL 2022 (Lecture Notes in Networks and Systems, 779) 9819963451, 9789819963454

This book presents new theories and working models in the areas of data analytics and learning. The papers included in t

111 78 9MB Read more

Data Science and Security: Proceedings of IDSCS 2022 (Lecture Notes in Networks and Systems, 462) 9811922101, 9789811922107

This book presents best selected papers presented at the International Conference on Data Science for Computational Secu

107 70 13MB Read more

Communication and Intelligent Systems: Proceedings of ICCIS 2022, Volume 1 (Lecture Notes in Networks and Systems, 686)
981992099X, 9789819920990

Author / Uploaded
Harish Sharma (editor)
Vivek Shrivastava (editor)
Kusum Kumari Bharti (editor)
Lipo Wang (editor)

Table of contents :
Preface
Contents
Editors and Contributors
Network Coverage and Event Detection in Mobile Sensor Networks
1 Introduction
2 System Models
2.1 Sensing Coverage
2.2 Probabilistic Sensing Models
2.3 Event Detection Probability
3 Results and Discussion
4 Conclusion
References
Attention Guided Human Fall Detection for Elderly Patient Monitoring
1 Introduction
2 Related Work
3 Proposed Method
3.1 Conv2D: Spatial Feature Extraction
3.2 Attention Mechanism
3.3 ConvLSTM2D: Temporal Feature Extraction
4 Experiments
4.1 Dataset
4.2 Preprocessing
4.3 Implementation Details
4.4 Results
5 Conclusion
References
SDN-Enabled IoT to Combat the DDoS Attacks
1 Introduction
1.1 Problem Definition
1.2 Paper Organization
2 Background
3 Proposed Approach
3.1 Result Analysis
4 Conclusion
References
Analysis of Existing Datasets of Household Objects for AI-Enabled Techniques
1 Introduction
2 Datasets for Household Objects Classified on the Basis of Application
2.1 Annotated Image Dataset of Household Objects from the RoboFEI@Home Team ch4techirobofei
2.2 My Nursing Home ch4ismail2020mynursinghome
2.3 The Open Images Dataset V4 ch4kuznetsova2020open
2.4 Office-Home Dataset ch4venkateswara2017deep
2.5 ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition ch4massiceti2021orbit
2.6 Household Objects for Pose Estimation (HOPE) ch4lin2021fusion
2.7 CMU Kitchen Occlusion Dataset (CMU_KO8) ch4hsiao2014occlusion
2.8 CMU Grocery Dataset (CMU10_3D) ch4hsiao2010making
2.9 RoboCup@Home-OBJECTS Benchmark ch4massouh2019robocup
2.10 ADE20K Dataset ch4zhou2017scene
2.11 OSLD Dataset ch4bastan2019large
2.12 Bottles and Cups Dataset
2.13 PhoCaL Dataset ch4wang2022phocal
3 Summarized Analysis
4 Conclusion
References
In Silico Molecular Docking Study by Using Bio-informatics Database to Fabricate M-Cell Targeting Nanocarrier System for Oral Delivery of Macromolecules
1 Introduction
2 Material and Methods
2.1 Preparation of Ligands
2.2 Preparation of Protein
2.3 In Silico Docking Study
2.4 In Silico Physicochemical and Pharmacokinetic Property Evaluation
2.5 Scheme for the Synthesis of Mannosylated Chitosan and the Docking Interaction Study
3 Results and Discussion
3.1 Pharmacokinetic and Drug-Likeness Screening of Carbohydrate Ligands
3.2 Bioavailability Radar and In Silico Physiochemical Parameter Evaluation
3.3 Synthesis Scheme for the Mannosylation of Chitosan and Their Docking Interaction
4 Conclusion
References
Open-Source Datasets for Colonoscopy Polyps and Its AI-Enabled Techniques
1 Introduction
2 Open-Source Colonoscopy Polyp Datasets
2.1 Polypgen Dataset ch6ali2021polypgen
2.2 Kvasir ch6Pogorelov:2017:KMI:3083187.3083212
2.3 Kvasir SEG ch6jha2020kvasir
2.4 HyperKvasir ch6Borgli2020
2.5 CVC-ClinicDB ch6bernal2015wm
2.6 CVC-ColonDB ch6bernal2012towards
2.7 CVC-EndoScene Still ch6vazquez2017benchmark
2.8 CVC-PolypHD ch6bernal2012towards,ch6vazquez2017benchmark,ch6bernal2021polyp
2.9 ETIS-Larib ch6silva2014toward
2.10 ASU-Mayo Clinic Colonoscopy Video ch6tajbakhsh2015automated
2.11 CVC-ClinicVideoDB ch6bernal2021polyp
2.12 Piccolo ch6sanchez2020piccolo
2.13 CP-CHILD-A and CP-CHILD-B ch6wang2020improved
2.14 Colorectal Polyp Image Cohort (PIBAdb) ch6PIBAdb
2.15 CVC-300 ch6ji2021progressively
2.16 SUN Colonoscopy Dataset ch6misawa2021development
2.17 LDPolypVideo Database ch6ma2021ldpolypvideo
2.18 BKAI-IGH NeoPolyp Database ch6an2022blazeneo,ch6ngoc2021neounet
2.19 Endotest ch6fitting2022video
2.20 GLRC ch6mesejo2016computer
2.21 Colonoscopic Dataset ch6li2021colonoscopy
3 Brief Comparative Analysis and Research Gaps
3.1 Raw Colonoscopy Data Available on Internet
4 Conclusion
References
Cloud-Based House Price Predictor App Using Machine Learning
1 Introduction
2 Literature Review
3 Dataset and Its Exploratory Data Analysis
3.1 Correlation Matrix
4 Methodologies
4.1 Linear Regression (LR)
4.2 Decision Tree (DT)
4.3 Random Forest (RF)
4.4 Extra Tree Regressor (ETR)
4.5 Extreme Gradient Boosting (XGBoost)
5 Results and Accuracy Comparison of Models
5.1 Outlook of the App
6 Conclusion
References
Compendium of Qubit Technologies in Quantum Computing
1 Introduction
2 Deep Dive into Qubits
2.1 Defining Qubits: Information Processing Perspective
3 Qubit Candidates
3.1 Trapped Ions
3.2 Superconducting Qubits
3.3 Photonic Quantum Computing
3.4 Semiconductor Qubits
3.5 Diamond-Based Quantum Computing
3.6 Topological Quantum Computing
3.7 NMR Quantum Computing
4 Conclusion
References
A Novel Post-quantum Piekert's Reconciliation-Based Forward Secure Authentication Key Agreement for Mobile Devices
1 Introduction
2 Research Contributions
3 Preliminaries
3.1 Rounding and Reconciliation
3.2 Error Distribution ch9peikert2014lattice
3.3 Ring Learning with Errors ch9peikert2014lattice
4 Proposed Post-quantum Secure Authenticated Key Exchange
4.1 Setup Phase
4.2 Registration Phase
4.3 Login and Authentication Phase
5 Security Analysis
6 Concluding Remarks
References
An Overview of IoT and Smart Application Environments: Research and Challenges
1 Introduction
2 Literature Review on the Critical Review of IoT
3 Smart Environments
3.1 Smart Homes
3.2 Smart Health
3.3 Industry Automation and Agriculture
3.4 Smart City
3.5 Smart Vehicle
3.6 Smart Grid
4 Challenges and Key Issues in IoT Implementation
5 Conclusion and Future Scope
References
Human Detection and Tracking Based on YOLOv3 and DeepSORT
1 Introduction
2 Related Works
3 Proposed Architecture
3.1 Phase 1—Object Detection
3.2 Phase 2—Object Tracking
4 Experiments and Results
5 Conclusions and Future Scope
References
Smart City: Road Traffic Monitoring System Based on the Integration of IoT and ML
1 Introduction
2 Methodology
3 IoT
3.1 Role of IoT in Smart Road Traffic Monitoring
4 Machine Learning
4.1 Role of ML in Smart Road Traffic Monitoring
5 IoT and ML in Smart Road Traffic Monitoring
6 Conclusion
References
Weed Detection in Crops Using Lightweight EfficientNets
1 Introduction
1.1 Background
1.2 Motivation and Contribution
2 Related Work
3 Proposed Method
3.1 Data Preprocessing
3.2 Models Selection and Training
3.3 Domain Specific Learning
4 Experimental Results and Analysis
4.1 Implementation Setup
4.2 Results and Discussion
4.3 Observations and Remarks
4.4 Comparison with Existing Approaches
5 Conclusion
References
Meta-heuristics for the Single-Channel PMU Placement Problem Considering Zero-Injection-Buses
1 Introduction
2 Related Work
2.1 Observability Rules Considering ZIBs
3 Heuristics
3.1 Greedy Heuristic 1
3.2 Greedy Heuristic 2
3.3 Particle Swarm Optimization
3.4 Genetic Algorithm
4 Results
5 Conclusions and Future Work
References
Comparative Analysis of Different Machine Learning Approaches for Sentiment Analysis
1 Introduction
1.1 Sentiment
1.2 Machine Learning
2 Methodology
2.1 Machine Learning-Based Sentiment Analysis
2.2 Lexicon-Based Sentiment Analysis
2.3 Hybrid Approach
3 Related Work
4 Comparative Analysis
5 Discussion and Analysis
6 Conclusion and Future Work
References
A Comprehensive Investigation of Machine Learning Algorithms with SMOTE Integration to Maximize F1 Score
1 Introduction
2 Related Work
3 Proposed Work
3.1 Dataset Description
3.2 Exploratory Analysis of Data
3.3 Classification Algorithms
3.4 SMOTE Integrated Classification
4 Results
5 Conclusion
References
Detection of Pathological Myopia from Fundus Images
1 Introduction
2 Related Work
3 Proposed Method
4 Results and Conclusion
References
Load Frequency Control of Single and Multi-area Power Systems Based on ADRC
1 Introduction
2 Active Disturbance Rejection Control (ADRC)
3 Power System Model for Load Frequency Control
4 Simulation Results
4.1 Single Area Power System
4.2 Single Area Power System with Nonlinearities
4.3 IEEE 39 Bus System
5 Conclusion
References
Pest Detection and Identification in Infested Plants Using Digital Images in Agriculture
1 Introduction
2 Related Works
3 Proposed System
4 Conclusion
References
Narrative Paragraph Generation for Photo Stream Using Neural Networks
1 Introduction
2 Related Work
3 Proposed Architecture
3.1 Phase1: CNN Layer
3.2 Phase2: Recurrent Network with Attention
3.3 Phase3: BLEU Score and Beam Search
4 Datasets Used
4.1 Caption Preparation
4.2 Model Parameters
5 Experimentation and Result
5.1 Performance Analysis of Deep CNNs
6 Conclusion and Future Work
References
Stroke Disease Prediction Using Adaboost Ensemble Learning Technique
1 Introduction
2 Literature Survey
3 Proposed Methodology
3.1 Proposed System
3.2 Dataset
3.3 Pre-processing
3.4 Classification
3.5 Adaptive Ensemble Learning
4 Result Analysis
4.1 Data Visualization
4.2 Algorithm Results
4.3 Result Comparison
5 Conclusion and Future Work
References
Real-time Multi-module Student Engagement Detection System
1 Introduction
2 Literature Review
3 Student Engagement Detection Framework
3.1 Overview
3.2 Emotion Detection
3.3 Micro-sleep Tracking
3.4 Yawns Detection
3.5 Iris Distraction Tracking
4 Results and Analysis
4.1 Dataset
4.2 Experimental Setup
4.3 Metrics
4.4 Results and Discussion
5 Conclusion and Future Works
References
A Machine Learning-Based Vulnerability Detection Approach for the Imbalanced Dataset UNSW-NB15
1 Introduction
2 Related Work
3 Proposed Methodology
4 Implementation
4.1 Dataset Description
4.2 Pre-processing
4.3 Feature Selection
4.4 Balanced Dataset
4.5 Implementation of ML Algorithms
5 Results and Discussion
5.1 Analysis of ML Algorithms
5.2 Comparative Analysis of the Algorithms with Both Datasets for Binary and Multi-class Classification
5.3 Comparison with Related Research
6 Conclusion
References
The Metaverse for Enterprises
1 Introduction
2 Literature Review
3 Discussion
4 Conclusion
References
Statistical Sales Forecasting Using Machine Learning Forecasting Methods for Automotive Industry
1 Introduction
2 Proposed Method
3 Implementation
4 Predictive Models
4.1 Linear Regression
4.2 Support Vector Machine
4.3 Random Forest Regression
5 Results and Discussion
6 Conclusion
References
Deep Learning Techniques for Detecting COVID-19
1 Introduction
2 Literature Review
3 Methodology
3.1 Design and Framework
3.2 Data Analysis and Modeling
4 Result and Analysis
4.1 Evaluation
4.2 Comparing Performance of CNN, VGG-16, VGG-19
5 Conclusion
References
Prediction for Bullish and Bearish Trend in the Price of Stocks Using PCA and LSTM
1 Introduction
2 Literature Survey
3 Proposed Methodology
3.1 Input Data set
3.2 Proposed Algorithms
4 Implementation
5 Results and Analysis
6 Conclusion and Future Works
References
Vector Control of PMSM Drive in Electric Vehicles Using SVM Regression Approach
1 Introduction
2 PMSM Model
3 Vector Control of PMSM Drive
4 Proposed Vector Control
4.1 Control Approach Using SVR
4.2 Support Vector Machine Regression (SVR)
4.3 Training SVM Network
4.4 Evaluation Metrics
5 Controller Performance Evaluation
5.1 Constant Speed and Load Conditions
5.2 Variable Speed Test
5.3 Impact of Load Disturbance
5.4 Evaluation Metrics
6 Conclusion
References
Key Frame Extraction from Videos Based on SIFT and Structural Similarity
1 Introduction
2 Proposed Method
2.1 Structural Similarity Between Frames
2.2 Scale-Invariant Feature Transform of Frames
3 Experimental Results
3.1 Datasets
3.2 Evaluation Metrics
3.3 Results
4 Conclusion
References
An Automated System for Rice Plant Diagnosis Using Deep Learning
1 Introduction
2 Rice Plant Disease
2.1 Bacterial Diseases
2.2 Viral Diseases
2.3 Fungus Diseases
3 Literature Review
4 Steps to Identification of Plant Diseases
4.1 Input Image and Set of Image
4.2 Image Pre-processing
4.3 Image Segmentation
4.4 Feature Extraction
4.5 Identification and Bracket of Conditions
5 Proposed GoogleNet with CNN Architecture
6 Results
7 Conclusion
References
Machine Learning Techniques in Data Fusion: A Review
1 Introduction
2 Literature Review
3 Categorization of Data Fusion Techniques
3.1 Based on Relation Between Data Sources
3.2 Dasarathy Categorization
3.3 Approaches for Fusion of Data
3.4 Based on Abstraction Level
3.5 Architectural-Based Classification
3.6 Traits of Fusion Data
3.7 Spatiotemporal Vector of Fusion Data
3.8 JDL Method
4 Common Machine Learning Techniques Used in Paper
5 Conclusion
References
Hyperparameter Tuning for Edge-IIoT Intrusion Detection Using SMOTE
1 Introduction
2 Releated Work
3 Proposed Work
3.1 Step 1 (Dataset and Pre-processing)
3.2 Step 2 (SMOTE)
3.3 Step 3 (Feature Selection)
3.4 Step 4 (Machine learning classifiers)
4 Results and Discussion
4.1 Evaluation Metrics
4.2 Results
4.3 Discussion
5 Conclusion
References
Hindi Fake News Fact Checker Using Machine Learning and Deep Learning
1 Introduction
2 Related Work
3 Methodology
3.1 Dataset Preparation
3.2 Dataset Description
3.3 Data Pre-processing
4 Experimental Design
4.1 Exploring Machine Learning Logics
4.2 Deep Learning Model
5 Results and Observations
5.1 Machine Learning Models
5.2 Deep Learning Models
6 Future Work and Conclusions
References
A Comprehensive Review on the Identification of Blood-Based Biomarkers for Alzheimer's Disease Detection Through Computational Approaches
1 Introduction
2 Scope of This Review
3 Pathogenesis of AD
4 Importance of Blood-Based Biomarker for AD
5 Datasets
6 Computational Approaches
6.1 Role of Differential Gene Expression Analysis in Identification of AD Blood Biomarkers
6.2 Identification of Biomarkers via Weighted Gene Co-Expression Network Analysis (WGCNA) in AD
6.3 Gene Enrichment Analysis for Identification of Candidate Biomarker and Pathways Associated with AD
6.4 Protein-Protein Interaction Analysis
6.5 Machine Learning Applications for Blood-Based AD Diagnostics Solution
7 Conclusion
References
Application of Ensemble Machine Learning Techniques in Yield Predictions of Major and Commercial Crops
1 Introduction
2 Literature Review
3 Methodology
3.1 Study Areas and Dataset Acquisition
3.2 Yield Prediction Using Ensemble Methods
3.3 Performance Assessment of the Models
4 Results and Discussion
5 Conclusion
References
A Review on Texture Feature Analysis of Chest Computed Tomography Images for Detection and Classification of Pulmonary Diseases
1 Introduction
2 Texture Analysis
3 Types of Feature Extraction Techniques
4 Role of Radiomics in Lung Diseases
5 Role of Computer Aided Diagnosis (CAD) Systems for Detection and Classification of Lung Diseases
6 Literature Survey on Texture Feature Extraction
7 Datasets
8 Classification
9 Discussion and Challenges
10 Limitations and Future Research Directions
11 Conclusion
References
A Review of Machine Learning Techniques for Tuberculosis Meningitis Diagnosis
1 Introduction
2 ML Techniques Used in Meningitis Diagnosis
3 Literature Review
4 Discussion
5 Conclusion
References
Embeddings-Based Parallel Corpus Creation for English-Manipuri
1 Introduction
2 Motivation
3 Related Works
4 Contribution
5 Methodology
5.1 Collection of Parallel Corpus
5.2 Text Processing
5.3 Sentence Alignment
6 Results and Evaluations
7 Conclusion and Future Work
References
Collision Avoidance System Using Reinforcement Learning
1 Introduction
2 Previous Works on Collision Avoidance System
3 Methodology
3.1 Reinforcement Learning
4 Collision Avoidance System with Reinforcement Learning
5 Experiments
6 Conclusion
References
Relationship Management in SIoT: A Survey
1 Introduction
1.1 Introduction to SIoT
1.2 Introduction to Relationship Management in SIoT
2 Architecture
3 Related Work
4 Conclusion
References
A Technique for Finding an Approximate Solution to an Ill-Posed Inverse Problem Using Tikhonov's Regularization Method
1 Introduction
2 Statement of the Inverse Problem
3 A Technique for Solving the Inverse Problem
4 Estimated Example
5 Conclusion
References
Optimal Prediction of Heart Disease by Identifying the Type of Chest Pain Using Machine Learning Techniques
1 Introduction
2 Related Work
3 Methodology
4 Evaluation Matrices
5 Data Description
6 Exploratory Data Analysis
6.1 Matrices Features Correlation:
7 Result Analysis
8 Comparison with the Previous Research
9 Conclusion
References
VANET Hybrid Routing Protocol Featuring Perpetual Hopfield Network and Enhanced K-Means Clustering Algorithm
1 Introduction
2 Related Work
3 K-Means Clustering Algorithm
4 Proposed System
5 Experiment and Result
6 Conclusion
References
Geo Science-Based Optimization Algorithms: A New Paradigm
1 Introduction
2 NIOAs Study
2.1 With Respect to Time Complexity
2.2 With Respect to the Application
2.3 With Respect to Exploration and Exploitation
3 Proposed Optimization Techniques Under the Geoscience Paradigm
3.1 Governing Forces and Equations
3.2 Stability Point
4 Conclusion
References
Identifying Incorrect Postures While Performing Sun Salutation Using MoveNet
1 Introduction
2 Related Work
3 Methodology
3.1 Preprocessing Module
3.2 Pose Classification Module
3.3 Assessment Module
4 Human Pose Estimation and Classification
4.1 MoveNet Model
4.2 Pose Classification Using ANN
5 Experiment
5.1 Dataset Preparation
5.2 Experimental Setup
6 Conclusion
References
Output Feedback Scheme-Based Network Synchronization of a Class of Discrete Time Systems in Chain and Ring Topology
1 Introduction
2 Contraction Theory for Discrete Time Systems
3 Problem Formulation
4 Chain Network of N-Systems
5 Ring Network of N-Systems
6 Numerical Simulations
6.1 Numerical Simulations for Chain Network
6.2 Numerical Simulations for Ring Network
7 Conclusions
References
Contrast Enhancement of Medical Images Using Otsu Thresholding
1 Introduction
1.1 Organization of the Paper
2 Technical Background
2.1 Otsu Thresholding [11]
2.2 CLAHE [13]
2.3 Evaluation Metrics for Image Quality
3 Proposed Method
4 Result Analysis
4.1 Cervical Cancer [16]
4.2 Breast Digital X-ray Image [17]
4.3 Kidney Image [17]
5 Conclusion
References
Color Image Encryption Using Hybrid Three-Scroll Unified Chaotic Attractor and 6D Hyperchaotic System
1 Introduction
2 Chaotic and the Hyperchaotic Systems
2.1 Three-Scroll Unified Chaotic Attractor (TSUCA)
2.2 6D Hyperchaotic System
3 Proposed Encryption Scheme
4 Result and Security Analysis
4.1 Key Space and Key Sensitivity
4.2 Differential Attack Analysis
4.3 Information Entropy Analysis
4.4 Histogram and Chi-Square Test
4.5 Correlation Analysis
4.6 Effect of Noise
5 Conclusion
References
IoT Adoption in Agriculture: Awareness and Challenges Faced by Rural Farmers in Delta Districts of Tamil Nadu
1 Introduction
1.1 The Application of IoT in Farming
1.2 Agriculture’s IoT Issues
2 Literature Review
3 Research Methodology
3.1 Sample Design
3.2 Hypothesis
3.3 Framework of Analysis
4 Findings
5 Recommendations
6 Conclusion
References
Optimized Reversible Arithmetic and Logic Unit
1 Introduction
2 Previous Work
3 Background
3.1 Reversible Logic Gates
4 Design and Implementation
4.1 Control Unit of ALU
5 Simulation Results
6 Conclusion
References
A Review on Diagnosis of Breast Cancer Using Mammography Techniques
1 Introduction
2 Breast Cancer Diagnosis Using Mammography
3 Mammogram Database
4 Computer-Assisted Breast Cancer Diagnosis
5 Critical Techniques for CAD Systems
6 Phases of Cancer Diagnosis
7 Geometric Properties
8 Result
References
Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices Using Deep Learning
1 Introduction
2 Background
2.1 IoT Architecture and Layer Vulnerability
3 Related Work
4 Analysis of Different Malware Attacks
5 Malware Security Issues in IoT
6 Conclusion
References
Network Security Risks, Challenges, and Solutions for Underwater Wireless Sensor Network’s Trusted Node-to-Node Communication: A Survey
1 Introduction
2 UWSN Architecture
3 UWSN Security Threats and Requirements
3.1 Security Threats
3.2 Security Requirements
4 UWSN Characteristics and Challenges
4.1 Unsecure UWSN Environment
4.2 Imperfect Bandwidth
4.3 Unreliable Data Communication
4.4 Communication Channel and Transmission Range
4.5 Cooperation of Heterogeneous Nodes
5 Primary Solutions for UWSN Challenges
5.1 Key Management
5.2 Trust Management
5.3 Routing Security
5.4 AI-Inspired Cybersecurity Solutions for Securing UWSNs
5.5 Big Data Integration
6 Conclusion and Future Work
References
False Data Injection Attack Detection in VANET Using Upgraded Grey Wolf Optimization Algorithm Using LSTM Classifier
1 Introduction
2 Related Works
3 Proposed Model
3.1 Deep Learning Technique
3.2 FDIA-GWO-Based Intrusion Detection System
4 Proposed Algorithm
4.1 Grey Wolf Optimization Algorithm
4.2 Upgraded Grey Wolf Optimization Algorithm
5 Experimental Design and Results
6 Conclusions and Future Enhancements
References
Investigation of the Role of Test Size, Random State, and Dataset in the Accuracy of Classification Algorithms
1 Introduction
2 Methodology
3 Analysis
4 Conclusion
References
Author Index

Citation preview

Lecture Notes in Networks and Systems 686

Harish Sharma Vivek Shrivastava Kusum Kumari Bharti Lipo Wang Editors

Communication and Intelligent Systems Proceedings of ICCIS 2022, Volume 1

Lecture Notes in Networks and Systems Volume 686

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

Harish Sharma · Vivek Shrivastava · Kusum Kumari Bharti · Lipo Wang Editors

Communication and Intelligent Systems Proceedings of ICCIS 2022, Volume 1

Editors Harish Sharma Department of Computer Science and Engineering Rajasthan Technical University Kota, India Kusum Kumari Bharti Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Madhya Pradesh, India

Vivek Shrivastava National Institute of Technology Delhi New Delhi, Delhi, India Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological University Singapore, Singapore

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-2099-0 ISBN 978-981-99-2100-3 (eBook) https://doi.org/10.1007/978-981-99-2100-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This book contains outstanding research papers as the proceedings of the 4th International Conference on Communication and Intelligent Systems (ICCIS 2022), which was held on December 19–20, 2022, at National Institute of Technology Delhi, India, under the technical sponsorship of the Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. This book presents novel contributions in the areas of communication and intelligent systems, and it serves as reference material for advanced research. The topics covered are: intelligent system: algorithms and applications, intelligent data analytics and computing, informatics and applications, and communication and control systems. ICCIS 2022 received a significant number of technical contributed articles from distinguished participants from home and abroad. ICCIS 2022 received 410 research submissions. After a very stringent peer-reviewing process, only 108 high-quality papers were finally accepted for presentation and the final proceedings. This book presents first volume of 55 research papers related to Communication and Intelligent Systems and serves as reference material for advanced research. Kota, India New Delhi, India Singapore Jabalpur, India

Harish Sharma Vivek Shrivastava Lipo Wang Kusum Kumari Bharti

v

Contents

Network Coverage and Event Detection in Mobile Sensor Networks . . . . Sunandita Debnath Attention Guided Human Fall Detection for Elderly Patient Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nazia Aslam, Priyesh D. Hemrom, and Maheshkumar H. Kolekar SDN-Enabled IoT to Combat the DDoS Attacks . . . . . . . . . . . . . . . . . . . . . . Pooja Kumari and Ankit Kumar Jain Analysis of Existing Datasets of Household Objects for AI-Enabled Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Divya Arora Bhayana and Om Prakash Verma In Silico Molecular Docking Study by Using Bio-informatics Database to Fabricate M-Cell Targeting Nanocarrier System for Oral Delivery of Macromolecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rahul Maurya, Suman Ramteke, and Narendra Kumar Jain Open-Source Datasets for Colonoscopy Polyps and Its AI-Enabled Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Harshita Mangotra, Palak Handa, and Nidhi Gooel

1

13 23

35

51

63

Cloud-Based House Price Predictor App Using Machine Learning . . . . . Amit Kumar

77

Compendium of Qubit Technologies in Quantum Computing . . . . . . . . . . Eby Sebastian and Ramesh Chandra Poonia

91

A Novel Post-quantum Piekert’s Reconciliation-Based Forward Secure Authentication Key Agreement for Mobile Devices . . . . . . . . . . . . . 101 Chaudhary Dharminder, S. S. Anushaa, S. Naundhini, and M. S. P. Durgarao

vii

viii

Contents

An Overview of IoT and Smart Application Environments: Research and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Chander Prabha, Sukhwinder Kaur, Jaspreet Singh, and Meena Malik Human Detection and Tracking Based on YOLOv3 and DeepSORT . . . . 125 Bhawana Tyagi, Swati Nigam, and Rajiv Singh Smart City: Road Traffic Monitoring System Based on the Integration of IoT and ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Komal Saini and Sandeep Sharma Weed Detection in Crops Using Lightweight EfficientNets . . . . . . . . . . . . . 149 Atishek Kumar, Rishabh Jain, and Rudresh Dwivedi Meta-heuristics for the Single-Channel PMU Placement Problem Considering Zero-Injection-Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 K. R. S. V. V. P. P. Narasa Reddy and Anjeneya Swami Kare Comparative Analysis of Different Machine Learning Approaches for Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Tanvi Desai and Divyakant Meva A Comprehensive Investigation of Machine Learning Algorithms with SMOTE Integration to Maximize F1 Score . . . . . . . . . . . . . . . . . . . . . . 187 Surbhi Sharma and Alka Singhal Detection of Pathological Myopia from Fundus Images . . . . . . . . . . . . . . . . 201 Sarvat Ali and Shital Raut Load Frequency Control of Single and Multi-area Power Systems Based on ADRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Ovais Farooq, Suhail Ahmad Suhail, and M. A. Bazaz Pest Detection and Identification in Infested Plants Using Digital Images in Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Monica Shinde, Kavita Suryavanshi, and Dhiraj Kumar Kadam Narrative Paragraph Generation for Photo Stream Using Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 M. N. Anjali, Tejash More, Kumari Misa, and Keshab Nath Stroke Disease Prediction Using Adaboost Ensemble Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Sreenidhi Ganachari and Srinivasa Rao Battula Real-time Multi-module Student Engagement Detection System . . . . . . . 261 Pooja Ravi and M. Ali Akber Dewan A Machine Learning-Based Vulnerability Detection Approach for the Imbalanced Dataset UNSW-NB15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Koppula Manasa and L. M. I. Leo Joseph

Contents

ix

The Metaverse for Enterprises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Silvia Angeloni Statistical Sales Forecasting Using Machine Learning Forecasting Methods for Automotive Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 S. Sivabalan and R. I. Minu Deep Learning Techniques for Detecting COVID-19 . . . . . . . . . . . . . . . . . . 321 Harsha Gaikwad, Manjushree Laddha, Arvind Kiwelekar, Sayali Bhongade, and Akshit Karande Prediction for Bullish and Bearish Trend in the Price of Stocks Using PCA and LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Adithya Mohanavel, M. I. Asmath Haseena, and N. Sabiyath Fatima Vector Control of PMSM Drive in Electric Vehicles Using SVM Regression Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Ashly Mary Tom and J. L. Febin Daya Key Frame Extraction from Videos Based on SIFT and Structural Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Paramita De An Automated System for Rice Plant Diagnosis Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Rakesh Meena, Sunil Joshi, and Sandeep Raghuwanshi Machine Learning Techniques in Data Fusion: A Review . . . . . . . . . . . . . . 391 Muskan Sharma, Priyanka Kushwaha, Pragati Kumari, Pushpanjali Kumari, and Richa Yadav Hyperparameter Tuning for Edge-IIoT Intrusion Detection Using SMOTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Bidyapati Thiyam and Shouvik Dey Hindi Fake News Fact Checker Using Machine Learning and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Sohali Baisla, Mehak Aggarwal, Poonam Bansal, and Kiran Malik A Comprehensive Review on the Identification of Blood-Based Biomarkers for Alzheimer’s Disease Detection Through Computational Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Ankita Maitra, Pushpendra Kumar, and Manoj Jha Application of Ensemble Machine Learning Techniques in Yield Predictions of Major and Commercial Crops . . . . . . . . . . . . . . . . . . . . . . . . . 451 T. R. Jayashree, N. V. Subba Reddy, and U. Dinesh Acharya

x

Contents

A Review on Texture Feature Analysis of Chest Computed Tomography Images for Detection and Classification of Pulmonary Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Priya Sawant and R. Sreemathy A Review of Machine Learning Techniques for Tuberculosis Meningitis Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Monali Ramteke, Shital Raut, and Tejal Kadam Embeddings-Based Parallel Corpus Creation for English-Manipuri . . . . 489 Gourashyam Moirangthem, Lavinia Nongbri, Ningthoujam Johny Singh, and Kishorjit Nongmeikapam Collision Avoidance System Using Reinforcement Learning . . . . . . . . . . . . 503 Aravindhan Thaninayagam, S. V. Raswanth Prasath, R. S. Hiruthick Roshan, and Darshana Othayoth Relationship Management in SIoT: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . 515 M. Shruthi, D. Sendil Vadivu, and Narendran Rajagopalan A Technique for Finding an Approximate Solution to an Ill-Posed Inverse Problem Using Tikhonov’s Regularization Method . . . . . . . . . . . . 527 Van Huyen Le and Liudmila V. Chernenkaya Optimal Prediction of Heart Disease by Identifying the Type of Chest Pain Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . 539 Ghulab Nabi Ahmad, Hira Fatima, Shafiullah, and Arshil Noor VANET Hybrid Routing Protocol Featuring Perpetual Hopfield Network and Enhanced K-Means Clustering Algorithm . . . . . . . . . . . . . . . 553 Anuranj Pullanatt and A. Anitha Geo Science-Based Optimization Algorithms: A New Paradigm . . . . . . . . 565 Aishwarya Mishra and Lavika Goel Identifying Incorrect Postures While Performing Sun Salutation Using MoveNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Sheetal Girase, Omkar Dutta, Adwait Mahadar, Atharva Ghodmare, and Mangesh Bedekar Output Feedback Scheme-Based Network Synchronization of a Class of Discrete Time Systems in Chain and Ring Topology . . . . . . . 589 Ravi Kumar Ranjan, Bharat Bhushan Sharma, and Dipak J. Prajapati Contrast Enhancement of Medical Images Using Otsu Thresholding . . . 603 Kurman Sangeeta, Modalavalasa Divya, and Bammidi Divyajyothi Color Image Encryption Using Hybrid Three-Scroll Unified Chaotic Attractor and 6D Hyperchaotic System . . . . . . . . . . . . . . . . . . . . . . 615 Subhashish Pal, Arghya Pathak, Ansuman Mahanty, Hrishikesh Mondal, and Mrinal Kanti Mandal

Contents

xi

IoT Adoption in Agriculture: Awareness and Challenges Faced by Rural Farmers in Delta Districts of Tamil Nadu . . . . . . . . . . . . . . . . . . . 629 S. Arjune and V. Srinivasa Kumar Optimized Reversible Arithmetic and Logic Unit . . . . . . . . . . . . . . . . . . . . . 641 Saroja S. Bhusare, Veeramma Yatnalli, E. Shreyas, Shreeram Aithal, Gayana A. Jain, and O. Sreekaar A Review on Diagnosis of Breast Cancer Using Mammography Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Bahareh Nazar Hosseini Saber and Reyhaneh Nazar Hosseini Saber Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 Moushumi Barman and Bobby Sharma Network Security Risks, Challenges, and Solutions for Underwater Wireless Sensor Network’s Trusted Node-to-Node Communication: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 D. Jocil and R. Vadivel False Data Injection Attack Detection in VANET Using Upgraded Grey Wolf Optimization Algorithm Using LSTM Classifier . . . . . . . . . . . . 703 M. S. Bennet Praba and R. Rathna Investigation of the Role of Test Size, Random State, and Dataset in the Accuracy of Classification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 715 Raj Kishor Bisht and Ila Pant Bisht Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727

Editors and Contributors

About the Editors Harish Sharma is an Associate professor at Rajasthan Technical University, Kota in Department of Computer Science and Engineering. He has worked at Vardhaman Mahaveer Open University Kota, and Government Engineering College Jhalawar. He received his B.Tech. and M.Tech. degree in Computer Engineering from Government Engineering College, Kota and Rajasthan Technical University, Kota in 2003 and 2009 respectively. He obtained his Ph.D. from ABV—Indian Institute of Information Technology and Management, Gwalior, India. He is secretary and one of the founder member of Soft Computing Research Society of India. He is a life time member of Cryptology Research Society of India, ISI, Kolkata. He is an Associate Editor of International Journal of Swarm Intelligence (IJSI) published by Inderscience. He has also edited special issues of the many reputed journals like Memetic Computing, Journal of Experimental and Theoretical Artificial Intelligence, Evolutionary Intelligence etc. His primary area of interest is nature inspired optimization techniques. He has contributed in more than 105 papers published in various international journals and conferences. Dr. Vivek Shrivastava has approx. 20 years of diversified experience of scholarship of teaching and learning, accreditation, research, industrial, and academic leadership in India, China and USA. Presently he is holding the position of Dean Research and Consultancy at National Institute of Technology Delhi. Prior to his academic assignments he has worked as System Reliability Engineer at SanDisk Semiconductors Shanghai China and USA. Dr. Shrivastava has significant industrial experience of collaborating with industry and Government organizations at SanDisk Semiconductors he has made significant contribution to the design development of memory products. He has contributed to the development and delivery of FiveYear Integrated B. Tech.—M. Tech. Programme (Electrical Engineering) and Master programme (Power Systems) at Gautam Buddha University Greater Noida. He has extensive experience academic administration in various capacity of Dean (Research

xiii

xiv

Editors and Contributors

and Consultancy), Dean (Student Welfare), Faculty In-charge (Training and Placement), Faculty In-charge (Library), Nodal Officer (Academics, TEQIP-III), Nodal Officer RUSA, Experts in various committees in AICTE, UGC, etc. Dr. Shrivastava has carried out research and consultancy and attracted significant funding projects from Ministry of Human Resources and Development, Government of India, Board of Research in Nuclear Science (BRNS) subsidiary organization of Bhabha Atomic Research Organization. Dr. Shrivastava has published over 80 journal articles, and presented papers at conferences, and has published several chapters in books. He has supervised five Ph.D. and 16 Masters students, and currently supervising several Ph.D. students. His diversified research interests are in the areas of reliability engineering, renewable energy and conventional power systems which include Wind, Photovoltaic (PV), Hybrid Power Systems, Distributed Generation, Grid Integration of Renewable Energy, Power Systems Analysis, and Smart Grid. Dr. Shrivastava is an Editor/Associate Editor of the Journals, International Journal of Swarm Intelligence (IJSI) and International Journal of System Assurance Engineering and Management. He is a fellow of the Institution of Engineers (India), and a Senior Member of the Institute of Electrical and Electronics Engineers (IEEE). Dr. Kusum Kumari Bharti is an Assistant Professor at PDPM IIITDM Jabalpur. Dr. Bharti has obtained her Ph.D. in Computer Science and Engineering from ABVIIITM Gwalior. She has guided six M.Tech., and presently guiding two Ph.D. students and five M.Tech. Students. She has published more than 12 journal and conference papers in the area of text clustering, data mining, online social network, and soft computing. She has been an active member of many organizing committees of various conferences, workshops and faculty development program. Her research areas include machine Learning, Data Mining, Machine Translation. Online Social Network, and Soft Computing. Dr. Lipo Wang received the Bachelor degree from National University of Defense Technology (China) and Ph.D. from Louisiana State University (USA). He is presently on the faculty of the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. His research interest is artificial intelligence with applications to image/video processing, biomedical engineering, and data mining. He has 330+ publications, a U.S. patent in neural networks and a patent in systems. He has co-authored two monographs and (co-)edited 15 books. He has 8,000+ Google Scholar citations, with H-index 43. He was keynote speaker for 36 international conferences. He is/was Associate Editor/Editorial Board Member of 30 international journals, including four IEEE Transactions, and guest editor for 10 journal special issues. He was a member of the Board of Governors of the International Neural Network Society, IEEE Computational Intelligence Society (CIS), and the IEEE Biometrics Council. He served as CIS Vice President for Technical Activities and Chair of Emergent Technologies Technical Committee, as well as

Editors and Contributors

xv

Chair of Education Committee of the IEEE Engineering in Medicine and Biology Society (EMBS). He was President of the Asia-Pacific Neural Network Assembly (APNNA) and received the APNNA Excellent Service Award. He was founding Chair of both the EMBS Singapore Chapter and CIS Singapore Chapter. He serves/served as chair/committee members of over 200 international conferences.

Contributors Mehak Aggarwal Artificial Intelligence and Data Sciences, IGDTUW, New Delhi, India Ghulab Nabi Ahmad Institute of Applied Sciences, Mangalayatan University, Aligarh, Uttar Pradesh, India Suhail Ahmad Suhail Department of Electrical Engineering, NIT Srinagar, Srinagar, India Shreeram Aithal ECE Department, JSS Academy of Technical Education, Bangalore, Karnataka, India M. Ali Akber Dewan School of Computing and Information Systems, Athabasca University, Alberta, Canada Sarvat Ali Visvesvaraya National Institute of Technology, Nagpur, India Silvia Angeloni Università degli Studi di Milano, Milan, Italy A. Anitha Noorul Islam Centre for Higher Education, Noorul Islam University, Thucklay, India M. N. Anjali Department of Computer Science and Engineering, Indian Institute of Information Technology Kottayam, Kottayam, Kerala, India S. S. Anushaa Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India S. Arjune School of Management, SASTRA Deemed to be University, Thanjavur, India Nazia Aslam Video Surveillance Lab, Department of Electrical Engineering, Indian Institute of Technology Patna, Bihta, India M. I. Asmath Haseena Department of Computer Science and Engineering, B.S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India Sohali Baisla Artificial Intelligence and Data Sciences, IGDTUW, New Delhi, India Poonam Bansal Artificial Intelligence and Data Sciences, IGDTUW, New Delhi, India

xvi

Editors and Contributors

Moushumi Barman Department of CSE, Assam Don Bosco University, Guwahati, Assam, India Srinivasa Rao Battula School of Computer Science and Engineering, Vellore Institute of Technology, Andhra Pradesh, Amaravati, India M. A. Bazaz Department of Electrical Engineering, NIT Srinagar, Srinagar, India Mangesh Bedekar Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India M. S. Bennet Praba Department of CSE, SRMIST, Chennai, India Divya Arora Bhayana Department of Electronics and Communication Engineering, Delhi Technological University, Delhi, India Sayali Bhongade Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, India Saroja S. Bhusare ECE Department, JSS Academy of Technical Education, Bangalore, Karnataka, India Ila Pant Bisht Department of Economics and Statistics, Government of Uttarakhand, Dehradun, India Raj Kishor Bisht School of Computing, Graphic Era Hill University, Dehradun, India Liudmila V. Chernenkaya Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia Paramita De G.L. Bajaj Institute of Technology and Management, Greater Noida, India Sunandita Debnath Department of Electronics and Communication Engineering, Indian Institute of Information Technology Vadodara (IIIT Vadodara), Gandhinagar, India Tanvi Desai Anand Institute of Management and Information Science, Anand, India Shouvik Dey National Institute of Technology Nagaland, Dimapur, India Chaudhary Dharminder Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India U. Dinesh Acharya Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India Modalavalasa Divya CSM Department, AITAM, Tekkali, India Bammidi Divyajyothi CSM Department, AITAM, Tekkali, India M. S. P. Durgarao Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India

Editors and Contributors

xvii

Omkar Dutta Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India Rudresh Dwivedi Netaji Subhas University of Technology, New Delhi, India Ovais Farooq Department of Electrical Engineering, NIT Srinagar, Srinagar, India Hira Fatima Institute of Applied Sciences, Mangalayatan University, Aligarh, Uttar Pradesh, India J. L. Febin Daya Electric Vehicles Incubation, Testing and Research Center, Vellore Institute of Technology, Chennai, India Harsha Gaikwad Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, India Sreenidhi Ganachari School of Computer Science and Engineering, Vellore Institute of Technology, Andhra Pradesh, Amaravati, India Atharva Ghodmare Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India Sheetal Girase Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India Lavika Goel Malaviya National Institute of Technology, Jaipur, India Nidhi Gooel Department of ECE, IGDTUW, Delhi, India Palak Handa Department of ECE, DTU, Delhi, India Priyesh D. Hemrom Department of Mechanical Engineering, Indian Institute of Technology Patna, Bihta, India R. S. Hiruthick Roshan Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India Ankit Kumar Jain Department of Computer Engineering, National Institute of Technology Kurukshetra, Kurukshetra, India Gayana A. Jain ECE Department, JSS Academy of Technical Education, Bangalore, Karnataka, India Narendra Kumar Jain Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, Madhya Pradesh, India Rishabh Jain Netaji Subhas University of Technology, New Delhi, India T. R. Jayashree Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India Manoj Jha Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, India

xviii

Editors and Contributors

D. Jocil Department of Information Technology, Bharathiar University, Coimbatore, Tamil Nadu, India Ningthoujam Johny Singh Indian Institute of Information Technology, Manipur, India Sunil Joshi Department of Computer Science and Engineering, Samrat Ashok Technological Institute Vidisha, Vidisha, M.P, India Dhiraj Kumar Kadam Entomology Department, Vasantrao Naik Marathwada Krishi Vidyapeeth, Parbhani, Maharashtra, India Tejal Kadam Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India Akshit Karande Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, India Anjeneya Swami Kare School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India Sukhwinder Kaur SDDIET, Barwala, Panchkula, India Arvind Kiwelekar Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, India Maheshkumar H. Kolekar Department of Electrical Engineering, Indian Institute of Technology Patna, Bihta, India Amit Kumar Government of Andhra Pradesh, Visakhapatnam, India Atishek Kumar Netaji Subhas University of Technology, New Delhi, India Pushpendra Kumar Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, India Pooja Kumari Department of Computer Engineering, National Institute of Technology Kurukshetra, Kurukshetra, India Pragati Kumari Indira Gandhi Delhi Technical University for Women, New Delhi, India Pushpanjali Kumari Indira Gandhi Delhi Technical University for Women, New Delhi, India Priyanka Kushwaha Indira Gandhi Delhi Technical University for Women, New Delhi, India Manjushree Laddha Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, India Van Huyen Le Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia

Editors and Contributors

xix

L. M. I. Leo Joseph SR University, Warangal, India Adwait Mahadar Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India Ansuman Mahanty Department of Physics, Dr. B. C. Roy Engineering College, Durgapur, India Ankita Maitra Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, India Kiran Malik Artificial Intelligence and Data Sciences, IGDTUW, New Delhi, India Meena Malik Chandigarh University, Mohali, Punjab, India Koppula Manasa SR University, Warangal, India Mrinal Kanti Mandal Department of Physics, National Institute of Technology, Durgapur, India Harshita Mangotra Department of ECE, IGDTUW, Delhi, India Rahul Maurya Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, Madhya Pradesh, India; National Ayurveda Research Institute for Panchakarma, CCRAS, Cheruthuruthy, Thrissur, Kerala, India Rakesh Meena Department of Computer Science and Engineering, Samrat Ashok Technological Institute Vidisha, Vidisha, M.P, India Divyakant Meva Marwadi University, Rajkot, India R. I. Minu Department of Computing Technology, SRM Institute of Science and Technology, Chennai, India Kumari Misa Department of Computer Science and Engineering, Indian Institute of Information Technology Kottayam, Kottayam, Kerala, India Aishwarya Mishra Malaviya National Institute of Technology, Jaipur, India Adithya Mohanavel Department of Computer Science and Engineering, B.S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India Gourashyam Moirangthem Indian Institute of Information Technology, Manipur, India Hrishikesh Mondal Department of Physics, Durgapur Government College, Durgapur, India Tejash More Department of Computer Science and Engineering, Indian Institute of Information Technology Kottayam, Kottayam, Kerala, India K. R. S. V. V. P. P. Narasa Reddy School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India

xx

Editors and Contributors

Keshab Nath Department of Computer Science and Engineering, Indian Institute of Information Technology Kottayam, Kottayam, Kerala, India S. Naundhini Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India Swati Nigam Department of Computer Science, Banasthali Vidyapith, Banasthali, Rajasthan, India Lavinia Nongbri Indian Institute of Information Technology, Manipur, India Kishorjit Nongmeikapam Indian Institute of Information Technology, Manipur, India Arshil Noor Department of Computer Science Institute of Technology and Management, Aligarh, Uttar Pradesh, India Darshana Othayoth Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India Subhashish Pal Department of Physics, National Institute of Technology, Durgapur, India; Department of Physics, Dr. B. C. Roy Engineering College, Durgapur, India Arghya Pathak Department of Physics, National Institute of Technology, Durgapur, India Ramesh Chandra Poonia Department of Computer Science, CHRIST (Deemed to be University), Bangalore, Karnataka, India Chander Prabha Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India Dipak J. Prajapati Government Engineering College, Modasa, Gujarat, India Anuranj Pullanatt Noorul Islam Centre for Higher Education, Noorul Islam University, Thucklay, India Sandeep Raghuwanshi Department of Computer Science and Engineering, Samrat Ashok Technological Institute Vidisha, Vidisha, M.P, India Narendran Rajagopalan Department of Computer Science and Engineering, National Institute of Technology Puducherry, Karaikal, India Monali Ramteke Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India Suman Ramteke Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, Madhya Pradesh, India Ravi Kumar Ranjan Department of Electrical Engineering, National Institute of Technology Hamirpur, Hamirpur, India

Editors and Contributors

xxi

S. V. Raswanth Prasath Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India R. Rathna Department of IT, SRMIST, Chennai, India Shital Raut Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India Pooja Ravi Department of Computing Technologies, SRM IST, Kattankulathur, India Bahareh Nazar Hosseini Saber Department of Biomedical Engineering, Islamic Azad University, Tehran, Iran Reyhaneh Nazar Hosseini Saber Department of Biomedical Engineering, Islamic Azad University, Tehran, Iran N. Sabiyath Fatima Department of Computer Science and Engineering, B.S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India Komal Saini Guru Nanak Dev University, Amritsar, India Kurman Sangeeta CSM Department, AITAM, Tekkali, India Priya Sawant SCTR’s Pune Institute of Computer Technology, Pune, India Eby Sebastian Department of Changanassery, Kerala, India

Computer

Science,

Assumption

College,

D. Sendil Vadivu Department of Computer Science and Engineering, National Institute of Technology Puducherry, Karaikal, India Shafiullah Department of Mathematics, K.C.T.C College, BRA Bihar University Muzaffarpur, Raxual, India Bharat Bhushan Sharma Department of Electrical Engineering, National Institute of Technology Hamirpur, Hamirpur, India Bobby Sharma Department of CSE, Assam Don Bosco University, Guwahati, Assam, India Muskan Sharma Indira Gandhi Delhi Technical University for Women, New Delhi, India Sandeep Sharma Guru Nanak Dev University, Amritsar, India Surbhi Sharma Department of CSE and IT, JIIT, Noida, India Monica Shinde D. Y. Patil Institute of MCA and Management, Savitribai Phule Pune University, Akurdi, Pune, Maharashtra, India E. Shreyas ECE Department, JSS Academy of Technical Education, Bangalore, Karnataka, India

xxii

Editors and Contributors

M. Shruthi Department of Computer Science and Engineering, National Institute of Technology Puducherry, Karaikal, India Jaspreet Singh Chandigarh University, Mohali, Punjab, India Rajiv Singh Department of Computer Science, Banasthali Vidyapith, Banasthali, Rajasthan, India Alka Singhal Department of CSE and IT, JIIT, Noida, India S. Sivabalan Department of Computing Technology, SRM Institute of Science and Technology, Chennai, India O. Sreekaar ECE Department, JSS Academy of Technical Education, Bangalore, Karnataka, India R. Sreemathy SCTR’s Pune Institute of Computer Technology, Pune, India V. Srinivasa Kumar School of Management, SASTRA Deemed to be University, Thanjavur, India N. V. Subba Reddy Department of Information Technology, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, India Kavita Suryavanshi MCA Department, D. Y. Patil Institute of MCA and Management, Savitribai Phule Pune University, Akurdi, Pune, Maharashtra, India Aravindhan Thaninayagam Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India Bidyapati Thiyam National Institute of Technology Nagaland, Dimapur, India Ashly Mary Tom School of Electrical Engineering, Vellore Institute of Technology, Chennai, India Bhawana Tyagi Department of Computer Science, Banasthali Vidyapith, Banasthali, Rajasthan, India R. Vadivel Department of Information Technology, Bharathiar University, Coimbatore, Tamil Nadu, India Om Prakash Verma Department of Electronics and Communication Engineering, Delhi Technological University, Delhi, India Richa Yadav Indira Gandhi Delhi Technical University for Women, New Delhi, India Veeramma Yatnalli ECE Department, JSS Academy of Technical Education, Bangalore, Karnataka, India

Network Coverage and Event Detection in Mobile Sensor Networks Sunandita Debnath

Abstract Sensor mobility has recently surged as a topic of interest to improve network coverage and event/intrusion detection by a deployed wireless sensor network. Due to the continuous movement of the sensors, more areas will be covered within a smaller interval of time, while the covered areas will eventually become uncovered as a consequence of sensor mobility. The intruder detection probability of the sensor network will also improve due to the probabilistic nature of covered and uncovered areas. On the other hand, the extreme mobility of the sensors may have a negative impact on the overall network performance. This paper shows the influence of sensors’ velocity on network coverage and event detection probability. Furthermore, this work also includes the probabilistic nature of the sensing range of a sensor due to the unpredictable surrounding environment and obstruction in the propagation path. This provides an insight about the optimal sensor velocity for achieving effective throughput under unfavorable surrounding environments. Keywords Coverage fraction · Event detection · Mobile sensor networks (MSN) · Probabilistic sensing model

1 Introduction In recent years, mobile sensor network (MSN) has been envisioned as a major topic of study among researchers for their enormous advantages over a conventional wireless sensor network. MSNs have overcome many of the challenges of stationary sensor networks. Researchers have widely explored the performance of WSNs, though the performance measures of MSNs have not been explored enough. When the deployed sensors are equipped with mobilizers, it can be considered as a mobile sensor network. S. Debnath (B) Department of Electronics and Communication Engineering, Indian Institute of Information Technology Vadodara (IIIT Vadodara), Gandhinagar 382028, India e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_1

1

2

S. Debnath

The sensor nodes can be mounted on mobile robots, vehicles, boats, or animals, etc. Recent research in this field is facing lots of challenges: mobility, battery draining, dynamic topology, unreliable connectivity, node failure, event detection, and many more [1–4]. Coverage and connectivity are the two most typical measures of the services offered by a wireless sensor network. Coverage defines how well every point of the monitored area is under the sensing range of any of the deployed sensor nodes and connectivity represents how accurately the sensed information is transmitted to the sink or destination for further processing. The coverage performance of sensor networks depends on several factors such as the sensing range of the nodes, characteristics of target events, sensing sensitivity of sensors, and nature of the propagation environment. Therefore, the sensing signals from a source sensor or simply a sensor may suffer from path loss, multipath effects, shadowing, interference, and noise from the propagation environment [5]. As a result, the sensing range of a sensor is variable in different directions. Hence, the sensing area of a sensor is also no longer deterministic and depends on the nature of the monitored area of interest (AoI). On the other hand, connectivity is another fundamental aspect of wireless sensor networks. It plays a significant role in the effective functioning of a wireless communication network. For the proper functioning of wireless networks, the sensed data must reach the sink node/base station or the gateway through a single-hop or multi-hop mode of communication [6]. If the connectivity between a deployed sensor node/mote and all other nodes/motes in the network is lost, then this node becomes isolated. The sensed data from an isolated node cannot be reported to the desired destination or the fusion center, and thus, the information is lost [7]. Therefore, coverage without connectivity is inefficacious. This research work is restrained to only coverage issues in MSNs, though connectivity concerns are not precluded in this study, they are not directly considered. Another crucial aspect of any sensor network is the network lifetime. The term lifetime is stated in the literature in many ways; it depends on the number of alive nodes. The network is considered to be nonfunctional or impaired if even a single node dies or when a percentage of nodes die or when a loss of coverage occurs due to node failure or unrestricted mobility of the sensor nodes [8]. The lifetime of the sensors affects the coverage and connectivity of the overall network. Thus, coverage, connectivity, and lifetime are trade-off parameters for network designers in sensor networks. With the evolution of technologies, the research goal for MSNs is to cope with the unpredictable and dynamic environment without any human intervention. With the introduction of mobility to the sensor nodes in a network, the coverage and event detection performance of the deployed network is enhanced. These mobile nodes can make up for the coverage holes and thus increase network lifetime. To the best of our knowledge, all the related state-of-the-art work on MSNs has considered sensors’ mobility as a tool for enhancing network performance. The consequences of extreme sensors’ velocity have not been assessed much. Besides this, in literature,

Network Coverage and Event Detection in Mobile Sensor Networks

3

most of the previous studies on MSN consider the conventional Boolean sensing model of constant sensing range for estimating network performance measures. In this paper, the effect of sensor speed on sensing coverage, event detection probability, and detection time has been discussed considering the non-idealistic probabilistic sensing model. The rest of the article is organized as follows: the network coverage and related definitions are presented in Sect. 2. The network performance metrics: coverage fraction, event detection probability, and detection time using probabilistic sensing model are also presented in Sect. 2. The results and performance analysis are discussed in Sect. 3. Finally, Sect. 4 concludes the paper.

2 System Models 2.1 Sensing Coverage Network coverage, sensing coverage, and coverage fraction these three terms have the same purpose. Thus, in this article, these terms will be used interchangeably. Coverage fraction is defined as how well each point of the geographic area of interest (AoI) is under the surveillance of the deployed sensor network. A AoI of area A is considered where N nodes are deployed according to the Poisson point process. Any event will be sensed by an arbitrary sensor node if it falls within the sensing region of the sensor. The coverage probability of a sensor with fixed sensing radius (Rs ) is given π R2 by Pdec = A s (neglecting boundary effect). Therefore, the probability an event or intruder not being detected by any of the deployed N sensors is Pns = (1 − Pdec ) N . Coverage fraction or coverage probability is equal to the probability that the event will be detected by at least one of the N deployed sensor nodes and is represented by Pc = 1 − (1 − Pdec ) N

(1)

as N is very large (N 1) (1) can also be approximated as Pc = 1 − e−NPdec π R2 −N As = 1−e 2 = 1 − e−λπ Rs

(2)

Here, sensor node density (N /A) is depicted by λ. Therefore, the sensing coverage attained by a sensor network depends on the number of deployed sensors and also on the sensing range of the nodes.

4

S. Debnath

In MSNs, the coverage probability of the deployed network depends on sensors’ speed and the fraction of time an area is under the sensing coverage of a mobile sensor. As the sensors are moving all the time, portions of the area will be covered for an interval of time. The areas will be alternated between covered and uncovered areas; this type of coverage is termed as sweep coverage. This sort of coverage is required in application areas where a large region of interest (RoI) needs to be monitored within a specific interval of time rather than continuous monitoring of each portion of the RoI. Thus, the coverage probability or coverage fraction during time interval [t0 , t1 ) can be expressed as in [9] as Pc = 1 − e−λArea(t1 −t0 )

(3)

Here in Eq. (3), the area being covered during time interval [t0 , t1 ) is expressed as Area(t1 − t0 ). In MANET, the mobility of the nodes is envisioned as random or uncontrolled, whereas in mobile sensor networks, the mobility of the sensors is supervised. In this work, it is considered that the mobile nodes are all moving only in straight lines and in an entirely uncoordinated manner. Based on real application scenarios, a plethora of optimal sensor mobility patterns is studied in the literature [10, 11]. Therefore, the network coverage during the time interval [t0 , t1 ) for MSNs can be expressed as 2 Pc = 1 − e−λ{π Rs +2Rs vs (t1 −t0 )}

(4)

where Area(t1 − t0 ) = π Rs2 +2Rs vs (t1 − t0 ) depicts the area traversed by the mobile node during the interval [t0 , t1 ) [12]. The portion of area traversed by the mobile nodes at least once during the interval [t0 , t1 ) has been illustrated in Fig. 1. It can be observed that for mobile sensor networks, the network coverage depends not only on the sensor density or sensing radius of the sensors but also on the velocity of the mobile nodes. A desired coverage can be achieved by precisely exploiting the controlled mobility of the sensors. On the contrary, mobility has an inimical effect on the overall network performance as if the sensor is moving with a high velocity, it is prone to miss out an intruding or malicious event. For an intruder to be detected, it has to stay within the sensing range of a mobile node for a minimum sensing time requirement. Most of the existing work on MSNs have considered the conventional deterministic sensing model viz. Boolean model for the performance analysis of the deployed mobile nodes. Boolean sensing model considers a constant sensing range in all directions ignoring the attenuation due to the surrounding propagation environment. Another category of probabilistic sensing model exists [13, 14] which considers the attenuation of the sensing range due to dynamic propagation environment and presence of obstruction in the direction of propagation.

Network Coverage and Event Detection in Mobile Sensor Networks

5

Fig. 1 Coverage fraction attained by a mobile node within time interval [t0 , t1 ) with moving at a speed of vs

2.2 Probabilistic Sensing Models Deterministic sensing model, namely the Boolean model, reflects a constant sensing range in all directions [15]. In the Boolean model, the sensing coverage of a sensor is considered as uniform as an isotropic antenna. Two widely known probabilistic sensing models are: Elfes [16] and Shadow fading [14] sensing models. This probabilistic model takes in account the effects of decay in sensing signal strength due to the propagation environment and also shadowing due to the presence of various obstacles in the propagation path. As a consequence of this, the sensing capability of a sensor is non uniform in all directions. In our previous work [12, 15], an effective sensing radius (ESR) is derived for the probabilistic sensing model. ESR considers losses due to the adverse propagation environment and fading of the received signal strength. The proposed ESR in [12] provides a more practical estimation of the sensing radius of a sensor. Therefore, network coverage during time interval [t0 , t1 ) by the mobile sensors with probabilistic sensing range can be expressed as PcElfes = 1 − e−λ{π Reff +2Reff vs (t1 −t0 )}

(5)

PcShad = 1 − e−λ{π Reff +2Reff vs (t1 −t0 )}

(6)

2

and 2

2

are the effective sensing radius for Elfes and Shadow fading where Reff and Reff sensing model, respectively.

Reff =

2 γ12 R12 −γ1 (Rmax −R1 ) 1 + γ − + γ R + R (1 )e 1 1 1 max 2 γ12

(7)

6

S. Debnath

Reff

=

Rmax

r =0

10η log10 (r/R ) dr r erfc √ σ 2

(8)

2.3 Event Detection Probability The performance of a deployed sensor network also depends on the event detection or intruder detection capability and also on the detection time requirement. An event initially not falling under the range of any sensor will never be detected in a static sensor network. However, this is not the case in mobile sensor networks. The event initially not falling in the sensing coverage of any of the nodes will eventually be detected in case of mobile sensor networks. The mobile nodes are continuously moving and monitoring the area of interest. Therefore, probability of an event or intruder getting sensed/detected in an interval td can be represented as in [12] as, P(T ≤ td ) = 1 − e− A Areacross . N

(9)

where Areacross is the cross-sectional area covered of a moving sensor within the time interval [t0 , t1 ). Equation (8) can be represented as Pdet = P(T ≤ td ) = 1 − e−2λRvs td .

(10)

Here, td = (t1 − t0 ) represents the sensing time interval and λ = (N /A) represents the sensor density. Thus, the expected detection time of a static event getting detected by the mobile nodes can be derived from (9) as ∝ Tdet =

e−2λRvs td dtd =

td =0

1 2λRvs

(11)

Similarly, the expected event detection time for probabilistic model can be expressed as TdetElfes =

2λvs

2 γ12

1 1 + γ1 R 1 +

TdetShadow = 2λvs

γ12 R12 2

Rmax r =0

− (1 + γ1 Rmax )e−γ1 (Rmax −R1 )

1 10 (r/R ) √ dr r erfc 10η log σ 2

(12)

(13)

Network Coverage and Event Detection in Mobile Sensor Networks

7

Equations (11) and (12) represent the detection time of a stationary event getting detected by a mobile node considering Elfes and Shadow fading sensing model, respectively. It can be noticed that the event detection probability relies on the sensors’ sensing range, sensor density, and velocity of the mobile sensors. Therefore, the sensors’ increased velocity can enhance the detection probability of any intruding event getting detected and also reduces the overall detection time. On the contrary to that, increased velocity can also have an adverse effect on the event detection probability. Almost all real events, e.g., chemical radiation, electromagnetic radiation, volcanic eruption, biological hazards to be sensed, etc., need a minimum sensing time requirement (ts ) by the sensor nodes. Due to high mobility of the sensor if the time interval for which the event stays in the sensing range of the nodes is less than ts , the event may slip away undetected. Thus, extreme mobility of the sensors is also unfavorable for the overall network performance. Moreover, the sensors are low powered nodes, and most of the time the DC battery of the nodes are non-rechargeable and non-replaceable. High velocity of the sensor causes fast depletion in the battery energy, and as a result of this, the network lifetime also reduces. In this work, a new term is defined as effective throughput which is nothing but trade-off between coverage performance and detection time as a measure of varying velocity. Effective throughput can be numerically expressed as P ∗ = (Pdet × Pc )

(14)

This P ∗ provides a new performance metric of mobile sensor network, which concludes that sensor mobility can be exploited to enhance the network coverage and event detection probability, but then extreme mobility can be unfavorable for the overall detection attained by the network against malicious events.

3 Results and Discussion This section of the article provides the simulation results to show the effect of sensor mobility on sensing coverage and intruder detection time by the deployed mobile sensor network. MATLAB is used as the simulation tool for carrying out the analysis. The parameter values used are mentioned in Table 1. To perform the numerical analysis and simulation, the entire monitoring area of interest is assumed to be a circular region of radius 1000 m. In the proposed framework, the maximum attainable sensing radius Rmax is considered as 50 m. The number of randomly deployed mobile sensor nodes (N) ranges from 0 to 2500 nodes. The network coverage (Pc ) is obtained for 30 s observation duration. Figure 2 shows the variation of both sensing coverage and event/intruder detection time with sensor density (λ) for different sensing models for average sensors’ speed v¯s = 2 m/s. In this figure, the solid lines represent the variation of coverage fraction (Pc ) with sensor density (λ), and the dashed line represents the variation of the intruder detection time (Tdet ) with sensor density (λ) for Boolean, Shadow fading,

8 Table 1 Simulation parameter used for analysis

S. Debnath Parameter

Value

Sensing radius of Boolean model

Rs

Maximum attainable sensing range of node

Rmax

Effective sensing range of a node

Reff

Number of sensor nodes

N

Node/sensor density (N/A)

λ

Pathloss exponent

2≤η≤4

Standard deviation (sd) of shadow fading model

0 dB ≤ σ ≤ 12 dB

Decay parameter for Elfes sensing model

γ and β

and Elfes sensing models, respectively. From the figures, it can be observed that the coverage fraction (Pc ) increases with the increase in sensor density, whereas the detection time of an intruder to be detected is reducing with the increase in sensor density. This is quite obvious that as sensor density increases more areas will be covered and intruders will be detected in a lesser interval of time. The plots also depict that the idealistic Boolean model attains the maximum coverage fraction and the minimum detection time requirement among all. On the other hand, in case of the probabilistic sensing model, the coverage fraction is less, and detection time is more in comparison to the idealistic Boolean sensing model. This is due to the fact of variable channel parameters, and this influence becomes more pronounced when the surroundings become further adverse. The plots in Fig. 2 are generated considering that the nodes are moving at a constant average velocity of v¯s = 2 m/s. Figure 3 depicts, for increasing sensors’ speed, the coverage fraction (Pc ) increases. This is due to the fact that now more areas will be covered in a smaller interval of time as a result of high mobility of the sensors. But each portion of the areas will be covered only for a smaller interval of time. On the other hand, with the increasing sensors’ speed, the probability that an intruder being detected will also increase, i.e., the detection probability of the deployed sensor network increases. Put another way that the probability of an event being missed out without being detected will reduce. The plots in Fig. 3 are generated for a 30 s observation period. Figure 4 shows the variation of effective throughput with varying velocity of the sensors for different sensing models. In this paper, a new term is defined as effective throughput which is nothing but trade-off between coverage performance and detection time as a measure of varying velocity. It is considered that the mobile sensors are moving at a speed up to a range of vs = 10 m/s. It can also be seen from the plots that with increasing sensor speed, the effective throughput increases but not as gradually as Pc . This provides an insight about the optimal velocity of the mobile sensors to achieve effective throughput.

Network Coverage and Event Detection in Mobile Sensor Networks

9

Fig. 2 Sensing coverage with respect to sensor density (λ) for: a Boolean model with Rs = 50 m and sensor velocity v¯s = 2 m/s; b shadow fading parameter σ = 6 dB and sensor velocity v¯s = 2 m/s; c Elfes decay parameter γ = 0.01/m, β = 1 and sensor velocity v¯s = 2 m/s. Detection time with respect to sensor density (λ) for: d Boolean model with Rs = 50 m and sensor velocity v¯s = 2 m/s; e shadow fading parameter σ = 6 dB and sensor speed v¯s = 2 m/s; f Elfes decay parameter γ = 0.01/m, β = 1 and sensor velocity v¯s = 2 m/s

Fig. 3 Variation of both sensing coverage and event detection time with sensor speed (v¯s ) for different sensing models

4 Conclusion In this research article, the impact of sensors’ speed on the overall performance of a mobile sensor network has been analyzed. The sensing coverage and the intruder detection time has been perceived for idealistic Boolean model and also for the probabilistic models, namely Elfes and Shadow fading sensing models. The results confirm that with the increase in sensors’ speed, more areas will be covered, and also,

10

S. Debnath

Fig. 4 Variation of effective throughput with sensor speed (vs ) for different sensing models

the event/intruder will be detected in a lesser time interval. But extreme mobility also had adverse effects on the event detection probability. This research work defines a new term effective throughput which considers the impact of increasing sensor speed on overall network performance. Furthermore, this article also provides a glimpse of coverage performance of mobile sensor networks under probabilistic sensing range, which gives a more realistic estimation of the network coverage and number of sensor requirements in unknown hazardous environments.

References 1. Etancelin JM, Fabbri A, Guinand F, Rosalie M (2019) DACYCLEM: a decentralized algorithm for maximizing coverage and lifetime in a mobile wireless sensor network. Ad Hoc Netw 87:174–187 2. Basagni S, Carosi A, Petrioli C, Phillips CA (2011) Coordinated and controlled mobility of multiple sinks for maximizing the lifetime of wireless sensor networks. Wirel Netw 17(3):759– 778 3. Long H, Liu Y, Wang Y, Dick R, Yang H (2009) Battery allocation for wireless sensor network lifetime maximization under cost constraints. In: Proceedings of the 2009 IEEE/ACM international conference on computer-aided design: digest of technical papers. Association for Computing Machinery 4. Gobel J, Krzesinski AE (2008) A model of autonomous motion in ad hoc networks to maximise area coverage. In: 2008 Australasian telecommunication networks and applications conference. IEEE 5. Debnath S, Hossain A (2019) Network coverage in interference limited wireless sensor networks. Wirel Pers Commun 109(1):139–153 6. Wang X, Xing G, Zhang Y, Lu C, Pless R, Gill C (2003) Integrated coverage and connectivity configuration in wireless sensor networks. In: Proceedings of the first international conference on embedded networked sensor systems (Sensys), Los Angeles, CA

Network Coverage and Event Detection in Mobile Sensor Networks

11

7. Liu X (2006) Coverage with connectivity in wireless sensor networks. In: 3rd international conference on broadband communications, networks and systems, pp 1–8 8. Chen Y, Zhao Q (2005) On the lifetime of wireless sensor networks. IEEE Commun Lett 9(11):976–978 9. Liu B, Dousse O, Nain P, Towsley D (2013) Dynamic coverage of mobile sensor networks. IEEE Trans Parallel Distrib Syst 24(2):301–311 10. Liu M, Cao J, Lou W, Chen L-J, Li X (2005) Coverage analysis for wireless sensor networks. Lecture notes in computer science (LNCS), 3794. Springer, Berlin, pp 711–720 11. Singh A, Sharma TP (2014) A survey on area coverage in wireless sensor networks. In: Proceedings of international conference on control, instrumentation, communication and computational technologies (ICCICCT), Kanyakumari, India, 10–11 July 2014, pp 829–836 12. Debnath S, Hossain A, Chowdhury SM, Singh AK (2018) Effective sensing radius (ESR) and performance analysis of static and mobile sensor networks. Telecommun Syst 68(1):115–127 13. Hossain A, Chakrabarti S, Biswas PK (2012) Impact of sensing model on wireless sensor network coverage. IET Wirel Sens Syst 2(3):272–281 14. Tsai YR (2008) Sensing coverage for randomly distributed wireless sensor networks in shadowed environments. IEEE Trans Veh Technol 57(1):556–564 15. Debnath S, Hossain A (2016) Impact of boundary effect on coverage fraction in wireless sensor network. In: Proceedings of international conference on electrical, electronics, and optimization techniques (IEEE ICEEOT-2016), Tamilnadu, India, 3–5 March, pp 2133–2137 16. Elfes A (1991) Occupancy grids: a stochastic spatial representation for active robot perception. In: Iyenger SS, Elfes A (eds) Autonomous mobile robots: perception, mapping and navigation, vol 1. IEEE Computer Society Press, pp 60–70

Attention Guided Human Fall Detection for Elderly Patient Monitoring Nazia Aslam, Priyesh D. Hemrom, and Maheshkumar H. Kolekar

Abstract Fall has been a significant threat to life if not attended to immediately. Elderly patients have problems with their motor activities, and patients with Alzheimer’s, Dementia, and Autism often suffer from falls. This paper presents a novel video-based fall detection approach using a deep neural network. We have designed an attention-guided CNN-LSTM neural network that detects falls in a video sequence. The convolutional neural network (CNN) is used to extract the spatial features of input data. In contrast, long short-term memory (LSTM) is used to learn the temporal relations between the frames. A multiplicative attention mechanism has been employed after the CNN layer that will extract the enhanced and focused features of input data. After that, an attention map is created based on the output of the context vector and fed to the LSTM layer for temporal feature learning. The experiments are carried out on the UR Fall detection dataset to validate the efficacy of the proposed algorithm. Keywords CNN · LSTM · Attention mechanism · Fall detection · Elderly patient monitoring

1 Introduction As the new age technologies led to a massive increase in computational speed, artificial intelligence (AI) slips into various applications improving fields beyond human capabilities. The involvement of deep learning in health care, especially in detectN. Aslam (B) Video Surveillance Lab, Department of Electrical Engineering, Indian Institute of Technology Patna, Bihta, India e-mail: [email protected] P. D. Hemrom Department of Mechanical Engineering, Indian Institute of Technology Patna, Bihta, India M. H. Kolekar Department of Electrical Engineering, Indian Institute of Technology Patna, Bihta, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_2

13

14

N. Aslam et al.

ing unintentional falls, could help the person get undelayed medical help. As per WHO [1], falls are the second leading cause of unintentional injury deaths worldwide. Each year, an estimated 6.84 lakh individuals die from falls globally, of which over 80% are in developing and under-develop countries. Also, 37 million people’s falls are severe enough to require medical attention each year. Falls can lead to head injuries (most critical common), hip, shoulder, hand and wrist fractures. Also, other major contributors to deaths from falls are circulatory system (47.4%) and respiratory system (17.4%) diseases [2]. Therefore, potent fall detection systems have a significant role in today’s health care and surveillance systems. The severity of falls is very common with older people who stay alone at home and leads to significant injuries. The wearable devices like accelerometers, gyroscopes, etc., can track the movement of a person by sensing their position, location, and speed. Rather than wearing one or multiple devices with which the older people or patients may be unfamiliar and uncomfortable, surveillance cameras play a vital role in monitoring them. The video footage from surveillance cameras can provide rich information about the persons and their surroundings [3]. Surveillance cameras are installed at airports, railway stations, bus stands, shopping malls, traffic signals, and streets for the security and safety of human beings. To monitor patients and older people, cameras can be installed at old care centers, isolation centers, hospitals, special wards, nursing homes, etc. Therefore, a vision-based system is a need for an hour. To detect falls, there are various techniques developed by researchers. Hadjadji et al. [4] presented an automatic human fall detection method that generates static and dynamic features representing falls with complementary information. The generated features are fused through the Choquet fuzzy integral technique. Liu et al. [5] use the human body silhouette to improve privacy protection. The vertical projection histogram of the silhouette image and statistical scheme is used to reduce the movement of the upper body of the human. Then, the K-nearest neighbor algorithm is used to classify between fall and not fall. Muheidat et al. [6] designed a real-time fall detection approach in which several sensors are placed beneath the carpet, and walking behaviors are monitored to identify falls. Wang et al. [7] created a camera-based fall detection approach in which the fall is primarily recognized by computing the widthto-height ratio, then speed and displacement in the vertical direction are estimated to validate the possibility of a fall. The problem with the handcrafted features-based technique is that it cannot model the dynamics of video data. On the other hand, deep learning-based techniques have proven their effectiveness for modeling spatial and temporal features of video data. CNN is the best spatial feature extractor, whereas LSTM models the temporal information [8]. In anomaly detection [9, 10], the combination of CNN-LSTM works well. Therefore, we have utilized the essence of the CNN and LSTM network that allows the model to learn the spatial features of video data and classify them into two categories (fall and not fall). We trained our model to learn the prominent features of video data and applied an attention mechanism that boosts the potential of the proposed framework and detects falls.

Attention Guided Human Fall Detection for Elderly …

15

2 Related Work The fall detection methods can be classified into three categories based on the input data such as (i) vision-based, (ii) sensor-based, and (iii) multimodal data-based. The vision-based approach is ambient-based sensors that are primarily cameras. Diverse cameras provide a view from various angles for the same subject and depth between the subject’s background. Vision-based approaches focus on extracting the frames from videos and using silhouettes or bounding boxes to detect falls. Doulamis [11] presents a paradigm for fall detection focused on the combined prediction of target object and motion information. To detect falls in various directions from the camera position, the moving vectors on specially chosen locations on the focal plane and the vertical motion of the top limit of the foreground item are computed. Furthermore, [12] proposed a new technique for fall detection based on studying the dynamic shape and motion of human body regions on Riemannian manifolds. Human activities are represented by dynamic shape points and motion points that move on two simple Riemannian manifolds. Min et al. [13] present a method for detecting falls against furniture, such as sofas and chairs. This approach is based on the identified people’s movement features, like motion speed and human shape aspect ratio. A deep learning algorithm known as R-CNN is used to acquire information on the locations and objects in the scene. Sensor-based fall detection method refers to some wearable devices which primarily use accelerometers and gyroscopes. The general idea behind a wearable fall detection system is that it embeds the sensing unit into wearables accessories or any gadgets that visually sense the person’s significant physical parameters. The raw data is then processed to determine the fall through the algorithm design. Wearable devices are also mobile and have a simple user interface, but they can be inconvenient for sick and older adults. The system can also be prone to failure if the user mishandles the device or forgets to wear it, or the device stops operating in a specific situation. Musci et al. [14] developed a system for a wearable device that uses RNN based on LSTM online to classify the fall. The model is trained on the “Sisfall” dataset of 38 subjects and 34 activities from their custom measuring belt-type device. The multimodal system approach combines the above techniques (vision-based and sensor-based). Combining both techniques increases accuracy but is compromised with computational complexity. Systems developed with the accelerometer, gyroscope, infrared cameras, ECG, and multiple sorts of temperature and humidity sensors detect any motion or vibration and other parameters to predict falls with a set threshold for those parameters. Martínez-Villaseñor et al. [15] proposed a multimodel framework using wearable gadgets and vision device systems. The model uses LSTM and CNN to extract spatiotemporal features from the raw data and then uses them on the UPFall detection dataset. The dataset comprises 17 actors, 12 activities suitable for versatility, an EEG headset, six infrared sensors, dual cameras, and five wearable sensors. Apart from these methods, we are focusing on vision-based (cameras) because older adults and patients may feel uneasy while wearing the sensors. Therefore, mon-

16

N. Aslam et al.

itoring them using a camera may be a better choice. Therefore, we have developed an attention-guided deep neural network for human fall detection. The proposed model is the combination of CNN and LSTM layers. CNN layers are responsible for spatial feature extraction from input video data, whereas LSTM tries to make the temporal relations of the extracted spatial features. For robust and efficient spatiotemporal feature extraction, a multiplicative attention mechanism is employed between the spatial and temporal features of video data. Afterward, a context vector is created to provide efficient spatiotemporal features of video data to detect falls. A UR Fall [16] detection dataset has been used to validate the effectiveness of our proposed algorithm.

3 Proposed Method This section of the paper explains the proposed framework for fall detection, as shown in Fig. 1. We have designed an attention-guided CNN-LSTM neural network that can classify between falls and not falls. The input image of size 224 × 224 × 3 is taken and fed to the two CNN2D + Pooling layers for spatial feature extraction. A multiplicative attention mechanism has been employed on the spatial features that will give the focused features of input data. After that, an attention map is created based upon the output of the context vector and given as input to the LSTM layer for temporal feature learning. A dense layer (128) is used to make a 1D vector and then given to the classifier to detect falls.

3.1 Conv2D: Spatial Feature Extraction The convolutional neural network (CNN) has proven itself for various image classification, image segmentation, and computer vision tasks. It emerges as one of the superior spatial feature extractors in images. Due to the sparsity, invariance, and weight-sharing properties, it has become one of the demanding neural networks for

Attention (224 x 224 x 3)

Conv LSTM2D

Conv LSTM2D

Attention map

Dense (128)

Input image

Fall Not Fall classifier

CNN + Relu Pooling

Fig. 1 Illustrative diagram of the proposed framework for detecting falls

Attention Guided Human Fall Detection for Elderly …

17

image processing. We have also employed the benefits of the CNN layer of spatial feature extraction. Two Con2D layers of size 128 and 56 with a kernel size of (3 × 3) have been used. Also, two maxpooling layers have been used after the Conv2D layer of kernel size (2 × 2) to downsample the previous output. The following equation can compute the convolution between the input image and the kernel: y(m, n) =

∞ ∞

h(i, j)X (m − i, n − j)

(1)

i=−∞ j=−∞

here, X is the input image, h is the kernel, and y is the convolution of X and h. Also, the size of feature maps of the convolutional layer can be computed by the following equation: X m,n − h i, j + 2 p (2) f map(size) = s Here, f map(size) indicates the size of the feature maps. X m,n is the input image of height m and width n. h i, j is the kernel of height i and width j. p and s stand for pooling and strides respectively.

3.2 Attention Mechanism The attention mechanism is an interface that binds the different feature maps of CNN layers and creates an attentive output using the context vector. With this mechanism, the model can selectively focus on valuable sections of the input signal and so understand the relationship between them. This will allow the model to deal with extended input sentences more efficiently. To classify between falls and not falls, we have employed the multiplicative attention mechanism, which will be obtained by taking the weighted summation of the attention weight and the last feature vector of the data. The output vector is referred to as a context vector and is given by the following equation. N αi, j x j (3) c(t) = j=1

here, x is the input feature map, N is the total number of feature maps, and αi, j is the attention weight and computed by the following equation: exp(ei j ) αi, j = N p=1 exp(ei p )

(4)

here, ei j is the alignment score and computed by taking the matric multiplication of the current feature map and the previous feature maps.

18

N. Aslam et al.

3.3 ConvLSTM2D: Temporal Feature Extraction ConvLSTM layer is an LSTM layer in which the convolution operation replaces the internal matrix multiplication. Since we are working on video data, extracting the temporal relation between the frames becomes necessary. Convolution has already set a trade mark in image processing; therefore, to compute the time relation between the image data, ConLSTM is suited well. For our proposed framework, we have employed two ConvLSTM layers of sizes 56 and 32 after the attention layer that will compute the temporal relation between the focused spatial features. The following equation computes the temporal relation between the frames: f t = σ (Wf ∗ [h t − 1, It ] + bf )

(5)

i t = σ (Wi ∗ [h t − 1, It ] + bi )

(6)

Cˆt = tanh (Wc ∗ [h t − 1, It ] + bc )

(7)

Ct = f t Ct−1 + i t Cˆt

(8)

Ot = σ (W0 ∗ [h t − 1, X t ] + b0 )

(9)

h t = Ot tanh(Ct )

(10)

Here, It is input vector (or the context vector in this case), Ct−1 , h t−1 , Ct , and h t are the memory of the previous cell, the previous output, current memory cell, and current cell output, respectively. Whereas, f t , i t , and Ot indicate for forget gate, input gate, and output gate, respectively. The symbols and ∗ are the dot product and convolution operation, respectively. After computing the spatiotemporal features of input video data, a dense layer of size 128 has been added to make a 1D vector. Afterward, this vector is passed through a softmax layer responsible for classifying between falls and not falls.

4 Experiments 4.1 Dataset We have used the UR Fall detection dataset [16] to train and evaluate the proposed framework. It consists of 70 videos total, of which 30 videos belong to falls, and the rest 40 videos are of activities of daily living (ADL). The dataset subjects are two young men and an older adult. In the falls videos (30 videos), each subject performs six actions: walking, sitting, lying down, sleeping, bending, and falling, whereas, in the rest 40 videos, all the subjects perform all the actions except for falls.

Attention Guided Human Fall Detection for Elderly …

19

Fig. 2 Sample images from UR Fall detection dataset

The samples of all six actions are shown in Fig. 2. Fall incidences are captured using two Microsoft Kinect cameras and accelerometer data. ADL events are captured using a single device (camera 0) and an accelerometer. PS Move (60 Hz) and x-IMU (256 Hz) devices were used to obtain sensor data. For our experiments, we have used the RGB data, consisting of 320 × 240 pixels of images captured at 30 fps from camera 0.

4.2 Preprocessing This step involves preparing the data in the prepared form that needs to be fed to the model. We have datasets in the form of videos in which frames are extracted. Our model does not use synchronization data and raw accelerometric data.

20

N. Aslam et al.

The images are resized into 224 × 224 pixels, converted to arrays of one-hot encoding, and split to 75% for training and 25% for validation. Further, data augmentation is performed on the “training dataset” using the ImageDataGenerator function from Keras preprocessing library. For every training epoch, the generator produces a distinctive and altered version of the input image. We have randomly rotated the images by 45◦ , 90◦ , 135◦ , and 180◦ , and the horizontal and vertical shifts are taken as 0.1. For the validation data, only mean subtraction was performed.

4.3 Implementation Details To optimize the cost function of the proposed framework, we have used the stochastic gradient descent (SGD) method with a dynamic and adaptive moment called Adam. During training, we have set the value of epsilon, decay, and learning rate as 1e−5, 1e−4, and 1e−3, respectively. The training is brought out on the UR Fall detection dataset for 50 epochs. ReLU activation function is used after the convolutional layer, whereas the sigmoid function has been used for the classifier. An HP-Z640 workstation with an Intel® Xenon(R) Processor with a CPU frequency of 2.10 GHz × 32 and an Nvidia Quadro K1200 GPU running Ubuntu 20.04.3 LTS and 64 GB RAM has been used to experiment. 75% data is used for training, while 25% is used for validation purposes. Python with TensorFlow and Keras platform is used for coding purposes.

4.4 Results The performance of the proposed fall detection method is evaluated in terms of accuracy, precision, sensitivity, and specificity. The following equation is used to compute the performance as mentioned above: TP + TN TP + TN + FP + FN

(11)

Precision =

TP TP + FP

(12)

Sensitivity =

TP TP + FN

(13)

Specificity =

TN TN + FP

(14)

Accuracy =

Attention Guided Human Fall Detection for Elderly …

21

Table 1 Comparative analysis of different state-of-the-art methods on UR Fall dataset Method Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) Kepski and Kwolek [17] Lotfi et al. [18] Rougier et al. [19] Yun and Gu [20] Ma et al. [21] Soni and Choudhary [22] Proposed method

90.00

83.30

100

80.00

99.24 – – – –

99.60 – – – –

99.52 95.40 96.77 99.93 98.15

97.38 95.80 89.74 91.67 97.10

99.81

99.70

99.65

98.20

The values that are not given by the authors, that fields are blank

Here, TP, TN, FP, and FN depict the true positive, true negative, false positive and false negative, respectively. The proposed framework is compared with the various state-of-the-art methods in terms of accuracy, precision, sensitivity and specificity, as shown in Table 1. Some works like [19–22] do not consider accuracy and precision for their performance evaluation. We have experimented and used accuracy, precision, sensitivity, and specificity as our performance evaluation parameters. After analyzing Table 1, we can say that the proposed framework can handle video data efficiently, and the attention mechanism with the spatiotemporal feature works well.

5 Conclusion This paper presents a novel, attention-guided deep neural network for vision-based human fall detection. The deep neural network is the combination of CNN and LSTM layers. The CNN layers are responsible for spatial feature extraction, whereas LSTM layers are used to know the temporal relation between the frames. A multiplicative attention layer is also employed after the CNN layer that focuses on only the attentive spatial features of the input data. An experiment is conducted on the UR Fall detection dataset to validate the effectiveness of the proposed model. We will employ different datasets with the multimodel method in future studies.

References 1. World Health Organization. Falls, Oct 2021. https://www.who.int/news-room/fact-sheets/ detail/falls 2. Stevens JA, Rudd RA (2014) Circumstances and contributing causes of fall deaths among persons aged 65 and older: United States, 2010. https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC4707652/

22

N. Aslam et al.

3. Aslam N, Sharma V (2017) Foreground detection of moving object using Gaussian mixture model. In: 2017 international conference on communication and signal processing (ICCSP). IEEE, pp 1071–1074 4. Hadjadji B, Saumard M, Aron M (2022) Multi-oriented run length based static and dynamic features fused with Choquet fuzzy integral for human fall detection in videos. J Vis Commun Image Represent 82:103375 5. Liu C-L, Lee C-H, Lin P-M (2010) A fall detection system using k-nearest neighbor classifier. Expert Syst Appl 37(10):7174–7181 6. Muheidat F, Tawalbeh L, Tyrer H (2018) Context-aware, accurate, and real time fall detection system for elderly people. In: 2018 IEEE 12th international conference on semantic computing (ICSC). IEEE, pp 329–333 7. Wang X, Liu H, Liu M (2016) A novel multi-cue integration system for efficient human fall detection. In: 2016 IEEE international conference on robotics and biomimetics (ROBIO). IEEE, pp 1319–1324 8. Ray A, Kolekar MH (2023) Image segmentation and classification using deep learning 9. Aslam N, Rai PK, Kolekar MH (2022) A3N: attention-based adversarial autoencoder network for detecting anomalies in video sequence. J Vis Commun Image Represent 87:103598 10. Aslam N, Kolekar MH (2022) Unsupervised anomalous event detection in videos using spatiotemporal inter-fused autoencoder. Multimed Tools Appl 1–26 11. Doulamis N (2010) Iterative motion estimation constrained by time and shape for detecting persons’ falls. In: Proceedings of the 3rd international conference on pervasive technologies related to assistive environments, pp 1–8 12. Yun Y, Gu IY-H (2016) Human fall detection in videos by fusing statistical features of shape and motion dynamics on Riemannian manifolds. Neurocomputing 207:726–734 13. Min W, Cui H, Rao H, Li Z, Yao L (2018) Detection of human falls on furniture using scene analysis based on deep learning and activity characteristics. IEEE Access 6:9324–9335 14. Musci M, De Martini D, Blago N, Facchinetti T, Piastra M (2018) Online fall detection using recurrent neural networks. arXiv preprint arXiv:1804.04976 15. Martínez-Villaseñor L, Ponce H, Perez-Daniel K (2019) Deep learning for multimodal fall detection. In: 2019 IEEE international conference on systems, man and cybernetics (SMC). IEEE, pp 3422–3429 16. Kwolek B, Kepski M (2014) Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput Methods Programs Biomed 117(3):489–501 17. Kepski M, Kwolek B (2018) Event-driven system for fall detection using body-worn accelerometer and depth sensor. IET Comput Vis 12(1):48–58 18. Lotfi A, Albawendi S, Powell H, Appiah K, Langensiepen C (2018) Supporting independent living for older adults; employing a visual based fall detection through analysing the motion and shape of the human body. IEEE Access 6:70272–70282 19. Rougier C, Meunier J, St-Arnaud A, Rousseau J (2011) Robust video surveillance for fall detection based on human shape deformation. IEEE Trans Circuits Syst Video Technol 21(5):611– 622 20. Yun Y, Gu IY-H (2015) Human fall detection via shape analysis on Riemannian manifolds with applications to elderly care. In: 2015 IEEE international conference on image processing (ICIP). IEEE, pp 3280–3284 21. Ma X, Wang H, Xue B, Zhou M, Ji B, Li Y (2014) Depth-based human fall detection via shape features and improved extreme learning machine. IEEE J Biomed Health Inform 18(6):1915– 1922 22. Soni PK, Choudhary A (2019) Automated fall detection from a camera using support vector machine. In: 2019 second international conference on advanced computational and communication paradigms (ICACCP). IEEE, pp 1–6

SDN-Enabled IoT to Combat the DDoS Attacks Pooja Kumari

and Ankit Kumar Jain

Abstract The ubiquitous growth in the Internet of Things has made network security the main concern nowadays. The Internet of Things is facing various network attacks which disrupt the network resources. The network technologies used to mitigate such attacks need to be improvised to combat the recent attacks. Thus, we have proposed an approach for detecting and mitigating DDoS network attacks using softwaredefined network (SDN) and machine learning. SDN is a network paradigm that is used to monitor and control the network traffic that helps to manage the network resources. In the proposed approach, we have used three machine learning models, namely Naive Bayes, logistic regression, and random forest to classify the network traffic, and after that, the SDN controller will update the network settings accordingly to mitigate the attack. The traffic generated from the applications is classified by the machine learning classifier models into benign and malicious categories. Then the classified results will be sent to the SDN controller which will either block the attacking traffic or update the network settings to combat it. Keywords SDN · Machine learning · DDoS · IoT

1 Introduction Network security is becoming an increasingly crucial issue in deciding whether a network technology or application is trustworthy or not. During a network attack, system resources may be depleted, preventing authorized users from accessing critical services, or private data may be taken. Software-defined network (SDN) induces sophisticated features in enhancing network administration through an innovative design [1, 2]. Software-defined networking is a new network paradigm that allows P. Kumari (B) · A. K. Jain Department of Computer Engineering, National Institute of Technology Kurukshetra, Kurukshetra, India e-mail: [email protected] A. K. Jain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_3

23

24

P. Kumari and A. K. Jain

network engineers to more easily monitor the network traffic, diagnose issues, and integrate and alter security policies. SDN provides an efficient approach for managing and controlling rapidly expanding networks, and license costs are no longer an issue because it allows administrators to design their programs to customize network administration over general-purpose hardware [3, 4]. SDN has several advantages, including centralized control, public and standard interfaces, and dynamic configuration. It simplifies network construction, implementation, and maintenance by allowing access to the network without disclosing the underlying layer’s specifics [5]. SDN has also been considered in various investigations, such as security, on the Internet of Things (IoT), which is composed of massive and dense networks giving paths to accomplish real-time transmissions from everywhere [5, 6]. The Internet of Things (IoT) is defined as a collection of smart objects with the primary goal of “Connecting the Unconnected.” The embedded smart gadgets in IoT observe their current situation, execute common tasks, convey the message directly, and synchronize the decision unconventionally without human intervention [7]. Because IoT provides excellent connectivity and simple communication, the number of organizations adopting this technology is rapidly increasing. The rapid spread of IoT applications makes the technology increasingly insecure and sensitive to assaults. Even though IoT service domains are continuously emerging, security issues remain critical. These security concerns lead to various attacks on the system, and the distributed denial of services (DDoS) attack is expanding rapidly [8]. Attackers are seeking to exploit these vulnerabilities in order to inflict harm, steal intellectual property, and disrupt private information [9, 10]. DDoS attacks are growing tremendously due to the fact that they are persuaded from distinct locations and hence are hard to detect. The attackers compromise IoT devices and turn them into zombies, sometimes known as bots, which they then utilize via handlers to perform DDoS assaults. The attacks then interrupt network services for genuine users who attempt to access them. The DDoS attacks can be volumetric DDoS, protocol exploitation-based, and amplified attacks [11]. The IoT operates on a variety of networks that are integrated with both large and small devices. Because tiny devices have minimal processing power and storage capacity, security measures and cryptographic algorithms are difficult to implement on them [12]. Thus, we have used the SDN paradigm and machine learning models with the IoT to manage traffic and combat DDoS attacks. SDN is recognized as one of the most important network architectures in terms of simplicity and progress of network administration and network communication technology [13, 14]. Figure 1 shows the market growth of the SDN [15]. The SDN provides a centralized controller to manage the network traffic. The network power of the data transport plane is decomposed by SDN. SDN-based devices, in addition to being cost-effective, have essentially no compatibility difficulties when employing a vendor-neutral protocol on the southbound interface, such as Open Flow [16]. Open Flow creates a tunnel via which the controller may gather network information and deliver commands to the switch. SDN has several uses, including software-defined data centers. Because a data center must deal with large

SDN-Enabled IoT to Combat the DDoS Attacks

25

Fig. 1 SDN market size [15]

amounts of data and serve various users, SDN helps to properly manage bandwidth utilization by enabling dynamic routing based on real-time input. With SDN’s centralized control, security policies may be distributed to devices more efficiently. The SDN has a three-plane architecture with an application plane, a control plane, and a data plane [17]. SDN applications for network administration, policy, and security resources are included in the application plane layer. The control layer is a unified control framework that operates on the network system (NOS). The control plane has a global network perspective. Finally, the transport flows transfer unit is used in the data plane layer based on flow control rules in the control plane. The SDN architecture consists of three layers and two interfaces which are the southbound and northbound interfaces [8, 18]. The control plane is a platform that translates commands from the application plane into terms that the data plane can understand and vice versa. The data plane, also known as the resource layer, is in charge of processing and transmitting data in accordance with the instructions provided by the control layer. The application plane is the topmost layer or SDN which defines network policies and strategies, reflecting how the entire network is managed and controlled. The application plane communicates with the control plane via the northbound interface (NBI) and the applicationcontrol interface. The northbound interface enables applications to acquire abstracts of network resources from the data plane via SDN controllers and to provide policy to controllers to assist them to make network management decisions. The resource control interface, also known as the southbound interface (SBI), connects the control plane with the data plane. The southbound interface enables the SDN controller to dynamically reconfigure the data layer devices and receive data plane signals. SDN controller is the brain of this architecture, and there are different SDN controllers present like RYU, POX, NOX, Floodlight, and Open Daylight. When interacting with SDN applications, most of these controllers offer the REST [19] application programming interface (API) and use Open Flow to set up SDN switches [20].

1.1 Problem Definition Since the IoT works on the principle of heterogeneity consisting of numerous devices, it does not have any standard architecture due to which DDoS attack detection is problematic. The proposed approach signifies a DDoS detection and mitigation model in

26

P. Kumari and A. K. Jain

which there is no need to implement heavy security algorithms over low computational devices. The SDN provides more flexible setups than conventional networks, and rules may be updated in response to network changes or user requirements more quickly than manual provisioning in legacy networks. Thus, we have proposed a model which uses SDN for managing the network traffic and machine learning models to classify the network traffic. Since cyber-attacks constantly get more severe and complex, making them more difficult to counter. The approach will make the use of IoT applications more flexible as the machine learning models will separate the incoming traffic and the SDN controller will then control the whole network traffic.

1.2 Paper Organization The remaining paper is organized into four sections: The first section provides an introduction to the SDN, IoT, DDoS attacks, and the requirement of developing a DDoS detection approach for IoT. The second section describes the related work that has been completed for defending the IoT network from cyber-attacks. The third section describes the proposed work which entails the SDN-enabled IoT architecture and the processing steps of the approach. The fourth section concludes the work carried out and discusses future extensions in this field.

2 Background For detecting DDoS attacks over IoT networks, several researchers have employed SDN with IoT. This section describes some of the most recent ways for fighting against DDoS attacks. Ravi and Shalinie [21] have proposed a novel approach for DDoS detection which is the Learning-Driven Detection Mechanism (LEDEM). It detects DDoS attacks using a semi-supervised machine learning algorithm. LEDEM uses a decentralized cloud-SDN architecture. The authors have divided the proposed approach into three phases; in the first phase, the data is captured, in the second phase, the attack is detected, and then in the third phase, the detected attack is mitigated. In the first phase, the local controller collects the data packets and discards them after extracting the important features. These features are then provided to the detection model. The researchers have designed SDELM, i.e., the Semi-supervised Deep Extreme Learning Machine for DDoS attack detection. The SDELM is a combination of ELM and deep layers. After successful detection of the attack, the local controller updates the network topology by utilizing the mitigation module because the local controller has a global view of the whole network. Bhayo et al. [22] suggested a security architecture based on software-defined networks (SDN) for identifying IoT vulnerabilities and suspicious traffic. For DDoS attack detection, they employed Software-Defined Network Wireless Sensor

SDN-Enabled IoT to Combat the DDoS Attacks

27

Network (SDNWISE). The approach detects the DDoS flooding attempts over the IoT networks. The researchers utilize session IP counter and IP payload analysis to detect the IoT vulnerabilities and malicious generated traffic. The proposed approach is implemented by simulating the network with an SDN controller. They have validated the approach by generating malicious traffic from a disrupted node by detecting and notifying it. The researchers first generated the malicious as well as benign traffic and fed it to the detection module to filter the traffic. Once the detection module detects the attack, the attack generating node is identified and notified to the controller. To identify DDoS assaults through SDN, Ali et al. in [23] developed a hierarchical control plane design for SDN controllers. The authors have employed SVM machine learning classifiers to detect and categorize the attacking traffic. Along with the classification approach, the authors used PCA to reduce the dimensionality of the dataset to improve the classifier’s performance. They used the DARPA dataset from 2000 for the attacking data and the DARPA dataset from 1998 for the benign traffic. Wani and Khaliq [24] have proposed an SDN-based intrusion detection system for heterogeneous IoT devices entitled “IDSIoT-SDL.” The authors have used deep learning classifiers for detecting anomalies in IoT. The authors have tested the proposed approach in a simulated environment using the testing data. For testing purposes, they have utilized one gateway switch or network edge of the IoT. The approach is implemented using Mininet WiFi and used Floodlight as the customized controller. The authors have utilized the CSE-CIC-IDS2018 dataset for training the deep learning classifiers. They trained the classifiers before supplying the data to the classifiers and used signature analysis to understand the possible nature of known IoT attacks. The approach is able to detect anomalies in IoT networks as well as in any network. Wang et al. [25] have developed a multi-task learning-based approach for predicting the traffic of SDN-enabled industrial IoT networks. The authors have designed a deep learning architecture for traffic prediction. From that prediction, they obtained the spatio-temporal features of the traffic matrix for improving the prediction rate. They have utilized the CNN and LSTM deep learning models to design the deep learning architecture. At the same time, the authors have used the multi-task learning (MTL) model for accurate traffic prediction. The traffic prediction is done according to the linear relation among the network traffic, the routing matrix, and the load present on a link. The approach is implemented on a real-time industrial IoT network dataset and is able to predict the network traffic anomalies effectively with a lesser deviation. Revathi et al. [19] have developed an approach for DDoS attack prediction and mitigation. For predicting the attack, the authors have used a discrete scalable memory-based support vector machine (DSM-SVM) and an SDN mitigation architecture for the attack detection. The pre-processed data is provided to the DSM-SVM algorithm for predicting the attack and then the mitigation server will identify the threat. The authors have validated the model using KDD dataset and used the RYU controller with the help of Mininet for establishing the SDN. The proposed approach effectively identifies the DDoS attacks on online web services.

28

P. Kumari and A. K. Jain

To detect DDoS attacks in IoT, Yang et al. [26] have proposed an approach using the edge of the software-defined network (SDN). The researchers considered the IoT traffic features and edge computing to provide the local services. They have implemented the detection and mitigation techniques at the Open Flow (OF) switches of the IoT network for achieving distributed anomaly detection which in turn minimizes the controller overload. For attack detection purposes, the authors have used machine learning models in the OF switches. Brajones et al. [20] have proposed an experimental evaluation model for detecting and mitigating DoS and DDoS attacks in IoT networks using statistical methods. The authors have used a stateful SDN data plane for implementing the approach. The proposed approach works in three steps: traffic monitoring, attack detection, and attack mitigation, while in the traffic monitoring phase, the authors have used Open Flow and OpenState SDN monitoring methods to get the network traffic information required for DDoS attack detection. In the detection phase, the gathered traffic information is fed to the entropy calculation algorithm which detects the malicious flows in the network. After detecting the attack, the SDN tries to mitigate it by changing the flow tables and applying new flow rules to the switches. We have gone through some existing research articles to get a thorough knowledge of the research area like Shetu et al. [28] have surveyed botnets used for different cyber-attacks. The researchers provided an extensive study on botnet lifecycle and various botnet detection methodologies. Saifuzzaman et al. [29] have proposed an intelligent IoT-based streetlight and traffic management automation system using deep learning models. Moreover, Table 1 gives a comparative analysis of various research articles on SDN parameters like SDN controller, SDN architecture, the forwarding devices used by the SDN in the data plane, and the simulator to emulate the SDN.

3 Proposed Approach The proposed approach implies a DDoS detection and mitigation model by combining the SDN and the machine learning classifier models. The proposed approach is divided into three major parts, namely: the collection of the generated traffic, applying machine learning models to classify the network traffic, and the implementation of the SDN network paradigm to mitigate the DDoS attack. Figure 2 depicts the SDN-enabled IoT architecture, and Fig. 3 describes the processing steps of the proposed approach with the help of a flowchart. The architecture shows that the IoT devices are connected to a network gateway which will connect the IoT network to the SDN data plane. In the SDN data plane, some forwarding devices are present which will forward the network traffic after the decision of the controller. The SDN controller will update the flow table entries according to the traffic received. In the proposed approach, we will first collect the data from the host of the simulated network as shown in Fig. 4 and extract the features. We have simulated the network using mininet simulator consisting of sixteen hosts, six oven virtual switches

SDN-Enabled IoT to Combat the DDoS Attacks

29

Table 1 Comparison of the existing work based on the use of SDN controller, architecture, forwarding devices, and simulator Research work

SDN controller

SDN architecture

SDN forwarding devices

Simulator

Ravi and Shalinie [21]

POX

Decentralized

Open flow switches

Mininet WiFi

Bhayo et al. [22]

SDNWISE

Centralized

Open flow switches

COOJA

Wani and Khaliq [24]

Floodlight

Centralized

Open flow switches

Mininet WiFi

Revathi et al. [19]

RYU

Centralized

Open flow switches

Mininet

Yang et al. [26]

POX

Centralized

Open flow switches, open virtual switch

Mininet

Brajones et al. [20] RYU

Component-based

Open virtual switch

Mininet

Aslam et al. [27]

Centralized

Open flow switches

Mininet

POX, Floodlight

Fig. 2 SDN-enabled IoT architecture

(OVS), and Ryu controller. Mininet creates a realistic virtual network consisting of a real running kernel, switches, and applications on a single machine. Then, we generated the traffic using hping command which is used to create or generate TCP, UDP, ICMP, and IP protocols network packets. After generating and collecting the real-time traffic, the data is divided into the training and the testing datasets to train and test the machine learning model, respectively. We have used three machine learning models to detect the attack which are: Naïve Bayes, logistic regression, and random forest. The machine learning model will then classify the network traffic in the normal and the attacking traffic classes and forward it to the SDN controller. The SDN controller will block the malicious traffic and forward the benign traffic to the network with the help of the forwarding devices.

30

Fig. 3 Flowchart of the proposed approach

Fig. 4 Network simulated using mininet

P. Kumari and A. K. Jain

SDN-Enabled IoT to Combat the DDoS Attacks

31

Table 2 Comparison between different machine learning classifiers Machine learning classifiers

Accuracy (%)

Precision (%)

Recall (%)

F1-score (%)

Naïve Bayes

96.14

99.23

98.89

99.06

Logistic regression

99.21

99.40

99.80

99.60

Random forest

98.33

99.16

99.36

99.66

During initialization, the SDN controller learns policies from applications and alerts data plane devices if any default action is indicated in the policy. SDN switches operate in two modes based on how the flow entry is introduced into the data plane: pro-active and reactive. Before a packet reaches SDN, instructions are already distributed in the switch in pro-active mode. This mode is similar to a classical network in that there is no contact between the control and data planes during packet transmission. Because all conceivable paths have been anticipated and planned. In reactive mode, new packets prompt flow entry insertion; the switch contacts the controller when a packet does not meet any current rules, and the controller instructs it on how to handle the packet. As the controller can automatically update the routing strategy following topology changes, a device can be added or deleted from the network without disrupting the network. Pro-active mode covers commonly utilized pathways, whereas reactive mode focuses on responding to network conditions, such as a firewall.

3.1 Result Analysis This section entails the analysis of the proposed approach’s performance on the basis of machine learning models. In this section, we have compared the results obtained from different classifiers as described in Table 2. Figure 5 shows the comparison among all the machine learning classifiers. We can observe from the graph that logistic regression provides the best results with 99.21% accuracy.

4 Conclusion Detecting DDoS attacks on IoT devices has been a significant problem since DDoS assaults are carried out by flooding data packets from many places. As a result, defending against this attack must be done in a bare minimum of time. In this paper, we have proposed an approach by combining the software-defined networking and machine learning classification models with IoT to combat the DDoS attacks over IoT networks. There are many layered models present for IoT but not a single standardized framework is available which creates a loophole because the defense methods are developed using different frameworks and some of them are not platformindependent. By combining the SDN with IoT, this drawback can be avoided as

32

P. Kumari and A. K. Jain 100 99 98 97 96 95 94

NAÏVE BAYES Accuracy (%)

LOGISTIC REGRESSION Precision (%)

Recall (%)

RANDOM FOREST F1- Score (%)

Fig. 5 Comparison between different machine learning classifiers

SDN works on the traffic flow tables irrespective of the framework and the combination of machine learning and SDN will provide a faster detection and mitigation model for DDoS attacks. The work can be further extended to detect and block the specific nodes from which the attacking traffic is coming.

References 1. Yang L, Zhao H (2018) DDoS attack identification and defense using SDN based on machine learning method. In: 2018 15th international symposium on pervasive systems, algorithms and networks (I-SPAN), Yichang, China 2. Douha NY-R, Bhuyan M, Kashihara S, Fall D, Taenaka Y, Kadobayashi Y (2022) A survey on blockchain, SDN and NFV for the smart-home security. Internet of Things 100588 3. Cai W, Song X, Liu C, Jiang D, Huo L (2021) An adaptive and efficient network traffic measurement method based on SDN in IoT. In: International conference on simulation tools and techniques (SIMUtools 2021) 4. Tsai P-W, Tsai C-W, Hsu C-W, Yang C-S (2018) Network monitoring in software-defined networking: a review. IEEE Syst J 12(4):3958–3969 5. Tayyaba SK, Shah MA, Khan OA, Ahmed AW (2017) Software defined network (SDN) based internet of things (IoT): a road ahead. In: ICFNDS ‘17: international conference on future networks and distributed systems 6. Tao H, Zain JM, Band SB, Sundaravadivazhagan B, Mohamed A, Marhoon HA, Ogbonnia OO, Young P (2022) SDN-assisted technique for traffic control and information execution in vehicular adhoc networks. Comput Electr Eng 102 7. Taylor S (2013) The next generation of the Internet revolutionizing the way we work, live, play, and learn. CISCO point of view, vol 12, no 6 8. Xu Y, Liu Y (2016) DDoS attack detection under SDN context. In: IEEE INFOCOM 2016—the 35th annual IEEE international conference on computer communications, San Francisco, CA, USA 9. Wani S, Imthiyas M, Almohamedh H, Alhamed KM, Almotairi S, Gulzar Y (2021) Distributed denial of service (DDoS) mitigation using blockchain—a comprehensive insight. Symmetry 13(227):1–21 10. Ye J, Cheng X, Zhu J, Feng L, Song L (2018) A DDoS attack detection method based on SVM in software defined network. Secur Commun Netw 2018(9804061):1–8

SDN-Enabled IoT to Combat the DDoS Attacks

33

11. Tuan NN, Hung PH, Nghia ND, Tho NV, Phan TV (2020) A DDoS attack mitigation scheme in ISP networks using machine learning based on SDN. Electronics 9(413):1–19 12. Sellami B, Hakiri A, Yahia SB (2022) Deep reinforcement learning for energy-aware task offloading in join SDN-blockchain 5G massive IoT edge network. Futur Gener Comput Syst 137:363–379 13. Bawany NZ, Shamsi JA, Salah K (2017) DDoS attack detection and mitigation using SDN: methods, practices, and solutions. Arab J Sci Eng 42:425–441 14. Bull P, Austin R, Popov E, Sharma M, Watson R (2016) Flow based security for IoT devices using an SDN gateway. In: 2016 IEEE 4th international conference on future internet of things and cloud 15. Bhutani A, Wadhwani P (2019) Software defined networking (SDN) market [Online]. Available: https://www.gminsights.com/industry-analysis/software-defined-networking-sdnmarket. Accessed 30 Sept 2022 16. Buragohain C, Medhi N (2016) FlowTrApp: an SDN based architecture for DDoS attack detection and mitigation in data centers. In: 2016 3rd international conference on signal processing and integrated networks (SPIN), Noida, India 17. Yin D, Zhang L, Yang K (2018) A DDoS attack detection and mitigation with software-defined internet of things framework. IEEE Access 6:24694–24705 18. Nam TM, Phong PH, Khoa TD, Huong TT, Nam PN, Thanh NH, Thang LX, Tuan PA, Dung LQ, Loi VD (2018) Self-organizing map-based approaches in DDoS flooding detection using SDN. In: 2018 international conference on information networking (ICOIN), Chiang Mai, Thailand 19. Revathi M, Ramalingam VV, Amutha B (2021) A machine learning based detection and mitigation of the DDOS attack by using SDN controller framework. Wirel Pers Commun 1–25 20. Brajones JG, Murillo JC, Valenzuela-Valdés JF, Valero FL (2020) Detection and mitigation of DoS and DDoS attacks in IoT-based stateful SDN: an experimental approach. Sensors 20(3):816 21. Ravi N, Shalinie SM (2020) Learning-driven detection and mitigation of DDoS attack in IoT via SDN-cloud architecture. IEEE Internet Things J 7(4):3559–3570 22. Bhayo J, Jafaq R, Ahmed A, Hameed S, Shah SA (2021) A time-efficient approach towards DDoS attack detection in IoT network using SDN. IEEE Internet Things J 1–20 23. Ali J, Roh B-H, Lee B, Oh J, Adil M (2020) A machine learning framework for prevention of software-defined networking controller from DDoS attacks and dimensionality reduction of big data. In 2020 international conference on information and communication technology convergence (ICTC), Jeju, Korea 24. Wani A, Khaliq R (2021) SDN-based intrusion detection system for IoT using deep learning classifier (IDSIoT-SDL). CAAI Trans Intell Technol 6(3):281–290 25. Wang S, Nie L, Li G, Wu Y, Ning Z (2022) A multi-task learning-based network traffic prediction approach for SDN-enabled industrial internet of things. IEEE Trans Ind Inf 1–9 26. Yang Y, Wang J, Zhai B, Liu J (2019) IoT-based DDoS attack detection and mitigation using the edge of SDN. In: International symposium on cyberspace safety and security, pp 3–17 27. Aslam M, Ye D, Tariq A, Asad M, Hanif M, Ndzi D, Chelloug SA, Elaziz MA, Al-Qaness MAA, Ji SF (2022) Adaptive machine learning based distributed denial-of-services attacks detection and mitigation system for SDN-enabled IoT. Sensors 22(7):2697 28. Shetu SF, Saifuzzaman M, Moon NN, Nur FN (2019) A survey of botnet in cyber security. In: 2019 2nd international conference on intelligent communication and computational techniques (ICCT), Jaipur, India 29. Saifuzzaman M, Shetu SF, Moon NN, Nur FN, Ali MH (2020) IoT based street lighting using dual axis solar tracker and effective traffic management system using deep learning: Bangladesh context. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT), Kharagpur, India

Analysis of Existing Datasets of Household Objects for AI-Enabled Techniques Divya Arora Bhayana and Om Prakash Verma

Abstract Artificial intelligence is defined as computer-based programs having selflearning capability. Robotics are one of the important application areas of artificial intelligence. Robots which can be trained to perform several day-to-day activities are the major task to achieve in the field of automation. It becomes imperative for the robotic machines to recognize numerous objects readily available in the household. With the rapidly increasing demand of automation, there is sudden rise in the need to train our machines. For this purpose, there is a requirement of large datasets which can be trained and tested for the complete validation. Robots that are trained to recognize household items will also be employed for numerous automated functioning of dayto-day chores. It impacts the society by helping the people with special needs. This study presents the available datasets aiming at the training of machines to classify, recognize, and detect the presence of a particular household object. This work aims at providing the researchers a clear insight of all that is available for use. Having enlisted datasets will help the researchers to use the readily available resource to apply novel methods for automation in households. The analysis of datasets gives the information regarding the number of classes as well as the number of images present in the dataset. It also enlists the size of the datasets. The application that has been applied on these datasets has also been mentioned beside the dataset. The paper also shows a small snippet of the images included in the datasets. Keywords Open-source datasets · Image classification · Robotic applications · Evaluation · Household objects

D. A. Bhayana (B) · O. P. Verma Department of Electronics and Communication Engineering, Delhi Technological University, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_4

35

36

D. A. Bhayana and O. P. Verma

1 Introduction There has been an enormous development in the field of automation in household tasks. Many fields are being developed at a massive pace [1]. The locomotion as well as the balancing of the robots is the mechanical aspect of such a robot. There is also an important requirement for the robot to recognize and classify various objects into the predefined categories. This arises the need to have the list of image datasets available for this specific purpose of domestic robotics. Considering the mechanics for trivial tasks like holding a glass of water to heavy household tasks like filling overhead water tank, the machines need to be assisted with pre-learned knowledge of the various items available in the vicinity. For accurate recognition, the knowledge of shape, color, and size of the objects has to be imparted to the robots. In the field of machine learning, the knowledge of various objects is transferred to the robots in the form of datasets. There has been a strong development in the field where machines in collaboration with human intervention achieve most difficult tasks of the household in no time. Now, the next step in the development of robotics for households will be a completely autonomous robots that can function without human intervention. The aimed robot should be a nearest replica of human in recognizing the objects and interacting with them [2]. This study is aimed to list the available datasets that can be used to make the robot capable of detecting and recognizing various objects that can be available in the household. The listed dataset facilitates the programmers to make their architectures aware of commonly available domestic items. The list of datasets ranges from as large datasets consisting of 9 million images [3] to small dataset with only 64 images [4]. The format of images present in each dataset is also discussed. This paper also makes it convenient to select the dataset on the basis of application to be applied. The link and availability of these are described in Table 1. The datasets were found using different keywords like ‘open-source household objects datasets’, ‘household object datasets’, ‘AI datasets of household objects’, and ‘available household objects AI’ on popular search engines such as Mendeley, Zenodo, and Google Scholar. The databases have been utilized for various AI-enabled techniques such as object detection, segmentation, and image classification. The organization of the paper is as follows: The datasets are discussed in Sect. 2, followed by summarized analysis in Sect. 3. Finally, in Sect. 4, concluding remarks are presented.

2 Datasets for Household Objects Classified on the Basis of Application This section discusses thirteen datasets of household objects for various AI-enabled techniques. All the datasets, their applications, their image snippets, and Web source links are presented.

Analysis of Existing Datasets of Household Objects …

37

2.1 Annotated Image Dataset of Household Objects from the RoboFEI@Home Team [5] This dataset is arranged as two sets of household items. It also has few videos in both the sets for testing purpose. The images are in JPG format and videos in MP4 format. Set 1 Images in set 1 are a set of 1028 objects in 13 classes. Out of the JPG format images taken, 166 images are labeled for training. This set of images are made by taking objects out from local grocery stores. The brands captured are mostly Brazilian brands. Apart from the object images, there are also 28 MP4 videos saved in Set 1 for testing the architecture’s learning. Set 2 The second set of images consist of 1737 objects distributed in 20 classes. Out of these, 388 images are labeled. The dataset was presented in Robocup@Home 2018 OPL competition. For the testing purpose, the dataset also gives access to one long video and 398 unlabeled images. RoboFEI@Home team developed a robot, named Home Environment Robot Assistant (HERA), and was made capable of performing the challenges of domestic task autonomously. The authors state in [6] the working of the robot with omnidirectional base, that there are few improvements that is possible to make the robot versatile in terms of robotic tasks.

2.2 My Nursing Home [7] This dataset is collected from senior home centers in Malaysia. It contains 37,500 images that belong to 25 classes. The classes basically consist of the commonly found objects in elderly care homes. The objects range from small items like a call bell to large objects like refrigerator and beds. Figure 1 shows small snippet from the dataset showing the various augmentations of a bin in the care home.

Fig. 1 Small snippet from My Nursing Home dataset

38

D. A. Bhayana and O. P. Verma

2.3 The Open Images Dataset V4 [3] The dataset consists of 9.2 million images. The dataset aims at three main applications, namely image classification, object detection, and also to find visual relationships between the different objects in the image. The dataset has been prepared by taking images from Flicker without the knowledge of the class names. This led to unbiased collection of images. This dataset consists of 57 classes. It also incorporates object boxes and in image labels. The dataset consists of complex cluttered scenes. On an average, each image has 8 labeled objects. The image classification for the dataset is done using Inception-ResNetV2. The evaluation of the object detection models is done using two robust models in combination. The first model comprises of faster RCNN with Inception-ResNetV2, used as feature extractor. The second model has SSD followed by MobileNetV2 that works as feature extractor with depth multiplier, it is also characterized by input image size of 300 × 300 pixels. Kuen et al. [8] propose auto-encoder—weight transfer network (WTN) that uses reconstruction loss to retain the information of the classification network for all the target classes, it increases the generalization of the technique. The proposed technique also introduces normalization of the inputs as well as the features that facilitates in avoiding under-fitting. It is observed that this method gives 6% performance gain on open images dataset. Narayan et al. [9] propose an approach that maintains region wise characteristics and utilizes bi-level attention module to enhance the features by taking into account region and holistic context. These enhanced region-level features are then aligned with class semantics. These alignments are then spatially pooled and help to attain the class predictions.

2.4 Office-Home Dataset [10] The dataset is organized in the form of four genres, namely artistic images, clip art, object images, and real-world images. Figure 2 shows few images in the four genres. Each genre has 65 object classes. It has 15,500 images from different classes. Broadly, the images are also classified into 4 domains, namely art (that has paintings and hand made sketches), clipart (digital computerized images), product (objects with the background), and real-world (casual pictures taken by camera). Authors of [10] introduce the dataset called ‘office-home’ and propose a method for the classifier to output hash codes instead of the probability vector in the last layer. Venkateshwar et al. claim that this method is said to have two major advantages: (1) It evolves a distinct loss function for target data; (2) during the final stage of prediction, the hash values of the test set can be compared to that of the training and thus give a robust model. Additionally, they also put forward domain adaptive hashing that generates hash codes for unsupervised domain adaptation.

Analysis of Existing Datasets of Household Objects …

39

Fig. 2 Small snippet from office-home dataset

Wen et al. [11] introduced a method to extract domain-invariant local features patterns. The method exploits local features for unsupervised domain adaptation. They have also experimentally studied the performance of different convolutional layers and found that conv 5_3 performs best. In [12], Liang et al. propose a generic method for unsupervised domain adaptation called source hypothesis transfer (SHOT). This method takes the classifier module and keeps it intact and learns the feature extraction module for the target. For the purpose of aligning the features of the target and the source, the method utilizes both information maximization and self-supervised pseudo labeling. The authors Long et al. [13] assure increase of transferability of a classifier by proposing conditional adversarial domain adaptation network (CDAN). This technique utilizes two approaches for this namely: the cross covariance between feature representations and classifier prediction by using multi-linear conditioning and entropy conditioning. This technique is called CDAN + E, where E is entropy conditioning. This method when applied to the office-home dataset beats all other transfer tasks in terms of accuracy.

2.5 ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition [14] This dataset is a collection of videos recorded by 77 blind or low vision people on their mobile phones. It is one of the most realistic datasets. It has 3822 videos covering 486 things of day-to-day life. This dataset stands different from other similar datasets because of the natural way the dataset has been accumulated. It has all sorts of anomalies in portraying the object to be detected. For example, it has many blurred images and partially covered images with all sorts of backgrounds. It has cluttered background to clean plain background. Figure 3 shows an example of a watering can in plain as well as cluttered environment. Authors in [15] introduce FilM Transfer (FiT) that overcomes the few drawbacks that existed in the previous models like training with small dataset, communication efficient distributed training protocols. It utilizes Naïve Bayes classifier on top of a pre-trained backbone of a classifier trained on large dataset. It is reported to give better results for image classification when applied on ORBIT dataset.

40

D. A. Bhayana and O. P. Verma

Fig. 3 Watering can in plain background and cluttered environment from the dataset

In [16], ‘Bongard-HOI: Benchmark for Human Object Interaction (HOI)’ is introduced that focuses on few shot learning and develop visual relationship reasoning. It makes the effort to create holistic perception reasoning system and better representation learning. Bronskill et al. [17] propose large image and task episodic training scheme (LITE), a general and memory efficient training scheme that facilitates meta-training on large tasks comprising large images on single GPU. This model is applied on ORBIT dataset which produces high accuracy.

2.6 Household Objects for Pose Estimation (HOPE) [4] This dataset that is provided by NVIDIA is collected by using 28 toy grocery items. The toy items were used to have a standard size and weight for the robotic arm to operate on them. The dataset comprises of RGB images and videos. The images are taken by mounting the camera on the robotic arm. It also takes into account various lighting conditions in the given scene. As many as 5 different deviations of lighting are considered. Lighting has also been varied in terms of angle at which it is projected on various objects. So that one object can have a shadowing impact on other objects. Ten numbers of household or office environments environments have been used to capture objects in 50 scenes. Tyree et al. [18] proposed a new dataset called ‘HOPE’ for 6-DoF Pose Estimation of household objects. The dataset is aimed at bridging the gap between the machine learning and robotics. On this dataset, pose estimation is done using deep object pose estimation (DOPE) and CosyPose techniques. This paper opens the avenue to estimate 2D and 3D pose for the hand-held objects, by encoding the image and applying cascade of two graph convolutional neural network. Figure 4 shows an example of images present in the dataset.

Analysis of Existing Datasets of Household Objects …

41

Fig. 4 Image from NVIDIA HOPE dataset

Fig. 5 Snippet of CMU kitchen occlusion dataset

2.7 CMU Kitchen Occlusion Dataset (CMU_KO8) [19] This dataset has 1600 images of 8 household items. Majorly, all these items are texture-less. The images are taken in a cluttered kitchen environment. Images also include blocked views of the objects in question. Various object detection models have been developed using this dataset and are claimed to provide high accuracy in real-world scenarios as well. The 1600 images included are divided fairly into two categories of 800 images each. One part contains the images of the objects defining the ground truth classes. The rest 800 images are the cluttered images of the objects that occlude various other objects. This dataset can be used for the purpose of occlusion detection. Figure 5 shows the examples of various images present in the dataset. Yu et al. [20] proposed an approach to detect texture-less objects. This approach was also applied on the CMU kitchen occlusion dataset due to the presence of highly cluttered environment images in the dataset. The application of this approach also facilitates the algorithm to be checked for robustness. This research proposes fast algorithm that utilizes binarized orientation compressing map and discriminative regional weight. This is achieved in two stages firstly, quantized gradient orientation is compressed by using circular window for strides. This process generates a set of possible object locations, which is further used for discriminative regional weights to differentiate between objects that have similar looking parts. The experiments infer

42

D. A. Bhayana and O. P. Verma

that the proposed approach is suitable for real-time texture-less object detection and give comparative results with the existing state-of-the-art algorithms in similar conditions. Authors in [21] propose an algorithm called ‘bounding oriented-rectangle descriptors for enclosed regions’ abbreviated as BORDER for texture-less object recognition. It aims to design an object recognizing algorithm that takes into account the texture-less object detection and also incorporating descriptor-based pipeline. The process of detection commences with division of long lines into small fragments called ‘linelets’. This is helpful in case of occlusion and stabilizes the interest-point shifts. Next step is to plot the rectangles enclosing the objects to be detected, which is done by incorporating descriptor-based local patches into larger rectangles. The rectangles are then rotated at unique angles to fit into the viewer’s angle. BORDER is deployed on multi-sized rectangles to take into account sub-sectional enclosure. After which the regions cut go through linear sampling to form descriptors of their class. Randomized KD-forest technique is then exercised to match the BORDER descriptors. In [22], Hsiao and Hebert propose an algorithm for shape matching specially in the occluded environment by considering it as traversing path in a low-level gradient network. For every pixel, the algorithm calculates the probability of it matching the template is calculated. And this has proved to have better accuracies in shape matching and object detection when applied to CMU kitchen occlusion dataset. Hsiao and Hebert [23] propose to express occlusion reasoning as an effective search over occlusion blocks which explains probabilistic matching pattern at best. It has proved to give better results on CMU kitchen occlusion dataset. In [24], Tombari et al. propose a unique feature that can be incorporated in scaleinvariant feature transform (SIFT) like object detection pipeline. Due to the limitations encountered by texture-less objects, the detection and description stages are modified accordingly.

2.8 CMU Grocery Dataset (CMU10_3D) [25] This is a dataset of 10 grocery items and consists of 620 images. The dataset comprises the images taken in the natural environment, kitchen areas, and highly cluttered areas as well. The dataset comprises 62 images per object. Instance refers to occurrence of the object of interest in the complete image. So 25 images of one object have one instance of the object, next 25 has two instances of the object of interest, remaining 12 images per object were the full 6D poses of the objects. The background was also marked in each image. Figure 6 shows the organization of images in the dataset.

Analysis of Existing Datasets of Household Objects …

43

Fig. 6 Snippet of CMU grocery dataset

Fig. 7 Snippet of RoboCup dataset

2.9 RoboCup@Home-OBJECTS Benchmark [26] This dataset is hierarchical in nature. It comprises 8 parent classes which is further classified into 180 children classes. It consists of 198 K images of the various grocery items found in the kitchen. The dataset has images of comparable shaped objects. Robocup@Home collects the datasets that are generally published in online competitions and makes a collection of objects of similar genre. For various applications to be performed, the data can be augmented and used further for splitting into training and testing. Figure 7 shows the different instances of objects captured in the dataset. In [27], authors list all the robotic challenges fulfilled by the participants in RoboCup, where various tasks are assigned for domestic robotics. The various participants used the dataset to make the task feasible. The work [28] studies the benchmark advancements in the field of domestic robotics.

2.10 ADE20K Dataset [29] The database consists of scene understanding and places. It can be used for image segmentation, classification, parts segmentation, etc. Various objects can be recognized from the images collected in the database. Many household objects can also be seen as a part of the scenes that are cluttered. This dataset comprises 150 objects with 20,210 images for training and 2000 images for validation and 3000 images for testing.

44

D. A. Bhayana and O. P. Verma

Major focus of this dataset is on applications like scene parsing, or recognizing and segmenting the images for objects of interest. For performing the experiments on this dataset, 150 object classes are selected. Out of these 150, 35 of them are stuff classes like wall, street, and floor. And 115 of the classes are objects like table and car. Zhou et al. [29] apply the scene parsing and part segmentation on ADE20K dataset. For scene parsing, three segmentation networks are trained for realizing SceneParse150 benchmark: (semantic segmentation) SegNet [30], (fully convolutional network) FCN-8s [31], and DilatedNet [32, 33]. Cascade segmentation module is proposed which integrates SegNet and DilatedNet. The results are reported according to four main parameters, namely pixel accuracy (proportion of appropriately classified pixels) reported as 74.52%, mean accuracy (proportion of rightly classified pixel averaged over all classes) came out to be 45.38%, mean IoU (intersection over union averaged over all classes) was reported 0.3490, weighted IoU (weighted by total pixel ratio of each class) was conveyed as 0.6108. Part segmentation is accomplished by selecting 8 categories of the objects with their parts that occur frequently in the dataset and is also explicitly labeled. Out of the selected 8 categories, then we screen their part categories from the dataset. Then, part stream is trained on the Cascade-DilatedNet. The metric used here is average accuracy which is reported to be 55.47%.

2.11 OSLD Dataset [34] Open set logo detection (OSLD) dataset as the name suggests this dataset is aimed at a system that can recognize various brands of the products available in the market. It puts forward the need of such a system that can be used to protect the authenticity of the logo. The dataset consists of images taken from e-commerce with brand logos. The logos are labeled to train the architecture for recognition. It also includes the bounding boxes on various learned logos. The product images range through various things we find in a day-to-day household environment. Figure 8 depicts the images present in the dataset.

2.12 Bottles and Cups Dataset This dataset consists of images of bottles and cups of different materials. It is collected by 3000+ different people on their mobile phones and is composed together to make the real-world dataset. The dataset has been collected in different lighting conditions and is very near to the actual real-world scenarios. The dataset has HD resolution images of highly diverse types of items. Figure 9 shows a snippet of the dataset.

Analysis of Existing Datasets of Household Objects …

45

Fig. 8 Snippet of OSLD dataset

Fig. 9 Snippet of bottles and cups dataset

2.13 PhoCaL Dataset [35] This dataset presents a collection of photo-metrically challenging objects hence termed as PhoCaL. Photo-metrically challenging refers to the objects that reflect too much light or are transparent and symmetric in shape. The dataset comprises 60 high quality 3 dimensional models of the objects found in household environments. It also takes into account the blockages in the complete view of the object as well as clutter found in day-to-day scenarios. The objects are categorized in 8 categories. It consists of RGBD + RGBP (that includes depth as well as polarization of the images) images in the dataset. The dataset aims at pose estimation of the objects present in the household. Figure 10 gives the example of images in the dataset.

46

D. A. Bhayana and O. P. Verma

Fig. 10 Snippet of PhoCaL dataset

3 Summarized Analysis The analysis of all datasets is represented in Table 1. It lists the various datasets with the number of classes it comprises and the number of images together with the image format presented in the dataset. The year of release of the dataset is also listed along with the application which can be employed on the dataset. PhoCaL dataset being the latest in the league. The datasets with few hundreds and thousands of images are considered as small datasets, whereas datasets comprising lakhs of images is bucketed into large datasets. In the enlisted datasets, the open image V4 dataset is the largest containing 9 million images, Robocup@Home object dataset and open-source logo detection dataset can be considered as large datasets. All the datasets are subjected to various techniques to give the results that facilitates the field of domestic robotics. We observe that since the open images dataset V4 is the largest as it has the maximum number of images, it also spreads across the largest number of classes of objects present in the household field. Office-Home dataset and ORBIT dataset have a wide set of applications that can be explained due to the versatility of the classes present in the dataset. It is also observed in the table that annotated image dataset for household objects takes 3.85 GB of space but consists of only 13 classes. Contrary to this, the Office-Home dataset has 65 classes but accounts for only 1.1 GB of space. Resolution of the images account for the space it occupies for storage. The enlisted datasets will prove helpful.

4 Conclusion This paper lists the available datasets required to train a robust model for the purpose of household automation. The technology has many domains of applications be it helping in normal households or can also be applied for elderly care. It may also be helpful with day-to-day tasks for differently abled people. The mentioned datasets take into consideration the enormous categories of domestic items and also take into account the various lighting conditions, angle of images as well as various intracategory variations of a product.

Name

Annotated image dataset for household objects from RoboFEI@Home team

My Nursing Home

The open images dataset V4

Office-home dataset

ORBIT

Household objects for pose estimation

CMU kitchen occlusion

CMU grocery dataset

Robocup@Home— OBJECTS benchmark

ADE20K

OSLD

Bottles and cups

PhoCaL dataset

S. No.

1

2

3

4

5

6

7

8

9

10

11

12

13

[35]

[36]

[34]

[29]

[25]

[19]

[19]

[4]

[14]

[10]

[3]

[7]

[5]

References

2022

Year

2021

2020

2010

2014

2014

2021

2021

2017

2020

2020

2020

Year

https://paperswithcode.com/paper/ phocal-a-multi-modal-dataset-forcategory

https://www.kaggle.com/datasets/ dataclusterlabs/bottles-and-cups-dataset

https://github.com/mubastan/osld

https://groups.csail.mit.edu/vision/ datasets/ADE20K/

https://sites.google.com/diag.uniroma1. it/robocupathome-objects

https://www.cs.cmu.edu/~ehsiao/ datasets.html

https://www.cs.cmu.edu/~ehsiao/ datasets.html

https://github.com/swtyree/hope-dataset

https://github.com/microsoft/ORBITDataset

https://www.hemanthdv.org/ officeHomeDataset.html

https://www.tensorflow.org/datasets/ catalog/open_images_v4

https://data.mendeley.com/datasets/ fpctx3svzd/1

https://ieee-dataport.org/open-access/ annotated-image-dataset-householdobjects-robofeihome-team

Link

Table 1 Comparison of existing datasets of household objects for AI-enabled techniques

1 GB

219.5 MB

2.3 GB

1.6 GB

1.7 GB

83.9 MB

213 MB

208 MB

76.6 GB

1.1 GB

565.11 GB

3 GB

3.85 GB

Size

8

2

12.1 K

262

143

8

8

28

486

65

600

25

13

No. of classes

60 (3D models)

100

20.8 K

25,210

196 K

620

1600

64

3822 (videos)

15,500

9M

37,500

1028

No. of images

.JPG

.PNG

.JPG

.JPG

.JPG

.JPG

.PNG

.PNG

.JPG

.JPEG

.JPG

.JPG, .MP4

Image format

Depth analysis, object RGBP + detection, image RGBD classification, pose estimation

Image classification, object recognition

Object recognition, image classification, object detection

Scene parsing, semantic understanding

Image classification, usage in real-time robotics

Object detection with multiple instances

Object detection in occluded environment, texture-less object detection

Image classification, pose estimation

Object detection, image segmentation, image classification

Object detection, image segmentation, image classification

Image classification

Image classification, object detection

Object detection, object recognition, image segmentation

Applications

Analysis of Existing Datasets of Household Objects … 47

48

D. A. Bhayana and O. P. Verma

References 1. Srivastava S, Li C, Lingelbach M, Martín-Martín R, Xia F, Vainio KE, Lian Z, Gokmen C, Buch S, Liu K et al (2022) Behavior: benchmark for everyday household activities in virtual, interactive, and ecological environments. In: Conference on robot learning, PMLR, pp 477–490 2. Ribeiro T, Gonçalves F, Garcia IS, Lopes G, Ribeiro AF (2021) CHARMIE: a collaborative healthcare and home service and assistant robot for elderly care. Appl Sci 11(16):7248 3. Kuznetsova A, Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, Kamali S, Popov S, Malloci M, Kolesnikov A et al (2020) The open images dataset v4. Int J Comput Vis 128(7):1956–1981 4. Lin Y, Tremblay J, Tyree S, Vela PA, Birchfield S (2021) Multi-view fusion for multi-level robotic scene understanding. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 6817–6824 5. Techi RDC, Gonbata MY, Meyer TSB, Neves LC, Perez BDFV, Yaguiu WY, Gazignato LI, Marostica G, Amorim JGR, de Souza RM et al (2019) RoboFEI@home team 2019: team description paper for the @home league at LARC/CBR 6. Junior PTA, Perez BDFV, Meneghetti R, Pimentel FDAM, Marostica GN (2019) Hera: home environment robot assistant. In: II BRAHUR and III Brazilian workshop on service robotics 7. Ismail A, Ahmad SA, Soh AC, Hassan MK, Harith HH (2020) MYNursingHome: a fullylabelled image dataset for indoor object classification. Data Brief 32:106268 8. Kuen J, Perazzi F, Lin Z, Zhang J, Tan Y-P (2019) Scaling object detection by transferring classification weights. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6044–6053 9. Narayan S, Gupta A, Khan S, Khan FS, Shao L, Shah M (2021) Discriminative region-based multi-label zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8731–8740 10. Venkateswara H, Eusebio J, Chakraborty S, Panchanathan S (2017) Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5018–5027 11. Wen J, Liu R, Zheng N, Zheng Q, Gong Z, Yuan J (2019) Exploiting local feature patterns for unsupervised domain adaptation. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 5401–5408 12. Liang J, Hu D, Feng J (2020) Do we really need to access the source data? Source hypothesis transfer for unsupervised domain adaptation. In: International conference on machine learning, PMLR, pp 6028–6039 13. Long M, Cao Z, Wang J, Jordan MI (2018) Conditional adversarial domain adaptation. In: Advances in neural information processing systems, vol 31 14. Massiceti D, Zintgraf L, Bronskill J, Theodorou L, Harris MT, Cutrell E, Morrison C, Hofmann K, Stumpf S (2021) ORBIT: a real-world few-shot dataset for teachable object recognition. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV) 15. Shysheya A, Bronskill J, Patacchiola M, Nowozin S, Turner RE (2022) Fit: parameter efficient few-shot transfer learning for personalized and federated image classification. arXiv preprint arXiv:2206.08671 16. Jiang H, Ma X, Nie W, Yu Z, Zhu Y, Anandkumar A (2022) Bongard-HOI: benchmarking few-shot visual reasoning for human-object interactions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19056–19065 17. Bronskill J, Massiceti D, Patacchiola M, Hofmann K, Nowozin S, Turner R (2021) Memory efficient meta-learning with large images. In: Advances in neural information processing systems, vol 34, pp 24327–24339 18. Tyree S, Tremblay J, To T, Cheng J, Mosier T, Smith J, Birchfield S (2022) 6-DoF pose estimation of household objects for robotic manipulation: an accessible dataset and benchmark. arXiv preprint arXiv:2203.05701 19. Hsiao H, Hebert M (2014) Occlusion reasoning for object detection under arbitrary viewpoint. IEEE Trans Pattern Anal Mach Intell 36(9):1803–1815

Analysis of Existing Datasets of Household Objects …

49

20. Yu H, Qin H, Peng M (2018) A fast approach to texture-less object detection based on orientation compressing map and discriminative regional weight. Algorithms 11(12):201 21. Chan J, Lee JA, Kemao Q (2016) Border: an oriented rectangles approach to texture-less object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2855–2863 22. Hsiao E, Hebert M (2013) Gradient networks: explicit shape matching without extracting edges. In: Twenty-seventh AAAI conference on artificial intelligence 23. Hsiao E, Hebert M (2013) Coherent occlusion reasoning for instance recognition. In: 2013 2nd IAPR Asian conference on pattern recognition. IEEE, pp 1–5 24. Tombari F, Franchi A, Di Stefano L (2013) Bold features to detect texture-less objects. In: Proceedings of the IEEE international conference on computer vision, pp 1265–1272 25. Hsiao E, Collet A, Hebert M (2010) Making specific features less discriminative to improve point-based 3D object recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2653–2660 26. Massouh N, Brigato L, Iocchi L (2019) RoboCup@ home-objects: benchmarking object recognition for home robots. In: Robot world cup. Springer, pp 397–407 27. Iocchi L, Holz D, Ruiz-del Solar J, Sugiura K, Van Der Zant T (2015) RoboCup@ home: analysis and results of evolving competitions for domestic and service robots. Artif Intell 229:258–281 28. Holz D, Iocchi L, Van Der Zant T (2013) Benchmarking intelligent service robots through scientific competitions: the RoboCup@ home approach. In: 2013 AAAI spring symposium series 29. Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ADE20K dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 633–641 30. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495 31. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), June 2015 32. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848 33. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 34. Bastan M, Wu H-Y, Cao T, Kota B, Tek M (2019) Large scale open-set deep logo detection. arXiv preprint arXiv:1911.07440 35. Wang P, Jung HJ, Li Y, Shen S, Srikanth RP, Garattoni L, Meier S, Navab N, Busam B (2022) PhoCaL: a multi-modal dataset for category-level object pose estimation with photometrically challenging objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 21222–21231 36. datacluster.ai (2017) Bottles and cups. In: Bottles and cups, pp 5018–5027

In Silico Molecular Docking Study by Using Bio-informatics Database to Fabricate M-Cell Targeting Nanocarrier System for Oral Delivery of Macromolecules Rahul Maurya , Suman Ramteke, and Narendra Kumar Jain Abstract Safe and effective drug delivery is a prime objective of the formulation and drug development process. The carbohydrate-binding receptor, such as lectin, has a carbohydrate recognition site and multiple domains for binding carbohydrates, such as mannose, fucose, maltose, glucose, and mannopyranosyl phenyl isothiocyanate. This receptor is a target for carbohydrate-conjugated systems to improve their bioavailability—the in silico study explores the different bioactive carbohydrate molecules that selectively interact with carbohydrate-binding receptors. Carbohydrate-binding receptors are generally expressed in gut-associated lymphoid tissue (GALT), which favors the cellular uptake of peptides and other bioactive molecules, such as nucleic acids, proteins, and phytoconstituents. Furthermore, it increases the bioavailability of proteins, peptides, and herbal drugs by preventing first-pass metabolism. The ligand–receptor interaction explores a new approach to designing ligand-conjugated nanocarrier systems. The proposed work is the in silico finding of a bioactive ligand that selectively binds with the carbohydratebinding receptor (CBRs). In this work, the drug-likeness parameters of Lipinski’s rule select the ligands. The selected carbohydrate ligands dock with the CBRs. The selected carbohydrate ligands dock with the CBRs and Determine the pharmacokinetic parameters of the minimum binding energy ligands. This study is also involved in determining ADME and bioavailability radar analysis to predict the bioactive molecule for designing a novel carrier system for receptor-mediated drug targeting. The polymer surface was modified using a selected ligand, and the surface-modified polymer docked with the protein to find the ligand–receptor interaction. The proposed nanocarrier will be used to design the targeted oral delivery system. Keywords Bio-informatics · In silico molecular docking · Ligand–receptor interaction R. Maurya (B) · S. Ramteke · N. K. Jain Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, Madhya Pradesh 462033, India e-mail: [email protected] R. Maurya National Ayurveda Research Institute for Panchakarma, CCRAS, Cheruthuruthy, Thrissur, Kerala 679531, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_5

51

52

R. Maurya et al.

1 Introduction An in silico study-based approach is the prime pylon in the formulation and drug development. The drug binds to selective targets such as protein, nucleic acid, and lipid layers. Thus, in silico molecular docking is essential to the drug design process. Molecular docking is based on specific algorithms and provides a score to reveal the ligand-target interaction. Different kinds of algorithms, such as Monte Carlo Simulated Annealing (SA) and Lamarckism genetic (LGA), have been developed and are most commonly used in molecular docking studies to find an interaction between drugs and targets [1]. The non-modified nanocarrier improves the chemical and therapeutic activity of the drug by protecting it from gastric and enzymatic degradation and prolonging the systemic retention time. However, carrier non-selectivity limits their applications. Carbohydrate-binding receptors (CBRs) expressed on macrophage cells (M-cells) of gut-associated lymphoid tissue (GALT) are the most desirable site for receptormediated endocytosis and lymphatic drug delivery [2]. Mannosylated nanocarriers (MCs) explore the targeted oral delivery of bioactive molecules via CBRs [3]. The types of ligands present at the surface of nanocarriers play an essential role in drug delivery. Mannose receptors (MRs) are CBRs belonging to the lectin (Ctype) receptor family found in humans and expressed in GALT, including M-cells of the spleen and lymph nodes (subcapsular sinus) [3]. M-cells have transmembrane glycoprotein C-type lectin receptors. This receptor has multiple domains for binding carbohydrates, such as mannose, fucose [4], glucose, and galactose [5]. Therefore, mannose-conjugated nanocarriers are a promising M-cell CBR receptor targeting approach. Currently, numerous mannose-conjugated nanocarrier systems, such as liposomes [6], micelles [7], nanostructured lipid carriers [8], and nanoparticles [9], have been formulated for M-cell targeted drug delivery [10].

2 Material and Methods 2.1 Preparation of Ligands Different carbohydrate compounds were downloaded in SDF format from PubChem (nih.gov) [11] database (Fig. 1). The α Mannopyranosyl phenyl isothiocyanate (MPPIT) is used in the design of mannosylated nanocarriers for mannose receptor targeting in various targeted drug delivery systems [12, 13]. The MPPIT was drawn using ChemDraw [14], converted into a 3D structure and saved in SDF format. During the docking study, the SDF format of ligands was converted into pdbqt format with the help of open Babel software [15]. Energy minimization can be performed using the force field UFF algorithm in PyRx software. The conjugated gradient algorithm was selected to optimize ligands in 200 steps.

In Silico Molecular Docking Study by Using Bio-informatics Database …

53

Fig. 1 a Protein 1HUP, different ligands, b α-d mannose, c α-d maltose, d α-d fucose, e α-d glucose, f α-d mannopyranosyl phenyl isothiocyanate, g chitosan, h mannosylated chitosan

2.2 Preparation of Protein Carbohydrate-binding protein, classified as C-type lectin protein, protein code 1HUP, was obtained from the protein data bank (http://www.rcsb.org) [16] in PDB format [17, 18]. The protein structure was prepared by using Discovery Studio software [19]. The ligand, water molecule, and heteroatom were removed from the protein, and polar hydrogen was added to the protein; again, this protein was saved in pdb format for docking analysis.

2.3 In Silico Docking Study The prepared proteins and ligands were selected for molecular docking using PyrX software [20]. The protein molecule is chosen as a lead molecule and assigned as a macromolecule, and the selected molecule must be saved or transformed into pdbqt format. After macromolecule selection, ligands are imported and processed for energy minimization. The energy minimization parameter is selected as an uff force field, and the conjugated gradient optimization algorithms have 200 steps. After that, all the ligands are converted into pdbqt format, and now, the ligand is ready for docking study. The AutoDock Vina tool in PyRx software was run, and macromolecules and multiple ligands were selected for docking [21]. For ligand binding, the value of the x, y, and z dimensions was set to 60; a grid box-centered dimension was nominated for the docking. The algorithm gave different docking models for ligand–protein interactions. The model has minimum energy selected and saved in pdb format. This file was further analyzed in the Discovery Studio to find drug-receptor interactions and types of amino acids involved in bond formation.

54

R. Maurya et al.

2.4 In Silico Physicochemical and Pharmacokinetic Property Evaluation Rule of Five Lipinski’s rule of five is applied to determine the drug-likeness of a compound, and it gives information about whether the selected compound is orally active. For oral bioavailability, molecule should have less than five donor H-bond, ten acceptors H-bond, the molecular weight is < 500 Dalton, and the calculated Log P (CLog P) is also less than five [22]. Bioavailability is the extent and the amount of drug that reaches the systemic circulation from the site of administration. These physicochemical parameters define aqueous solubility and intestinal permeability, both of which influence oral bioavailability. The drug-likeness rule of five is a qualitative approach in drug design for the screening of drug and nondrug molecules for oral bioavailability [23]. In the proposed work, ligands were selected for the study of Lipinski’s rule of five by using bio-informatics and computational biology (http://www.swissadme.ch) [24]. The compound obeys all of Lipinski’s rules selected for further analysis. In Silico ADME Analysis Pharmacokinetic ADME studies of ligands, such as absorption, distribution, metabolism, and excretion, were performed with the help of SwissADME (http:// www.swissadme.ch). The data and information obtained from this study were helpful in the formulation and development process. Bioactivity Score and Bioavailability Radar The canonical SMILES of all ligands were obtained from PubChem. The bioavailability radar of ligands determined the oral bioactivity of the compound with the help of Swiss ADME (http://www.swissadme.ch).

2.5 Scheme for the Synthesis of Mannosylated Chitosan and the Docking Interaction Study As per the binding energy (kcal/mol) given in Table 1, the ligand with the minimum energy was selected for the conjugation reaction with chitosan. α MPPIT, the chitosan 3D structure, and the reaction scheme were drawn by ChemDraw software, [12, 25] by chem draw and converted into the sdf file for docking analysis using PyRx software. 1HUP protein was selected as a macromolecule and transformed into pdbqt format. The ligands chitosan and mannosylated chitosan gave in Fig. 4 were selected as ligands. All the ligands were converted into pdbqt format. Docking was performed by using the AutoDock Vina tool of PyRx software. For ligand binding, the value of the x, y, and z dimensions was set to 60, and a grid box-centered dimension was

In Silico Molecular Docking Study by Using Bio-informatics Database …

55

Table 1 Binding interaction between ligands and protein 1HUP Ligand molecule/polymer

Binding energy (kcal/mol)

Hydrogen bond

Amino acid involved in the H-bond

α-d mannose

− 4.9

4

ASP 177, VAL 176, THR 167

α-d fucose

− 4.7

2

ARG 182, PHE 175

α-d glucose

− 4.7

4

LEU178, GLU169, THR 167, LYS 170

Maltose

− 5.2

5

GLY152, ASN 156, GLN 155, ASN 187

Mannopyranosyl phenyl isothiocyanate

− 5.4

4

ALN 148, ASN 208, ASN 187, TYR 185

Chitosan polymer

− 5.0

4

GLU192, THR 186, TRP 188, PHE 175

Mannosylated chitosan polymer

− 5.6

6

GLU 173, LEU 183, ARG 182, GLY 191, GLU 192, PRO 193

set for docking. After the completion of the algorithm, the binding energy of both ligands was analyzed. The ligands with minimum binding energy were selected for drug-receptor interaction in the discovery studio.

3 Results and Discussion Carbohydrate receptors are most commonly found in gut-associated lymphoid tissue. It takes part in the internalization of polysaccharides. The affinity of the ligand– receptor interaction is estimated by binding (energy kcal/mol); a lower value of binding energy indicates a higher binding relationship. Different carbohydrate monoand disaccharide compounds (Fig. 1) were screened against 1HUP C-type lectin carbohydrate-binding protein. The ligand with minimum energy was selected for surface modification of the nanocarrier. The binding energy (kcal/mol), types of amino acids associated with hydrogen bonding, and the number of hydrogen bonds were noted. The carbohydrate molecule MPPIT has a minimum binding energy (− 5.4 kcal/mol) compared to other molecules (Table 1). The amino acids interacting in hydrogen bonding are ALN 148, ASN 208, ASN 187, and TYR 185. This molecule was selected for surface modification of nanocarriers, which are used for receptor-mediated drug targeting. The order of minimum to maximum binding energy between the carbohydrate molecules is α Mannopyranosyl phenyl isothiocyanate > maltose > α-d mannose > α-d glucose and α-d fucose. α Mannopyranosyl phenyl isothiocyanate forms four hydrogen bonds with amino acids ALN 148, ASN 208, ASN 187, and TYR 185 of the 1HUP protein. Maltose

56

R. Maurya et al.

shows a − 5.2 kcal/mol binding energy and forms five hydrogen bonds with different amino acids GLY152, ASN 156, GLN 155, and ASN 187 of the 1HUP protein. α-d fucose and α-d glucose show the same binding energy of − 4.7 kcal/mol; however, glucose forms four hydrogen bonds with LEU 178, GLU 169, THR 167, and LYS 170 amino acids, and α-d fucose forms two hydrogen bonds with ARG 182 and PHE 175 amino acids of the 1HUP protein. The α-d mannose ligand shows high binding energy (− 4.9 kcal/mol) among these molecules and forms four hydrogen bonds with amino acids ASP 177, VAL 176, and THR 167, as shown in Fig. 2.

Fig. 2 Interaction of different ligands with receptors (1HUP) a α-d mannose, b α-d maltose, c α-d fucose, d α-d glucose, e α-d mannopyranosyl phenyl isothiocyanate, f chitosan polymer (Cs), g mannosylated chitosan polymer (MCs)

In Silico Molecular Docking Study by Using Bio-informatics Database …

57

3.1 Pharmacokinetic and Drug-Likeness Screening of Carbohydrate Ligands The bioactive property of ligands was screened via SwissADME to identify the bioavailability radar and physiochemical properties (weight, hydrogen donor, hydrogen acceptor, lipophilicity, and molar refractivity). α Mannopyranosyl phenyl isothiocyanate, α-d mannose, α-d fucose, and α-d glucose fit all five criteria of Lipinski’s. Maltose did not satisfy all the criteria; it deviated by violating the two measures of Lipinski given in Table 2. Intestinal absorption and blood–brain barrier permeation are identified by the BOILED-Egg method, which is based on the polarity and lipophilicity of the compounds. GI absorption is low for all ligands except α-d fucose. The bioavailability score of all the ligands is the same (0.55) except for maltose, 0.17. All the ligands except α-mannopyranosyl phenyl isothiocyanate are substrates of P-glycoprotein, which generally retard the absorption, bioavailability, and retention time of drugs [26, 27]. Hence, ligands that do not have the substrate of P-glycoprotein are most suitable for M-cell targeting. All the ligands are non-inhibitors of the CYP1A2, CYP2C19, CYP2C9, CYP2D6, and CYP3A4 membrane cytochrome P450 enzyme family that have a significant role in drug metabolism (Table 3). The induction and inhibition metabolism enzymes are both phenomena that are not desirable. The induction of metabolic enzymes causes rapid metabolism of the substrate, and inhibition causes accumulation of substrate [28]. The in silico study predicts the interaction of ligands with cytochrome family enzymes which play a prominent role in the formulation and development. The ligand molecule was screened through SwissADME web software to determine its bioavailability score. The in silico drug-likeness character and bioavailability score are given in Table 2. Lipophilicity, denoted as iLogP, provides information on the absorption of the molecule; the higher the iLogP value is, the lower the absorption [29]. Table 2 shows that the iLogP of α-d mannose is lower than Table 2 Lipinski’s rule of 5, lipophilicity, and bioavailability score Parameters

Mass g/mol

Ligand molecule

No. of Hydrogen Hydrogen Lipophilicity rotatable bond bond bond donor acceptor ilog TPSA Po/w

Bioavailability score –

Rule

< 500

≤5

< 10

and |1 >. The qubits’ advancement is that they can be either in 0 or 1 states or in a linear combination of all other possible states called superposition. Combining two or more vector states gives rise to another vector state in the same Hilbert space in a quantum system. The main aim of this paper is to interpret qubits from an information processing perspective (Sect. 2) and to enumerate and analyse core qubit technologies in brevity (Sect. 3).

2 Deep Dive into Qubits A qubit is a quantum reality that can encode a piece of information. All quantum systems exhibit specific properties or states where information can be embedded using clever engineering. These are called quantum states. The variations in the energy levels of atoms, the spinning of electrons, the magnetic fields, and the polarisation of light in vertical or horizontal levels are examples of such states upon which quantum information processing is based. Quantum computers have many hardware parts like quantum memory, quantum registers, quantum circuits, quantum networking interface, quantum readout section, etc. Registers are the place where qubits transform by unitary operations-reversible operations. In a qubit transformation, the next state of a qubit depends on its previous state, and as they use reversible gates, the initial state is always retrievable.

2.1 Defining Qubits: Information Processing Perspective A qubit is a quantum state where information can be encoded, processed, and readout. It is “quantum” because the state exhibits quantum mechanical properties like entanglement and superposition. Quantum computing is the engineering of quantum mechanical effects for information processing. Particles, sub-particles, and quasi-particles are qubit candidates. In the realm of anyons and Majorana fermions, we deal with quasi-particles with excitation. Though most qubit candidates like ions, electrons, photons, etc., are particles, a hole, the absence of an electron, is a quasiparticle. Whether the qubit candidates are particles, sub-particles, or quasi-particles, we rely on exploiting the quantum states (an excitation state, polarity, a spin state, or a charge state) for information processing. A qubit should undergo three activities: encoding, processing, and decoding (measurement) of the data. This is the core aspect of any information processing. Encoding is usually done in a fiducial eigen state, primarily pure states, because the evolution of Hamiltonian is predictable as a function of a matrix if the beginning state is known. Some quantum information processing accepts mixed states too in the form of 1/2 nuclei spin as in NMR. Processing or manipulating a qubit state is through unitary transformations or gate operations by which the spinning or polarisation undergoes

Compendium of Qubit Technologies in Quantum Computing

93

Fig. 1 Qubit processing stages. The qubits’ advancement is that they can be either in 0 or 1 states or in a linear combination of all other possible states called superposition and is represented: | >= α|0 > +β|1 >

a desired flux or change in quanta. The quantisation of the qubits’ energy, spin, and momentum is achievable via controllable electromagnetic fields. After correctly processing the qubit states, it is necessary to read them out before decoherence. Reading out or measurement means mapping the internal state of the qubit. Quantum measurements are probabilistic. Qubits are addressable individually and collectively, and measurement, especially projective measurement, varies according to the technology devised. Lasers, microwave pulses, and interferometers are some of them. Figure 1 vivifies five properties of qubits: quantum, state, encoding of the data, processing of the data, and reading out of the data. With proper exploitation of the quantum states, information processing occurs.

3 Qubit Candidates Various qubit candidates drastically differ in the physical implementation; hence, multiple architectural designs exist for quantum hardware implementation. Some candidates are trapped ions, superconducting qubits, linear optics, donor systems, quantum dots, topology, diamond-based qubits, etc.

94

E. Sebastian and R. C. Poonia

3.1 Trapped Ions An ion is an atom with an extra electric charge. When the number of electrons (negative charge) does not match protons (positive charge), we have molecules or particles with an additional charge. A particle with more protons implies a positively charged particle and is known as a cation, whilst their counterparts with more electrons and negative charges are called an anion. The basic idea behind the ion trapping qubit is to confine a charged particle in a definitive state using electromagnetic force and manipulate its quantum state through some force (Coulomb force). Trapped ions became the first qubit technology and were the first candidates for implementing quantum bits in Shor’s factorising algorithm. The technology of trapping ions was an already known technique since the 1930s. In 1995 Cirac and Zoller demonstrated that the trapped ion could operate as qubits in a gatebased (controlled Not gate) quantum information processing [1] and later showed that as the number of qubits increases, the controlling mechanism does not need to grow exponentially. Since 1995, significant improvements have been made in the field, and the number of ion qubits in quantum computing has improved drastically.

3.1.1

Trapping Technologies

There are two technologies for trapping ions: Penning trap and Paul trap. In Penning traps (developed by Frans Michel Penning and furthered by J. R. Pierce and Hans Georg Dehmelt), electrons are trapped using a magnetic-electric field. Penning traps are 2D traps where coherence can be prolonged if the electrical field frequency is higher than the magnetic frequency, and the rotation issue can be tackled using proper stroboscopic measures. Wolfgang Ernst Pauli proposed a spectrometer without a magnetic field. This radio frequency-based trap is called the Paul trap or Pauli trap. In Paul trap, an oscillating electric field confines ions in a ponderomotive state. This ponderomotive potential depends on measures like the ion’s charge-to-mass ratio, the frequency and strength of the RF signal, and the potential motion obeys the Mathieu equation. There are two types of Paul traps called point traps and linear traps. Point traps are fabricated using a ring and endcap mechanism. The RF potential is applied between the ring and cylindrical electrodes, forming a quadrupolar field with an RF zero field at the ring centre. In a point trap, the point is a zero RF field in such a way that more than one ion can be held in that point field with some micromotion effect that may adversely affect the fidelity of the state.

Compendium of Qubit Technologies in Quantum Computing

3.1.2

95

Zeeman and Hyperfine Qubits

Most popular candidates of ions are 9 Be+ , 25 Mg+ , 40 Ca+ , 43 Ca+ , 88 Sr+ , 133 Ba+ , 138 Ba+ , and 171 Yb+ . Depending upon the presence of nuclear spin, ions can be classified into Zeeman qubit and Hyperfine qubit. In Zeeman qubit, ion has no nuclear spin; the qubit state can be achieved through ground state valence electron and an external magnetic field with proper gyromagnetic ratio. For ions with nonzero nuclear spin, the interaction between valence electrons and nuclear spin gives rise to two hyperfine states [2]. Zeeman qubits are extremely sensitive to magnetic fluctuations, whilst hyperfine qubits are insensitive to them. In Trapped ION technology, internal states are initialised by optical pumping, and the ground state is achieved by laser cooling.

3.1.3

Analysis

Promising companies promoting ion traps are quantinuum (Honeywell), IONQ, quantum factory, Alpine technologies, etc. From 64 qubit Honeywell, companies aim towards 512 qubits in the NISQ era. Ion trapping satisfies long coherence, exceptional fidelity, and uncomplicated measurement. But state preparation and gate operation in trapped qubits are comparatively slow.

3.2 Superconducting Qubits Superconducting qubit systems (quantum processors) are designed solid-state circuits where atoms are engineered to store and process quantum information aided by the developments made by semiconductor technology. When executed at low temperatures, the superconducting circuits exhibit quantum mechanical properties, and the energy level and coupling states are designable using special capacitors, inductors, Cooper pair and Josephson junction. It is possible to integrate such systems to construct large-scale quantum computers [3]. Superconducting qubit energy levels are influenced by circuit element configuration, and they are designable and hence called artificial atoms. The first implementation of a superconducting qubit system using a simple Cooper pair box was proposed [4] in 1999, and the subsequent years witnessed steady as well as gradual improvements and scale-ups in the technology that finally culminated in the acclaimed supremacy by Google in 2019 with 53 qubits of Sycamore processor [5]. Superconducting qubits clubbed with surface code architecture have been used by IBM and Google to build large-scale quantum computers. IBM towered this race for qubit supremacy in 2022 with a 433 qubit Osprey processor and advances to a 1121 qubit Condor processor in 2023.

96

3.2.1

E. Sebastian and R. C. Poonia

Charge, Phase, and Flux Qubits

The energy levels ratio metrics in superconducting qubits are Josephson energy and Coulomb charging energy. Whilst Josephson energy represents the strength of coupling amongst junctions, Coulomb energy is the demand for increased charge in the junction. Qubits in superconducting technology can differ in their phase, flux state, and junction charge [6]. Depending upon the degree of freedom, three types of core superconducting qubits are charge qubits, phase qubits, and flux qubits. Combining and adapting the core three superconducting qubits paved the way for novel superconducting qubits like transmon qubits, Fluxonium, 3-JJ, and C-shunt flux qubits and hybrid qubits. Of the various types of superconducting qubits, the most promising one is the transmon qubit. Transmon as a new ideal for superconducting qubits came up in 2007 [7], and there are various types like xmon, gmon, transmon, 3D transmon, etc. Transmon qubits are improved versions of charge qubits with more insensitivity to noise.

3.2.2

Analysis

Superconducting quantum computers with scalable architecture are a promising technology at present. Superconductivity is an already-evolved technology. The more we scale up individual arrays of atoms to large monolithic systems, the more complexities we need to incorporate in design to avoid decoherence and to add interconnectivity.

3.3 Photonic Quantum Computing Photonic quantum computing is a broad term that entails linear and non-linear methods. Linear optical quantum computing (LOQC) is the technology in which photons are the candidates for qubits, and beam-splitters, waveplates, and phase shifters can manipulate them. A photon enjoys a fixed frequency level but has a varying wavelength depending upon the medium it propagates. Another essential fact about photons is that they can be created or destroyed (energy absorption) [8]. Vertical and horizontal polarisation gives the charm of superposition.

3.3.1

Photon Creation

There are two ways to create photons: heralded single-photon generation and deterministic on-demand photon generation. The core idea behind heralded photon generation is a spatially nondegenerate parametric down-conversion mechanism. Parametric amplification helps measure the phase and wave quanta with high fidelity. The heralded mechanism, popularly known as SPDC (spontaneous parametric down-

Compendium of Qubit Technologies in Quantum Computing

97

conversion), can generate single-photon or pairs of photons as in entanglement. On the other hand, deterministic photon generation uses a photon emitter as the source. Increasing the efficiency of photon emission and making efficient coupling to quantum dots remains a challenge. A photonic quantum processor based on silicon, called a photonic chip, is another innovation in this field. It is more efficient in terms of complexity and cost and can adapt to CMOS technology [9].

3.3.2

Analysis

Xanadu, a Toronto-based quantum computing company, Jiuzhang 2.0, a Chinese photonic quantum computer availing Gaussian boson sampling, and QuiX Quantum are remarkable in this field. The ability to create photons, the working at average temperature, less chance for decoherence, and adaptability to teleported networking makes them ideal for quantum computing. Making a scalable system with photonic interaction protocols is a significant challenge.

3.4 Semiconductor Qubits Semiconductor technology provides myriad ways for quantum information processing, especially electron-based spin qubits, that differs in architecture and features. The best examples are quantum dots, donor systems, spin and charge qubits, and valley qubits. Semiconductor-based qubits are not limited by quantum information processing alone. Instead, they are used in other prominent industries like quantum sensing, quantum simulation, quantum cryptography, and communication.

3.4.1

Dots, Donors, Spin-Charge, and Valley Qubits

Quantum dots are nanoparticles with conductive and quantum properties (optoelectronic) realised in semiconductors that act as an abode and transit for electrons having spin-up and down states. Doping is a well-known technique used in semiconductor technology whereby the natural or artificial impurities of the material contribute to the generation of beneficial properties in it. Semiconductor dopants in valance bonding provide an extra mobile electron. By cooling down these freely moving mobile electrons to milli-kelvins, the electron is bound to a dopant atom, resulting in a spin qubit [10]. Since electrons have both spin and charge properties, spin qubits and charge qubits are possible. Whilst the principle of spin-orbit interaction provides a way to control spins electrically and opens up spintronics, the relative position (such as right or left) of the electron between two adjacent dots determines the charge state of the qubit. Some semiconductors have multiple valleys in conducting band, and multiple extrema manipulation is a feasible qubit property giving rise to valleytronic qubits.

98

3.4.2

E. Sebastian and R. C. Poonia

Analysis

Silicon enjoys charge, spin and valley properties, and electrons are ideal spin qubits. Solid-state technology has matured and is comparatively inexpensive. However, semiconductor qubit technology is a generic mechanism that drastically differs in its material and technology. Some systems need to cool down immensely to process electrons, and highly scalable systems are still challenging.

3.5 Diamond-Based Quantum Computing The nitrogen-vacancy centre in diamonds, known as the NV centre, provides an ambient space for spin-based quantum information processing. Diamond is a semiconductor, and NV centre processing with photonic manipulation is possible at room temperature. A nitrogen-vacancy centre occurs in a diamond lattice when a carbon atom is missing or replaced by something in it next to a nitrogen atom. This vacuum is a desirable defect to trap electrons, and it is possible to manipulate such electrons using magnetic, electrical, or photonic properties.

3.5.1

Analysis

The benefit of the NV centre is that it provides a robust spin qubit with long coherence and combines it with nuclear magnetic resonance and photonic transferability, all of which are primarily possible at room temperature. But NV centre behaviour is still unforeseeable, and the coupling of electronic spin with magnetic resonance is strenuous.

3.6 Topological Quantum Computing A simple way to understand topology is to think of a pufferfish transforming into a spiky ball, where deformation does not destroy reality. The deformation in quantum information processing emanates from decoherence and noise. Topology provides a resilient qubit model based on quantum quasi-particles.

3.6.1

Anyons and Majorana Fermions

Anyons are subtle quasi-particles that exist only in two dimension under specific conditions like absolute zero temperature and the presence of intense magnetism. Anyons exhibit strange properties like fractional charge and fractional statistics to preserve their connectedness at the event of quantum phase shifts. Anyons can-

Compendium of Qubit Technologies in Quantum Computing

99

not intersect the same worldline, and their evolution in space-time occurs through a braiding phenomenon. Majorana, a quasi-particle related to fermions, exhibits particle-hole symmetry of superconductors. A special mode called Majorana zero is advantageous in quantum information processing because they possess non-abelian braiding properties. Various materials exhibit topological properties like insulators, superconductors, Weyl or Dirac semimetals, and spin liquids [11].

3.6.2

Analysis

Topological quantum computing relies on the non-abelian braiding of quasi-particles. Topology is a highly developing industry and a promise for Fault-tolerant quantum information processing. But Anyons and Majorana particles are their antiparticles and are elusive.

3.7 NMR Quantum Computing Nuclear magnetic resonance (NMR) quantum processing is a spin-based quantum computing where nuclei become qubits. The nucleus of an atom has an inherent spin state, and it is possible to quantise them using magnetic properties. The ability to control nuclear spin is the core factor behind NMR qubits. It is possible to distinguish two levels of quantum state in a nuclear spin for which the internal angular momentum and various operations are mathematically describable.

3.7.1

Liquid and Solid NMR Technologies

Nuclear spin dynamics for quantum information processing are available in two flavours: (a) NMR in liquids and (b) NMR in solid states. Of the two, NMR spectroscopy of liquid state has already found a good pace in various quantum information processing.

3.7.2

Analysis

NMR research is currently in crystals and solid-state materials, including liquids, and has come under semiconductor quantum computing. A part of it is moving towards topology too. Though NMR qubits have a long coherence time, slow gate operations and complicated scalability are drawbacks.

100

E. Sebastian and R. C. Poonia

4 Conclusion All these seven technologies are seven approaches to realise the same goal, i.e. information processing using quantum mechanics. Each one has its advantages and disadvantages. We cannot judge that one is better than the other or pinpoint something as the best. Some technologies have advanced a bit further, whilst others are just in the beginning stage. Superconductor and semiconductor technologies have a better history of research and experiments than quasi-particles and spin liquids. Hence, the superior technology will be the one that brings about the quantum advantage at the earliest.

References 1. Cirac JI, Zoller P (1995) Quantum computations with cold trapped ions. Phys Rev Lett 74:4091– 4094 2. Bardin JC, Slichter DH, Reilly DJ (2021) Microwaves in quantum computing. IEEE J Microwaves 1:403–427 3. Makhlin Y, Schön G, Shnirman A (2001) Quantum-state engineering with Josephson-junction devices. Rev Mod Phys 73:357–400 4. Nakamura Y, Pashkin YA, Tsai JS (1999) Coherent control of macroscopic quantum states in a single-cooper-pair box. Nature 398:786–788 5. Arute F, Arya K (2019) Quantum supremacy using a programmable superconducting processor. Nature 574:505–510 6. Berggren KK (2004) Quantum computing with superconductors. Proc IEEE 92:1630–1638 7. Koch J, Yu TM, Gambetta J, Houck AA, Schuster DI, Majer J, Blais A, Devoret MH, Girvin SM, Schoelkopf RJ (2007) Charge-insensitive qubit design derived from the cooper pair box. Phys Rev A 76:042319 8. Pearsall TP (2020) Photons. Springer, Cham 9. Silverstone JW, Bonneau D, Obrien JL, Thompson MG (2016) Silicon quantum photonics. IEEE J Sel Top Quant Electron 22:390–402 10. Hill CD, Peretz E, Hile SJ, House MG, Fuechsle M, Rogge S, Simmons MY, Hollenberg LC (2015) A surface code quantum computer in silicon. Sci Adv 1(9):e1500707 11. Lahtinen V, Pachos J (2017) A short introduction to topological quantum computation. SciPost Phys 3(3):021

A Novel Post-quantum Piekert’s Reconciliation-Based Forward Secure Authentication Key Agreement for Mobile Devices Chaudhary Dharminder, S. S. Anushaa, S. Naundhini, and M. S. P. Durgarao

Abstract Lattice-based cryptography plays a very important role in authentication and key exchanges that protects against the threat of quantum attacks. However, it is not easy to design quantum resistant password-based protocol due to the high demand for security requirements and the limited resources nature of mobile devices. In this article, we have proposed a novel post-quantum key exchange based on a variant of lattice assumption, the ring learning errors. This protocol uses better unbiased Piekert’s reconciliation with respect to even q, but reconciliation in Ding’s protocol is biased. This protocol ensures both authentication and key agreement. This protocol needs just two messages in exchange for authentication and key agreement. This protocol ensures both security against quantum attacks and efficiency due to simple algebraic operations that are polynomial addition and multiplications. Keywords Authentication · Key agreement · Reconciliation · Ring learning with errors · Lattices

1 Introduction In the past, if two party had to establish a secure encryption, then they had to exchange the keys by using some of secure physical means. In 1976 until the Diffie and Hellman (DH) proposed the first key exchange algorithm, but it suffers from the man in the middle attacks. The security in the Diffie and Hellman key exchange protocol is based on the difficulty of discrete logarithm over a finite cyclic multiplicative group. The core idea of most of key exchange lies in the hardness of assumptions based on number theory. With advanced technology quantum computing, one can solve the number theory-based assumptions such as integer factorization and discrete logarithm by using some advanced algorithms due to Shor and its variants [12, 15, 17].

C. Dharminder (B) · S. S. Anushaa · S. Naundhini · M. S. P. Durgarao Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai 601103, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_9

101

102

C. Dharminder et al.

The learning with errors [16] is a well-known assumption based on lattices that is proved to be post-quantum secure. This assumption is shown to be as hard as worst-case lattice assumptions [1–3], and in recent, it has been proved a milestone for cryptographic applications. The new variant of this assumption is called ring learning with errors [13] which is much more efficient, and it too enjoys very strong hardness guarantees that the ring-based learning with errors assumptions follows pseudorandom distribution, assuming that worst-case assumptions on ideal lattices are hard for polynomial-time quantum algorithms. A post-quantum key exchange mechanism was put up in 2014 by Peikert [14]. He suggested a fresh, impartial reconciliation method. For use in realistic post-quantum cryptography, Peikert offers a collection of lower level primitives that are both effective and demonstrably secure. An authenticated key exchange algorithm was suggested by Zhang et al. [19], but it was found vulnerable to signal leakage. Kirkwood et al. [11] made the observation that key exchange failures in ring learning make the reuse of a secret key risky. This finding offers the researchers a fresh direction; however, the finding is incompletely described. In 2016, Fluhrer [10] saw the introduction of Fluhrer’s key mismatch for ring learning using errors-based protocols that revised public/private keys. The private key of the truthful party is retrieved using this concept. The match or mismatch qualities are crucial in regaining the shared key between the two parties. The concept of a signal leakage attack for ring learning with errors-based key exchange and reuse of public/private keys was first observed by Fluhrer [10] in 2017, and it was then developed and proposed a provably secure password authentication and key exchange under the assumption of ring learning with errors by Ding et al. [7]. According to this theory, an adversary examines the signal function’s output and obtains the secret key by setting up numerous sessions with the trustworthy party. The advantage of this approach is that fewer steps are needed to recover the private key of an honest person [7]. The improved signal leakage attack [8], where “0 < c < q” is any constant, involves “q + c” steps. In 2018, Feng et al. [9] introduce an authenticated key exchange for mobile communication; however, it lacks authentication. Dabra et al. [4] examined the protocol [9] in 2020 and make a good conclusion about signal leakage attacks on the protocol. But, Dabra et al. [4] protocol does not support a secure login and authentication stage that opens the door to a denial of service attack. A three factor authenticated key exchange mechanism was put forth by Dharminder and Prabhu Chandran [5] in 2020. However, a signal leakage attack can be applied on it. For mobile devices, Wang et al. [18] have suggested an effective two-factor authentication technique that is robust to quantum assaults. There is extra communication overhead because it employs three exchange messages. This protocol also makes use of a greater number of operations and adds additional computational cost.

2 Research Contributions We have studied literature from the protocols [4–6, 8, 9, 18, 19] and found that most of the protocols either suffers from signal leakage attack, or they lacks in

A Novel Post-quantum Piekert’s Reconciliation-Based Forward . . .

103

computations. The proposed protocol is secure against signal leakage attack. The proposed protocol avoids use the long-term private key by the server. Moreover, the proposed protocol uses Piekert’s [14] reconciliation mechanism that makes the protocol unbiased by nature.

3 Preliminaries Z[X ] Let Z be the ring of polynomials with integer coefficients, and Q = (X be the ring ) of mth cyclotomic number field. In this article, we have taken m = 2.n = 2+1 , and m (X ) = X n + 1 is the corresponding polynomial. Let q be enough large prime, Zq Q and we define Qq = q.Q ≡ X n +1 with integer coefficients which are taken from Z Zq = q.Z . If D defines a random probability distribution on Q, then x ← D denotes a random sample x ∈ Q following the distribution D. If S denotes a random set, then U(S) denotes uniformly random distribution on S, and x ← U(S) denotes a random sample/element picked from the set S. If A is probabilistic adversary, then z ← A(y) denotes A inputs y, and then, it tosses coins and outputs z.

Definition 1 (Ring learning with error problem [13]) Let Q q be a ring, D be the uniform distribution over Q, and t ∈ Q be taken randomly. Now, let the distribution of pair (a, at + e) ∈ Qq × Qq be denoted as At,D , where a ∈ U(Qq ) is chosen uniformly random and e ← D, then there is no adversary A who can find t, and e, if he knows the value a. Definition 2 (Decision ring learning with error problem [13]) Let Qq be a ring, D be a uniform distribution over Q, and t ∈ Q be taken randomly. Now, let the distribution of pair (a, at + e) ∈ Qq × Qq be denoted as DRLWEt,D , where a ∈ U(Qq ) chosen uniformly and e ← D, then decision ring learning with errors assumption implies that no adversary A who can distinguish DRLWEt,D from Qq × Qq .

3.1 Rounding and Reconciliation This subsection contains very important symbols and terminologies for key exchange mechanism [14]. We have a function . : R → Z such that y = x for x ∈ Z, and y ∈ [x − 21 , x + 21 ). Let q ≥ 2 be an integer. Let .2 : Zq → Z2 be the unimodular rounding function such that x2 = q2 .x mod(2) is well defined on the quotient ring q2 .qZ = 2Z. Let . 2 : Zq → Z2 be the cross rounding function such that x 2 = q4 x mod(2). In Piekert’s reconciliation [14], there are two disjoint intervals I0 = {0, 1, . . . , q4 − 1}, and I1 = {− q4 , . . . , −2, −1} under modulo q consisting of q4 cosets in Zq . We can see that these two intervals partitions all ϑ ∈ Zq satisfying ϑ2 = 0, and q2 + I0 , and q2 + I1 partitions all ϑ satisfying ϑ2 = 1, respectively.

104

C. Dharminder et al.

The new rounding map is . 2 : Zq → Z2 such that ϑ 2 := q4 .ϑ mod(2). In other words, ϑ 2 is the bit b ∈ {0, 1} satisfying ϑ ∈ Ib ∪ ( q2 + Ib ). If ϑ follows a uniform distribution, then ϑ 2 also follows a uniform distribution iff q2 is even number, otherwise biased. We can observe that ϑ, ω ∈ Zq are enough close, then one recovers ϑ2 , given ϑ 2 , and ω. Let E = [− q8 , q8 ) ∩ Z be the set, then reconciliation rec : Zq × Z2 → Z2 is rec(ω, b) = 0, if ω ∈ Ib + E mod(q), otherwise 1. Lemma 1 If ϑ ∈ Zq , and q ≥ 2 is even, then both ϑ2 and ϑ 2 follow same uniform distribution [14]. Proof If ϑ 2 = b for random b{0, 1}, then ϑ is uniform on Ib ∪ ( q2 + Ib ). Moreover, we know if ϑ ∈ Ib , then v2 = 0, and if ϑ ∈ q2 + Ib , then v2 = 1. Lemma 2 If ω = ϑ + e mod(q) for e ∈ E, ϑZq , and q ≥ 2 even, then rec(ω, ϑ 2 ) = [ϑ]2 by [14]. Proof If b = ϑ 2 ∈ {0, 1}, ϑ is uniform on Ib ∪ ( q2 + Ib ), then ϑ2 = 0, and if ϑ ∈ Ib . This will be true, iff ω ∈ Ib + E due to fact (Ib + E) − E ⊆ Ib + (− q4 , q4 ).

3.2 Error Distribution [14] We will discuss certain Gaussian type error over the number field K, and discretized to Q. If r > 0, then Gaussian Dr over Q with parameter r follows the distribution 2 ( −π2x ) r r

. In the proposed article, we consider error distribution = ( mgˆ ).Dr over the ˆ to Q. We can number field K. There is extra multiplicative factor mgˆ translates Q discretize such distribution to Q denoting by D = 2 .

e

Lemma 3 If e ∈ D = 2 , for = mgˆ .Dr , then g.e is δ subGaussian with parame rad(m) √ −n for some δ ≤ 2 , and ||g.e|| ≤ m.(r ˆ + ). n except ter m. ˆ r 2 + 2π rad(m) 2 m m probability at most 2−n , respectively [14].

3.3 Ring Learning with Errors [14] For the simplicity of application, we have considered the problem in its discretized Q , and the secret is drawn normal form, and all the are chosen from Q, or Qq = qQ from standard discrete error distribution. We denote β ← D for random sampling β from the distribution D.

A Novel Post-quantum Piekert’s Reconciliation-Based Forward . . .

105

4 Proposed Post-quantum Secure Authenticated Key Exchange This section describes the idea into four phases: (1) setup, (2) registration, (3) login and authentication, and (4) password change phases, respectively.

4.1 Setup Phase The setup algorithm is responsible to generate public parameters for the server S j . 1. In this step, S j chooses an integer m > 0, and it defines mth cyclotomic ring Q of degree m = φ(n), where φ(.) is Euler function. 2. In this step, S j chooses modulus q > 0 which is coprime with each prime p > 2 dividing m, so that g ∈ Q, and g.c.d.(g, q) = 1. For simplicity and efficiency, we have taken q ≡ 1 mod(m) support primality condition. Furthers, it takes discrete error distribution D = over Q with = ( mgˆ ).Dδ over number field K, and standard deviation δ > 0. 3. The S j contains the cipher space C = Qq × Q2 , and the key space K = Q2 . We also have a randomized function η : Zq → Z2q for prime q > 2 that converts an integer to even integer, and reconciliation rec : Zq × Z2 → Z2 , respectively. 4. In this step, S j chooses random samples w, s0 , s1 ← D, and it computes the public key b j = w.s1 + s0 ∈ Qq . 5. In this step, S j chooses collision resistant hashing h : {0, 1}∗ → {0, 1}t . 6. In this last step, S j publishes the parameters {n, q, w, g, D, η, b j , h(.)}, and it keeps s1 secret.

4.2 Registration Phase In this phase, the user Ui registers through a secure channel. 1. The user Ui registers himself choosing identity idi , and password pwi ∈ Zq∗ , and he computes xi = h(idi || pwi ), and it sends {idi , xi } to the server. 2. The server S j gets the information {idi , xi }, and it computes sidi = h(idi ||s1 ), and yi = sidi ⊕ xi , and it sends the information {yi , h(.)} to the user. 3. The user Ui gets the information yi , h(.), from the server, then he computes sidi = yi ⊕ xi , and vi = h(idi || pwi ||sidi ), and he stores {yi , vi , h(.)} in the mobile device.

106

C. Dharminder et al.

User (Ui )

Public channel

Login Phase Inputs idi , pwi Mobile Device computes xi = h(idi ||pwi ) sidi = xi ⊕ yi Verifies vi =?h(idi ||pwi ||sidi ) Authentication and Key Agreement Phase Chooses si , e1 , e 2 ← D Computes ui = wsi + e1 ∈ Qq vi = g.si .bj + e2 ∈ Qq zi = η(vi ) ci =< zi >2 ∈ Q2 mi = zi 2 ∈ Q2 aidi = idi ⊕ h(mi ||ui ) < aidi , gi , ui , ci > gi = h(sidi ||mi ||ui ||idi ) −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

Computes wi = g.uj .si ∈ Qq mj = rec(wi , uj ) ∈ Q2 sk = h(sidi ||mi ||mj ||ui ||uj ||idi ) Verifies gj = h(sk||sidi ||mj ||uj ||idi ) Establishes Session Key sk

Server (Sj )

Computes wj = g.ui .s1 ∈ Qq mi = rec(wj , ci ) ∈ Q2 idi = aidi ⊕ h(mi ||ui ) sidi = h(idi ||s1 ) Verifies gi = h(sidi ||mi ||ui ||idi ) Chooses sj , e3 , e4 ← D Computes vj = g.sj .ui + e4 uj = w.sj + e3 zj = η(vj ) cj =< zj >2 ∈ Q2 mj = zj 2 ∈ Q2 sk = h(sidi ||mi ||mj ||ui ||uj ||idi ) gj = h(sk||sidi ||mj ||uj ||idi ) < uj , g j > ←−−−−−−−−−−−−−−−−−−−−−

Fig. 1 Illustration of login and authentication phase

4.3 Login and Authentication Phase We can extend the idea [14] cyclotomic rings. This mean both rounding function and reconciliation mechanism which can be implemented on cyclotomic rings Q using the decoding basis. For a even number 2q, the rounding maps .2 : Qq → Q2 , and . 2 : Qq → Q2 can be applied with respect decoding basis. In general, we Q ] , and Qq = qQ . consider 2q being an even number, and two quotient rings Q = gZ[X m (X ) In this subsection, we have illustrated login and authentication phases in the Fig. 1. A detailed description of the proposed protocol is given in the lines below. 1. In the login phase, a user enters idi , and pwi into mobile device, and it computes xi = h(idi || pwi ), sidi = xi ⊕ yi , and it verifies the equation vi = ?h(idi || pwi ||sidi ). If verification is successful, then mobile user gets logged into the device. 2. In the next step, the mobile device chooses random values si , e1 , e2 ← D from the chosen distribution. Furthers, it computes u i = wsi + e1 ∈ Qq , vi = g.si .b j + e2 ∈ Qq , z i = η(vi ), ci = z i 2 ∈ Q2 , m i = z i 2 ∈ Q2 , aidi = idi ⊕ h(m i ||u i ), and a verification factor gi = h(sidi ||m i ||u i ||idi ). Finally, it sends the message aidi , gi , u i , ci to the server.

A Novel Post-quantum Piekert’s Reconciliation-Based Forward . . .

107

3. In this phase, the server receives the message aidi , gi , u i , ci , and it computes w j = g.u i .s1 ∈ Qq , m i = rec(w j , ci ) ∈ Q2 , idi = aidi ⊕ h(m i ||u i ), sidi = h(idi ||s1 ), and it verifies gi = h(sidi ||m i ||u i ||idi ). If verification holds, then the server chooses random samples s j , e3 , e4 ← D from the chosen distribution. Furthers, it computes v j = g.s j .u i + e4 , u j = w.s j + e3 , z j = η(v j ), c j = z j 2 ∈ Q2 , m j = z j 2 ∈ Q2 , and sk = h(sidi ||m i ||m j ||u i ||u j ||idi ). Finally, it computes g j = h(sk||sidi ||m j ||u j ||idi ), and it sends u j , g j to the corresponding user. 4. The user gets the message u j , g j into the device, and it computes wi = g.u j .si ∈ Qq , m j = rec(wi , u j ) ∈ Q2 , and sk = h(sidi ||m i ||m j ||u i ||u j ||idi ). Finally, he verifies the validity of key using the equation g j = h(sk||sidi ||m j ||u j ||idi ), and ends the process establishing session key.

5 Security Analysis In this section, we have analyzed the security with respect to existing general attacks. The proposed framework mathematically has been proved correct. The protocol also ensures security against advanced quantum computing attacks. Theorem 1 The proposed authenticated key agreement protocol based on ring learning with error assumption follows mathematical correctness. Proof The server verifies gi =?gi , and it proceeds for the user authentication, where gi = h(sidi ||m i ||u i ||idi ) and gi = h(sidi ||m i ||u i ||idi ). The equality of gi and gi relies upon the correct values of m i = z i 2 and m i = rec(w j , ci ), respectively. Lemma 4 If ||g.si || ≤ for i ∈ {0, 1}, where si denotes user secrets, and ( q8 )2 ≥ (δ 2 .(22 + n) + π2 ).ω2 for some ω > 0, and r 2 = r 2 + 2π. rad(m) , then receiver m recovers correctly except probability at most 2n.e3 , for some ≤ 2−n . Proof A user computes u i = w.si + e1 , vi = g.si .b j + e2 , and the public key is w, b j = w.s1 + s0 , and the server computes w j = g.u i .s1 = g.(w.si + e1 ).s1 = vi + g.(si .s0 + e1 .s1 ) − e2 under modulo two operation. If e = g.(si .s0 + e1 .s1 ) − e2 ∈ Q, and eˆ ∈ Q is randomly chosen by z i = η(vi ), then z i = 2vi − eˆ ∈ Q2q . Now, the Lemma (3.4, 3.5) is enough to prove that coefficients of 2.e + eˆ lies in [− 2q8 , 2q8 ) with guessed probability [14]. 1. Anonymity: In the authentication phase, the user chooses si , e1 , e2 ← D, and he computes u i = wsi + e1 ∈ Qq , vi = g.si .b j + e2 ∈ Qq , z i = η(vi ), ci = z i 2 ∈ Q2 , m i = z i 2 ∈ Q2 , and aidi = idi ⊕ h(m i ||u i ). Finally, he sends masked identity in place of real identity. Therefore, the proposed protocol maintains the anonymity. 2. Stolen Device Attack: A user selects identity idi and password pwi . He chooses random ri ∈ Z ∗p , computes xi = h(idi || pwi ), and sends the information {idi , xi }

108

3.

4.

5.

6.

C. Dharminder et al.

to the corresponding server. The server computes sidi = h(idi ||s1 ), and yi = sidi ⊕ xi using his master secret, and it returns {yi , h(.)} to the user. The user computes sidi = yi ⊕ xi and a verification factor vi = h(idi || pwi ||sidi ), then he stores (yi , vi , h(.)) in the mobile device. If the adversary the device and he steals the information (yi , vi , i , h(.)), then also he cannot recover the password and identity because of collision resistant hashing of identity and password with master secret of the server. Freshness of Key: The user chooses random si , e1 , e2 ← D, and he computes u i = wsi + e1 ∈ Qq , vi = g.si .b j + e2 ∈ Qq , z i = η(vi ), ci = z i 2 ∈ Q2 , and m i = z i 2 ∈ Q2 . Finally, he computes aidi = idi ⊕ h(m i ||u i ), and gi = h(sidi ||m i ||u i ||idi ), and sends the message aidi , gi , u i , ci to server. The server computes w j = g.u i .s1 ∈ Qq , m i = rec(w j , ci ) ∈ Q2 , and the real identity idi = aidi ⊕ h(m i ||u i ). Furthers, the server verifies gi = h(sidi ||m i ||u i ||idi ), and it chooses s j , e3 , e4 ← D, then it computes v j = g.s j .u i + e4 , u j = w.s j + e3 , z j = η(v j ), c j = z j 2 ∈ Q2 , m j = z j 2 ∈ Q2 , and the session key sk = h(sidi ||m i ||m j ||u i ||u j ||idi ). The same sk = h(sidi ||m i ||m j ||u i ||u j ||idi ) is computed by the user. The key sk = h(sidi ||m i ||m j ||u i ||u j ||idi ) includes random numbers. This property ensures the freshness of the session key. Mutual Authentication: The user chooses random si , e1 , e2 ← D, and he computes u i = wsi + e1 ∈ Qq , vi = g.si .b j + e2 ∈ Qq , z i = η(vi ), ci = z i 2 ∈ Q2 , and m i = z i 2 ∈ Q2 . Finally, he computes masked identity aidi = idi ⊕ h(m i ||u i ), using random m i , and u i , and a verification factor gi = h(sidi ||m i ||u i ||idi ). The adversary cannot compute the verification factor because it includes hidden identity of the user. Moreover, it also includes the secret si generated by the user, and it is hard to recover si assuming the hardness of ring learning with errors assumption. Furthers, the adversary cannot generate a fake verification factor of original factor gi = h(sidi ||m i ||u i ||idi ). The server and the user authenticate each other with the help of equation gi = h(sidi ||m i ||u i ||idi ) and g j = h(sk||sidi ||m j ||u j ||idi ), and these two equations confirm mutual authentication. Replay Attack: The user chooses random si , e1 , e2 ← D, and he computes u i = wsi + e1 ∈ Qq , vi = g.si .b j + e2 ∈ Qq , z i = η(vi ), ci = z i 2 ∈ Q2 , and m i = z i 2 ∈ Q2 . Finally, he computes aidi = idi ⊕ h(m i ||u i ), and gi = h(sidi ||m i ||u i ||idi ), and sends aidi , gi , u i , ci to server. The server computes w j = g.u i .s1 ∈ Qq , m i = rec(w j , ci ) ∈ Q2 , and idi = aidi ⊕ h(m i ||u i ). Furthers, the server verifies gi = h(sidi ||m i ||u i ||idi ), and it chooses s j , e3 , e4 ← D, then it computes v j = g.s j .u i + e4 , u j = w.s j + e3 , z j = η(v j ), c j = z j 2 ∈ Q2 , m j = z j 2 ∈ Q2 , and sk = h(sidi ||m i ||m j ||u i ||u j ||idi ). The key sk = h(sidi ||m i ||m j ||u i ||u j ||idi ) is computed by the user is same as by the server. The sk = h(sidi ||m i ||m j ||u i ||u j ||idi ) includes random numbers. This property ensures the adversary cannot replay older messages in the new established session between user and server. Impersonation Attack: In the proposed design, the adversary is not capable of generating legal messages gi and g j . The user chooses random The user chooses random si , e1 , e2 ← D, and he computes u i = wsi + e1 ∈ Qq , vi = g.si .b j +

A Novel Post-quantum Piekert’s Reconciliation-Based Forward . . .

109

e2 ∈ Qq , z i = η(vi ), ci = z i 2 ∈ Q2 , and m i = z i 2 ∈ Q2 . Finally, he computes aidi = idi ⊕ h(m i ||u i ), and gi = h(sidi ||m i ||u i ||idi ), and sends aidi , gi , u i , ci

to server. But, the uniqueness of random samples ensures different keys for every new session. Since the adversary does not know random si , s1 , and he cannot impersonate the server. It helps to avoid the impersonation attack on messages traveling through public channel. 7. Forward Secrecy: In forward secrecy, if the secret key of the server is compromised, then also an adversary cannot recover the older session key. In the proposed protocol, the user chooses random. The user chooses random si , e1 , e2 ← D, and he computes u i = wsi + e1 ∈ Qq , vi = g.si .b j + e2 ∈ Qq , z i = η(vi ), ci = z i 2 ∈ Q2 , and m i = z i 2 ∈ Q2 , aidi = idi ⊕ h(m i ||u i ), and gi = h(sidi ||m i ||u i ||idi ), and sends aidi , gi , u i , ci to server. The server computes v j = g.s j .u i + e4 , u j = w.s j + e3 , z j = η(v j ), c j = z j 2 ∈ Q2 , m j = z j 2 ∈ Qn2 , and sk = h(sidi ||m i ||m j ||u i ||u j ||idi ). The same sk = h(sidi ||m i || m j ||u i ||u j ||idi ) is computed by the user. The session key sk = h(sidi ||m i ||m j || u i ||u j ||idi ) includes random numbers. The uniqueness and randomness of samples ensure different keys for every new session. Therefore, an adversary cannot read the older messages even if the secret key of the server gets compromised. 8. Offline Dictionary Attack: A user selects idi and pwi , then he computes xi = h(idi || pwi ) and sends the information {idi , xi } to the server. The server computes sidi = h(idi ||s1 ) using his master secret, and yi = sidi ⊕ xi , and returns the message {yi , h(.)} to the user. The user computes sidi = yi ⊕ xi , and vi = h(idi|| pwi ||sidi ), and stores (yi , vi , h(.)) in the mobile device. If an adversary gets the stored information (yi , vi , ri ) inside the smart device, then he requires to computes the value sidi and idi to guess the password pwi .

6 Concluding Remarks Lattice-based cryptography plays a very important role in authenticated key exchange that protects against the threat of quantum attacks. In this article, we have proposed a new post-quantum key exchange based on a variant of lattice assumption, the ring learning errors. In last decade, many reconciliation-based key exchanges have been proposed but generally suffer from signal leakage attacks. Secondly, most of the proposed designs use Ding’s reconciliation that is biased by nature. The proposed scheme uses Piekert’s reconciliation that makes it free from biasness. The proposed protocol achieves most of the existing security attributes required for authenticated key exchange protocols.

110

C. Dharminder et al.

References 1. Ajtai M (1996) Generating hard instances of lattice problems. In: Proceedings of the twentyeighth annual ACM symposium on theory of computing. ACM, pp 99–108 2. Ajtai M (1999) Generating hard instances of the short basis problem. In: International colloquium on automata, languages, and programming. Springer, Berlin, pp 1–9 3. Ajtai M, Dwork C (1997) A public-key cryptosystem with worst-case/average-case equivalence. In: Proceedings of the twenty-ninth annual ACM symposium on theory of computing. ACM, pp 284–293 4. Dabra V, Bala A, Kumari S (2020) LBA-PAKE: lattice-based anonymous password authenticated key exchange for mobile devices. IEEE Syst J 15(4):5067–5077 5. Dharminder D, Prabhu Chandran K (2020) LWESM: learning with error based secure communication in mobile devices using fuzzy extractor. J Ambient Intell Humanized Comput 11(10):4089–4100 6. Ding J, Alsayigh S, Lancrenon J, Saraswathy RV, Snook M (2017) Provably secure password authenticated key exchange based on RLWE for the post-quantum world. In: Cryptographers’ track at the RSA conference. Springer, Berlin, pp 183–204 7. Ding J, Alsayigh S, Saraswathy RV, Fluhrer S, Lin X (2017) Leakage of signal function with reused keys in RLWE key exchange. In: 2017 IEEE international conference on communications (ICC). IEEE, pp 1–6 8. Ding J, Fluhrer S, Rv S (2018) Complete attack on RLWE key exchange with reused keys, without signal leakage. In: Australasian conference on information security and privacy. Springer, Berlin, pp 467–486 9. Feng Q, He D, Zeadally S, Kumar N, Liang K (2018) Ideal lattice-based anonymous authentication protocol for mobile devices. IEEE Syst J 13(3):2775–2785 10. Fluhrer S (2016) Cryptanalysis of ring-LWE based key exchange with key share reuse. Cryptology ePrint Archive 11. Kirkwood D, Lackey BC, McVey J, Motley M, Solinas JA, Tuller D (2015) Failure is not an option: standardization issues for post-quantum key agreement. In: Workshop on cybersecurity in a post-quantum world, p 21 12. Kitaev AY (1995) Quantum measurements and the abelian stabilizer problem. arXiv preprint quant-ph/9511026 13. Lyubashevsky V, Peikert C, Regev O (2010) On ideal lattices and learning with errors over rings. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, Berlin, pp 1–23 14. Peikert C (2014) Lattice cryptography for the internet. In: International workshop on postquantum cryptography. Springer, Berlin, pp 197–219 15. Proos J, Zalka C (2003) Shor’s discrete logarithm quantum algorithm for elliptic curves. arXiv preprint quant-ph/0301141 16. Regev O (2006) Lattice-based cryptography. In: Annual international cryptology conference. Springer, Berlin, pp 131–141 17. Shor PW (1999) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev 41(2):303–332 18. Wang Q, Wang D, Cheng C, He D (2021) Quantum2fa: efficient quantum-resistant two-factor authentication scheme for mobile devices. IEEE Trans Dependable Secure Comput 19. Zhang J, Zhang Z, Ding J, Snook M, Dagdelen Ö (2015) Authenticated key exchange from ideal lattices. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, Berlin, pp 719–751

An Overview of IoT and Smart Application Environments: Research and Challenges Chander Prabha , Sukhwinder Kaur , Jaspreet Singh , and Meena Malik

Abstract Everybody’s everyday life is surrounded by objects (devices) having wireless and computational capabilities. These devices are resource-constrained like actuators and sensors responsible for enabling their connection to the Internet. The Internet of things (IoT) is generally a computing model and serves as a key element for the development of smart applications used in distinct areas. The present ecosystem of IoT devices provides many use cases and other side communication solutions. These devices have distinct performance traits. However, the major challenge in the present scenario is the identification of suitable communication solution(s) with respect to IoT for a smart application. This paper focuses on different requirements of smart applications environments like smart cities, smart homes, smart health, smart factories, smart grids, smart retail, smart vehicles, and smart agriculture and tried to relate these with IoT solutions. The major characteristics are described of these smart applications (environments) along with their relevant IoT communication technologies. The challenges are described while implementing smart solutions using IoT that remain open for further research. The researcher would be able to understand the applicability of IoT in a real scenario. Keywords IoT technologies · Sensors · IoT applications · Smart IoT · Cloud

C. Prabha (B) Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India e-mail: [email protected] S. Kaur SDDIET, Barwala, Panchkula, India J. Singh · M. Malik Chandigarh University, Mohali, Punjab, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_10

111

112

C. Prabha et al.

1 Introduction The term IoT was first introduced in 1999 by Kevin [1]. It is an emerging paradigm of communication that simplifies our lives through the use of IoT devices and smart sensors via the Internet. The IoT’s growth and gained popularity in providing original solutions to numerous concerns related to government, trades, and industries (private/public) throughout the world, serve its purpose to make lives easier through the use of smart applications. It is rapidly now becoming a valuable aspect of our life. Over the last few years, a lot of technological development and research effort has been devoted to the IoT for numerous reasons leading to smart applications. The primary reason is the exponential growth of smart objects/devices that serve as a part of IoT infrastructure. A number of wearable devices would be having data processing and inherent sensing capabilities that are expected to grow and face trillions of interconnected things when being added to the Internet. The second reason is the urbanization trend. By 2027, the market size of IoT is expected to reach $347.6 billion in smart cities [2]. The solutions created in smart cities are with the help of technologies of the Internet and big data to improve citizens’ quality of life and address their problems. In today’s smart environment, a surge has been seen in resources that will be energy-efficient, efficient traffic management, smart waste management, and improved safety and security of the public. The IoT is therefore a significant facilitator for a smart environment [3]. Sensor nodes connected to various smart city applications create massive volumes of data that are currently underutilized. Using the current ICT infrastructure, diverse information may be brought together. To achieve information aggregation, wireless communication technologies like 3G, Wi-Fi, and LTE are used. The IoT includes PCs and other electronic devices in the context of embedded device usage and existing Internet infrastructure. The smart city idea is predicated on the ability to control billions of IoT devices from a central location [4]. Recent developments in low-power wireless network protocols have made it possible for administrators to remotely monitor and operate a variety of sensor networks and actuators. Sensor applications may be connected to be used by many web applications using this platform for intelligent operation. The use of semantic web algorithms and probabilistic reasoning methods aids in the utilization of very large volumes of data and information. A reasoning technique can be employed for extracting useful information by combining several smart city domains such as home, car, health care, and environment domains [5]. This study makes the following contributions: The critical aspects of IoT are discussed in Sect. 2 along with its literature. Section 3 discusses various smart environments through the use of IoT. Section 4 presents the key issues, challenges, and various IoT solutions. Finally, Sect. 5 concludes the paper with future scope.

An Overview of IoT and Smart Application Environments: Research …

113

2 Literature Review on the Critical Review of IoT Many researchers have demonstrated their work on IoT in an inter-disciplined manner concerning specific applications and its aspect [6]. In various application domains, IoT itself has proven its ability. In a smart city, wireless sensor networks (WSNs) are the elementary sources of dissimilar information gathering. The information supplied by distinct sensors periodically overlaps and is only limited in nature. Addressing the issues associated with incomplete data fusion is a research challenge [7]. The power and potential of IoT can be seen in numerous application fields in real life (Fig. 1). One of the prevalent IoT areas is the smart city, a major part of it is smart homes. An IoT-based central control unit is used for entire communication between devices used in the smart home via the Internet. Smart cities gained prominence in recent years, thus attracting a lot of research activity. In-house convenience is enhanced in smart homes. It also helps households to save money, such as decreased energy use resulting in a cheaper power bill. Smart cars [8] fitted with sensors and smart gadgets are also a part of smart cities. The IoT serves as a tool for developing new smart driving systems. The users have a safe driving experience with all control in one place automatically managed. Khajenasiri et al. [9] and Backhouse [10] conducted a review of IoT technologies with the goal of benefiting smart city applications for the control of smart energy. The IoT is currently being used in only a few application scenarios to benefit both technology and humans. IoT’s scope is quite vast, and it will be able to expand in the near future in all application areas. The conservation of energy is an integral aspect

Fig. 1 Application fields of IoT

114

C. Prabha et al.

and with IoT smart energy control systems can be developed, thus helping in saving money and energy. In relation to the smart city, they discussed IoT architecture. They mentioned that the most difficult task in accomplishing IoT architecture is the infancy of IoT software and hardware. They advised addressing these challenges in order to create a dependable, effective, as well as consumer-friendly IoT system. Alavi [11] has examined the subject of urbanization in cities. The rural-to-urban region migration of people has resulted from an increase in the number of inhabitants of cities. Thus, smart solutions are required for commuting, power, infrastructure, and health care. Smart cities delve into a variety of topics, including management of traffic, environmental quality management, public health and safety, smart lighting, smart trash collection scenarios, etc. With increasing urbanization, the infrastructural requirement for an enhanced smart city has created opportunities for innovators to provide smart city solutions. Weber [12] and Lin [13] addressed the security and privacy aspects of IoT. The private corporation using IoT should include permissions, authentication methods, threats susceptibility, and consumer secrecy in their commercial activities. In order to meet worldwide demands, a generic framework for privacy and security must be established, and in parallel, there is a need to examine and realize the problems and hurdles in security and privacy. Heer et al. [14] and Ponnalagarsamy et al. [15] identified a flaw in the security of the IP-based IoT ecosystem. For device connectivity, the web serves as the backbone of such systems. As a result, security vulnerabilities are a major concern. Furthermore, the framework should be created with the roles and responsibilities of each object defined in such systems. It must be scalable further to support integration from small-scale to large-scale IoT. Authorization and access control are other challenges in IoT that require a potential solution. Liu et al. [16] work is in the above direction. The two are important and critical for the verification of interacting parties and thus to prevent sensitive information loss. An authentication system was proposed by Liu et al. [16], and they tested it against snooping, key control, active attacks, and man-in-the-middle attacks. Kothmayr et al. [17] suggested a two-way authentication mechanism for the IoT. Attackers on the web are constantly on the lookout for safe content. The solution proposed by them ensures message security, authenticity, integrity, secrecy, storage consumption, and end-to-end delay. Li et al. [18] offered a radical approach to cloud infrastructures for data-centric IoT applications. Efficient solutions are required to serve a large number of applications pertaining to IoT on cloud infrastructure. According to Yan et al. [19], another critical aspect is trust management in IoT. It enables consumers to know and believe IoT services and applications without having to be concerned about unpredictability and threats. They studied several challenges in trust management and addressed their significance in relation to IoT developers and consumers. According to Noura et al. [20], interoperability is important in IoT because it facilitates the assimilation of services and devices from numerous dissimilar systems for creating an efficient solution. Several additional studies [21, 22] emphasized interoperability and highlighted the various obstacles that it faces in IoT.

An Overview of IoT and Smart Application Environments: Research …

115

Wang et al. [23] expressed their worries about the treatment of household wastewater. They identified many flaws in the existing wastewater treatment process and its dynamic surveillance system and proposed efficient IoT-based alternatives. The IoT can be extremely useful in wastewater treatment and performance monitoring. Another important domain in IoT is smart agriculture. Qiu et al. [24] developed an IoT-based four-layer intelligent surveillance platform architecture for facility farm ecosystems for managing the ecosystem of agriculture. Every layer is in charge of a particular task, and the architecture as a whole may produce a better ecological system with less human involvement. There are several methods and approaches for measuring and controlling air quality. AirCloud is a cloud-based air quality and surveillance system presented by Cheng et al. [25]. Using five months’ data, they implemented AirCloud and examined its effectiveness for a continuous period of two months. Another critical component of IoT is its compatibility with agricultural and environmental norms. In survey research, Talavera et al. [26] concentrated on this topic and provided the essential initiatives of IoT for environmental and agro-industrial problems. They stated that IoT activities in these sectors are visible. IoT is enhancing present technologies while also helping farmers and the community. Jara et al. [27] emphasized the significance of IoT-based patient health monitoring. They proposed that IoT sensors and devices in conjunction with the Internet can aid in patient health monitoring. They also suggested a structure and procedure to accomplish this goal.

3 Smart Environments This section discusses various smart environments pertaining to a smart city like smart homes, smart health care, smart grid systems, and smart factories. People are demanding more leisure in their lives as a result of new technologies. To provide linked solutions for the public, the smart city makes use of numerous software, communication network, and user interfaces. To make sure that only the most pertinent and significant data is supplied via the communication network, these cities largely rely on the use of technologies like edge computing. Various machine learning techniques used in IoT as part of its implementation are [28]: • Supervised learning: Monitoring is the foundation of learning. Mapping is done between the input variable (x) and the output variable in the supervised learning technique (y). Using a “labeled” dataset, we train the machines, and then the trained machines predict the output. Real-world applications of supervised learning include fraud detection, spam filtering, and risk assessment. • Unsupervised learning: There is no requirement for oversight. A dataset that is neither classified nor labeled is used to train machines. This algorithm’s

116

C. Prabha et al.

primary goal is to classify or group the unsorted dataset based on commonalities, patterns, and differences. Tourists who exhibit similar behavior, for instance, can be grouped together and used in targeted advertising. • Reinforcement learning: Here machine trains itself with its own behavior as per the surroundings continually using trial and error to achieve the best results. The objective is to maximize the cumulative outcomes because a better result yields a bigger reward. This type of machine learning is used to take specific decisions like controlling traffic signals. Example of reinforcement learning: Markov decision process.

3.1 Smart Homes In this modern generation of automated technologies such as automatic vehicles, automatic bots, automatic dishwashers, and so on, there is a demand for automated houses where people can accomplish things with the least amount of effort. Lighting, ventilation, and security are all controlled and automated in smart houses [29]. Smart homes use home automation that alters our homes so that a variety of chores are carried out automatically. The fundamental architecture enables detecting house circumstances, processing instrumented data, and leveraging microcontroller-enabled sensors to track home conditions and actuators for managing home-embedded devices. The smart home paradigm is rising. The reason for this is the cost-cutting trends and modernization. This is done by including the capacity to keep a consolidated record of events, as well as running machine learning procedures to deliver key cost factors, cost-cutting recommendations, and other relevant information. A smart home system consists of sensors, processors, actuators, software components, and a database for storage of processed data collected through cloud services and sensors. Figure 2 shows the elements used in smart home implementation with optional cloud connectivity. The potential of the cloud and huge resources can help IoT-based for overcoming their limitations in communication, storage, processing, backup, and recovery. The gathering of data, its analysis, and further transferring data to the cloud for next subsequent processing are steps in involved in the data processing. The security concerns are addressed in which, cloud computing may be private for highly secure data and public for others. Rapid advancements are being made with the assistance of enabling technologies like natural language processing and machine learning to better understand the demand and usage of technologies at home.

3.2 Smart Health The use of smart environments and IoT for health has significantly grown in the last twenty years as a result of considerable drops in sensor costs and advancements in

An Overview of IoT and Smart Application Environments: Research …

117

Fig. 2 Smart home environment with IoT and cloud connectivity

signal processing methods. A variety of sensors worn by a person or incorporated into the person’s environment will gather and process statistics of data continuously in order to first provide feedback or information to the person and then notify the family, medical staff, or other authorized individuals of the person’s state. Such IoT devices can be used for patient remote health monitoring in certain circumstances, such as post-surgery, or to assist individuals in longer living, and for better health [27]. Close examination of a person utilizing physiological data or even relevant associated information can offer information on the individual’s physical status as well as the capacity to issue alarms in situations of distress or other emergencies. Improving and/or monitoring the state of people with severe illnesses or recurring ailments is one of the most critical health challenges. Some smart gadgets and software, for example, can aid in the management of chronic illnesses like diabetes. The purpose of such services is to assist the individual in managing the effect of the disease and controlling it, which relies on measuring equipment and/or smartphones and other smart devices. Other data, like heart rate, can be collected in order to track the change of this data over time and, potentially, to trigger alerts, in emergency situations such as identification of distress situations [30]. Engineers and researchers are working on the development of efficient IoT devices to monitor a variety of health concerns such as obesity, diabetes, and depression.

3.3 Industry Automation and Agriculture The world’s population, by 2050 is expected to be around 10 billion. In our life, agriculture also plays an important role. Improvement is required in present agricultural techniques to feed a huge population. So, in order to boost agricultural output, there is

118

C. Prabha et al.

Fig. 3 IoT working in agriculture

a need to integrate agriculture and technology together. An example of one such technique is greenhouse technology [31]. The environmental conditions are adjusted in this technique to enhance output. However, human control is less effective, resulting in energy loss and decreased output. Figure 3 shows the working of IoT in the production of agriculture. Another advantage of IoT is the automation of industries. IoT has revolutionized the industry, managed its quality control and inventory management, handled logistics, and optimization of supply chain management.

3.4 Smart City Numerous technologies like cloud and wireless sensors can be combined with IoT servers to build a successful smart city. Environmental impact is another critical consideration in the smart city [32]. So, for designing and developing efficient and sustainable smart city infrastructure, the green technologies concept can be incorporated. To people’s societies, IoT provides various financial benefits and advantages for the well-being of its citizens like economic growth, industrialization, water quality management, and so on. For reducing the environmental negative impact on IoT devices and systems, IoT developers must be observant [26]. A lot of energy conservation is one of the issues associated with environmental effects on IoT devices. This field requires research to generate superior quality goods to manufacture the latest IoT devices having low energy consumption rates [33, 34].

An Overview of IoT and Smart Application Environments: Research …

119

3.5 Smart Vehicle Smart technologies that are being integrated into newly introduced automobiles can identify traffic congestion on the route. This helps in finding another optimal path for the driver and further reduces the city’s traffic congestion. Furthermore, low-cost smart IoT devices are to be integrated into all-range cars to monitor their operation, thus quite helpful in managing the health of the car. With the use of sophisticated sensors, self-driving automobile interaction is possible. This would help traffic move more smoothly than human-driven automobiles. However, the time needed for this technique to be executed globally. Until then, IoT devices can take relevant action by sensing traffic congestion ahead. As a result, the embedding of IoT devices should be taken into account by a transportation manufacturing firm into its manufactured vehicles for benefit of society [35].

3.6 Smart Grid Smart grid is another important application of IoT. These are very complicated and used to provide energy effectively, and reliably. Electricity consumption is monitored automatically. Furthermore, monitoring the performance of smart grids aids in detecting early flaws and responding more effectively in the event of a crisis [36].

4 Challenges and Key Issues in IoT Implementation This section discusses the challenges and key issues in smart application implementation. Corresponding threats, attacks, and their security requirements [37, 38] are also presented. A summary is presented in Table 1. • A major portion of smart home systems are built using sensors. These sensors raise the possibility for security risks like physical capture, in which the attacker might get control of a smart sensor, can damage it, can inject fake data, and even seize the system. As a result, smart home devices require proper measures of security to ensure user data privacy and confidentiality [39]. • It is difficult to secure devices in health care since they exchange information and are prone to vicious assaults. Furthermore, any security fissure might have serious ramifications for the life of the patient. Another risk is the privacy fissure if the information is compromised by hackers or other vicious actors [40]. • Smart agriculture and industry automation sensors need surveillance as per their usage [41]. Only authorized persons have access to machinery equipment while maintaining confidentiality.

120

C. Prabha et al.

Table 1 IoT security requirements w.r.t attacks and threats IoT application

Attacks and threats

Security requirements

Smart home

False data injection/physical capture

Privacy, confidentiality, integrity

Smart health

Denial of service/physical capture

Privacy, confidentiality, integrity

Smart agriculture

Denial of service/physical capture

Availability, integrity

Industry automation

False data injection/physical capture

Integrity, confidentiality

Smart grid

Denial of service/false data injection

Availability, integrity

Smart vehicles transport

Denial of service/physical capture

Privacy, confidentiality, integrity

Smart supply chain

Denial of service/access control

Privacy, integrity

• Smart grids are high-level targets for future threats due to their mission-essential nature, and any disruption in energy can cause financial harm to the region. Furthermore, the use of fake data injection hinders the data’s integrity [42, 43]. • Smart transportation gadgets need physical security and surveillance. Data integrity, confidentiality, and privacy are also jeopardized as a result of denial of service (DoS) and man-in-the-middle assaults [44]. • A smart supply chain needs robust security mechanisms to ensure that only authorized persons have permission to access data (integrity and privacy), as well as the physical security of storage media [45]. The Internet of things paradigm offers several aids to the IT field and individuals. The rise of IoT is enormous, and there are several security concerns and risks that must be dealt with [46]. Any technology faces numerous problems, like security, difficulty in real-world deployment, and other factors to consider while installing. Emerging technologies like data sciences and machine learning are revolutionizing the possible threats, and attackers are employing new and novel approaches to circumvent existing security procedures. As a result, security concerns must be addressed and handled in each and every area of IoT [47]. Various challenges IoT faces may comprise of: • • • • • •

Outdated equipment Identity authentication management Protection of privacy Security of data Trust management Hardware and software architectures.

The heterogeneous and diverse IoT environment demands particular security attention, and there is a requirement for innovative and dynamic countermeasures and

An Overview of IoT and Smart Application Environments: Research …

121

frameworks that provide end-to-end security inside IoT [48]. Services and support are required for IoT applications to store, manage, and interpret data acquired from devices like sensors. A lot of developments occurred in this area, with cloud and fog computing appearing to be the most significant approaches. A variety of technologies like IEEE 802.15.4, IEEE 802.11ah, ITU-T G.9959, LoRaWAN, Sigfox, MS/TP, NB-IoT, BLE, PLC, DECT-ULE, and NFC are used in IoT [49]. Each of these technologies is having its own range, frequency bands, bit rate, and particular topology support. Table 2 shows various IoT solutions technologies [50–52] and their features. Table 2 IoT solutions’ technologies’ main features in smart environments Example smart environment

Sensor type

Topologies

Technologies

Computing approach

Applications

Smart home

Dedicated

Star and mesh

IEEE 802.15.4, Cloud Wi-Fi, Bluetooth LE, ITU-T G.9959, RFID, PLC, DECT-ULE

Smartphone

N/A

Wi-Fi

Smart city

Dedicated

Star

LoRaWAN, Cloud, fog, NB-IoT, Sigfox and edge

Dedicated

Mesh

IEEE 802.15.4

Smartphone

N/A

Wi-Fi

Smart factory

Dedicated

Mesh

IEEE 802.15.4 TSCH

Fog

Reduced downtime, improved incident responses, etc

Smart health

Dedicated

Star

Bluetooth LE, RFID, W-Fi

Cloud and edge

Smartwatches, fitness trackers, ECG monitors, etc

Smart grid

Dedicated

Mesh

IEEE 802.15.4, Cloud and Sigfox, NB-IoT fog

Amazon Alexa, Google assistant, lawn irrigation system, etc

Fire sensors, smart meters, HVAC, video surveillance, etc

Wind generating units, rooftop solar photovoltaic units, etc

122

C. Prabha et al.

5 Conclusion and Future Scope IoT’s recent breakthroughs all around the world have drawn the interest of developers and researchers. These two are collaborating to expand technology on a big scale and serve society. However, enhancements can be achieved if different challenges and inadequacies in the existing technical techniques are taken into consideration. In this paper, a discussion has been made on various smart applications pertaining to IoT use, key issues, challenges, and on numerous IoT technologies. Due to the huge variety of alternatives, selecting the proper IoT technology that best matches the needs of a system having a smart application can be a difficult process. The process of creating smart environments includes incorporating information from surroundings, such as computer vision systems and the human factor. As humans are either directly or indirectly involved, consideration of their requirements may also be the major focus of smart environments.

References 1. Samaila MG, Neto M, Fernandes DA, Freire MM, Inácio RR (2018) Challenges of securing internet of things devices: a survey. Secur Priv 2(1):e20 2. Global IoT in smart cities market size, share and industry trends analysis report by component, by solution type, by services type, by application, by regional outlook and forecast, 2021–27. https://www.reportlinker.com/p06249506/?utm_source=GNW, last accessed 12th Oct 2022 3. Chin J, Callaghan V, Ben Allouch S (2019) The internet of things: reflections on the past, present, and future from a user-centered and smart environments perspective. J Ambient Intell Smart Environ 11(1) 4. Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M (2015) Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun Surv Tutorials 17(4):2347–2376 5. Tripathy AK, Tripathy PK, Ray NK, Mohanty SP (2018) iTour: the future of smart tourism: an IoT framework for the independent mobility of tourists in smart cities. IEEE Consum Electron Mag 7(3):32–37 6. Salam A, Hoang AD, Meghna A, Martin DR, Guzman G, Yoon YH (2019) The future of emerging IoT paradigms: architectures and technologies (Preprints) 7. Ahad MA, Paiva S, Tripathi G, Feroz N (2020) Enabling technologies and sustainable smart cities. Sustain Cities Soc 61 8. Badii C, Bellini P, Difino A, Nesi P (2018) Sii-mobility: an IoT/IoE architecture to enhance smart city mobility and transportation, services. Sensors 19(1) 9. Khajenasiri I, Estebsari A, Verhelst M, Gielen G (2017) A review on internet of things solutions for intelligent energy control in buildings for smart city applications. Energy Procedia 111:770– 779 10. Backhouse J (2020) A taxonomy of measures for smart cities. In: Proceedings of the 13th international conference on theory and practice of electronic governance, Athene, Greece, 23–25 Sept 2020 11. Alavi AH, Jiao P, Buttlar WG, Lajnef N (2018) Internet of things-enabled smart cities: stateof-the-art and future trends. Measurement 129:589–606 12. Weber RH (2010) Internet of things-new security and privacy challenges. Comput Law Secur Rev 26(1):23–30

An Overview of IoT and Smart Application Environments: Research …

123

13. Lin J, Yu W, Zhang N, Yang X, Zhang H, Zhao W (2017) A survey on internet of things: architecture, enabling technologies, security and privacy, and applications. IEEE Internet Things J 4(5):1125–1142 14. Heer T, Garcia-Morchon O et al (2011) Security challenges in the IP based internet of things. Wirel Pers Commun 61(3):527–542 15. Pommalagarsamy S et al (2020) IoT ecosystem—a survey on classification of IoT. In: Proceedings of the first international conference on advanced scientific innovation in science, engineering and technology, ICASISET 2020, 16–17 May 2020, Chennai, India 16. Liu L, Xiao Y, Philip-Chen CL (2012) Authentication and access control in the internet of things. In: 32nd international conference on distributed computing systems workshops, Macau, China. IEEE Xplore 17. Kothmayr T, Schmitt C, Hu W, Brunig M, Carle G (2013) DTLS based security and two-way authentication for the internet of things. Ad Hoc Netw 11:2710–2723 18. Li Y et al (2019) IoT-CANE: a unified knowledge management system for data centric internet of things application systems. J Parallel Distrib Comput 131:161–172 19. Yan Z, Zhang P, Vasilakos AV (2014) A survey on trust management for internet of things. J Netw Comput Appl 42:120–134 20. Noura M, Atiquazzaman M, Gaedke M (2019) Interoperability in internet of things: taxonomies and open challenges. Mob Netw Appl 24(3):796–809 21. Al-Fuqaha A, Guizani M et al (2015) Internet of things: a survey, on enabling technologies, protocols, and applications. IEEE Commun Surv Tutorials 17:2347–2376 22. Palattella MR, Dohler M et al (2016) Internet of things in the 5G era: enablers, architecture and business models. IEEE J Sel Areas Commun 34(3):510–527 23. Wang JY, Cao Y, Yu GP, Yuan M (2014) Research on applications of IoT in domestic waste treatment and disposal. In: IEEE proceedings of 11th World Congress on intelligent control and automation, Shenyang, China 24. Qiu T, Xiao H, Zhou P (2013) Framework and case studies of intelligent monitoring platform in facility agriculture ecosystem. In: IEEE proceedings 2013 second international conference on agro-geoinformatics (agro-geoinformatics), Fairfax, VA, USA, 12–16 Aug 2013 25. Cheng Y et al (2016) AirCloud: a cloud-based air-quality monitoring system for everyone. In: Proceedings of the 12th ACM conference on embedded network sensor systems, ACM, Memphis, Tennessee, 03–06, pp 251–265 26. Talavera JM et al (2017) Review of IoT applications in agro-industrial and environmental fields. Comput Electron Agric 142(7):283–297 27. Jara AJ, Zamora-Izquierdo MA, Skarmeta AF (2013) Interconnection framework for mHealth and remote monitoring based in the internet of things. IEEE J Sel Areas Commun 31(9):47–65 28. Ghazal TM, Hasan MK, Alshurideh MT et al (2021) IoT for smart cities: machine learning approaches in smart healthcare—a review. Future Internet 13:218 29. Umair M, Cheema MA et al (2021) Impact of COVID-19 on IoT adoption in healthcare, smart homes, smart buildings, smart cities, transportation and industrial IoT. Sensors 21:3838 30. Sebastian S, Ray PP (2015) Development of IoT invasive architecture for complying with health of home. In: Proceedings I3CS, Shillong, pp 79–83 31. Sisinni, Saifullah A, Han S, Jennehag U, Gidlund M (2018) Industrial internet of things: challenges, opportunities, and directions. IEEE Trans Ind Inf 14(11):4724–4734 32. Montori F, Bedogni L, Bononi L (2018) A collaborative internet of things architecture for smart cities and environmental monitoring. IEEE Internet Things J 5(2):592–605 33. Lim Y, Edelenbos J, Gianoli A (2019) Identifying the results of smart city development: findings from the systematic literature review. Cities 95 34. Shehab MJ, Kassem I, Kutty AA, Kucukvar M, Onat N, Khattab T (2022) 5G networks towards smart and sustainable cities: a review of recent developments, applications and future perspectives. IEEE Access 10:2987–3006 35. Luo Y, Zhu X, Long J (2019) Data collection through mobile vehicles in edge network of smart city. IEEE Access 7:168467–168483

124

C. Prabha et al.

36. Faheem M, Butt RA, Raza B, Ashraf MW, Ngadi MA, Gungor VC (2019) Energy efficient and reliable data gathering using internet of software-defined mobile sinks for WSNs-based smart grid applications. Comput Stan Interfaces 66(Article 103341) 37. Chanal PM, Kakkasageri MS (2020) Security and privacy in IoT: a survey. Wirel Pers Commun 115(2):1667–1693 38. Abdulghani HA, Nijdam NA, Collen A, Konstantas D (2019) A study on security and privacy guidelines, countermeasures, threats: IoT data at rest perspective. Symmetry 11(6) 39. Chin J, Callaghan V, Ben Allouch S (2019) The internet of things: reflections on the past, present and future from a user-centered and smart environments perspective. J Ambient Intell Smart Environ 11(1) 40. Cook DG, Duncan G, Sprint G, Fritz RL (2018) Using smart city technology to make healthcare smarter. Proc IEEE 106(4):708–722 41. Gavrilovic N, Mishra A (2020) Software architecture of the internet of things (IoT) for smart city, healthcare and agriculture: analysis and improvement directions. J Ambient Intell Humaniz Comput 12:1315–1336 42. Kabalci Y, Kabalci E, Padmanaban S, Holm-Nielsen JB, Blaabjerg F (2019) Internet of things applications as energy internet in smart grids and smart environments. Electrics 8(9):972 43. Masera M, Bompard EF, Profumo F, Hadjsaid N (2018) Smart (electricity) grids for smart cities: assessing roles and societal impacts. Proc IEEE 106(4):613–625 44. Xhafa F, Aly A, Juan AA (2021) Allocation of applications to fog resources via semantic clustering techniques: with scenarios from intelligent transportation systems. Computing 103(3):361–378 45. Kshetri N (2017) The evolution of the internet of things industry and market in China: an interplay of institutions, demands and supply. Telecommun Policy 41(1):49–67 46. Paiva S, Ahad MA et al (2021) Enabling technologies for urban smart mobility: recent trends, opportunities and challenges. Sensors 21:2143 47. Syed AS, Sierra-Sosa D et al (2021) IoT in smart cities: a survey of technologies, practices and challenges. Smart Cities 4:24 48. Makhdoom I, Zhou I, Abolhasan M et al (2020) A blockchain-based framework for privacypreserving and secure data sharing in smart cities. Comput Secur 88 49. Ejaz W, Anpalagan A (2019) Internet of things for smart cities: technologies, big data and security. Springer, Berlin 50. Bellini P, Nesi P, Pantaleo G (2022) IoT-enabled smart cities: a review of concepts, frameworks and key technologies. Appl Sci 12(3) 51. Ilyas M (2019) Internet of things (IoT) and emerging applications. J Systemics Cybern Inform 17(5):27–31 52. Alam T (2021) Cloud-based IoT applications and their roles in smart cities. Smart Cities 64(4)

Human Detection and Tracking Based on YOLOv3 and DeepSORT Bhawana Tyagi, Swati Nigam, and Rajiv Singh

Abstract This paper proposes an enhanced object tracking architecture based on YOLOv3 and DeepSORT. YOLOv3 is used for object detection, but in this case, we have only selected the human class. For object tracking, the DeepSORT, Kalman filter, and Hungarian algorithm are used. The occlusion issue is solved using the Kalman filter. The parallel approach is used in the suggested architecture to increase the base method’s speed. In a parallel approach, the input frame is fed simultaneously to the object detector and the object tracker, enabling the tracker to begin following a person as soon as the object detector recognizes them. Here, occlusion and speed are the two issues that we are concentrating on. The experimental outcomes using specific data demonstrate positive outcomes with faster operation. Keywords Human detection and tracking · YOLO · Deep learning · Transfer learning · DeepSORT

1 Introduction Computer vision is a branch of science with the goal of simulating human frame analysis, feature extraction, and result interpretation. Multiple object tracking is the most popular research topic among academics and in the field of computer vision because of the development of deep learning techniques [1]. Multiple target tracking (MTT) is another term for multiple object tracking (MOT). Typically, MOT includes the following steps for a particular video: The trajectories of each individual object are then produced after the initial identification and location of the multiple objects B. Tyagi · S. Nigam · R. Singh (B) Department of Computer Science, Banasthali Vidyapith, Banasthali, Rajasthan 304022, India e-mail: [email protected] B. Tyagi e-mail: [email protected] S. Nigam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_11

125

126

B. Tyagi et al.

and maintenance of their identification numbers [2]. The purpose of MOT is to process the input video to recognize and track the objects. Numerous applications of computer vision rely on MOT systems [3]. Two steps are used to design these systems. The first technique, object detection [4], finds the targeted objects in the video. If the object detection step is more accurate, the overall system performance will be good. In order to determine an object’s trajectory, tracking compares the identified objects in the current frame with those in earlier frames. When an object detection system is more accurate, there will be fewer missed detections and, as a result, the identified objects’ trajectories will be more precise and smoother. MOT is also referred to as a technique for tracking multiple moving objects [5]. MOT has a wide range of real-world applications, including robotics, traffic management, video surveillance, and self-driving vehicles. Recently, the need for human detection and tracking models is increasing rapidly. It has many applications in video surveillance like autonomous driving cars [6–8], face recognition [9, 10], crowd analysis [11, 12], human–computer interaction [13, 14], and security analysis [15, 16]. It is one of the most challenging tasks in the field of computer vision, and when there is MOT, then it becomes more challenging [17]. Many challenges are associated with human detection and tracking models, such as occlusion, abrupt changes in speed, and variations in the appearance of an object, such as color, size, body articulation, pose, shape, and so on, which can affect the performance of the tracking algorithms. To overcome these challenges, many methods have been proposed [18–20]. But it is very difficult to cope with these challenges in a real-life scenario. The real-time MOT system needs substantial computing resources and can deal with real-time challenges like switching of identification numbers and different detection failures [21, 22]. First, to perform object tracking, it is necessary to identify the target object. Here, a human detection and tracking system are proposed. We have employed YOLOv3 [23] for human detection and the DeepSORT framework for tracking human movement in recorded or real-time videos. Initially, YOLOv3 detects the human in the frames and generates the bounding box, which is then tracked by DeepSORT. This research can also be used to track humans for purposes of video surveillance and crowd security. The key contributions of this research work are as follows: • The customized YOLOv3 is used to perform the object detection, where model is trained on the single class person using transfer learning. • The parallel approach is used where the input frame is fed to the object detector and object tracker in parallel, so that the tracker starts tracking an individual immediately once it is identified by the object detector.

2 Related Works Recently, the MOT issue has drawn the attention of many researchers working in the field of computer vision. The authors [3, 24–26] covered a thorough explanation

Human Detection and Tracking Based on YOLOv3 and DeepSORT

127

of the MOT methods on offer. MTT algorithm based on YOLO and long shortterm memory (LSTM) was proposed by Tan et al. [27]. Multiple objects can be found by using YOLO, and LSTM was used to determine the temporal relationship between the frames. The degree of similarity between the objects that were identified was determined using Euclidean distance. Sun et al. [28] proposed the deep affinity network (DAN) to track multiple objects by jointly learning the appearances of target objects and their affinities in frame sequences. The feature pyramid Siamese network (FPSN), which combines the feature pyramid network and the Siamese architecture, was proposed by Lee and Kim [29] to address the issue of structural simplicity. In order to increase MOT’s accuracy, they also used data on the motion of various objects. To gather long-term data about several objects, a probabilistic auto-regressive motion model was put forth by Saleh et al. [30] to calculate the likelihood of multiple objects in sequences of frames. To deal with the problem of occlusion, Peng et al. [31] proposed the box plane matching method so that it can work in dense crowds. Initially, the noisy detections were removed by using a layer-wise aggregation discriminative model. Then the Global Attention Feature Model was used to extract appearance features of the detected objects, and then it was used to compute the similarity in appearance between the current detections and the previous traces. The deep neural network named the flow fuse tracker was proposed by Zhang et al. [32]. It focused on target flowing and fusing. The flow tracker DNN was used for target flowing, and for target fusing, the fuse tracker DNN was used. The head tracking method of people in a crowd was proposed by Sundararaman et al. [33]. They proposed a head detector called HeadHunter to detect the tiny heads in the crowded scene. To track heads, the particle filter and color histogram were used with HeadHunter and to maintain the unique identity of an individual in the crowd, they have proposed a new metric, i.e., IDEucl. Stadler and Beyerer [34] used clustering for multiple object tracking to deal with the problem of occlusion. They proposed cluster-aware non-max suppression to minimize the missed detection and to correct the incorrect identification number, past track information was used.

3 Proposed Architecture This section consists of the proposed methodology and workflow of our work in detail. Figure 1 shows the basic flow of standard object tracking. Initially, the input video is divided into frames and fed to the object detector. Then an object detector algorithm is used to detect the objects, and the tracker algorithm is used to track the objects identified by the detector. Suppose that if n objects are identified, then those n objects will be serially tracked by the tracker algorithm. As it uses the serial approach, it takes more time. In the proposed approach, YOLOv3 is used for object detection and the DeepSORT algorithm [35], along with the Kalman filter and Hungarian algorithm, for person tracking. As the efficiency of the tracker is based on the result of the detection

128

B. Tyagi et al.

Fig. 1 Flow of standard object tracking

Fig. 2 Proposed approach

process, here we are considering a single class, i.e., human to detect. In the previous version, the tracker started its work after the completion of the detection phase. The proposed approach is depicted in Fig. 2. Here, the input frame is fed to the detector and tracker simultaneously, so that whenever the YOLOv3 detects a human, the tracker starts tracking that person at the same time. In the proposed architecture, the detector, and the tracker work in parallel. A queue is inserted between the detector and the tracker for synchronization. This process broadly consists of two phases, i.e., object detection and object tracking.

3.1 Phase 1—Object Detection This phase consists of the detection of objects. Here we have used YOLOv3 for object detection. In YOLO, there are 24 convolutional layers and 2 fully connected

Human Detection and Tracking Based on YOLOv3 and DeepSORT

129

Fig. 3 Steps for object detection via YOLOv3

layers. The basic flow of object detection is shown in Fig. 3. First, the input frame is divided into N grids of M × M size. This helps with the detection and localization of an object. Second, for each object, the bounding box is drawn based on the grids, and it shows the label along with its confidence score. As there are multiple bounding boxes corresponding to an object, in the third step non-max suppression is performed to eliminate the bounding box which has less than 0.5 confidence score. The steps for human detection are depicted in Fig. 4. Initially, the customized dataset is created using the Open Image Dataset (OIDv4). The COCO-person dataset [36] is also downloaded. The dataset consists of three parts: train, test, and validation. LabelTool was used to annotate the dataset. For training, three files named .cfg, .data, and names were created using YOLOv3. In the corresponding directory, the dataset and annotations were copied. Here we are taking only one class, i.e., human. Pretrained weights [37] are used to train the YOLOv3. The neural networks are trained by Darknet [38]. All weights get stored in a backup file after completing 400 iterations, and the best weight will be used to detect the human. For testing, the following arguments are required: input image; .cfg file; .data file; trained weights; and .name file. Non-max suppression is used to eliminate more than one bounding box corresponding to an object; it drops the bounding box if the confidence score is less than 0.5.

Fig. 4 Steps for customized object detection

130

B. Tyagi et al.

3.2 Phase 2—Object Tracking In this phase, the input video is processed and converted into an image sequence. YOLOv3 is used to identify the object. Here we have taken a single object, i.e., a human. Here we are using the DeepSORT algorithm along with the Kalman filter and the Hungarian algorithm for person tracking. To predict the state of the target, the Kalman filter is used as it takes the state vector of the target. It can deal with the problem of occlusion as it can predict the possible movement path of an object based on the series of observed measurements over time. In the tracking process, the detected person is assigned a unique identification number and then the movement trajectory of that person is tracked. This process tracks the movement trajectories of all the individuals while maintaining their own unique identification number. Finally, the output is saved as a video file in audio video interleave (AVI) format.

4 Experiments and Results YOLOv3 is used to detect the object, as here we are using a single class, i.e., human. So, the accuracy of detection is improved, as depicted in Fig. 5. The description of Fig. 5 is shown in Table 1. For image 1, detection accuracy for person 1 is increased by 8%, for person 2 by 5%, and for person 3 by 35%. The detection accuracy for persons 1, 2, 3, and 4 is increased by 8%, 10%, 7.1%, and 3% for the sample image 2.

Image 1

Image 2

Fig. 5 Object detection accuracy (first column represents object detection accuracy by YOLOv3 and second column represents object detection accuracy by custom YOLOv3)

Human Detection and Tracking Based on YOLOv3 and DeepSORT

131

Table 1 Quantitative results comparison for object detection S. No. YOLOv3

Custom YOLOv3

Observation

1

Person1-92%, Person1-100%, Person2-95%, Person3-65% Person2-100%, Person3-100%

Better detection accuracy of proposed model

2

Person1-91%, Person1-99%, Better detection accuracy of Person2-90%, Person2-100%, proposed model Person3-91%, Person4-96% Person3-98%, Person4-99%

The performance of detector is evaluated in terms of mean accuracy precision (mAP). The formula of mAP is described in Eq. 1, where N is the number of classes, and here, we have considered only one class. N 1 APi N i=1

mAP =

(1)

Table 2 shows the comparison of mAP on the customized and COCO-person dataset. It shows that the customized YOLOv3 reports 4.7% better mAP than the second-highest mAP. The trends of loss function are depicted in Fig. 6. To evaluate our proposed approach, we have used multiple object tracking accuracy (MOTA). It is defined in Eq. 2. n MOTA = 1 −

t

FNt + FPt + IDSt n , t GTt

(2)

where FNt represents false negative, FPt represents false positive, IDSt denotes the switch identity at t time, and GTt is the ground truth. Table 3 depicts the tracking performance on the MOT17 [45] dataset. The proposed approach reports the best MOTA among the other existing approaches. Although the IDSt is minimum in [48], the [46] reports the minimum FP, whereas FN is minimum Table 2 Comparison of mAP on COCO-person and customized dataset

Model

mAP

YOLOv2 [39]

54.3

YOLOv2 tiny [40]

39.7

YOLOv3

58.4

YOLOv3 tiny [41]

57.2

YOLOX [42]

51.2

YOLOX tiny

32.8

PP-YOLOE-M [43]

49.1

PP-YOLOv2 [44]

50.3

YOLOv3 (customized)

61.3

132

B. Tyagi et al.

Fig. 6 Performance of loss function and mAP while training the model

Table 3 Tracking performance on the MOT17 [45] test dataset Tracker

MOTA

IDSt

FP

FN

MHT_DAM [46]

50.7

2314

22,875

252,889

FWT [47]

51.3

2648

24,101

247,921

jCC [48]

51.2

1802

25,937

247,822

MOTDT17 [49]

50.9

2474

24,069

250,768

Proposed method

51.8

2301

23,112

246,221

in our approach. The visualization of the testing result of tracker is depicted in Fig. 7. The method gives the unique id to the person when it first appears in the frame and then identifies and tracks the same person with the same ids in subsequent frames. The green color represents the bounding box, red color represents unique identification number that is assigned to the individual when it first appears in the video and blue color represents the tracking path.

5 Conclusions and Future Scope In this paper, an enhanced method for object tracking based on YOLOv3 and DeepSORT is proposed. Object detection plays the most important role in ensuring accurate object tracking. Here, YOLOv3 is used for object detection, and it only considers one class, which is human. Along with the Kalman filter and Hungarian algorithm, the DeepSORT is utilized for object tracking. The Kalman filter is utilized to solve the occlusion problem. This architecture is novel because it parallels the processes of object detection and object tracking. Due to the concurrent operation of the detector and tracker, the rate of object tracking is accelerated. This work can be expanded to analyze the behavior of individuals in crowds for security, crowd management, and

Human Detection and Tracking Based on YOLOv3 and DeepSORT

a

133

b

Fig. 7 Testing result on sample videos of MOT-17 dataset

other purposes. As the precision of tracking is directly proportional to the ability of the detector to detect the object, future research can further enhance the detection capability.

References 1. Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: multiobject tracking and segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7942–7951 2. Luo W, Xing J, Milan A, Zhang X, Liu W, Kim TK (2021) Multiple object tracking: a literature

134

B. Tyagi et al.

review. Artif Intell 293:103448 3. Ciaparrone G, Sánchez FL, Tabik S, Troiano L, Tagliaferri R, Herrera F (2020) Deep learning in video multi-object tracking: a survey. Neurocomputing 381:61–88 4. Emami P, Pardalos PM, Elefteriadou L, Ranka S (2020) Machine learning methods for data association in multi-object tracking. ACM Comput Surv (CSUR) 53(4):1–34 5. Jainul Rinosha SM, Augasta G (2021) Review of recent advances in visual tracking techniques. Multimedia Tools Appl 80(16):24185–24203 6. Jeong Y, Son S, Jeong E, Lee B (2018) An integrated self-diagnosis system for an autonomous vehicle based on an IoT gateway and deep learning. Appl Sci 8(7):1164 7. Bansal P, Kockelman KM (2018) Are we ready to embrace connected and self-driving vehicles? A case study of Texans. Transportation 45(2):641–675 8. Lotfi F, Taghirad HD (2021) A framework for 3d tracking of frontal dynamic objects in autonomous cars. Expert Syst Appl 183:115343 9. Shi X, Chai X, Xie J, Sun T (2022) Mc-gcn: a multi-scale contrastive graph convolutional network for unconstrained face recognition with image sets. IEEE Trans Image Process 31:3046–3055 10. Neto JBC, Ferrari C, Marana AN, Berretti S, Bimbo AD (2022) Learning streamed attention network from descriptor images for cross-resolution 3d face recognition. ACM Trans Multimedia Comput Commun Appl (TOMM) 19(1):1–20 11. Rezaei F, Yazdi M (2021) Real-time crowd behavior recognition in surveillance videos based on deep learning methods. J Real-Time Image Proc 18(5):1669–1679 12. Swathi HY, Shivakumar G (2021) Hybrid feature-assisted neural model for crowd behavior analysis. SN Comput Sci 2(4):1–11 13. Mekler ED, Hornbæk K (2019) A framework for the experience of meaning in human-computer interaction. In: Proceedings of the 2019 CHI conference on human factors in computing systems, pp 1–15 14. Kashef M, Visvizi A, Troisi O (2021) Smart city as a smart service system: human-computer interaction and smart city surveillance systems. Comput Hum Behav 124:106923 15. Siddique A, Medeiros H (2020) Tracking passengers and baggage items using multi-camera systems at security checkpoints. arXiv preprint arXiv:2007.07924 16. Xu R, Nikouei SY, Chen Y, Polunchenko A, Song S, Deng C, Faughnan TR (2018) Realtime human objects tracking for smart surveillance at the edge. In: 2018 IEEE international conference on communications (ICC). IEEE, pp 1–6 17. Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: European conference on computer vision. Springer, Berlin, pp 107–122 18. Brunetti A, Buongiorno D, Trotta GF, Bevilacqua V (2018) Computer vision and deep learning techniques for pedestrian detection and tracking: a survey. Neurocomputing 300:17–33 19. Zuo M, Zhu X, Chen Y, Yu J (2022) Survey of object tracking algorithm based on Siamese network. J Phys Conf Ser 2203:012035 20. Tong K, Wu Y, Zhou F (2020) Recent advances in small object detection based on deep learning: a review. Image Vis Comput 97:103910 21. Nguyen HQ, Nguyen TB, Le TA, Le TL, Vu TH, Noe A (2019) Comparative evaluation of human detection and tracking approaches for online tracking applications. In: 2019 international conference on advanced technologies for communications (ATC). IEEE, pp 348–353 22. Llamazares Á, Molinos EJ, Ocaña M (2018) Detection and tracking of moving obstacles (datmo): a review. Robotica 38(5):761–774 23. Redmon J, Farhadi A. Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 24. Soleimanitaleb Z, Keyvanrad MA, Jafari A (2019) Object tracking methods: a review. In: 2019 9th international conference on computer and knowledge engineering (ICCKE). IEEE, pp 282–288 25. Rakai L, Song H, Sun S, Zhang W, Yang Y (2021) Data association in multiple object tracking: a survey of recent techniques. Expert Syst Appl 116300

Human Detection and Tracking Based on YOLOv3 and DeepSORT

135

26. Sugirtha T, Sridevi MA (2022) Survey on object detection and tracking in a video sequence. In: Proceedings of international conference on computational intelligence. Springer, Berlin, pp 15–29 27. Tan L, Dong X, Ma Y, Yu C (2018) A multiple object tracking algorithm based on yolo detection. In: 11th international congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI). IEEE, pp 1–5 28. Sun S, Akhtar N, Song H, Mian A, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119 29. Lee S, Kim E (2018) Multiple object tracking via feature pyramid Siamese networks. IEEE Access 7:8181–8194 30. Saleh F, Aliakbarian S, Rezatofighi H, Salzmann M, Gould S (2021) Probabilistic tracklet scoring and inpainting for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14329–14339 31. Peng J, Gu Y, Wang Y, Wang C, Li J, Huang F (2020) Dense scene multiple object tracking with box-plane matching. In: Proceedings of the 28th ACM international conference on multimedia, pp 4615–4619 32. Zhang J, Zhou S, Chang X, Wan F, Wang J, Wu Y, Huang D (2020) Multiple object tracking by flowing and fusing. arXiv preprint arXiv:2001.11180 33. Sundararaman R, De Almeida Braga C, Marchand E, Pettre J (2021) Tracking pedestrian heads in dense crowd. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3865–3875 34. Stadler D, Beyerer J (2021) Multi-pedestrian tracking with clusters. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–10 35. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing, pp 3645–3649 36. https://cocodataset.org/#explore 37. https://github.com/AlexeyAB/darknet 38. https://pjreddie.com/darknet/yolov1. Last accessed 2022/08/13 39. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 40. Wai YJ, Bin Mohd Yussof Z, Bin Salim SI, Chuan LK (2018) Fixed point implementation of Tiny-Yolo-v2 using OpenCL on FPGA. Int J Adv Comput Sci Appl 9(10):506–512 41. Adarsh P, Rathi P, Kumar M (2020) YOLO v3-tiny: object detection and recognition using one stage improved model. In: 2020 6th international conference on advanced computing and communication systems (ICACCS), Mar 6. IEEE, pp 687–694 42. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in. arXiv preprint arXiv: 2107.08430 43. Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, Lai B (2022) PP-YOLOE: an evolved version of YOLO. arXiv preprint arXiv:2203.16250 44. Huang X, Wang X, Lv W, Bai X, Long X, Deng K, Yoshie O (2021) PP-YOLOv2: a practical object detector. arXiv preprint arXiv:2104.10419 45. Multiple object tracking benchmark, https://motchallenge.net/data/MOT17/. Last accessed: 2022/08/13 46. Kim C, Li F, Ciptadi A, Rehg JM (2015) Multiple hypothesis tracking revisited. In: Proceedings of the IEEE international conference on computer vision, pp 4696–4704 47. Henschel R, Leal-Taixé L, Cremers D, Rosenhahn B (2018) Fusion of head and full-body detectors for multi-object tracking. In: Computer vision and pattern recognition workshops (CVPRW) 48. Keuper M, Tang S, Andres B, Brox T, Schiele B (2018) Motion segmentation and multiple object tracking by correlation co-clustering. IEEE Trans Pattern Anal Mach Intell 42(1):140–153 49. Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. ICME

Smart City: Road Traffic Monitoring System Based on the Integration of IoT and ML Komal Saini and Sandeep Sharma

Abstract One of the critical challenges generated by globalization is managing the traffic on the roads. The mishandling of road traffic because of its varying nature hinders efficient traffic flow, consumes time, and poses a risk to road safety. All these problems can be resolved with the aid of an efficient and trustworthy smart road traffic monitoring (SRTM) system. Even though a substantial amount of study has been done on road traffic management, still it remains an active topic of research. Evolving techniques like the Internet of things (IoT) and machine learning (ML) may help in the development of an efficient and robust system for monitoring traffic. Moreover, by integrating these techniques, decision-making mechanisms can be improved, and even urban evolution can be promoted. Therefore, the primary purpose of this paper is to study the role of the Internet of things (IoT) and machine learning (ML) in smart road traffic monitoring (SRTM) scenarios independently as well as when collaborating. Further, to gain a deeper understanding of the system, several IoT and ML frameworks for road traffic management are examined including the techniques used, outcomes, and their future work also. From the comparative analysis of the frameworks, it is seen that IoT and ML when used together in traffic management prove to be much more efficient. Moreover, this paper gives only the theoretic review of the state of traffic monitoring system and not any kind of practical implementation. So, in future, IoT and ML-aided efficient framework for road traffic monitoring will be designed and implemented. Keywords Internet of things (IoT) · Machine learning (ML) · Traffic monitoring · Edge computing · Cloud computing

K. Saini (B) · S. Sharma Guru Nanak Dev University, Amritsar 143005, India e-mail: [email protected] S. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_12

137

138

K. Saini and S. Sharma

1 Introduction The intellect of a smart city depends upon the technologies used to enhance the standard of living in this modern era. A city can be made smart in various ways like smart living, home automation, smart society, smart environment, smart transportation, etc., and the benefits provided by these ways produce high social values [1]. Smart devices which regularly generate a huge amount of data are installed within the infrastructure of the smart city scenario and further processed to achieve the smartness of the infrastructure [2]. Smart traffic management is one of the key areas for smart city development because overcrowding of vehicles on the roads is a serious problem that worsens as cities grow and are currently becoming a great obstacle for many nations worldwide [3]. Traffic congestion has the greatest impact on the economy and health because it causes delays in the completion of domestic activities, resulting in a massive loss of time, and attention, as well as various extra expenditures [4]. Moreover, every year a huge number of deaths are caused by traffic accidents, which also inflict numerous impairments and damages to the lives of the people [5]. Traditional traffic management systems have many problems like a lack of identifying the correct vehicle, the equal time of traffic lights which wastes time when the route is clear, etc. Due to these problems, it is incapable of handling the dynamic traffic on the roads and managing the traffic bottlenecks. As a result, the need for a better solution is required which can be acquired by the “smart road traffic monitoring” (SRTM) systems [6]. Automatic traffic signal updating, highway traffic monitoring, handling emergencies, etc., are some of the examples of traffic management that make up the smart road traffic monitoring (SRTM) system [7]. In addition, managing real-time traffic is a significant problem. Recent developments in information technologies can help resolve these issues. Smart things or devices, wearable sensors, and many other components are connected in a network that forms the Internet of things (IoT). These devices significantly improved intellectual ability over recent years, enabling device communication and execution of various other tasks [6] and producing a significant quantity of data, which needs a lot of processing, space, and energy to analyze [8]. Consequently, advanced analytics and other technologies including deep learning, reinforcement learning, machine learning, etc., may also be used for the SRTM system [9]. Smart transportation can help smart city development as future smart cities are predicted to heavily rely on intelligent transportation system (ITS), which will improve mobility and transport management, driving safeness, while also increasing energy effectiveness and environmental pollution reduction [10]. In this paper, the role of the Internet of things (IoT) with machine learning (ML) in the smart road traffic monitoring (SRTM) system has been explored in detail. The rest of the paper has been organized into various sections. Section 2 gives the review methodology. Sections 3 and 4 explain the overview of IoT, its role in the SRTM system, and an overview of ML, and its role in the SRTM system, respectively. Then, Sect. 5 describes the collaborative use

Smart City: Road Traffic Monitoring System Based on the Integration …

Total

Relevant

9 3

1 1

8 6 1 1

3 3

7 5

13

18 15

24

Fig. 1 Articles processed for the study

139

of IoT and ML in smart road traffic monitoring systems followed by the comparisons of the existing frameworks. Section 6 gives the conclusion of the study.

2 Methodology This paper examines the state of road traffic monitoring using Internet of things (IoT) and machine learning (ML). Various resources and databases of published articles from popular journals, conferences, and proceedings were used to extract works of literature. Search terms like “road traffic monitoring”, “intelligent transportation systems”, “IoT and ML techniques for traffic management”, and “frameworks for traffic monitoring” were used to find the existing literature. The papers were downloaded, reviewed, synthesized, evaluated, criticized, and compiled. The following are some of the scholarly articles that were searched like Elsevier, Springer, IEEE, ACM, Google scholar, and others. In total, around 47 publications were used in the research. Figure 1 shows the articles that have been processed for this study.

3 IoT The Internet of things (IoT) is a network of interconnected things that consists of various components like sensing devices, actuators, Web applications, and other technological developments for communication via the Internet [11]. IoT system allows these things to interact and share information while completing a substantial task. Data analytics techniques are used to analyze the data produced by the devices to extract useful facts regarding the system. Most of the IoT data are processed and kept on an extremely scalable cloud server to make the proper analysis, which will be used for decision-making in future [12]. However, it is getting more challenging for the cloud to satisfy all the needs due to the rise in IoT devices as well as the

140

K. Saini and S. Sharma

plethora of information collected [13]. Such massive data transfers need an excessive amount of network bandwidth, which raises the price of the services provided by the cloud [14]. Moreover, one drawback of traditional cloud services is that the response time is too long [15]. Edge computing, therefore, plays an important role in resolving these problems. Edge computing is a technology that moves the processing capability from an existing cloud center toward the edges of the network near the client side. Performing the calculations at the edge can lower the latency, reduce network bandwidth, enhance cyber security, etc. [16].

3.1 Role of IoT in Smart Road Traffic Monitoring With the development of the Internet of things (IoT), the idea of smart cities has been enhanced. From the various recent research studies in the field of transportation, it can be seen that the concept of IoT may play a significant role in smart road traffic management by integrating things/devices via the Internet for communication, as well as handling the traffic movements on roads like tracking and identifying the vehicle, etc. [17]. One of the crucial factors affecting the dangers related to road traffic incidents is increased speed of the vehicles. So, in order to identify the motorists who are breaking traffic rules, a method based on IoT and image processing is presented in [18]. The most widely used and budget sensors for the vehicle detection system are magnetic, acoustic, and infrared sensors [19]. A significant quantity of data is being produced by the growing usage of automobiles and smart vehicle technologies. To provide actionable insights from such data, big data analytics must be powerful [20]. IoT and cloud computing might offer the computing infrastructure needed to gather and handle such large data [21]. Moreover, real-time communication occurs between the vehicles, and response problems that arise during message transmission in real time might reduce the system’s performance. To resolve the problems of the traditional cloud, edge computing may be helpful in the SRTM system [22]. Hence, for managing the hybrid and dynamic nature of traffic on the roads, and for handling traffic jams, IoT in smart transportation will form a strong and vital basis. Figure 2 shows the IoT-based edge-cloud architecture for a smart road traffic monitoring (SRTM) system. Level 1 includes devices, sensors, cameras, etc., from which the real-time traffic data are gathered and further transmitted to level 2, i.e., the edge. Here, edge computing plays an important role by pre-processing the data and extracting only the essential data that must be transferred to the cloud for additional analysis [23]. Then, the last level 3 is the main cloud, where the data are stored and processed to possibly forecast and control traffic congestion. Moreover with the advantages of IoT, there may also exist some of the disadvantages of IoT in transportation systems. If in any case the gadgets and communication network malfunctions or even stops working, it might cause property loss in addition to casualties to passengers and drivers as well. So these issues are to be well kept in mind before using IoT in any application [24].

Smart City: Road Traffic Monitoring System Based on the Integration …

141

Fig. 2 IoT-based edge-cloud architecture

4 Machine Learning Along with IoT, the field of machine learning (ML) is also expanding rapidly. ML had evolved as one of the biggest game-changing technical innovations in recent times [25]. It is a key component of artificial intelligence (AI), giving a system the capability to learn and decide on its own without being given explicit instructions. Data analyzes are frequently done using machine learning. According to the most current figures in [26], the market for ML and AI-based technologies grew by $1.4 billion in 2016 and will expand to $59.8 billion by 2025. The plethora of raw data generated by IoT devices is of no use until processed and analyzed. The collected data are stored in the cloud where AI methods like machine learning (ML), deep learning (DL), supervised learning, reinforcement learning, etc., are used to process and analyze the data for pattern recognition, and insights, and even to do forecasting for future use [23, 27].

4.1 Role of ML in Smart Road Traffic Monitoring Huge traffic volume in densely populated areas causes significant energy and time loss, as well as potential damage to roads and other transportation infrastructure [28]. Machine learning, when used in smart road traffic monitoring (SRTM) systems, has the potential to improve passenger safety, lower the number of accidents on the road and lessen traffic congestion. ML can be used in various ways in smart transportation systems like in predicting the traffic flow on the roads. In order to do short-term traffic speed prediction, this work [29] examines the effectiveness of deep belief networks (DBN) and LSTM considering the rainfall conditions as well. The study in [30] first knows the relation among weather conditions and traffic flow; secondly, it improves the prediction of traffic flow using a deep learning approach. Moreover, ML also

142

K. Saini and S. Sharma

helps in predicting the travel time as it is beneficial for controlling traffic, planning the route, and so on. Duan et al. [31] explored the deep learning LSTM neural network model to predict the travel time. Other applications of ML can be in vehicle detection [32], vehicle classification [33], and recognizing the driving behavior [34]. ML technologies also help the motor industry throughout the automobile production process, and it also helps in reducing the number of accidents on roads, predicting the accidents based on road conditions [35]. The flowchart in Fig. 3 illustrates how machine learning is used to regulate traffic on roads. The raw vehicular data collected from various sources like sensors, cameras, RFID tags, etc., are pre-processed to extract the useful data in a structured format. Then, the final processed data are gone through the training and testing phase in which various ML algorithms are used for making the final decision regarding traffic management. Fig. 3 Machine learning process in SRTM

Smart City: Road Traffic Monitoring System Based on the Integration …

143

5 IoT and ML in Smart Road Traffic Monitoring In smart road traffic management, IoT and ML technologies are applied in a variety of ways. The number of both personal and commercial automobiles increases with the growth of any nation. This has led to traffic jams, logistical delays, and an increase in the frequency of car crashes and pollution. As a result, the need for SRTM systems has risen [36]. Manual traffic control methods, on the other hand, need more labor because these systems possess extremely weak traffic regulations and administration cannot successfully control traffic in all regions using a manual approach [6]. Therefore, a smart road traffic monitoring (SRTM) system is quite often used for dealing with traffic problems. It can help to minimize traffic congestion while also enhancing transportation quality for both businesses and people. The digital devices like cameras, sensors, routers, smart wearable, smartphones, etc., are being widely used in intelligent transportation systems to automatically monitor and control the traffic as well as reduce the congestion. In [37], a smartphone is used to track the location of the public buses in real time so that the waiting time of the daily passengers who are waiting on the bus stands for buses can be reduced. Hence, integrated Internet of things (IoT) and machine learning (ML) are being used to regulate road traffic. The advantages of both IoT and ML are used for generating real-time smart systems which not only collect data but also produce statistics and pattern recognition of traffic data. Various frameworks for smart road traffic management (SRTM) have been presented by many researchers. Table 1 presents the comparative analysis of existing ML and IoT frameworks for smart road traffic monitoring.

6 Conclusion The study examines the significance of the Internet of things (IoT) and machine learning (ML) in smart road traffic monitoring (SRTM) systems. In recent years, IoT and ML have been receiving a lot of exposure owing to the advantages they provide in the development of smart systems. In this paper, a survey of SRTM systems using IoT and ML was conducted. It was discovered that research studies based on IoT and ML for intelligent traffic management have increased in the last 2–3 years because of their capability to resolve the challenging issues that have emerged in traditional methods, such as handling the dynamic nature of traffic, delay concerns, data analysis, and so on. Furthermore, many researchers have explored the usage of hybrid IoT–ML systems to improve the efficiency of the real-time traffic monitoring system. Comparative analysis of the existing work is given the tabular form. Other than IoT and ML, more advanced technologies are needed to be explored. The goal for future work is to present a real-time IoT and ML-aided smart road traffic monitoring (SRTM) framework and also work on the security issues faced.

Based on weather data, to predict the traffic flow

To propose an accident detection system using smart systems

[47]

1. To efficiently manage IoT + Data analytics traffic jams at intersections, an algorithm for traffic lights is proposed 2. To detect traffic congestion in a specific area and to evaluate its accuracy, an algorithm for tweet classification is proposed

[40]

[41]

Traffic congestion is forecasted on the highways which define different traffic states

[39]

Method

[38]

LSTM AR-LSTM CNN

K-nearest neighbor, regression tree, feedforward neural networks

IoT + Machine learning (ML)

–

Hidden Markov model (HMM) + Contrast measure

Using real-time monitoring sensors on European highways

Collected using parking sensors and radar data

Using RFID, WSNs, and cameras at traffic lights

Online

Data set GPS data

Algorithm used CNN + LSTM

IoT + ML

ML

ML

Objective

1. Traffic forecasting is done using both long- and short-range time scales using various predictions 2. A data reduction approach is proposed to determine the most suitable road

Refs.

Table 1 Comparative analysis of existing ML and IoT frameworks for smart road traffic monitoring Outcomes

All algorithms performed well in detecting accidents and achieved an accuracy of over 99%

In terms of error rate and execution time, CNN produced better results

When two routes are overcrowded, the avg. waiting time of automobiles can be reduced using smart traffic lights

The proposed model has achieved less error % = 8.8% as compared to other models like ARIMA, ANFIS, and random walk

With the increase in the time scale, the difference between the proposed method and the base models also increases

Future scope

(continued)

In future, other algorithms can be explored using different parameters like meteorological data, road conditions, etc.

In future, real-time traffic state estimation can be developed, and the outcomes could be contrasted with other XGBoost methods

1. ML algorithms are to be used to evaluate the data 2. Other issues like cost benefits, security, and privacy of user data can be resolved

This approach can enable critical long-term solutions such as ramp metering in future

The time scale can be increased up to 48 h and forecasting can be increased from road to a wide area level

144 K. Saini and S. Sharma

Objective

To make a way for an ambulance to save the life of a person

Traffic flow prediction

To build a short-term traffic flow prediction framework using the CNN approach

1. To balance the traffic on the roads 2. To adapt the traffic signal timing according to the intensity of traffic

Predicting the short-term traffic

Refs.

[45]

[46]

[42]

[43]

[44]

Table 1 (continued)

J48 decision tree NB SVM KNR

IoT + ML

Long short-term memory (LSTM)

CNN + STFSA

IoT + ML

Machine learning (ML)

XGBoost algorithm

Machine learning (ML)

Algorithm used Image processing algorithms

Method

IoT + Cloud computing

Collected from Beijing Traffic Management Bureau

An algorithm is proposed to generate the data

Real data collected using loop detectors

Data were collected by an automatic number plate recognition system using cameras

CCTV cameras collect videos and images

Data set

Outcomes

Future scope

VANETs can be explored in future for traffic management systems

To improve the prediction efficiency, analysis is needed of the Spatio information

The performance of XGBoost can be improved by exploring the other parameters The framework should be implemented on large real roads

In future, this proposed model must be installed in a real environment

LSTM performed better in Various input/output traffic terms of accuracy as prediction systems can be compared to the other proposed in future models like SVM, ARIMA, etc. When the forecast time is long

J48 outperformed the other algorithms for vehicle reachability prediction

CNN + STFSA resulted in the lowest error rate

XGBoost performed better as compared to the other models like SARIMA, CNN, and LSTM

After identifying the ambulance, the alerts are given to the other traffic to clear its way

Smart City: Road Traffic Monitoring System Based on the Integration … 145

146

K. Saini and S. Sharma

References 1. Qian Y, Wu D, Bao W, Lorenz P (2019) The internet of things for smart cities: technologies and applications. IEEE Netw 33(2):4–5. https://doi.org/10.1109/MNET.2019.8675165 2. Sarrab M, Pulparambil S, Awadalla M (2020) Development of an IoT based real-time traffic monitoring system for city governance. Glob Transitions 2:230–245. https://doi.org/10.1016/ j.glt.2020.09.004 3. Feltrin G, Popovic N, Wojtera M (2019) A sentinel node for event-driven structural monitoring of road bridges using wireless sensor networks. J Sens 2019. http://doi.org/10.1155/2019/865 2527 4. Zadobrischi E, Cosovanu LM, Dimian M (2020) Traffic flow density model and dynamic traffic congestion model simulation based on practice case with vehicle network and system traffic intelligent communication. Symmetry (Basel) 12(7). http://doi.org/10.3390/sym12071172 5. Nuruddeen MI, Siyan P (2016) Analyzing factors responsible for road traffic accidents along Kano-Kaduna-Abuja dual carriageway Nigeria. J Econ Sustain Dev 7(12):156–163 6. Lilhore UK et al (2022) Design and implementation of an ML and IoT based adaptive trafficmanagement system for smart cities. Sensors 22(8). http://doi.org/10.3390/s22082908 7. Lee WH, Chiu CY (2020) Design and implementation of a smart traffic signal control system for smart city applications. Sensors (Switzerland) 20(2). http://doi.org/10.3390/s20020508 8. Bhat WA (2018) Is a data-capacity gap inevitable in big data storage? Computer (Long Beach Calif) 51(9):54–62. http://doi.org/10.1109/MC.2018.3620975 9. Bhat WA (2018) Bridging data-capacity gap in big data storage. Futur Gener Comput Syst 87(2018):538–548. https://doi.org/10.1016/j.future.2017.12.066 10. Cader A, Nafrees M, Mohamed A, Sujah A, Mansoor C (2021) Smart cities: emerging technologies and potential solutions to the cyber security threads. http://doi.org/10.1109/ICEECC OT52851.2021.9707994 11. Kebbeh PS, Jain M, Gueye B (2020) SenseNet: IoT temperature measurement in railway networks for intelligent transport. In: IBASE-BF 2020—1st IEEE international conference on natural and engineering sciences for Sahel’s sustainable development—impact of big data application on society and environment, pp 1–8. http://doi.org/10.1109/IBASE-BF48578.2020. 9069596 12. Dhingra S, Madda RB, Patan R, Jiao P, Barri K, Alavi AH (2021) Internet of things-based fog and cloud computing technology for smart traffic monitoring. Internet of Things (Netherlands) 14. http://doi.org/10.1016/j.iot.2020.100175 13. Yu W et al (2017) A survey on the edge computing for the internet of things. IEEE Access 6:6900–6919. http://doi.org/10.1109/ACCESS.2017.2778504 14. Sittón-Candanedo I, Alonso RS, Rodríguez-González S, García Coria JA, De La Prieta F (2020) Edge computing architectures in Industry 4.0: a general survey and comparison. Adv Intell Syst Comput 950:121–131. http://doi.org/10.1007/978-3-030-20055-8_12 15. Premsankar G, Di Francesco M, Taleb T (2018) Edge computing for the internet of things: a case study. IEEE Internet Things J 5(2):1275–1284. https://doi.org/10.1109/JIOT.2018.280 5263 16. Wan S, Ding S, Chen C (2022) Edge computing enabled video segmentation for real-time traffic monitoring in internet of vehicles. Pattern Recogn 121. http://doi.org/10.1016/j.patcog. 2021.108146 17. Li Z, Al Hassan R, Shahidehpour M, Bahramirad S, Khodaei A (2019) A hierarchical framework for intelligent traffic management in smart cities. IEEE Trans Smart Grid 10(1):691–701. http:// doi.org/10.1109/TSG.2017.2750542 18. Nizzad ARM et al (2021) Internet of things based automatic system for the traffic violation. http://doi.org/10.1109/ICEECCOT52851.2021.9708060 19. Qiu J, Du L, Zhang D, Su S, Tian Z (2020) Nei-TTE: intelligent traffic time estimation based on fine-grained time derivation of road segments for smart city. IEEE Trans Ind Inf 16(4):2659– 2666. https://doi.org/10.1109/TII.2019.2943906

Smart City: Road Traffic Monitoring System Based on the Integration …

147

20. Verma P, Sood SK (2018) Cloud-centric IoT based disease diagnosis healthcare framework. J Parallel Distrib Comput 116:27–38. https://doi.org/10.1016/j.jpdc.2017.11.018 21. Sood SK, Mahajan I (2018) A fog-based healthcare framework for Chikungunya. IEEE Internet Things J 5(2):794–801. https://doi.org/10.1109/JIOT.2017.2768407 22. Suresh Kumar K, Radha Mani AS, Sundaresan S, Ananth Kumar T (2021) Modeling of VANET for future generation transportation system through edge/fog/cloud computing powered by 6G. Cloud IoT-Based Veh Ad Hoc Netw 105–124. http://doi.org/10.1002/9781119761846.ch6 23. Luhach AK, Jat DS, Hawari KB, Gao XZ, Lingras P (eds) (2020) Advanced informatics for computing research part 1. http://doi.org/10.1007/978-3-031-09469-9 24. 15584-Article Text-55369-1-10-20210312.pdf 25. Zantalis F, Koulouras G, Karabetsos S, Kandris D (2019) A review of machine learning and IoT in smart transportation. Future Internet 11(4):1–23. https://doi.org/10.3390/FI11040094 26. Rahman W, Islam R, Hasan A, Bithi NI, Hasan M (2020) Computer and intelligent waste management system using deep learning with IoT. J King Saud Univ Comput Inf Sci [Online]. Available: http://doi.org/10.1016/j.jksuci.2020.08.016 27. Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2(3):1–21. https://doi.org/10.1007/s42979-021-00592-x 28. Ben Atitallah S, Driss M, Boulila W, Ben Ghezala H (2020) Leveraging deep learning and IoT big data analytics to support the smart cities development: review and future directions. Comput Sci Rev 38. http://doi.org/10.1016/j.cosrev.2020.100303 29. Jia Y, Wu J, Ben-Akiva M, Seshadri R, Du Y (2017) Rainfall-integrated traffic speed prediction using deep learning method. IET Intel Transport Syst 11(9):531–536. https://doi.org/10.1049/ iet-its.2016.0257 30. Koesdwiady A, Soua R, Karray F (2016) Weather information in connected cars: a deep learning approach. IEEE Trans Veh Technol 65(12):9508–9517 31. Duan Y, Lv Y, Wang FY (2016) Travel time prediction with LSTM neural network. In: IEEE conference on intelligent transportation systems proceedings, ITSC, pp 1053–1058. http://doi. org/10.1109/ITSC.2016.7795686 32. Tang Y, Zhang C, Gu R, Li P, Yang B (2017) Vehicle detection and recognition for intelligent traffic surveillance system. Multimed Tools Appl 76(4):5817–5832. https://doi.org/10.1007/ s11042-015-2520-x 33. Zhao D, Member S, Chen Y, Lv L (2017) Attention for vehicle classification. IEEE 9(4):356– 367 34. Ouyang Z, Niu J, Guizani M (2018) Improved vehicle steering pattern recognition by using selected sensor data. IEEE Trans Mob Comput 17(6):1383–1396 35. Iyer LS (2021) AI enabled applications towards intelligent transportation. Transp Eng 5:100083. https://doi.org/10.1016/j.treng.2021.100083 ˇ 36. Jelínek J, Cejka J, Šedivý J (2022) Importance of the static infrastructure for dissemination of information within intelligent transportation systems. Commun Sci Lett Univ Žilina 24(2):E63– E73. https://doi.org/10.26552/COM.C.2022.2.E63-E73 37. Cader A, Nafrees M (2022) Intelligent transportation system using smartphone. http://doi.org/ 10.1109/ICEECCOT52851.2021.9708053 38. Bogaerts T, Masegosa AD, Angarita-Zapata JS, Onieva E, Hellinckx P (2020) A graph CNNLSTM neural network for short and long-term traffic forecasting based on trajectory data. Transp Res Part C Emerg Technol 112:62–77. https://doi.org/10.1016/j.trc.2020.01.010 39. Zaki JF, Ali-Eldin A, Hussein SE, Saraya SF, Areed FF (2020) Traffic congestion prediction based on hidden Markov models and contrast measure. Ain Shams Eng J 11(3):535–551. https:// doi.org/10.1016/j.asej.2019.10.006 40. Alsaawy Y, Alkhodre A, Sen AA, Alshanqiti A, Bhat WA, Bahbouh NM (2022) A comprehensive and effective framework for traffic congestion problem based on the integration of IoT and data analytics. Appl Sci 12(4). http://doi.org/10.3390/app12042043 41. Braz FJ et al (2022) Road traffic forecast based on meteorological information through deep learning methods. Sensors 22(12):1–19. https://doi.org/10.3390/s22124485

148

K. Saini and S. Sharma

42. Zhang W, Yu Y, Qi Y, Shu F, Wang Y (2019) Short-term traffic flow prediction based on spatiotemporal analysis and CNN deep learning. Transp A Transp Sci 15(2):1688–1711. https://doi. org/10.1080/23249935.2019.1637966 43. Sahil, Sood SK (2021) Smart vehicular traffic management: an edge cloud centric IoT based framework. Internet of Things (Netherlands) 14. http://doi.org/10.1016/j.iot.2019.100140 44. Zhao Z, Chen W, Wu X, Chen PCY, Liu J (2017) LSTM network: a deep learning approach for short-term traffic forecast. IET Intell Transp Syst 11(2):68–75. https://doi.org/10.1049/ietits.2016.0208 45. Vardhana M, Arunkumar N, Abdulhay E, Vishnuprasad PV (2019) Iot based real time traffic control using cloud computing. Cluster Comput 22(s1):2495–2504. https://doi.org/10.1007/ s10586-018-2152-9 46. Sun B, Sun T, Jiao P (2021) Spatio-temporal segmented traffic flow prediction with ANPRS data based on improved XGBoost. J Adv Transp 2021. http://doi.org/10.1155/2021/5559562 47. Ozbayoglu M, Kucukayan G, Dogdu E (2016) A real-time autonomous highway accident detection model based on big data processing and computational intelligence. In: Proceedings— 2016 IEEE international conference on big data, Big data 2016, pp 1807–1813. http://doi.org/ 10.1109/BigData.2016.7840798

Weed Detection in Crops Using Lightweight EfficientNets Atishek Kumar, Rishabh Jain, and Rudresh Dwivedi

Abstract Crop cultivation predominantly depends on how well the weed’s emergence or growth can be controlled. Weed management using chemicals such as herbicides in a large crop field is an uneventful and resource-intensive task. Also, it does not yield promising results. Prioritising the timely application of herbicides at specific spots in a crop field is imperative for successful weed management. Hence, results in better and healthier crops. This work intends to offer a deep learning-based solution to detect and classify different categories of weeds in a crop field and then apply appropriate herbicide treatment to curb weed growth at weed-prone spots in a crop field. In this work, we utilise five convolutional neural networks (CNNs)-based architectures, namely MobileNets, MobileNetV2, and EfficientNets (B0–B2), all suited for low-powered devices to tackle this problem efficiently and at a low cost. Experiments with the Plant Seedling dataset, Soybean dataset, and DeepWeeds dataset show that EfficientNets outperformed MobileNets on all three datasets achieving accuracy values of 98.92%, 99.97%, and 96.40%, respectively. Keywords Weed detection · Weed classification · Convolutional neural networks · Deep learning

A. Kumar (B) · R. Jain · R. Dwivedi Netaji Subhas University of Technology, New Delhi 110078, India e-mail: [email protected] R. Jain e-mail: [email protected] R. Dwivedi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_13

149

150

A. Kumar et al.

1 Introduction 1.1 Background As the world’s population grows, so does the demand for food. It is essential to concentrate on the issues that the agricultural sector is experiencing such as soil management, water management, species recognition, crop quality, and weed detection. New advances in this last category might be helpful to combat the most significant biological threats to crop productivity. The presence of weeds can lead to some serious threats such as increased agricultural costs, increased irrigation requirements, displacing native species, land and water degradation, etc. Moreover, weeds contend with crop plants for grasping nutrients, soil moisture, available space, and sunlight. Weed infestations in crops are a major contributor to annual crop yield losses as well. Hence, effective weed management is crucial for healthy crop production. Manual weeding is the oldest method still in use to keep weeds out of crops. However, due to the higher labour costs and time requirements, it is ineffective for larger-scale crop fields. Apart from manual weeding, chemical (herbicide treatment) and mechanic weeding systems are also used for weed management. The aforementioned techniques for weed mitigation have some drawbacks, including labour-intensive processes that take a long time and depend primarily on the type of crop, type of weed, and other environmental factors such as weather and soil conditions. Hence, accurate and reliable weed detection and classification are very crucial to mitigate crop losses. Effective weed detection will result in the least herbicide treatment being applied to the fields, thus lowering production costs and delivering quality products to consumers. The classification and identification of image-based patterns are strongly influenced by CNN. Some of the CNN architectures widely used for weed detection have been discussed in literature such as AlexNet, MobileNet, EfficientNet, ResNet, region-based convolutional neural network (R-CNN), VGG, and GoogleNet. The primary strength of these techniques lies in manual labelling of datasets whilst retaining classification. In this work, we have exploited the use of MobileNets and EfficientNets to build an efficient and robust model for the detection and classification of weed seedlings and plants.

1.2 Motivation and Contribution To overcome the limitations of manual, chemical, and mechanic weeding and to avoid crop damage from weeds, it is important to automate the precise, timely detection, and classification of weeds in crop fields. Many statistical features-based [1–7] and deep learning-based approaches [8–11] for weed detection and classification have been proposed in literature but due to limitations such as fluctuating illumination, occlusion from other plants, noisy images, and colour and texture similarity amongst weeds and

Weed Detection in Crops Using Lightweight EfficientNets

151

crops, segmenting the weed-inclusion area out of the entire image and performance decline due to incorrect labelling, etc., makes these approaches inefficient for weed mitigation. Therefore, the objective of our work is to build an efficient and robust CNN-based approach for automated detection and classification of weeds in crop fields to limit the use of chemical treatment (herbicides) at specific locations rather than using the whole cropland to mitigate crop damage from weeds. The key contributions of this work are summarised below: • Our work aims to provide a deep learning-based method for weed detection and classification in crop fields that is both resource and cost-efficient. Experiments have been performed using five popular CNN-based architectures, namely MobileNet, MobileNetV2, and EfficientNets (B0–B2), all suited for low-powered devices to tackle this problem efficiently and at a low cost. • Experiments on three publicly available datasets, i.e Plant Seedling dataset [12], Soybean dataset [13], and DeepWeeds dataset [14], show the efficacy of the proposed work. The datasets were so chosen to compare the models’ performance in a number of environmental settings, namely laboratory setting, aerial imagery, and ground imagery. • Building and deploying applications prototypes employing the lightweight quantised models (TFLite) for inference on the edge and mobile devices, eliminating specialised server architecture and improving latency. • Comparison with a few state-of-the-art weed detection and classification algorithms shows that the proposed solution gives better performance with efficient resource utilisation. The remainder of the paper is structured as follows: The related work studied is summarised in Sect. 2. Section 3 presents the proposed work. The details about the implementation setup, results, and comparison with existing approaches are described in Sect. 4. Conclusions and future directions are described in Sect. 5.

2 Related Work Traditional machine learning-based image pipelines for classification consist of image acquisition, image pre-processing, feature extraction, machine learning classifier, and classifier evaluation metrics [15, 16]. Following this approach, researchers introduced novel techniques such as hough transform and clustering [1], shape features [17], discriminate analysis [2], texture features [3], and colour analysis [4]. Kodagoda et al. [5] used near-infrared image cues with texture and colour features to classify wheat from the weed species. Sabzi et al. [6] introduced texture features based on grey-level co-occurrence matrix (GLCM), spectral descriptors for texture, moment invariant features, and colour and shape features. Different classification algorithms such as support vector machines (SVMs) and random forest classifiers were utilised to distinguish between crops and weeds. However, these approaches

152

A. Kumar et al.

for weed classification are quite challenging due to the factors such as variable illumination, occlusion from other plants, image noise, and similar colour and texture features of crops and weed plants. Hence, building a robust feature extractor and descriptor requires extensive domain knowledge. In recent years, CNNs have emerged as efficient and effective algorithms for analysing and classifying data. However, the key to their success is feature learning, i.e. the process of extracting discriminatory features from the raw data that are most appropriate for the task. Training deep learning models from scratch is a very resource and time-intensive task. Transfer learning methodologies have allowed knowledge transfer from models trained on benchmark datasets such as ImageNet [18] or MS COCO [19] to more specific tasks. It also allows fine-tuning the model’s pre-trained parameters for the required task with minimal training data and resources [20]. Suh et al. [8] trained multiple CNNs such as ResNet-50 and VGG-19 with pretrained ImageNet weights for weed detection and achieved an impressive accuracy of 97% from VGG-19. Partel et al. [9] proposed deep learning vision models such as faster R-CNN, YOLO-v3, and ResNets to develop an intelligent sprayer for weed control. Le et al. [7] proposed a novel model, named filtered local binary patterns with contour mask (k-FLBPCM) that outperformed other CNN models with an accuracy of approximately 99% for classifying barley, canola, and wild radish. Trong et al. [10] have proposed a multi-model ensemble with transfer learning. The authors used five pre-trained models trained independently, i.e. NASNet, ResNet, InceptionResNet, MobileNet, and VGGNet. Bayesian conditional probability and priority weight scoring were used to assign a score to the models. The limitations observed in their proposed work were in either segmenting the weed-inclusion area out of the entire image or performance decline due to incorrect labelling.

3 Proposed Method The proposed work aims to develop a robust and efficient CNN-based solution for the detection and classification of weeds in a crop field so as to make it easier and costfriendly to apply weed control treatment only at specific weed-prone spots instead of the whole crop field.

3.1 Data Preprocessing The images in all three datasets are variable-sized and have RGB channels with 8 bits per pixel per channel. The images are pre-processed by applying model-specific normalisation before applying augmentations. Deep learning models require a huge amount of data for learning so as to prevent overfitting and generalise well. To ensure this, a real-time image augmentation technique is being applied by feeding a large amount of varied input data. Each batch of images is randomly augmented

Weed Detection in Crops Using Lightweight EfficientNets

153

before it is fed into the model in every iteration to increase the total data that the model observes during the training period. Simple transformations such as horizontal flipping, colour space augmentations, and random cropping were applied to simulate different variations [21].

3.2 Models Selection and Training In our proposed work, we have used five well-known CNN architectures, namely MobileNets, MobileNetV2, and EfficientNets (B0–B2), which are best suited for low-power and memory devices. MobileNets MobileNet [22] is an efficient CNN model for mobile and embedded image processing applications. MobileNet is based on an optimised architecture that uses depth-separable convolutions to build lightweight and memory-efficient deep neural networks. Depthwise separable convolutions factorise a convolution into two parts: a depthwise convolution and a 1 × 1 pointwise convolution. This drastically reduces the number of model parameters compared to a conventional convolutional network with uniform depth in the networks. MobileNetV2 MobileNetV2 [23] is an improved architecture for image processing on mobile devices. This model extends MobileNet’s depthwise separable convolutions with an inverted residual structure and linear bottlenecks. Residual blocks use expended representations at the input and output layers and squeeze the representation using 1 × 1 convolutions. Inverted residual block, on the contrary, uses thin bottleneck layers at the input and the outputs. Linear bottleneck refers to a linear activation in the last convolution of the residual block to tackle the loss of information due to the bottlenecked inputs channels. EfficientNets EfficientNet [24] is another popular CNN architecture and scaling method that provides better accuracy and efficiency gains by uniformly scaling all dimensions of depth, width, and resolution using a compound coefficient. According to the authors, with the availability of 2 N more computational resources, the network’s depth can be scaled by α N , the width can be scaled by β N , and the image size can be scaled by γ N . The value of the constants α, β, and γ is obtained by a small grid search on the baseline model. EfficientNet also introduced a class of uniformly scaled CNNs (B0–B7), with EfficientNet-B7 achieving state-of-the-art accuracy on multiple benchmark datasets whilst being 8 times smaller and 6 times faster on inference. The baseline model EfficientNet-B0 extends MobileNetV2’s depthwise separable convolutions and inverted residual blocks with squeeze and excitation blocks. The block diagram of EfficientNet-B0 is shown in Fig. 1.

154

A. Kumar et al.

Fig. 1 EfficientNet-B0 encoder architecture

3.3 Domain Specific Learning Initially, the base model (encoder layers) is loaded with pre-trained ImageNet weights and frozen from training. The input image size for both MobileNet and MobileNetV2 is (224, 224, 3). The image sizes for the EfficientNets are scaled up incrementally with increasing model size. Input image sizes of EfficientNet-B0, EfficientNet-B1, and EfficientNet-B2 are (224, 224, 3), (240, 240, 3), and (260, 260, 3), respectively. Global average pooling is applied to the last convolution layer of the encoder. The output from the encoder layers is a 2D tensor of size (1, 1024) for the MobileNets and (1, 1280) for the EfficientNets, respectively. The classifier consists of two fully-connected layers of 256 and 128 neurons, respectively, with ReLU activations. A dropout layer with a rate 0.4 was also employed before the final softmax output layer.

4 Experimental Results and Analysis In this section, we present the details of the experimental design and results with respect to different parameters as well as a comparison with the existing approaches.

4.1 Implementation Setup The MobileNets and EfficientNets (B0–B7) were fine-tuned for the problem statement. The complete block diagram for the training setup is illustrated in Fig. 2. Many iterations of the models were run with different settings, the best results were achieved using categorical cross-entropy as a loss function, and the optimizer used is Adam with the following hyperparameters:

Weed Detection in Crops Using Lightweight EfficientNets

155

Fig. 2 Block diagram for training setup

η = 0.001 β1 = 0.9 β2 = 0.999 = 10−7 The learning rate is reduced by a factor of 0.1, if the validation loss plateaus. A dropout rate of 0.4 and early stopping is used to regularise the model weights. The encoder layers are incrementally unfrozen for training in the case when the validation categorical accuracy falls below the threshold value of 95%. For the proposed work, the above-stated datasets were split into training and testing sets with a ratio of 9:1. The training dataset is further divided into the training and the validation sets using the same 9:1 ratio. Stratified sampling is used to get a test set to be more representative of the actual distribution of classes in the original dataset.

4.2 Results and Discussion To evaluate the performance of the proposed work, we tested it on all three datasets. This section shows promising results for our five CNN models trained and evaluated on the three datasets (Fig. 3). Plant Seedling Dataset: A Public Image Database for Benchmark of Plant Seedling Classification Algorithms [12] The dataset comprises of 5539 images of approximately 960 unique plant seedlings from 12 species such as black grass, common chickweed, maize, and common wheat. The images were captured at several different growth phases. The RGB images are variable-sized and are in PNG format with an approximate physical resolution of 10 pixels per mm.

156

A. Kumar et al.

(a) Plant Seedling Dataset

(b) Soybean Dataset

(c) DeepWeeds Dataset

Fig. 3 Sample instances from the 3 datasets Table 1 Plant Seedling dataset: classification report Model name Macro averages Precision Recall F1-score MobileNet MobileNetV2 EfficentNet-B0 EfficentNet-B1 EfficentNet-B2

0.9823 0.9848 0.9749 0.9884 0.9917

0.9819 0.9847 0.9747 0.9883 0.9861

0.9820 0.9847 0.9744 0.9881 0.9885

Accuracy 0.9819 0.9847 0.9747 0.9883 0.9892

Note: The best performing model is marked as bold

Both MobileNet and MobileNetV2 performed equally well with MobileNetV2 having a slightly higher accuracy 98.47% in comparison with 98.19% for MobileNet, also better recall and precision values in some classes. All the EfficientNet models (B0–B2) had similar accuracy levels to the MobileNet models with EfficientNet-B2 being the most accurate with an accuracy of 98.92%. EfficientNets had slightly better precision, recall, and F1-score with increasing model capacity, model size, and input features (image size). Hence, for the three EfficientNet models, there is a size-performance trade-off to be made. Also, it has been observed that the EfficientNet models converged quicker than the MobileNets. The classification report comprising the results of all models evaluated on the Plant Seedling dataset is given in Table 1. Soybean Dataset: Weed Detection in Soybean Crops Using ConvNets [13] A total of 400 images were chosen from the set of images captured through unmanned aerial vehicles (UAV) that showed the presence of weeds. These images were segmented using the simple linear iterative clustering (SLIC) algorithm in the Pynovisão software. There are 15,336 segments manually annotated to 4 classes including 3249 segments of soil, 7376 segments of soybean, 3520 segments of grass, and 1191 segments of broadleaf weeds. All models performed equally well with test accuracy greater than 99% and EfficentNet-B0 achieving the highest accuracy of 99.97%. This can be attributed to a large number of training instances available (> 15,000), higher quality images and less number of target classes.

Weed Detection in Crops Using Lightweight EfficientNets Table 2 Soybean dataset: classification report Model name Precision Recall MobileNet MobileNetV2 EfficentNet-B0 EfficentNet-B1 EfficentNet-B2

0.9988 0.9987 0.9998 0.9983 0.9993

157

Macro averages F1-score

0.9993 0.9965 0.9989 0.9968 0.9968

0.9990 0.9976 0.9994 0.9976 0.9981

Accuracy 0.9993 0.9987 0.9997 0.9984 0.9990

Note: The best performing model is marked as bold

All the models converged sufficiently within 10 epochs. There was no need to fine-tune the model as the pre-trained encoder layers performed well for the task. The classification report given in Table 2 shows the results of all models evaluated on the soybean dataset. DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning [14] The dataset consists of 17,509 images; 8403 are from 8 different weed species, and the remaining 9106 images are negatives. The dataset aims to mimic images captured under realistic environmental conditions to model the actual model performance. Images usually consist of multiple plants with the target weed plant often occluded by grass or leaves. The images from the negative category consist of a collection of plants, bushes, and soil/ground. For the DeepWeeds dataset, MobileNets struggled to generalise well compared to EfficientNets, with lower precision, recall, and F1-scores. EfficientNets generalised well with higher accuracy, precision, recall, and F1-scores than both MobileNets. This can be attributed to the squeeze and excitation blocks [25] used in addition to the MBConv blocks that allowed the model to select appropriate image feature maps adaptively. The dataset consisted of multiple plants with occlusions with selective attention necessary for good generalisation. Both MobileNet and MobileNetV2 performed similarly in terms of accuracy, and MobileNet had slightly better precision than MobileNetV2 for some target classes. There was a marginal difference between EfficientNet-B0, EfficientNet-B1, and EfficientNet-B2, with EfficientNet-B2 performing slightly better in terms of accuracy and precision due to its larger model capacity. The classification report with all

Table 3 DeepWeeds: classification report Model name Precision Recall MobileNet MobileNetV2 EfficentNet-B0 EfficentNet-B1 EfficentNet-B2

0.9369 0.9283 0.9512 0.9439 0.9569

Macro averages F1-score

0.9233 0.9310 0.9426 0.9391 0.9487

Note: The best performing model is marked as bold

0.9290 0.9296 0.9463 0.9412 0.9525

Accuracy 0.9440 0.9452 0.9583 0.9555 0.9640

158

(a) Training and validation accuracy

A. Kumar et al.

(b) Training and validation loss

Fig. 4 Plant Seedling dataset—EfficientNet-B2

(a) Training and validation accuracy

(b) Training and validation loss

Fig. 5 Soybean dataset—EfficientNet-B0

(a) Training and validation accuracy

(b) Training and validation loss

Fig. 6 DeepWeeds dataset—EfficientNet-B2

the parametric results of all the models evaluated on the DeepWeeds dataset has been shown in Table 3. The training and validation accuracy (top) and training and validation loss (bottom) for the better performing EfficientNet-B2 on the Plant Seedling dataset are shown in Fig. 4. Figures 5 and 6 show the graph plots for training and validation accuracy (top) and loss (bottom) for the better performing EfficientNet-B0 on the Soybean dataset, and EfficientNet-B2 on DeepWeeds dataset, respectively.

Weed Detection in Crops Using Lightweight EfficientNets

159

4.3 Observations and Remarks Through rigorous experimentation with all the five models on all the three datasets, the following conclusions are drawn: EfficientNets performed well on all three datasets, particularly on the DeepWeeds dataset, where they matched and even outperformed ResNet-50 trained by the dataset creators, which achieved an average classification accuracy of 95.7%. The squeeze and excitation blocks with minimal computation overhead allowed the model to select appropriate feature maps for better generalisation. Mobilenets had lower average precision and recall than EffcientNets though they edge out in model size and inference time on CPU. MobileNetV2 was the smallest model, and MobileNet had the fastest inference time. If smaller model sizes and faster inference are important, then MobileNet works well with slightly low performance in terms of precision and recall. With the minimal difference in performance metrics between the various EfficientNets, EfficientNet-B0 seems to justify the right trade-off between performance, model size, and inference time.

4.4 Comparison with Existing Approaches Table 4 presents the comparative analysis of the various performance metrics of the proposed work with existing approaches [10, 14, 26, 27] for weed detection and classification given by different researchers which shows that the proposed [24] models perform better than many with efficient resource utilisation, small model size, and least amount of overfitting. We, therefore, describe our approach in this domain as leveraging superiority.

5 Conclusion In any crop production system, weed control is crucial for a high yield. Practical and sustainable weed control in crops requires efficient and robust weed detection and classification frameworks. Our proposed work proved better than other existing approaches in terms of statistical performance metrics and computational resources. EfficientNets (B0–B2) generalise effectively across the three incredibly diverse datasets and transfer learning allowed for faster training and iterations. Additionally, the proposed models outperform other existing techniques in terms of computation and memory efficiency, thus allowing their deployment on low-powered devices and to remote locations with no access to fast networking and powerful computational systems.

Proposed technique

AgroAVNET (VGGNet, AlexNet) CNN background segmentation Late fusion DNNs ResNet-50 EfficientNet-B2 EfficientNet-B0 EfficientNet-B2

Author(s)

Chavan and Nandedkar [26] Nkemelu et al. [27] Trong et al. [10] Olsen et al. [14] Proposed method Proposed method Proposed method

Table 4 Performance comparison with existing approaches

Plant Seedling dataset Plant Seedling dataset Plant Seedling dataset DeepWeeds dataset Plant Seedling dataset Soybean dataset DeepWeeds dataset

Dataset

0.9821 0.9260 0.9731 0.9570 0.9892 0.9997 0.9640

Metrics Accuracy – – – – 0.9917 0.9998 0.9569

Precision

– – – – 0.9861 0.9989 0.9487

Recall

– – – – 0.9885 0.9994 0.9525

F1-score

> 100 – > 100 > 100 ∼ 40 ∼ 30 ∼ 40

Model size (Mb)

160 A. Kumar et al.

Weed Detection in Crops Using Lightweight EfficientNets

161

References 1. Hemming J, Rath T (2002) Image processing for plant determination using the hough transform and clustering methods. Gartenbauwissenschaft 67(1):1–10 2. Meyer GE, Mehta T, Kocher MF, Mortensen DA, Samal A (1998) Textural imaging and discriminant analysis for distinguishingweeds for spot spraying. Trans ASAE 41(4):1189 3. Bakhshipour A, Jafari A, Nassiri SM, Zare D (2017) Weed segmentation using texture features extracted from wavelet sub-images. Biosyst Eng 157:1–12 4. Hamuda E, Mc Ginley B, Glavin M, Jones E (2017) Automatic crop detection under field conditions using the HSV colour space and morphological operations. Comput Electron Agric 133:97–107 5. Kodagoda S, Zhang Z, Ruiz D, Dissanayake G (2008) Weed detection and classification for autonomous farming. Intell Prod Mach Syst, pp 1–6 6. Sabzi S, Abbaspour-Gilandeh Y, Arribas JI (2020) An automatic visible-range video weed detection, segmentation and classification prototype in potato field. Heliyon 6(5):e03685 7. Le VNT, Ahderom S, Apopei B, Alameh K (2020) A novel method for detecting morphologically similar crops and weeds based on the combination of contour masks and filtered local binary pattern operators. GigaScience 9(3):giaa017 8. Suh HK, Ijsselmuiden J, Hofstee JW, van Henten EJ (2018) Transfer learning for the classification of sugar beet and volunteer potato under field conditions. Biosyst Eng 174:50–65 9. Partel V, Kim J, Costa L, Pardalos PM, Ampatzidis Y (2020) Comparison of deep learning frameworks. In: ISAIM, Smart sprayer for precision weed control using artificial intelligence 10. Trong VH, Gwang-hyun Y, Vu DT, Jin-young K (2020) Late fusion of multimodal deep neural networks for weeds classification. Comput Electron Agric 175:105506 11. Dhakshayani J, Kulkarni SS, Mahapatra A, Surendiran B, Nath MK (2022) Weed classification from paddy crops using convolutional neural network. In: Proceedings of the international conference on paradigms of communication, computing and data sciences. Springer, Berlin, pp 493–507 12. Giselsson TM, Jørgensen RN, Jensen PK, Dyrmann M, Midtiby HS (2017) A public image database for benchmark of plant seedling classification algorithms. arXiv preprint arXiv:1711.05458 13. dos Santos Ferreira A, Freitas DM, da Silva GG, Pistori H, Folhes MT (2017) Weed detection in soybean crops using convnets. Comput Electron Agric 143:314–324 14. Olsen A, Konovalov DA, Philippa B, Ridd P, Wood JC, Johns J, Banks W, Girgenti B, Kenny O, Whinney J et al (2019) Deepweeds: a multiclass weed species image dataset for deep learning. Sci Rep 9(1):1–12 15. Mahmudul Hasan ASM, Sohel F, Diepeveen D, Laga H, Jones MGK (2021) A survey of deep learning techniques for weed detection from images 16. Liakos KG, Busato P, Moshou D, Pearson S, Bochtis D (2018) Machine learning in agriculture: a review. Sensors 18(8) 17. Woebbecke DM, Meyer GE, Von Bargen K, Mortensen DA (1995) Shape features for identifying young weeds using image analysis. Trans ASAE 38(1):271–281 18. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255 19. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Lawrence Zitnick C (2014) Microsoft coco. Common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014, pp 740–755 20. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In: International conference on artificial neural networks. Springer, Berlin, pp 270–279 21. Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data 6(1):1–48 22. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

162

A. Kumar et al.

23. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks 24. Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks 25. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141 26. Chavan TR, Nandedkar AV (2018) Agroavnet for crops and weeds classification: a step forward in automatic farming. Comput Electron Agric 154:361–372 27. Nkemelu DK, Omeiza D, Lubalo N (2018) Deep convolutional neural network for plant seedlings classification. arXiv preprint arXiv:1811.08404

Meta-heuristics for the Single-Channel PMU Placement Problem Considering Zero-Injection-Buses K. R. S. V. V. P. P. Narasa Reddy and Anjeneya Swami Kare

Abstract Phasor measurement units (PMUs) are time-synchronized devices that can provide high-speed measurements of different power stations in real time. These time-synchronized measurements are used to monitor a power grid. If the supply and demand in the power grid are not perfectly matched, it can lead to power outages. Hence, power grids are monitored using PMU in real time with high speed and precision. The placement of PMU is important because of the high cost of PMU and their ability to measure the adjacent power stations along with the PMU placed station. So, they are not needed to be placed on every power station. And also the presence of zero-injection-buses (ZIB) in a power grid reduces the number of PMUs to be placed. Hence, optimizing the number of PMUs required to completely observe a power grid is a well-known problem. The common PMU placement problems ensure complete network (power grid) observability with a minimum number of PMU installations using multi-channel PMUs. Most of them disregard the existence of ZIBs, enhancement of measurement redundancy and low-cost single-channel PMU. Over multi-channel PMUs, single-channel PMUs are more reliable. The inherent problem with multi-channel PMUs is that we cannot build a PMU with unlimited channel capacity. There is a limitation to the number of channels a PMU can have. In the case of a PMU failure or line outage, a grid with multi-channel PMUs is much more effected than a single-channel PMU grid. In this paper, we consider the singlechannel PMU placement problem considering ZIBs. We propose multiple heuristics and meta-heuristics for the problem. We proposed two greedy heuristics, heuristics based on particle swarm optimization and genetic algorithm. The proposed heuristics are implemented and tested on real-world power grid data sets. Keywords Phasor measurement unit · Power dominating set · Power edge set · Heuristics

K. R. S. V. V. P. P. Narasa Reddy · A. S. Kare (B) School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_14

163

164

K. R. S. V. V. P. P. Narasa Reddy and A. S. Kare

1 Introduction In our day-to-day life, we are very much dependent on the electricity, which we get from a power station. The network of such power stations is called a power grid. The recent increase in consumption demand and mix of different power sources into the power grid make it work at its limits. This is leading the grid into different uncertainties. And also the uncertainties in the demand and supply of electricity can cause a lot of strain on the power grid leading to power outages. Hence, it is very important to monitor the power grid in real time so that the demand and supply can be perfectly matched along with other uncertainties in the power grid. For monitoring the power grids, there are devices called phasor measurement units (PMUs). These devices can generate real-time values for different power stations in a power grid. A PMU is a time-synchronized device with high speed and precision and has the ability to measure the magnitude and phase angle of current and voltage in a power station. These measurements are used for monitoring a power grid in real time. A PMU can calculate the current and voltage phasors of a power station where it was placed and the adjacent power stations based on the channel capacity of the PMU. This ability of PMUs and their high cost makes the placement of PMUs a very important optimization problem. The power stations and the transmission lines between power stations in a power grid can be viewed as vertices and edges of a connected undirected graph. Where power stations are vertices and the transmission lines are the edges. For example, as shown in Fig. 1, the power stations are nodes (v0 , v1 , v2 , v3 , v4 , v5 ) and the transmission lines of those power stations are edges (v0 ↔ v2 , v2 ↔ v1 , v2 ↔ v3 , v2 ↔ v4 , v1 ↔ v3 , v4 ↔ v3 , v3 ↔ v5 ). In this paper, vertex, node, bus and power station are used interchangeably. There are two types of PMUs based on their channel capacity. They are • Multi-channel PMU: A multi-channel PMU has the ability to measure the PMU placed station and multiple adjacent power stations based on its channel capacity. It means an n-channel PMU can observe n + 1 power stations.

Fig. 1 Example power grid with six power stations

v0 v2 v1

v4 v3 v5

Meta-heuristics for the Single-Channel PMU Placement Problem . . .

165

• Single-Channel PMU: A single-channel PMU has the ability to measure the PMU placed station and only one other adjacent power station. It means a single-channel PMU can observe two power stations. There are two prominent problems in the PMU placement. They are • PDS problem [10]: The POWER DOMINATING SET (PDS) problem is a PMU placement problem which uses multi-channel PMUs. • PES problem [20]: The POWER EDGE SET (PES) problem is a PMU placement problem which uses single-channel PMUs Both the PDS and PES problems are NP-hard [10, 20]. The two problems follow the basic observation rules. Generally, in the case of a PDS problem, the rules are defined in terms of nodes or power stations because PMUs are placed on the nodes in a multi-channel PMU placement problem. The rules for the PDS problem are [27]: 1. If a PMU is placed on a power station, then all the adjacent power stations are observed along with itself. 2. If in a set of ZIB and its adjacent stations, one of the power stations is not observed then it can also be observed. In the case of a PES problem, the rules are defined in terms of transmission lines or edges because in a single-channel PMU placement problem the PMUs are placed on the edges. The rules for the PES problem are [20]: 1. If a PMU is placed on an edge, then both the nodes (or power stations) of the edge are observed. 2. If a node is a ZIB and only one of the nodes is unobserved of the nodes containing ZIB and its adjacent nodes, then that node is also observed. The second observation rule in both PDS and PES problems can be further divided into many new rules based on our problem solving as shown by Abiri et al. [1]. In the real-world power grids, we will have both ZIB and non-ZIB power stations. The second observation rule is applied to the ZIB power stations and their neighborhood. This rule can help to significantly reduce the number of PMUs required for observation of the power grid with ZIBs. For example, let us consider the power grid as shown in Fig. 2 and place some single-channel PMUs. • If all nodes are non-ZIBs in Fig. 2, then we need a minimum of three singlechannel PMUs to completely observe the grid. The PMUs will be placed on edges v0 ↔ v1 , v2 ↔ v3 and v4 ↔ v5 . Here, by rule 1, all the nodes are observed as they have PMU placed on their edges. • If the nodes v1 and v3 are ZIBs in Fig. 2, then we need a minimum of two singlechannel PMUs to completely observe the grid. The PMUs will be placed on edges v1 ↔ v2 and v4 ↔ v5 . Here, the nodes v1 , v2 , v4 and v5 are observed by rule 1. By applying rule 2 at node v1 , node v0 is also observed. Again by applying rule 2 at node v3 , node v3 is also observed. Hence, all the nodes are observed.

166

K. R. S. V. V. P. P. Narasa Reddy and A. S. Kare

v1

v2 v4

v0

v5

v3

Fig. 2 Example power grid with some ZIB power stations

• If the nodes v0 , v1 , v2 and v5 are ZIBs in Fig. 2, then we need a minimum of one single-channel PMU to completely observe the grid. The PMU will be placed on edge v0 ↔ v1 . Here, the nodes v0 and v1 are observed by rule 1. By applying rule 2 at nodes v1 and v0 , nodes v2 and v3 are also observed, respectively. Again by applying rule 2 at node v2 , node v4 is also observed. Again by applying rule 2 at node v5 , node v5 is also observed. Hence, all the nodes are observed. From the above observation, we can conclude that the number of PMUs required for the complete observability of a power grid can be reduced with the increase in the number of ZIBs. In this paper, we study the single-channel PMU placement problem containing both ZIBs and non-ZIBs in a power grid using multiple heuristics. The heuristics include two greedy approaches, a particle swarm optimization and a genetic algorithm. All the implemented heuristics are tested on real-world IEEE power grid data sets.

2 Related Work PMU placement problem has been there for far more than thirty years. Most of the solutions to this problem focus on the multi-channel PMUs and don’t consider low-cost single-channel PMUs. Those solutions which do consider the singlechannel PMUs will have all power stations as ZIBs. In general, the PDS problem is the PMU placement problem that considers both ZIB and non-ZIB power stations with n-channel PMUs. The variants of the PDS problem include the PES problem [13, 20], multi-stage PMU placement problem [8, 23] and multi-channel PMU placement problem. These problems have been solved by both mathematical and evolutionary algorithms. In 1998, Haynes et al. [11] first proposed the PDS problem as a variation of the dominating set problem in graph theory. Sun and Fan [22] proposed an integer programming formulation for the probabilistic power dominating set problem considering ZIBs, line outages and PMU failures. Dean et al. [7] suggested a method for determining the minimum cardinality of a power dominating set for the hypercubes. Liao and Lee [15] proposed a linear time algorithm for finding the

Meta-heuristics for the Single-Channel PMU Placement Problem . . .

167

minimum cardinality of a power dominating set for the proper circular-arc graphs. And many other methods are proposed for finding the power dominating number of many graphs. The PES problem is the least studied problem when compared with other variants of the PDS problem. Poirion et al. [20] introduced a new approach by placing PMUs on the edges and called this a PES problem and formulated the problem using integer linear programming. Kare and Valluru [13] proposed multiple heuristics for the PES problem. Darties et al. [6] analyze the complexity and lowers bounds for the PES problem and proposed a polynomial-time approach for the power grid with bounded tree-width like series-parallel power grids or outer planar power grids. The computation complexity of the PES problem is studied in [5, 24]. Out of all the variants of the PDS problem, the multi-channel PMU placement problem is the most studied. Baldwin et al. [2] proposed a spanning tree-based heuristic and a simulated annealing approach for the multi-channel PMU placement problem. Marin et al. [17] proposed genetic algorithm-based heuristics. Xu and Abur [25] proposed an integer linear programming based approach using TOMLAB optimization toolkit. Hurtgen and Maun [12] proposed iterated local search heuristics and other heuristics. Gou [9] proposed an integer linear programming based approach considering ZIBs. Peng et al. [19] proposed a priority list-based heuristic and a tabu search-based heuristic. Chakrabarti and Kyriakides [3] proposed an exhaustive search approach considering single branch outages. Manousakis et al. [16] proposed several meta-heuristics and mathematical programming methods for the multi-channel PMU placement problem. Mazhari et al. [18] proposed a multi-objective PMU placement method for electric grids considering various contingencies. Abiri et al. [1] formulate the topological observation rules considering channel limitation and ZIBs. And also proposed a solution for complete observability considering PMU loss and line outage cases. Shafiullah et al. [21] proposed an optimal particle swarm optimization algorithm considering the maximum measurement redundancy. Yang et al. [26] formulate outage detection and proposed a genetic algorithm. Hence, this problem has developed into different variants and with different types of solutions. The two most common approaches used are • Numerical Approaches: In numerical methods, if the number of unknown states is equal to the rank of the associated measurement gain or Jacobian matrix, then the grid becomes observable [4]. • Topological Approaches: Topological methods are graph-based methods and the grid is observed by the construction of a spanning tree. Calculating the rank of the Jacobian matrix is computationally expensive in numerical methods [14] and is error-prone in cases such as the occurrence of a singular matrix, whereas topological methods are faster than numerical methods and there are no errors like the possibility of a singular matrix in numerical methods. So, in this paper, we have used topological approaches for the implementation of our heuristics. The formulation defined by Shafiullah et al. [21] in optimization of PMU placement problem with N number of power stations is used in this paper by various heuristics.

168

K. R. S. V. V. P. P. Narasa Reddy and A. S. Kare

2.1 Observability Rules Considering ZIBs The observability rules for the topological methods considering the presence of ZIBs given by Abiri et al. [1] are summarized below. 1. If a PMU is placed on a power station v, then the power station v and all its adjacent power stations are observed. 2. The observability of a set having a single ZIB and its adjacent power stations is described as follows: n fi ≥ n − 1 i=1

where ‘n’ is the number of power stations in the set and f i is the observability of the power station i. 3. The observability of a set formed with ‘m’ number of ZIBs along with its adjacent power stations, where ZIBs are connected by a single non-ZIB or directly to each other is described as follows: n

fi ≥ n − m

i=1

where ‘n’ is the number of power stations in the set and f i is a binary value indicating the observability of the power station i. 4. If a power station is present in more than one set of ZIBs, then the observability f i of that power station remains unchanged for one set ZIBs and used zero for other sets of ZIBs. 5. The observability f i remains unchanged if a power station is not adjacent to any ZIB. The rules (2) to (4) are used iteratively until no new power station can be observed.

3 Heuristics We have proposed four heuristics for the single-channel PMU placement problem considering ZIBs: two greedy approaches, particle swarm optimization and genetic algorithm. And a fitness function1 is developed to improve the heuristics over the time of their iterations. Fitness function determines the cost due to a solution. It uses the rules in Sect. 2.1 to find out how many power stations can be observed directly or indirectly using ZIBs. And the input to the fitness function is the power grid G and a binary vector S of edges representing whether a PMU is placed or not. 1 indicates the presence of PMU and 0 indicates the absence of PMU. 1

Due to space limitations, the pseudo-code of the algorithms is omitted.

Meta-heuristics for the Single-Channel PMU Placement Problem . . .

169

The heuristics in this paper are developed based on the assumption that the singlechannel PMUs are used for observing the power grid, and therefore, they provide solutions in terms of the edges of grid, i.e., transmission lines. As single-channel PMUs can observe only two stations and each transmission line has two end power stations, the single-channel PMUs are placed on the transmission line of the power grid.

3.1 Greedy Heuristic 1 We have proposed a simple greedy heuristic for the single-channel PMU placement problem. This greedy approach iterates over the edges to find an edge where both of its power stations are unobserved. And then a PMU is placed on the edge thereby observing both of its power stations. And then apply the topological rules using the fitness function to update observed stations that are observed by applying rules. After iterating over all the edges, again iterates over all edges to find an edge where one of its power stations or both power stations is unobserved. And then a PMU is placed on the edge thereby observing those power stations.

3.2 Greedy Heuristic 2 A greedy approach for the single-channel PMU placement problem is proposed based on the degree of power stations. This greedy approach iterates over the power stations in the ascending order of their degree to find a power station that is unobserved. And then finds an adjacent power station that has a minimum degree and is unobserved. If it has no unobserved adjacent power station, then select a minimum degree adjacent power station. And then a PMU is placed on the edge of the power stations thereby observing both power stations. And then apply the topological rules using the fitness function to update observed stations that are observed by applying rules.

3.3 Particle Swarm Optimization Particle swarm optimization (PSO) is a population-based optimization algorithm. In this approach, we have proposed a constraint-factored PSO. This approach starts with an initializing of the population, where each unit of population is called a particle which has position and velocity. Each particle has dimensions equal to the number of edges. And every particle is iterated over for a required number of iterations. In each iteration, every particle’s velocity is updated based on its best position and best positions in the population along with some inertia of the particle. And the position

170

K. R. S. V. V. P. P. Narasa Reddy and A. S. Kare

of the particle is updated with the new velocity. Best positions are determined by the fitness functions. The velocity of a particle in a general PSO will be the sum of its inertia, cognitive and social parts as given below: Veliiter+1 = Inertia + Cognitive + Social Inertia = WVeliiter Cognitive = c1r1 (Pbest(i) − Piiter ) Social = c2 r2 (PbestGolbal − Piiter ) Piiter+1 = Piiter + Veliiter+1 Here, iter The iteration that PSO is being executed. Veli The velocity of particle i. Pi The position of particle i. Pbest(i) The best position of particle i. PbestGolbal The best position of the swarm. r1 and r2 The random values between 0 and 1. W The inertia weight of the particle. Wmax and Wmin The maximum (Wmax ) and minimum (Wmin ) inertia weight of the particle, respectively. C1 and C2 The cognitive and social constants, respectively. In single-channel PMU placement problem, the dimensions of a particle are considered as the edges of the power grid. And the position of each dimension in a particle is either 0 or 1. Where, 1 represents a PMU is placed on that edge and 0 represents no PMU is placed on that edge. Based on the literature experience [21], the following parameters are chosen as Wmax = 1.2, Wmin = 0.1, niter = 500, vel ∈ [−4, 4], 4.05 ≤ C1 + C2 ≤ 4.15 and 2.00 ≤ C1 ≤ 2.05.

Meta-heuristics for the Single-Channel PMU Placement Problem . . .

171

3.4 Genetic Algorithm The genetic algorithm (GA) is a simple population-based optimization algorithm. It is started with the initialization of the population, where each unit of the population is called an individual. In GA, each iteration is called a generation. In each generation, new individuals are formed and the best three individuals of the previous generation are retained. These new individuals are called children as they formed from two individuals of the previous generation by applying crossover and mutation to these individuals. In a single-channel PMU placement problem, the size of an individual is equal to the edges of the power grid. And each value in an individual is either 0 or 1. Where, 1 represents a PMU is placed on that edge and 0 represents no PMU is placed on that edge. Here, the selection of two individuals as parents is done by applying a method called roulette wheel selection. This selection selects the parent individuals proportionate to their fitness values and thereby providing the best possible children. In this approach of GA, we are not replacing the whole generation with the new generation of individuals. The top three individuals of the previous generation are retained in the present generation. And the crossover and mutation implemented in this approach of GA are single point crossover and bit flip mutation. Based on the observation, the probability of crossover and mutation is set as 0.8 and 0.008, respectively. And the population size is taken as 100.

4 Results All the approaches in this problem are implemented using Python 3.10 and run on a machine with 16 GB RAM and AMD Ryzen 7 4800H Processor having 8 cores and a base speed of 2.90 GHz. In this project, the standard IEEE power system test cases are used. The results computed for single-channel PMU placement considering ZIBs by two greedy approaches, PSO and GA are shown in Table 1. The time in seconds taken by these approaches is listed in Table 2. The time in Table 2 is transformed to a single decimal point accuracy. For testing the solution efficiency, we took the best-known optimal algorithm in the literature available. And compared the total channel capacity taken by that algorithm to our algorithm for each available data set. The results of total channel capacity are shown in Table 3. As we can see from the comparison, our GA takes fewer channels for monitoring or observing. Hence, Our GA generated solution is less effected by a line outage than any other algorithm. One of the solutions of our GA for the IEEE-14 grid is (v1 ↔ v2 , v3 ↔ v4 , v5 ↔ v6 , v9 ↔ v14 , v10 ↔ v11 , v12 ↔ v13 , v7 ↔ v8 ). By placing single-channel PMUs on the above edges, all power stations are observed. There are seven edges where a single-channel PMU is placed, each single-channel PMU is taking two channel capacity. So for the standard IEEE-14 power system, our GA takes only uses 14 channels to observe the system. In the paper by Shafiullah et al. [21], the multi-channel PMUs are placed on the v2 , v6 , and v9 nodes for the IEEE-14 power system. Each multi-channel PMU

172

K. R. S. V. V. P. P. Narasa Reddy and A. S. Kare

Table 1 Experimental results: the number of PMU’s computed by our heuristics for a single-channel PMU placement considering ZIB’s Data set (|V |, |E|) |ZIB| Greedy 1 Greedy 2 PSO GA IEEE-14 IEEE-30 IEEE-39 IEEE-57 IEEE-69 IEEE-118 IEEE-300

(14, 20) (30, 41) (39, 46) (57, 78) (69, 68) (118, 179) (300, 409)

1 6 10 15 20 10 66

7 13 17 22 28 56 137

7 12 16 21 28 61 130

7 13 16 21 30 85 228

7 13 15 21 27 57 124

Table 2 Running times: CPU time (in seconds) consumed by our approaches for a single-channel PMU placement considering ZIB’s Data set (|V |, |E|) Greedy 1 Greedy 2 PSO GA IEEE-14 IEEE-30 IEEE-39 IEEE-57 IEEE-69 IEEE-118 IEEE-300

(14, 20) (30, 41) (39, 46) (57, 78) (69, 68) (118, 179) (300, 409)

0.1 0.1 0.1 0.1 0.1 0.1 0.2

0.1 0.1 0.1 0.1 0.1 0.1 0.1

1.6 4.0 3.9 10.4 5.1 12.6 32.4

1.1 1.7 4.2 2.9 2.1 3.3 9.2

Table 3 Comparison of total channel capacity with the best-known heuristic [21] Data set IEEE-14 IEEE-30 IEEE-39 IEEE-57 Best known heuristic, PSO [21] Ours single-channel PSO Ours single-channel GA

15 14 14

31 26 26

31 32 30

48 42 40

IEEE-118 140 170 122

here observes five power stations along with the node it was placed on. So the total channel capacity taken by the multi-channel PMUs is 15, which is more than the single-channel PMUs channel capacity that our GA has taken. The standard IEEE14 power system is shown in Fig. 3. If a single-channel PMU fails in the above IEEE-14 grid, only two power stations are not observed but if you have placed a multi-channel PMU on this grid, then the chances of unobservability for more than two power stations are very high. So, the failure of a multi-channel PMU affects more than a single-channel PMU. Hence, the single-channel PMU placement problem is implemented in this project using multiple heuristics.

Meta-heuristics for the Single-Channel PMU Placement Problem . . . Fig. 3 Standard IEEE-14 power system

173

v2

v1

v3

v4

v5

v6

v7

v9

v11

v13

v8

v10

v12

v14

5 Conclusions and Future Work In this project, we have implemented multiple heuristics for the single-channel PMU placement problem considering ZIBs. The implemented heuristics include two greedy approaches, a particle swarm optimization and a genetic algorithm. The total channel capacity of these heuristics is smaller compared to other heuristics that have been implemented using multi-channel PMU. The failure of a single-channel PMU is very less compared to a multi-channel PMU. The single-channel PMUs are reliable, and the effect of PMU failure and line outage is very less compared to multichannel PMUs for the same power grid. And many more single-channel PMUs are needed than multi-channel PMUs for the same power grid. This leads to so much maintenance cost. For future work, we want to identify the best possible n of n-channel PMU for each data set available considering the lowest possible n and cost (both the cost of PMU and maintenance). And also want to analyze the results of placing a fixed set of PMUs having different channels at different power stations of a single power grid.

References 1. Abiri E, Rashidi F, Niknam T, Salehi MR (2014) Optimal PMU placement method for complete topological observability of power system under various contingencies. Int J Electr Power Energy Syst 61:585–593 2. Baldwin TL, Mili L, Boisen MB, Adapa R (1993) Power system observability with minimal phasor measurement placement. IEEE Trans Power Syst 8(2):707–715 3. Chakrabarti S, Kyriakides E (2008) Optimal placement of phasor measurement units for power system observability. IEEE Trans Power Syst 23(3):1433–1440 4. Dalali M, Kazemi Karegar H (2016) Optimal PMU placement for full observability of the power network with maximum redundancy using modified binary cuckoo optimisation algorithm. IET Gener Transm Distrib 10(11):2817–2824

174

K. R. S. V. V. P. P. Narasa Reddy and A. S. Kare

5. Darties B, Chateau A, Giroudeau R, Weller M (2018) Improved complexity for power edge set problem. In: Brankovic L, Ryan J, Smyth WF (eds) Combinatorial algorithms. Springer International Publishing, Cham, pp 128–141 6. Darties B, Champseix N, Chateau A, Giroudeau R, Weller M (2018) Complexity and lowers bounds for power edge set problem. J Discrete Algorithms 52–53:70–91 (Combinatorial algorithms—special issue devoted to life and work of Mirka Miller) 7. Dean N, Ilic A, Ramirez I, Shen J, Tian K (2011) On the power dominating sets of hypercubes. In: 2011 14th IEEE international conference on computational science and engineering, pp 488–491 8. Dua D, Dambhare S, Gajbhiye RK, Soman SA (2008) Optimal multistage scheduling of PMU placement: an ILP approach. IEEE Trans Power Deliv 23(4):1812–1820 9. Gou B (2008) Generalized integer linear programming formulation for optimal PMU placement. IEEE Trans Power Syst 23(3):1099–1104 10. Guo J, Niedermeier R, Binkele-Raible D (2008) Improved algorithms and complexity results for power domination in graphs. Algorithmica 52(2):172–184 11. Haynes T, Hedetniemi S, Hedetniemi S, Henning M (2002) Domination in graphs applied to electric power networks. SIAM J Discrete Math 15:519–529 12. Hurtgen M, Maun JC (2010) Optimal PMU placement using iterated local search. Int J Electr Power Energy Syst 32(8):857–860 13. Kare AS, Valluru S (2020) Heuristics for the power edge set problem. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT), pp 1–7 14. Korres GN, Manousakis NM, Xygkis TC, Löfberg J (2015) Optimal phasor measurement unit placement for numerical observability in the presence of conventional measurements using semi-definite programming. IET Gener Transm Distrib 9(15):2427–2436 15. Liao CS, Lee DT (2005) Power domination problem in graphs. In: Wang L (ed) Computing and combinatorics. Springer, Berlin, pp 818–828 16. Manousakis NM, Korres GN, Georgilakis PS (2012) Taxonomy of PMU placement methodologies. IEEE Trans Power Syst 27(2):1070–1077 17. Marin FJ, Garcia-Lagos F, Joya G, Sandoval F (2003) Genetic algorithms for optimal placement of phasor measurement units in electrical networks. Electron Lett 39(19):1403–1405 18. Mazhari SM, Monsef H, Lesani H, Fereidunian A (2013) A multi-objective PMU placement method considering measurement redundancy and observability value under contingencies. IEEE Trans Power Syst 28(3):2136–2146 19. Peng J, Sun Y, Wang H (2006) Optimal PMU placement for full network observability using Tabu search algorithm. Int J Electr Power Energy Syst 28(4):223–231 20. Poirion PL, Toubaline S, D’Ambrosio C, Liberti L (2016) The power edge set problem. Networks 68(2):104–120 21. Shafiullah M, Hossain MI, Abido M, Abdel-Fattah T, Mantawy A (2019) A modified optimal pmu placement problem formulation considering channel limits under various contingencies. Measurement 135:875–885 22. Sun O, Fan N (2019) The probabilistic and reliable connected power dominating set problems. Optim Lett 13 23. Sun O, Fan N (2019) Solving the multistage PMU placement problem by integer programming and equivalent network design model. J Glob Optim 74(3):477–493 24. Toubaline S, D’Ambrosio C, Liberti L, Poirion PL, Schieber B, Shachnai H (2018) Complexity and inapproximability results for the power edge set problem. J Comb Optim 35(3):895–905 25. Xu B, Abur A (2004) Observability analysis and measurement placement for systems with PMUs. In: IEEE PES power systems conference and exposition, vol 2, pp 943–946 26. Yang X, Chen N, Zhai C (2019) Optimal placement of limited PMUs for transmission line outage detection and identification 27. Zhao M, Kang L, Chang GJ (2006) Power domination in graphs. Discrete Math 306(15):1812– 1816

Comparative Analysis of Different Machine Learning Approaches for Sentiment Analysis Tanvi Desai

and Divyakant Meva

Abstract As a result of the transition from Web 2.0 to Web 3.0, people have adapted to being educated and socially connected with others while also becoming more empowered to deliver and obtain various services based on their individual thoughts and views. The natural language processing approach, known as sentiment analysis, uses the emotional tone behind the written text material and determines whether such implications are positive, negative, or neutral. It encompasses text extraction for sentiment and qualitative information using data mining, machine learning, and artificial intelligence. This research paper is a review of the literature about different machine learning techniques. It compares accuracy, benefits, and limitations of each machine learning method. Keywords Machine learning · Web · Individual thoughts · Sentiments · Learning

1 Introduction 1.1 Sentiment Sentiment is an attitude towards something. People nowadays are used to give reviews, feedback, comments, or opinions on the Internet. Sentiment analysis refers to the process of determining the attitude behind the text. Text processing sentiment modelling uses natural language processing and machine learning to assign measurable sentiment values to sentences and phrases. Reference dictionaries were used to identify positive and negative words and estimate the text’s emotion. For sentiment classification, machine learning is popular. T. Desai (B) Anand Institute of Management and Information Science, Anand, India e-mail: [email protected] D. Meva Marwadi University, Rajkot, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_15

175

176

T. Desai and D. Meva

Sentiment Analysis Text categorization, extraction of structured data from unstructured data, and aspect-based sentiment analysis are the three main tasks of sentiment analysis. Text categorization has three levels: document, sentence, and feature. The document-level classification classifies the entire document as subjective or objective and then determines whether the perception is good or bad. The sentence-level classification shows whether a sentence is subjective or objective, and then it shows whether a sentence is positive, negative, or neutral. It is for short reviews and comments. The feature-level categorization classifies sentiments by their traits [1]. Following text classification, the next step is extracts structured data from unstructured data. Fivetuple method solves it. One of the important elements of a tuple is the aspect in which an opinion is usually expressed. Next, an aspect-based sentiment analysis is performed, which includes: recognizing features of the desired object and evaluating the emotional polarity of each feature. This task consists of two subtasks, including aspect extraction and categorization.

1.2 Machine Learning Machine learning is an area of artificial intelligence that focuses on the development and study of strategies for extracting knowledge from data. For classification problems, supervised and unsupervised algorithms are very effective to extract sentiment from ordinary text. Supervised Learning Supervised learning is a research topic under the umbrella of machine learning that is dependent on labelled data in order to complete training. Labelled datasets are used to train algorithms that accurately predict data. It uses a training dataset to create models that will provide the desired results. This training data consists of appropriate inputs and outputs which enable the model to learn as time passes. In sentiment analysis, decision trees, support vector machines, Naive Bayes, and maximum entropy are often used. Unsupervised Learning Unsupervised learning methods are not dependent on labelled data to train the algorithm like supervised learning. It is used to analyses and cluster unlabelled datasets. These methods of learning identify underlying patterns or clusters without any training. The most common techniques for unsupervised learning are K-means and apriori algorithms.

2 Methodology There are two distinct methods for doing a sentimental analysis. The first is lexicon analysis, while the second is machine learning. A lexicon-based technique is used to classify the sentiment view of the entire content with the help of the lexicon’s perspective. The lexicon can be created manually or automatically using the dictionary-based

Comparative Analysis of Different Machine Learning Approaches …

177

method or the corpus-based method. Machine learning algorithms classify sentiment using supervised learning, unsupervised learning.

2.1 Machine Learning-Based Sentiment Analysis In the machine learning method, a classifier is used to determine the perspectives of undiscovered texts. The model is built from tagged examples of sentences and documents. This type of training is called “supervised learning”. As instructed, the model determines which examples in the class are good and which are bad. Certain characteristics separate a positive from a negative sentence, according to the model. Those characteristics are called learning parameters, and they are usually unigrams, or single words or tokens from the training dataset. It is possible that the classification will be binary (good or bad) or include a neutral category [2].

2.2 Lexicon-Based Sentiment Analysis The use of dictionaries and lexicons is a popular alternative method of sentiment analysis. Because the dictionaries are applied according to specified criteria, they are also known as rule-based dictionaries. The sentiment attitudes of specific texts and the current lexicon are used to generate text sentiment values. Words and their polarity can be found in the dictionary. After the discovery of a new text, the words in that text are compared to the terms in the dictionary, and the values of those words are gathered using a variety of aggregation methods. The polarities of the words in the text are added together to determine the semantic meaning of the entire text.

2.3 Hybrid Approach The hybrid approach is also used for sentiment analysis which includes both machine learning and a lexicon-based classification approach. Several research methodologies applied the methods of lexicon-based and automatic learning techniques to improve sentiment classification. This hybrid approach is primarily advantageous because it allows one to get the benefits of both approaches. The combination of lexicon and machine learning has demonstrated increased accuracy [3]. Figure 1 presents the working mechanisms of machine learning and lexicon-based approach for sentiment analysis.

178

T. Desai and D. Meva

Fig. 1 Working mechanism of machine learning and lexicon-based analysis [2]

3 Related Work The discipline of natural language processing has made sentiment analysis a central focus of research in recent years. The main purpose of this investigation is to recognize the feelings and views of individuals. Several research studies have been conducted using various models, but it is still regarded as a difficult topic to tackle because there are so many conflicts to resolve. As per the review, the majority of researchers have done sentiment analysis on movie reviews, product reviews, tourism destinations, and social media. Wassan et al. [4] and Sunagar et al. [5] have developed a new approach that focuses on sentimental aspects of the item’s characteristics. They used an Amazon dataset that included product reviews. To retrieve data as positive or negative, the system performs some preprocessing steps on the dataset, like filtering, tagging, and the deletion of stop words. According to Umarani et al. [6] survey, mixed approaches combining lexiconbased with machine learning techniques or a combination of neural network techniques outperform using these strategies independently. When analysing the sentiment of Malayalam tweets, Soumya and Pramod [7] applied machine learning techniques such as Naive Bayes, support vector machine, and random forest. Depending on the content, these tweets can be positive or negative. During the process of feature vector construction, they took into consideration a wide

Comparative Analysis of Different Machine Learning Approaches …

179

variety of characteristics, including unigram with SentiWordNet, bag of words, term frequency, and inverse document frequency. According to their research, the random forest classifier performs better when assessing unigrams with SentiWordNet, which incorporates negation words. Mahmood et al. [8] used lexicons and machine learning to test Facebook posts from the University of East Malaysia (UUM). According to their findings, a hybrid approach may be superior to a machine learning approach. As a result, the lexiconbased approach is commonly used to label data and generate meaningful datasets, which improves sentiment analysis classification accuracy. Parveen et al. [9] have been using SVM, Naive Bayes, and KNN to analyse the data collected from online movie reviews. They discovered that well-prepared AI calculations can perform amazingly well in analysing sentiment polarities. They concluded that the SVM algorithm could correctly classify more than 80% of the time. Even though the training dataset was small, the difference between Naive Bayes and KNN was very important. But for a large training dataset, all three algorithms perform well. Nezhad and Deihimi [10] worked on Persian sentiment analysis with CNN-LSTM, a deep learning methodology. In this architecture, CNN is used for feature extraction, and LSTM is used to learn long-term interconnections in the CNN-LSTM architecture. As a result, the model can use both CNN and LSTM capabilities and achieve good results with an 85% precision. Alsaeedi and Khan [11] presented a survey of various methods for Twitter sentiment data analysis. They have introduced machine learning ensemble approaches and dictionary-based approaches. Their findings show that when multiple features were involved, the techniques SVM and MNB produced the highest accuracy. SVMs are considered the best machine learning algorithms, but sometimes, a lexicon-based approach is useful. Naive Bayes, maximum entropy, and SVM secured approximately 80% accuracy with n-gram and bigram models. With a classification accuracy of 85%, the ensemble approach outperforms supervised machine learning techniques. Saranya and Jayanthy [12], Jagdale and Emmanuel [13], and Tripathy et al. [14] worked on topic-based sentiment classification with the help of machine learning methodology. They experimented with Naive Bayes, maximum entropy classification, and the SVM algorithm. Performance-wise, they discovered that Naive Bayes performs the worst and SVM performs the best. Saritha and Nayak [15] used machine learning to categorize user reviews. Encoding is based on the user reviews. Initially, they extracted the aspects from the dataset. The retrieved data is then subjected to preprocessing procedures to remove noise and undesirable data. Aspects of sentiment can be categorized as positive, negative, or neutral, depending on their relationship to the polarity. The experiment takes both the favourable and unfavourable aspects of the review datasets into account. The experimental results show that the SVM approach is far more effective than the Naive Bayes one. Singh et al. [16] have tried out WEKA software for sentiment classification on three datasets: reviews of the Woodland Wallet, the 7465 digital camera, and an IMDB movie. They did this by using the J48, OneR Naive Bayes, and BFTree algorithms

180

T. Desai and D. Meva

on reviews of the Woodland Wallet, the 7465 digital camera, and an IMDB movie. The accuracy rate of the first dataset with Naive Bayes is 100% due to its small size. They also found that Naive Bayes learns the most quickly out of the four classifiers, while J48 learns the least quickly. Their research also shows that the OneR classifier outperforms very well at classifying the correct instances, and the J48 algorithm works well for promising the instances as true positive or false positive. Zou et al. [17] used SVM and Naive Bayes to classify review sentiment. When a large number of training datasets for the review are considered, the Naive Bayes approach achieves high precision when compared to others. The findings suggest that the Naive Bayes approach outperforms the SVM. Wahyudi and Kristiyanti [18] have done smart phone product reviews with SVM and SVM-based particle swarm optimization. For the product opinion, the data is evaluated and labelled as premium, good, bad, or fail. Their proposed models were tested in order to determine the confusion matrix. The accuracy of the SVM and SVMbased particle swarm optimization is 82.00 and 94.50, respectively. The researchers discovered that SVM-based particle swarm optimization outperforms the support vector machine alone. Gautam and Yadav [19] suggested machine learning algorithms for classifying product review sentences to analyse various labelled recommendations. They say that Naive Bayes worked better than maximum entropy, and that when SVM was limited to a one-gram model, it worked better. The results show that adding Naive Bayes and SVM to WordNet’s semantic analysis improves accuracy. As per the study, the trained model can be expanded to improve sentence identification based on feature vectors, and WordNet can be expanded for the review conclusion. Joshi and Itkat [20] have presented a survey on different machine learning methods for feature-level sentiment analysis, which includes supervised, unsupervised, and hybrid methods. The authors reported that supervised machine learning methods outperformed unsupervised machine learning methods. They also stated that unsupervised methods are essential because supervised methods require large amounts of training datasets, which are very highly priced, whereas unsupervised data gathering is simple. Govindarajan [21] has introduced a hybrid approach of Naive Bayes and genetic algorithm models that was tested on movie review data. First, they use individual classifier Naive Bayes and the genetic algorithm to look at the information. They then create a hybrid Naive Bayes–genetic algorithm model and test its accuracy. The hybrid model outperforms the base classifiers in terms of classification accuracy and testing time due to reduced data dimensions. Abdulla et al. [22] discuss sentiment analysis on Arabi using both corpus-based and lexicon-based approaches. They discovered that the corpus-based approach using SVM achieves the highest accuracy to classify light-stemmed data. Zhang et al. [23] have conducted experiments on Cantonese text for restaurant reviews in order to classify its sentiment polarities. They have also looked at the various classification methods and feature presentation possibilities. According to the authors’ reports, the performance of the Naive Bayes classifier is comparable to

Comparative Analysis of Different Machine Learning Approaches …

181

or even superior to that of the SVM. According to their research, bigram frequency is a useful feature for obtaining sentiments in Cantonese text. Ye et al. [24] tested supervised machine learning methods on travel destination reviews from around the world. They reported that supervised machine learning algorithms can accomplish better classifications of reviews to derive sentiment opinion. The authors claim algorithms can classify over 80% of the time to improve accuracy. They also concluded that the SVM and the character-based N-gram model outperformed well. They also stated that algorithms were highly considerable with a small training dataset. However, as the training data grows larger, the variation becomes less considerable. Mulkalwar and Kelkar [25] have proposed a combined approach to identifying text reviews based on the sentiment contained in those reviews using two classifiers, the hidden Markov model and the support vector machine. Based on what they found, combining the ideas behind the hidden Markov model and the support vector machine improves the performance of the expected classifier.

4 Comparative Analysis Table 1 summarizes the findings of various approaches to machine learning for sentiment analysis by different researchers.

5 Discussion and Analysis In this report, we review existing research on sentiment analysis using various machine learning approaches. We also found that most of the experiments were done with datasets from Amazon and IMDB that were well-known. We also found that people had also experimented on social media data like Facebook and Twitter. Recently, sentiment analysis has been done on non-English languages and has drawn the attention of scholars. Due to the availability of its resources, such as lexica, corpora, and dictionaries, the English language is the most widely used language. This presents scholars with a new task in developing lexica, corpora, and dictionary resources for additional languages. We also analysed that machine learning is used for sentiment classification, and lexicon-based approach is used for sentiment analysis. Several different classification strategies are used to run the tests. These include Naive Bayes, SVM, and maximum entropy. Some researchers have used advanced machine learning algorithms as a hybrid strategy using GA, a corpusbased lexicon strategy, convolutional neural networks (CNN), and long short-term memory (LSTM). In terms of accuracy, the support vector machine outperforms many other classifiers. We concluded that Naive Bayes performs best with a small feature set, whereas SVM performs best with a large feature set. Maximum entropy also outperforms, but it suffers from overfitting.

Author

Soumya and Pramod

Mahmood et al.

Nezhad and Deihimi

Saritha and Nayak

Wahyudi et al.

Saranya and Jayanthy

Singh et al.

S. No.

1

2

3

4

5

6

7

Table 1 Comparative analysis

Year

2017

2018

2019

2019

2019

2020

2020

Approach

Machine learning

Machine learning

Machine learning

Machine learning

Hybrid deep learning

Hybrid lexicon and machine learning

Machine learning

Methodology

NB J48 OneR

NB ME SVM

SVM SVM_PSO

SVM NB

CNN LSTM

NB SVM

NB SVM RF

88

1400

200

–

2550

–

3184

Size

Sony digital camera 1465 reviews

Woodland product reviews

Movie review

Smartphone product review

Mobile reviews collected from Amazon

Persian dataset

FB post-Universiti Utara Malaysia (UUM)

Malayalam tweets using Twitter API

Dataset

Weka

–

–

Python with Scikit learn library

Pycharm

–

–

Tool

(continued)

NB 73.51 J48 76.86 BFTree 69.45 OneR 76.56

NB 85.12 J48 87.62 BFTree 84.98 OneR 87.65

NB 81.0 ME 80.4 SVM 82.9

SVM 82.0 SVM-PCO 94.50

SVM 88.43 NB 84.71

CNN 67 LSTM 86

NB 86 SVM 90

NB 94.4 SVM 94.8 RF 95.6

Accuracy

182 T. Desai and D. Meva

Zou et al.

Gautam and Yadav

Abdulla and Ahmed

Govindarajan

Zhang et al.

Ye et al.

8

9

10

11

12

13

2009

2011

2013

2013

2014

2016

Year

Machine learning

Machine learning

Hybrid

Corpus-based Lexicon-based

Machine learning

Machine learning

Approach

NB SVM DLM

NB SVM

NB GA

SVM NB DT KNN

NB SVM ME

NB SVM

Methodology

Tourists’ reviews

Cantonese-written restaurant reviews

Movie review

Arabic dataset

Customers’ review

1191

3000

2000

2000

19,340

1400

1000

Movie reviews

Movie review

Size

Dataset

–

–

–

Rapid miner

Python with NLTK library

Tool

NB 80.71 SVM 85.14 Dynamic language 84.05

NB 93.17 SVM 90.67

NB 91.15 GA 91.25

SVM 84.7 NB 80.4 KNN 51.3 D Tree 50

NB 88.2 ME 83.8 SVM 85.5 SA 89.9

NB 65.57 SVM 45.71

NB 76.78 J48 74.23 BFTree 61.51 OneR 63.52

Accuracy

Note NB Naive Bayes; SVM support vector machine; SVM_PSO support vector machine-based particle swarm optimization; ME maximum entropy; DT decision tree; KNN K nearest neighbour; DLM dynamic language model

Author

S. No.

Table 1 (continued)

Comparative Analysis of Different Machine Learning Approaches … 183

184

T. Desai and D. Meva

6 Conclusion and Future Work This paper provides an overview of the different machine learning approaches used in sentiment analysis. After analysing the work of other people, it is clear that SVM and NB are frequently used algorithms for text classification. A lot of work has already been done, but there is still progress to be made to improve prediction accuracy. As a result, various aspects of the system, such as algorithms, feature extraction, and dataset, must be improved. In this field, people are becoming more interested in languages other than English, but there are still not enough resources and studies on these languages. Most people use WordNet, which is also available in languages other than English. A lot of natural languages still need to build resources, which are used in sentiment analysis. The advanced concept of machine meaning or ensemble learning can be a good scope for sentiment analysis.

References 1. Godara N, Kumar S (2019) Opinion mining using machine learning techniques. Int J Eng Adv Technol 9(2):4287–4292 2. Taboada M (2016) Sentiment analysis: an overview from linguistics 3. Pandey S, Tekchandani H, Verma S (2020) A literature review on application of machine learning techniques in pancreas segmentation. In: 2020 first international conference on power, control and computing technologies (ICPC2T), Jan 2020. IEEE, pp 401–405 4. Wassan S, Chen X, Shen T, Waqar M, Jhanjhi NZ (2021) Amazon product sentiment analysis using machine learning techniques. Rev Argentina Clín Psicol 30(1):695 5. Sunagar P, Naik DA, Sangeetha V, Kanavalli A, Seema S (2021) Feedback collection and sentiment analysis on the product reviews. In: 2021 IEEE international conference on mobile networks and wireless communications (ICMNWC), Dec 2021. IEEE, pp 1–6 6. Umarani V, Julian A, Deepa J (2021) Sentiment analysis using various machine learning and deep learning techniques. J Nig Soc Phys Sci 385–394 7. Soumya S, Pramod KV (2020) Sentiment analysis of Malayalam tweets using machine learning techniques. ICT Express 6(4):300–305 8. Mahmood A, Kamaruddin S, Naser R, Nadzir M (2020) A combination of lexicon and machine learning approaches for sentiment analysis on Facebook. J Syst Manag Sci 10(3):140–150 9. Parveen R, Shrivastava N, Tripathi P (2020) Sentiment classification of movie reviews by supervised machine learning approaches using ensemble learning & voted algorithm. In: 2nd international conference on data, engineering and applications (IDEA), Feb 2020. IEEE, pp 1–6 10. Nezhad ZB, Deihimi MA (2019) A combined deep learning model for Persian sentiment analysis. IIUM Eng J 20(1):129–139 11. Alsaeedi A, Khan MZ (2019) A study on sentiment analysis techniques of Twitter data. Int J Adv Comput Sci Appl 10(2) 12. Saranya K, Jayanthy S (2017) Onto-based sentiment classification using machine learning techniques. In: 2017 international conference on innovations in information, embedded and communication systems (ICIIECS), Mar 2017. IEEE, pp 1–5 13. Jagdale J, Emmanuel M (2019) Hybrid corrective critic neural network for sentiment classification in community media. In: 2019 3rd international conference on electronics, communication and aerospace technology (ICECA), June 2019. IEEE, pp 1236–1241

Comparative Analysis of Different Machine Learning Approaches …

185

14. Tripathy A, Anand A, Rath SK (2017) Document-level sentiment classification using hybrid machine learning approach. Knowl Inf Syst 53(3):805–831 15. Saritha B, Nayak J (2019) Aspect based sentiment analysis using Naïve Bayes and support vector classifiers 16. Singh J, Singh G, Singh R (2017) Optimization of sentiment analysis using machine learning classifiers. HCIS 7(1):1–12 17. Zou H, Tang X, Xie B, Liu B (2015) Sentiment classification using machine learning techniques with syntax features. In: 2015 international conference on computational science and computational intelligence (CSCI), Dec 2015. IEEE, pp 175–179 18. Wahyudi M, Kristiyanti DA (2016) Sentiment analysis of smartphone product review using support vector machine algorithm-based particle swarm optimization. J Theoret Appl Inf Technol 91(1) 19. Gautam G, Yadav D (2014) Sentiment analysis of Twitter data using machine learning approaches and semantic analysis. In: 2014 seventh international conference on contemporary computing (IC3), Aug 2014. IEEE, pp 437–442 20. Joshi NS, Itkat SA (2014) A survey on feature level sentiment analysis. Int J Comput Sci Inf Technol 5(4):5422–5425 21. Govindarajan M (2013) Sentiment analysis of movie reviews using hybrid method of Naive Bayes and genetic algorithm. Int J Adv Comput Res 3(4):139 22. Abdulla NA, Ahmed NA, Shehab MA, Al-Ayyoub M (2013) Arabic sentiment analysis: lexicon-based and corpus-based. In: 2013 IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT), Dec 2013. IEEE, pp 1–6 23. Zhang Z, Ye Q, Zhang Z, Li Y (2011) Sentiment classification of Internet restaurant reviews written in Cantonese. Expert Syst Appl 38(6):7674–7682 24. Ye Q, Zhang Z, Law R (2009) Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst Appl 36(3):6527–6535 25. Mulkalwar A, Kelkar K (2012) Sentiment analysis on movie reviews based on combined approach. Int J Sci Res 3:1739–1742

A Comprehensive Investigation of Machine Learning Algorithms with SMOTE Integration to Maximize F1 Score Surbhi Sharma

and Alka Singhal

Abstract Classification algorithms are very helpful for the healthcare sector as they help in detecting a disease at an early stage which helps to give the required treatment to the patients in a timely manner. Machine learning techniques can be used to develop a classification model that can predict a disease. In this paper we have explored classification algorithms on the Framingham Heart Disease Dataset. The dataset contains 15 features that are helpful to predict the risk of CHD in the next ten years. In our study we found that the dataset is imbalanced, i.e., the total instances of a certain class is higher than the instances of another class present in dataset, so we have used Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset. Using SMOTE for the purpose of balancing the dataset has improved the F1 Score of positive class for all classifiers drastically. In our experimental analysis we have found that accuracy of SVM increases after applying SMOTE. Also AUC value of SVM is highest among all the classifiers. Keywords Heart disease prediction · SMOTE technique · Classification algorithms on imbalanced dataset

1 Introduction According to the report of World Health Statistics [1] diseases that are Non Communicable are the major reason for the mortality nowadays. Among these Non Communicable Diseases Heart Disease is present among the leading causes of mortality that accounts for nearly 17.9 million deaths worldwide. According to a recent study in India early deaths have been increased to 59% due to heart disease. There are various

S. Sharma (B) · A. Singhal Department of CSE and IT, JIIT, Noida, India e-mail: [email protected] A. Singhal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_16

187

188

S. Sharma and A. Singhal

problems that are faced by people related to the heart like identification of abnormality in heart rhythm, genetic heart problems, coronary heart disease, cardiomyopathy (disease in muscles of heart), cardiac arrest, etc. The main reason for heart disease is an unhealthy lifestyle as discussed by Khandoker et al. [2]. Various factors that are contributing to the increase of this deadly disease are obesity due to unhealthy diet plans, high cholesterol, high blood pressure levels, hypertension, smoking habit, inadequate health, physical activity, etc. Advancement in technology has resulted into invention of various smart devices that can be used in our daily lives to monitor our glucose levels, blood pressure, SPO2, etc. [3] but still we are not able to decrease the number of heart patients globally Coronary Heart Disease which is a variant of heart disease is creating havoc on the lives of people. There are certain indications that the American Heart Association has registered. Some of these indications are sleep deprivation, fluctuation in rate of heart beats, swelling in legs, and constant weight gain. These indications can be due to heart disease and should not be ignored. Due to the fatal results of CHD various prediction models have been developed by researchers so far in order to predict the disease with the help of certain risk factors like age, diabetes, cholesterol, BMI, glucose level, heart rate, blood pressure, diabetes, etc. In this paper we have applied various classification algorithms on the dataset which is based on the population of Framingham. It records various risk factors of the people that can gradually lead to the chances of having CHD in the next ten years. The results of classification algorithms are compared then. During the Exploratory Data Analysis we found that the dataset is imbalanced. In order to balance the dataset a balancing technique, Synthetic Minority Oversampling Technique (SMOTE), is used. Our discussion in next sections is put in the following order as: Sect. 2 presents the related work which is a review of some of the work done in the field. Section 3 describes the proposed work on the dataset. Results are published and compared in Sect. 4. Conclusion of our analysis and future work is suggested in Sect. 5.

2 Related Work Cardiovascular Diseases have an adverse effect on the heart as well as blood vessels. Various risk factors that are associated with CVD which include age, cholesterol, heart rate, BMI, previous history of strokes, etc. have been reported by medical experts. These risk factors can be used in classification models and on the basis of these factors prediction of risk of heart disease can be done. Various researchers have suggested methodology to design a classification model that can predict accurately for the people who have more chances of suffering from a heart disease. These classification models based on the suggested approaches by researchers work on a variety of datasets that are involving the above mentioned risk factors directly or indirectly. Most datasets are made of common attributes but they differ in the experimental setup and place. Uddin and Kumar [4] have developed a model that uses three classifiers Naïve Bayes, Gradient Boost, and Random Forest. The model

A Comprehensive Investigation of Machine Learning Algorithms …

189

is a multilayer dynamic system that uses majority voting as an ensemble method. Gao et al. [5] have developed a model that uses K-nearest neighbor (KNN), support vector machine (SVM), decision tree (DT), Naive Bayes (NB), and random forest (RF) classifiers together with two ensemble methods boosting and bagging. Bagging works on average. Kohli and Arora [6] have proposed a solution that involves using backward selection feature selection method on heart disease dataset that is online available in University of California Irvine (UCI) repository. 11 features were used with the three classification algorithm namely logistic regression, SVM, and decision tree. Among the three, logistic regression gave the best accuracy. Gonsalves et al. [7] have used three algorithms Naïve Bayes, SVM, and decision tree on South African heart disease dataset and evaluated using confusion matrix and concluded that Naïve Bayes produced best results. They used 10 features for training the model. A dataset was collected from Harpan Kita Hospital by Enriko et al. [8] and for classification they used KNN with and without parameter weighting. On the Framingham dataset a approach was suggested by Rubini et al. [9] in which random forest algorithm was used and its results were compared with other algorithms. The accuracy of proposed technique is 84.81%. There are various algorithms of classification that can be used in combination to obtain more accurate results. Such models are used by Mienye et al. [10] and Latha and Jeeva [11]. On combining the various classifiers, the advantage of working capability of all the classifiers can be taken. Gárate-Escamila et al. [12] have described a model that uses six classifiers: Decision tree, Gradient boost tree, logistic regression, multilayer perceptron classifier, Naïve Bayes, and random forest. This work proposed chi square test of independence and Principal Component Analysis (PCA) for the purpose of feature selection and feature extraction, respectively. Haq et al. [13] uses logistic regression, KNN, ANN, SVM, DT, and NB classifiers and three feature selection techniques namely Relief, mRMR, and LASSO along with k fold cross validation. Mienye et al. [14] have developed a classification model using sparse autoencoders. Autoencoder has two functions namely encoder and decoder. Encoder works by mapping the given original d-dimensional input data with an hidden representation and decoder is made to map in opposite direction, i.e., from the hidden representation to d-dimensional vector that should match with original input. The model was trained against negative instances so that it can accurately identify them. A model developed by Ali et al. [15] that makes use of deep learning together with feature fusion techniques. The model takes input from sensors and medical records and combine them using fusion techniques. An ensemble deep learning classifier is then trained to do prediction. Dudjak et al. [16] have discussed a study on oversampling algorithms that are based on SMOTE technique. Skryjomski et al. [17] have discussed the methods of balancing classes. They have suggested to use oversampling instead of under sampling as in under sampling we can loose some valuable instances from our dataset. SMOTE technique is a type of oversampling. Magesh and Swarnalatha [18] have proposed a model based on CDTL for prediction of heart disease.

190

S. Sharma and A. Singhal

3 Proposed Work We have seen that many researchers have obtained prediction results on imbalanced data. In this case accuracy obtained is biased and also F1 Score is very less. To resolve this we have used SMOTE to balance the dataset and improve the F1 Score. This section deals with the discussion of classification algorithms that have been applied on the dataset and also the use of SMOTE coupled with classification algorithms.

3.1 Dataset Description For the experiments we have used the Framingham Heart Study (FHS) dataset that consists of 4240 records and 16 features. The dataset has recorded some prevailing risk factors related to demography, behavior, past records as well as current medical conditions that can gradually lead to perceiving CHD. The dataset is publicly available on Kaggle. The data in the dataset was collected in three rounds from 1948 to 2002 on the basis of a study done on the population of Framingham. 3596 samples of the dataset belong to the negative class and rest of the samples belong to the positive class. Table 1 describes the significance of all attributes of the dataset. Table 1 Description of attributes of dataset Attributes

Significance

Age

Age of person

Education

Values: 1–4; 1 is for high school, 2 is for high school or GED, 3 is for college or vocational study, 4 is for college

Current smoker

Values 1, 2; 1—yes, 0—no

Diabetes

1—diabetic, 0—non diabetic

Totchol

Value of cholesterol in mg/dl

SysBP

Systolic BP value in mmHg

DiasBP

Diastolic BP value in mmHg

CigsPerDay

Value is equal to total number of cigarettes smoked by person in a day

BPmeds

Is person taking BP medicine? (1—yes, 0—no)

prevalentStroke Did person have any heart stroke? (1—yes, 0—no) PrevalentHyp

Is person facing some hypertension? (1—yes, 0—no)

BMI

Value of body mass index

Glucose

Glucose level value in mg/dl

Heartrate

Measurement of rate of heart beat in beats per minute

TenYearCHD

Risk of CHD in next ten years (1—yes, 0—no)

A Comprehensive Investigation of Machine Learning Algorithms …

191

3.2 Exploratory Analysis of Data Preprocessing is done on a dataset to make it more suitable for a model developed for classification. The real data comprises of noise, missing values and unsuitable format. In order to improve the accuracy of prediction results of the classification model data is cleaned and made suitable for applying classification algorithms. Figure 1 depicts the steps we have followed to conduct the experiments. The implementation is being done using various Python libraries. ‘Numpy’ library is used for applying mathematical functions. Classification algorithms have been implemented with the help of ‘sklearn’ library. Graphs plotting and data visualization are done by importing two libraries ‘seaborn’ and ‘matplotlib’ as discussed by Oberoi and Chauhan [19]. Data analysis has been carried out with the help of ‘pandas’. For preprocessing following steps have been taken into consideration. 1. Missing values can exist in the dataset as discussed by Kang [20] due to various prevailing reasons like human error, defective measurement instrument, inconsistency of measuring unit, etc. These missing values are to be handled beforehand Fig. 1 Flowchart of process

192

S. Sharma and A. Singhal

as they affect the accuracy results of a model. Figure 2 depicts the missing values found in the dataset. 2. All the missing values are substituted and mean values of the corresponding feature are treated as substitutes of missing values. The feature ‘Education’ is manually removed from the dataset as it will not contribute to the prediction. Instead of deleting rows with missing values we have replaced it with mean value so that the number of samples available does not decrease. Figure 3 depicts the missing values in terms of percentage. Boxplots have been used to find outliers as these outliers act as noise and create hindrance in prediction. 3. Feature selection is required to select relevant features for training the model. If we will train the model using all features then it will increase the cost of computation. The dataset contains 15 features, out of which the most relevant ones are to be selected. We have used feature importance technique that provides a score to the all features then we have selected top 7 features. We have computed the correlation of features with the help of heatmap. Figure 4 illustrates the heatmap showing correlation of features with each other. 4. Feature Scaling is a technique which is used to standardize the values of independent variables of a given dataset in a specified range. This also involves a method to put the variables in the common range in order to avoid the domination of a particular variable. After scaling data is split into 80:20 ratio for training and testing. Fig. 2 Count of missing data of every attribute

A Comprehensive Investigation of Machine Learning Algorithms …

Fig. 3 Feature wise percentage of missing data

Fig. 4 Heatmap for correlation of features

193

194

S. Sharma and A. Singhal

3.3 Classification Algorithms We have evaluated the results of K-nearest neighbor (KNN), SVM, decision tree, and logistic regression classification algorithms on the dataset. First we have done classification on imbalanced dataset and then we have done balancing on dataset and again we applied classification and compared the results of both scenario. KNN is a classification algorithm based on supervised machine learning. It predicts the target class of new data instances by evaluating k neighbors of new data instances in the training data. For evaluation of neighbors, distance between data points is calculated by using techniques like Euclidean distance, Manhattan distance, etc. The class of new instances is derived from the class of the majority of the neighbors. SVM belongs to the supervised machine learning category that works by forming hyperplanes. It divides n-dimensional feature space into classes so that target class of new data instance can be predicted. DT is a classification algorithm that works by utilizing the property of trees. Root node is split into branches based on the probabilistic evaluation of features. Edges represent decision rules applied on nodes. Methods used for creation of trees include Gini Index and entropy rule. LR is used to predict the category of dependent variables with the help of independent variables. The likelihood of a certain class is computed with the help of logistic function or sigmoid function.

3.4 SMOTE Integrated Classification On applying classification algorithms directly we have found that the FI Score value is very less. The reason for this is that the dataset is imbalanced. The problem of imbalance occurs when the samples of different classes are not present in equal ratio in the dataset. The samples of one class is in majority as compared with other class. In the given dataset the total number of samples of positive class (that person has risk of CHD in the next ten years) is 644 and that of negative class (that person does not have risk of CHD in the next ten years) is 3596. The challenge with imbalanced dataset is that on applying machine learning techniques we can get biased results in terms of classification as during training sample of one class are comparatively very higher than other. Imbalanced dataset can be treated by resampling. In resampling we can perform either oversampling (adding more instances of minority class) or under sampling (deleting some instances of majority class). In order to balance data of given imbalanced dataset Synthetic Minority Oversampling Technique (SMOTE) will be used.

A Comprehensive Investigation of Machine Learning Algorithms …

195

SMOTE works by creating synthetic examples of minority classes. Following steps describes the working of SMOTE • • • • •

Random selection of an instance x belonging to the minority class. Evaluation of K-nearest neighbors of instance x. Choosing randomly one neighbor y. Joining x and y by line segment in feature space. The synthetic samples generated are the result of a combination of convex types of two instances x and y.

4 Results In our study we have compared the results obtained after using SMOTE with the previous results. Figure 5 depicts the variation in samples of minority class after resampling the data using SMOTE. Table 2 compares the F1 score of prediction done using SMOTE with the F1 Score obtained previously. For imbalanced data F1 Score gives better evaluation as it is harmonic mean of precision and recall. Also in our study we have compared the accuracy of LR, SVM, DT, and KNN classifiers when coupled with SMOTE with the previously achieved accuracies. Table 3 shows the comparison on the basis of achieved accuracy. We can infer from the obtained values of accuracy that accuracy of SVM classifier has increased on using SMOTE. Figures 6, 7, 8, and 9 depicts the Receiver Operating Characteristic (ROC) curve for KNN, LR, DT, and SVM, respectively. The highest AUC value is obtained by SVM.

Fig. 5 Imbalanced dataset versus balanced dataset

196 Table 2 Comparison of F1 score

S. Sharma and A. Singhal Classifier

Class

F1 score without SMOTE

F1 score with SMOTE

LR

Negative

0.91

0.68

Positive

0.09

0.63

SVM

Negative

0.91

0.87

Positive

0.08

0.84

Negative

0.86

0.77

Positive

0.19

0.71

Negative

0.91

0.84

Positive

0.02

0.84

DT KNN

Table 3 Comparison of accuracy

Fig. 6 ROC curve for KNN

Classifier

Accuracy before SMOTE technique

Accuracy after SMOTE techniques

LR

84

66

SVM

84

86

DT

76

75

KNN

84

84

A Comprehensive Investigation of Machine Learning Algorithms …

197

Fig. 7 ROC curve for LR

Fig. 8 ROC curve for DT

5 Conclusion We have done a study on Framingham Heart Disease dataset in which we have done prediction of risk of developing Coronary Heart Disease in next ten years. The dataset is imbalanced as it has very less instances of positive class. We have used

198

S. Sharma and A. Singhal

Fig. 9 ROC curve for SVM

SMOTE technique to balance the data. For imbalanced dataset measure of F1 Score is important as it is the harmonic mean of two values, i.e., precision and recall. By using SMOTE technique the F1 Score for positive class is increased drastically. Accuracy of SVM is increased on integrating it with SMOTE. Also SVM has obtained highest Area Under ROC Curve (AUC) value, i.e., 0.931. We can evaluate the working of SMOTE on larger dataset that contains more realistic features. The accuracy of classifiers can also be compared with respect to larger dataset. We can also use combination of various feature selection techniques and compare their results.

References 1. Health Stats 2017 (2021) World Health Organization (WHO). Accessed https://www.who.int/ news-room/factsheets/detail/cardiovascular-diseases 2. Khandoker A, Al Zaabi Y, Jelinek H (2019) What can tone and entropy tell us about risk of cardiovascular diseases? In: Proceedings of computing in cardiology conference (CinC), pp 1–4 3. Martin-Isla C, Campello VM, Izquierdo C, Raisi-Estabragh Z, Baeßler B, Petersen SE, Lekadir K (2020) Image-based cardiac diagnosis with machine learning: a review. Front Cardiovasc Med 7 4. Uddin MN, Kumar R (2021) An ensemble method based multilayer dynamic system to predict cardiovascular disease using machine learning approach. Inform Med Unlocked 5. Gao XY, Amin Ali A, Shaban Hassan H, Anwar EM (2021) Improving the accuracy for analyzing heart diseases prediction based on the ensemble method. Complexity 2021. https:// doi.org/10.1155/2021/6663455

A Comprehensive Investigation of Machine Learning Algorithms …

199

6. Kohli PS, Arora S (2018) Application of machine learning in disease prediction. In: Proceedings of 4th international conference on computing communication and automation (ICCCA). IEEE, Greater Noida, pp 1–4 7. Gonsalves AH, Thabtah F, Mohammad RMA, Singh G (2019) Prediction of coronary heart disease using machine learning: an experimental analysis. In: Proceedings of 3rd international conference on deep learning technologies, pp 51–56 8. Enriko IKA, Suryanegara M, Gunawan D (2018) Heart disease diagnosis system with K-nearest neighbors method using real clinical medical records. In: Proceedings of the 4th international conference on frontiers of educational technologies, pp 127–131 9. Rubini PE, Subasini CA, Katharine AV, Kumaresan V, Kumar SG, Nithya TM (2021) A cardiovascular disease prediction using machine learning algorithms. Ann Rom Soc Cell Biol 25(2):904–912. https://www.annalsofrscb.ro/index.php/journal/article/view/1040 10. Mienye ID, Sun Y, Wang Z (2020) An improved ensemble learning approach for the prediction of heart disease risk. Inform Med Unlocked 20 11. Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked 16:100203 12. Gárate-Escamila AK, El Hassani AH, Andrès E (2020) Classification models for heart disease prediction using feature selection and PCA. Inform Med Unlocked 19:100330 13. Haq AU, Li JP, Memon MH, Nazir S, Sun R (2018) A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob Inf Syst 14. Mienye ID, Sun Y, Wang Z (2020) Improved sparse autoencoder based artificial neural network approach for prediction of heart disease. Inform Med Unlocked 18. https://doi.org/10.1016/j. imu.2020.100307 15. Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, Kwak KS (2020) A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf Fusion 63:208–222 16. Dudjak M, Martinović G (2021) An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult. Expert Syst Appl 182:115297 17. Skryjomski P, Krawczyk B, Cano A (2019) Speeding up k-nearest neighbors classifier for large-scale multi-label learning on GPUs. Neurocomputing 354:10–19 18. Magesh G, Swarnalatha P (2020) Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction. Evol Intell 1–11 19. Oberoi A, Chauhan R (2019) Visualizing data using Matplotlib and Seaborn libraries in Python for data science. Int J Sci Res Publ 9(3):8733 20. Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402

Detection of Pathological Myopia from Fundus Images Sarvat Ali and Shital Raut

Abstract Pathological myopia (PM), which results from degenerative changes in the sclera, choroid, and retinal pigment epithelium (RPE), is associated with irreversible vision loss. This study proposes automatically detecting PM or normal vision from input retina fundus image. We have experimented with various transfer learning models and the pre-processing steps using reinforcement learning (RL). The best results were achieved with our custom ResNet50 as a baseline model. It has achieved an AUC score of 0.9984 on the validation dataset provided by the PALM challenge, a Satellite Event of The IEEE International Symposium on Biomedical Imaging in Venice, Italy. This AUC score is among the top 3 performers in this challenge. As in medical domain, more accurate results are always in demand, and this score ensures that the model can be set up for a clinical application in future as a second opinion to ophthalmologists. Keywords Pathological myopia · Transfer learning · Fundus images · Clinical application

1 Introduction High myopia or nearsightedness results when light reflected from the object are focused in front of the retina. As a result of high myopia, some people can develop macular lesions known as myopic maculopathy [1], the presence of which defines PM, which causes uncorrected and irreversible visual impairment [2]. High myopia can lead to other eye diseases like retinal detachment [3], glaucoma [4], etc. The fundus of the eye is the interior surface of the eye opposite the lens and includes the retina, optic disk, macula, fovea, and posterior pole. TL-based models have been S. Ali (B) Visvesvaraya National Institute of Technology, Nagpur, India e-mail: [email protected] S. Raut Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_17

201

202

S. Ali and S. Raut

great performers in the field of medical image analysis as they solve data scarcity problems and are optimal in terms of time and computational complexity [5]. Hence, we have experimented with various augmentation steps and TL-based models that attempt to provide automatic, accurate, and faster classification results. RL [6] helped us to find out best set of augmentation steps to be applied to the input fundus image for optimal learning. Our transfer learning (TL)-based custom ResNet50 aims to classify PM and non-PM images from a given dataset of fundus images in PALM Challenge [7]. The best model can be deployed as a clinical application to detect PM; hence, a precise treatment can be given to patients based on their retina fundus images.

2 Related Work Some related work includes [8] developed a CNN model with an AUC of 0.9845 after pre-processing of fundus images from the dataset [9, 10] developed a ResNet encoder [11] model to generate features from processed images from the same dataset and achieved AUC of 0.993 on the validation dataset. Freire et al. [12] used additional dataset REFUGE [13], RIGA [14], and IDRiD [15] along with [9] and used pretrained model Xception [16] with fine-tuning to detect probability of pathological myopia and achieved AUC score of 0.9987 on validation dataset and 0.995 on test dataset. Yu et al. [17] developed a deep learning system for PM detection using 17,330 color fundus images and used Xception, VGG16, ResNet, and DenseNet201 models to classify PM patients achieving the best AUC score of 0.993 with the Xception model. Devda and Eswari [18] used dataset [9] and deployed the CNN model that consists of three convolutional layers with ReLU activation and three max pooling layers for image feature extraction and two dense hidden layers with one softmax activation function for the final output layer to detect PM with an accuracy of 0.97% on test data. Zhuo et al. [19] used retinal study data obtained from Singapore Cohort Study of the Risk Factors for Myopia (SCORM) [20] and applied the mRMR [21] method, which selects a maximally relevant and minimally redundant set of features for PM detection and concludes that mRMR-based classifiers perform better than SVM and generate a clear set of features. Ding et al. [21] method uses a learning phase where a bag of features is learned. Then local image features learned from fundus images are matched with the first phase to generate optimal features and input to a classification model for PM detection. Kong et al. [22] used an additional dataset like REFUGE [13], RIGA [14] and IDRiD [15], and Messidor [23] along with the dataset [9] and then trained the CNN model to achieve an AUC of 0.993 on the validation dataset. Briskilal et al. [24] used models ResNet50 and DenseNet121 to achieve the accuracy of 95.34 and 98.08, respectively, on the dataset [9]. Pathan et al. [25] designed an ensemble classifier and achieved an accuracy score of 0.95 on the PALM dataset. Our custom ResNet50 has outperformed in terms of AUC score compared to all these studies to detect PM.

Detection of Pathological Myopia from Fundus Images

203

3 Proposed Method We have used a dataset given in PALM challenge [9] to develop a binary classification model that classifies whether the given input retina fundus image belongs to PM or normal vision, as shown in Fig. 1. The dataset consists of 400 training images of size 1444 * 1444 * 3 with 239 pathological myopia images and 161 normal images. Transfer learning (TL)-based model ResNet50 [11], also known as deep residual training networks, solves the problem of vanishing gradient as they have skip connection from the previous layer shown in Fig. 2 to keep training error in check as we stack more layers into the model for better accuracy. We have experimented with different optimizers, activation functions, and image augmentation. For image augmentation, we have a state (S) which refers to baseline ResNet50 model without an augmentation applied, reward function (R) here refers to validation loss and set of actions (A) are pre-defined which includes different combinations of grayscale conversion, CLAHE optimization, rotation by different degrees, horizontal or vertical flip, scale, and blur. Our RL agent learns most optimal augmentation steps which minimizes the reward, and Q values are updated as shown in (1). Final augmentation action having maximum Q value is used in our custom ResNet50. We evaluated AUC when the model was trained from scratch as well as when the model used pretrained ImageNet weights. We have also shown the AUC scores of different fine-tuned architectures. The best results that highlight this study’s novelty were achieved with the following pre-processing steps of input images and fine-tuning of ResNet50 that justifies the technical depth of our study. The input images of size 1444 × 1444 × 3 are converted to grayscale, and contrast limited adaptive histogram equalization (CLAHE) is applied to improve the contrast of the image so that finer details are prominently visible in Fig. 3. Image is then resized to 224 × 224 × 3 for input to the custom ResNet50 model. Data augmentation techniques selected by RL are applied, such

(a) Pathological Myopia Eye Fig. 1 PM versus NPM

(b) Normal Eye

204

S. Ali and S. Raut x Skip connection 1 (Conv+Batch-Norm) F (x) + x x

Layer-1

a(x)

Layer-2

Relu

⊕

add

a( x) Relu

F ( x) x Skip connection 2 (identity)

Fig. 2 Residual connection: upper or lower path is followed based on dimensions

as rotation of 45◦ , random horizontal flip where all pixels of the random image are shifted horizontally, and Gaussian blur that introduces random noise in the image while preserving edges and details for better feature learning by model. Images are normalized to get pixel values from 0 to 1 and reduce skewness in data for optimal model performance. With the application of this data augmentation technique, best set of random transformations are applied at every epoch to the batch of images so that model gets unidentical images at every epoch to generalize better. The preprocessed and transformed images are input to the model architecture shown in Fig. 4, which outputs the probability of PM and normal vision in the input image. The better optmizer between Adam and stochastic gradient descent (SGD) came out as SGD, and we used it with a momentum of 0.9 for fast convergence. The learning rate is kept less initially and increases exponentially by a factor of 0.1 in every third epoch as initially model learns general features, and later, model learns detailed features; hence, learning rate is kept higher for them so that they learn fast. Cross entropy loss (CE), which increases when predicted probability diverges from the actual label, is used as a loss function to train the model. We have also experimented with other models such as DenseNet121, EfficientNetB4, Inception, and VGG19, and their AUC scores are summarized in Table 1. Flow starting from input of the image to final PM probability prediction in our fine-tuned Resnet50 is listed below. Equation 2 calculates the output image size. Q(S, A) = Q(S, A) + α(R) α = 0.4 R = +1 {loss < min-loss} R = −1 {loss > min-loss} R = 0 {loss = min-loss}

(1)

Detection of Pathological Myopia from Fundus Images

(a) Pathological Myopia Eye

205

(b) Normal Eye

Fig. 3 Pre-processed PM versus NPM images Input Image (224x224x1)

Conv1 1 input channel 3 output channels

Linear1 2048 in 1024 out Relu Dropout

1x1x512

Conv2 64 kernels of 7x7, stride = 2

1x1x256

1x1x128

Max Pool 3x3, stride =2

1x1x64

3x3x512

3x3x256

3x3x128

3x3x64

1x1x2048

1x1x1024

1x1x512

1x1x256

Conv5*3

Conv4*6

Conv3*4

Conv2*3

1024 in 2 out Linear2

Fig. 4 Model architecture

1. The input gray scale and CLAHE optimized images of size 224 × 224 × 1 are converted to the size 224 × 224 × 3 using convolution layer as ResNet50 works on 3 input channel images. 2. The input image size of 224 × 224 × 3 after second convolution layer with padding = 3 and stride = 2 and kernel_size = 7

Output Size =

224 + 2 ∗ 3 − 7 + 1 = 112 2

206

S. Ali and S. Raut

3. Batch normalization is done to standardize the input and avoid internal covariate shift that ensures input to each layer which is distributed around the same mean and standard deviation. 4. Max pool operation is performed with kernel_size = 3 × 3, stride = 2 and padding =1 Output Size =

112 + 2 ∗ 1 − 3 + 1 = 56 2

5. Three convolution operations in a bottleneck block are repeated 3 times as shown in Fig. 4 and outputs image size of 56 × 56. Other bottleneck blocks will output the image size as 28 × 28, 14 × 14, and 7 × 7, respectively. Later, there is an adaptive average pooling layer that calculates kernel_ size and stride on its own to give output image size as 1 × 1. We replaced the fully connected layer with the custom linear layer at the end that takes 2048 input features and gives an output of 1024 features with ReLU activation and then a dropout of 0.5, and another linear layer is added that takes input feature map of size 1024 and output feature map of size 2 with LogSoftmax activation which corresponds to PM probability. Sigmoid activation function inplace of LogSoftmax activation gave less AUC score. We have attempted the fine-tuning of fully connected layer of ResNet50 layer with single linear layer and the LogSoftmax activation which resulted in lower AUC score than stacking of 2 linear layers. The architecture is trained for 25 epochs and reports a validation error of 0.14. The AUC score with this architecture came out as 0.9984 with pretrained weights and 0.9910 with all the parameters trained from scratch. The evaluation criteria of the challenge require the probability of PM corresponding to each of the 400 images given in the evaluation dataset. We have attempted to freeze some layers of our fine-tuned ResNet50 model, but it underperformed on our validation data. Our combination of image pre-processing steps and custom ResNet50 performed better than all the experimented models and models proposed in the literature, making it one of the best choices for PM classification and detection. Output Size = (Input + 2 ∗ padding − kernel size)/stride + 1

(2)

4 Results and Conclusion We have achieved an AUC score of 0.9984 on the evaluation dataset in the PALM challenge, which has 400 images of size 1444 × 1444 × 3. This AUC score is best compared to the other state-of-the-art studies, as shown in Table 1. This AUC score results from the optimal combination of image pre-processing and augmentation steps selected using RL, suitable optimizer, suitable loss and activation functions, and fine-tuning the last layer of ResNet50. As illustrated, this ResNet50 model can be deployed as a promising clinical application to detect PM on the input of retina fundus

Detection of Pathological Myopia from Fundus Images Table 1 AUC score comparison Author Method N. Rauf et al. R. Hemelings et al. Cefas Freire et al. L. Lu et al. J. Cui et al. J. Son et al. K. Ananth et al. S. Pathan et al. Proposed method Proposed method Proposed method Proposed method Proposed method Proposed method Proposed method Proposed method Proposed method Proposed method

CNN Transfer learning Xception Transfer learning CNN CNN Transfer learning Ensemble of TL models ResNet50 w/o image pre-processing ResNet50 trained from scratch ResNet50 with adam optimizer ResNet50 with sigmoid activation ResNet50 with 1 linear layer VGG19 Inception EfficieNetB4 DenseNet121 Custom ResNet50

207

AUC score

Accuracy

0.9847 0.9934 0.9957 0.993 – 0.993 – – 0.9961 0.9910 0.9967 0.9970 0.9972 0.9845 0.9943 0.9960 0.9961 0.9984

– – – – 0.97 – 0.98 0.95 – – – – – – – – – –

images. This model architecture can further be customized to classify glaucoma, diabetic retinopathy, and many more retinal diseases. The accuracy and scores of the present model can further be enhanced by using GAN [26] to generate images for data augmentation purposes and then train the model on the combined dataset. More models can be fine-tuned to check if any of them can perform better than our proposed fine-tuned ResNet50 model. In future, we want to attempt to decode the black box nature of our custom ResNet50 using explainable AI [27]. Further research progress could be made to deploy the model on Web or mobile devices for screening purpose and lesion detection in the PM images for assistance to the ophthalmologists.

References 1. Wang YX, Wang S, You QS, Jonas JB, Liu HH, Xu L (2010) Prevalence and progression of myopic retinopathy in Chinese adults: the Beijing eye study. Ophthalmology 117 2. Ohno-Matsui K (2017) What is the fundamental nature of pathologic myopia? Retina 37 3. Flitcroft DI (2012) The complex interactions of retinal, optical and environmental factors in myopia aetiology. Prog Retinal Eye Res 31 4. Montolio FGJ, Jansonius NM, Marcus MW, de Vries MM (2011) Myopia as a risk factor for open-angle glaucoma: a systematic review and meta-analysis. Ophthalmology 118 5. Santhanam N, Kim HE, Cosa-Linan A et al (2022) Transfer learning for medical image classification: a literature review. BMC Med Imaging 69

208

S. Ali and S. Raut

6. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge 7. Sun X, Liao J, Xu Y, Zhang S, Zhang X, Fu H, José FL, Orlando I, Bogunović H (2019) Palm: pathologic myopia challenge 8. Gilani SO, Waris A, Rauf N (2021) Automatic detection of pathological myopia using machine learning. Sci Rep 11:16570 9. Orlando JI, Bogunovic H, Sun X, Liao J, Xu Y, Zhang S, Zhang X, Fu H, Li F (2019) Palm: pathologic myopia challenge. IEEE Dataport 10. Blaschko MB, Jacob J, Stalmans I, De Boever P, Hemelings R, Elen B (2021) Pathological myopia classification with simultaneous lesion segmentation using deep learning. Comput Methods Programs Biomed 199:105920 11. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition 12. Freire CR, Moura JCC, Daniele MS et al (2020) Automatic lesion segmentation and pathological myopia classification in fundus images. arXiv: abs/2002.06382 13. Breda JB, van Keer K, Bathula DR, Diaz-Pinto A, Orlando JI, Fu F et al (2020) Refuge challenge: a unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Med Image Anal 59:101570 14. Almazroa A (2018) Retinal fundus images for glaucoma analysis: the Riga dataset. Deep blue data. University of Michigan 15. Kamble R, Kokare M, Porwal P, Pachade S et al (2018) Indian diabetic retinopathy image dataset (IDRID): a database for diabetic retinopathy screening research. Data 3(3) 16. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition 17. Yu W, Lu L, Zhou E et al (2021) Development of deep learning-based detecting systems for pathologic myopia using retinal fundus images. Commun Biol 4 18. Devda J, Eswari R (2019) Pathological myopia image analysis using deep learning. Procedia Comput Sci 165 19. Zhuo Z, Cheng J, Jiang L et al (2012) Pathological myopia detection from selective fundus image features. In: 2012 7th IEEE conference on industrial electronics and applications (ICIEA), pp 1742–1745 20. Saw SM, Chua WH et al (2002) Nearwork in early onset myopia. Invest Ophthalmol Vis Sci 43 21. Ding C, Peng HC, Long FH (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and minredundancy. IEEE Trans Pattern Anal Mach Intell 27 22. Kong ST, Jung K-H, Son J, Kim J (2021) Leveraging the generalization ability of deep convolutional neural networks for improving classifiers for color fundus photographs. Appl Sci 591 23. Decencière E, Zhang X et al (2014) Feedback on a publicly distributed image database: the Messidor database. Image Anal Stereology 33(3):231–234 24. Briskilal J, Kalyanasundaram A, Prabhakaran S, Senthil Kumar D (2020) Detection of pathological myopia using convolutional neural network. Int J Psychosoc Rehabil 24 25. Pathan S, Siddalingaswamy PC et al (2020) Automated detection of pathological and nonpathological myopia using retinal features and dynamic ensemble of classifiers. Telecommun Radio Eng 79 26. Hung S-K, Gan JQ (2021) Augmentation of small training data using GANs for enhancing the performance of image classification. In: 2020 25th international conference on pattern recognition (ICPR), pp 3350–3356 27. Amina A, Mohammed B (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6:52138–52160

Load Frequency Control of Single and Multi-area Power Systems Based on ADRC Ovais Farooq, Suhail Ahmad Suhail, and M. A. Bazaz

Abstract The objectives of load frequency control (LFC) are to keep the frequency constant and to control the change in tie line power between control areas. PID controllers tuned by optimization algorithms have been used for LFC purposes to improve dynamic performance. However, a PID controller has specific inherent challenges and drawbacks. Towards this end, as a novel control method, active disturbance rejection control (ADRC) rectifies the issues in PID. It requires significantly less information from the plant model and is resistant to external perturbations and modelling uncertainties. This work presents ADRC-based Load frequency control of single and multi-area power systems. Complexities such as nonlinearities in power systems are also considered. For multi-area power systems, genetic algorithm is used for tuning ADRC parameters. GA-based ADRC is applied to the IEEE 39 bus system, divided into three areas for LFC studies. Electric vehicle fleets are added to the IEEE 39 bus system. ADRC proves to be a very efficient control strategy. Keywords Active disturbance rejection control (ADRC) · Load frequency control (LFC) · Vehicle to grid (V2G) · Genetic algorithm (GA) · Generation rate constraint (GRC) · Governor dead band (GDB)

1 Introduction Large-scale power systems usually involve regions or control areas that constitute coherent clusters of generators. Tie lines link various control areas. Tie lines are used for energy exchange between areas as per contract conditions and provide support to adjacent areas during abnormal conditions. Changes in area load cause mismatches in frequency and planned power interchanges between areas [1]. Load frequency control (LFC) refers to the active power and frequency control in power O. Farooq · S. Ahmad Suhail (B) · M. A. Bazaz Department of Electrical Engineering, NIT Srinagar, Srinagar, India e-mail: [email protected] M. A. Bazaz e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_18

209

210

O. Farooq et al.

systems. The fundamental goal of LFC is to maintain constant frequency and power interchange between areas against randomly fluctuating active power loads. The integral of the control error is used as the control signal in the traditional control approach for the LFC problem. Although an integral controller offers zero steady-state frequency deviation, it leads to large settling time and gives large frequency deviation [2]. Similarly conventional PI and PID controllers have been used for the LFC problem. These controllers have been tuned in literature both by trial and error approach and using certain optimisation algorithms. Authors in [3] have used jelly fish search algorithm-based tilted integral derivative (TID) controller for LFC problem. In [4], a modified structure of the TID controller is presented which is finetuned using Archimedes optimization algorithm (AOA). Application of optimisation algorithms has lead to better dynamic performance. Further, electric vehicles have been added to power system at an increasing pace. Vehicle to grid (V2G) has been instrumental in improving the performance of LFC. The performance improvement has been depicted by authors in [5]. However, a conventional PID controller suffers from various drawbacks which include: (1) error computation. (2) Noise amplification in the derivative part. (3) The control law in the form of a linear weighted sum is oversimplified. (4) Problems due to integral control action [8]. ADRC was introduced by Han in 1998. It is largely model independent and shows very good robustness. The controller was modified by Gao who used linear gains in place of their nonlinear counterparts, thus giving rise to linear active disturbance rejection controller (LADRC) [9]. Authors in [10] have shown that ADRC outperforms PID by taking various simulation studies in generic first and second order systems with and without nonlinearities. In [11], ADRC is used as a substitute for both PID and model-based control techniques for position control of a DC motor. In the present study, a 2nd order LADRC has been used for load frequency control of various power system configurations starting with a single area power system with and without nonlinearities. The controller is tuned based on bandwidth parameterization proposed by Gao in [9]. For use in multi-area power system, the controllers have been tuned using genetic algorithm. The scheme is verified on IEEE 39 bus system (New England 10 machine test system) for validation on a more pragmatic power system application. The rest of the paper is organised as follows: Sect. 2 presents the basic description of ADRC and depicts a second order ADRC. Power system model for LFC problem is provided in Sect. 3. Testing and comparison of the control scheme are done in Sect. 4. Final conclusions are drawn in Sect. 5.

Load Frequency Control of Single …

211

2 Active Disturbance Rejection Control (ADRC) ADRC considers the internal dynamics as well as external disturbances of a plant as total disturbance and rejects the latter to obtain a simplified plant that is easy to control. The estimation of states as well as total disturbance used for control action is done by an extended state observer (ESO). As depicted in Fig. 1, ADRC consists of four components: (1) A tracking differentiator (TD) which constructs a transient profile which the plant can follow. (2) Nonlinear combination of errors is used as control law. (3) Extended state observer (ESO) is used to estimate uncertainities in real time. (4) Rejection of total disturbance (comprising of process dynamics and external disturbance) so that the plant is effectively controlled [8]. In its original form, ADRC had nonlinear gains and was challenging to analyse and tune. Gao transformed it into the LADRC, which utilised linear gains and had fewer tuning variables. (a) Second Order ADRC: In case of a second order ADRC, an extended state observer(ESO) as the name implies measures three states of the plant to be controlled. Estimation of the first two states leads to a PD type control action, and the third state gives parameter uncertainty in the plant as well as any external disturbances, which is then compensated for. The topology of a second order ADRC is shown in Fig. 2. rsp is the desired trajectory, and d is the external disturbance. ESO estimates the parametric uncertainities in the plant along with the external disturbance d. For this purpose, the ESO recieves the control input u

r

Tracking Differenator

r1

-

r2

e1

Nonlinear Weighted Sum

e2

-

u0

-

u

1/B

Plant

y

B x1

x3

x2

ESO Fig. 1 ADRC topology d rsp

-

u0

Kp

-

1/B

-

u

Kd x1

x3 x2

ESO

Fig. 2 Structure of second order ADRC

Plant

y

212

O. Farooq et al.

and the plant output y. The ESO gives us the following: xˆ˙1 = x2 + β1 (y − xˆ1 ) x˙ˆ2 = x3 + β2 (y − xˆ1 ) + Bu x˙ˆ3 = β3 (y − xˆ1 )

(1)

Here, β1 , β2 , and β3 are observer parameters. Considering the plant can be approximated by the second order differential equation: y¨ = f (y, y˙ , t, u) + Bu (2) If the parameters β1 , β2 , and β3 in Eq. (1) are finely tuned, xˆ1 , xˆ2 , and xˆ3 will track y, y˙ , and f . The function f represents the total disturbance which include internal parametric uncertainity and external disturbance. This function f is estimated by the ESO, and this makes ADRC largely model independent since f is estimated and compensated by control law. The control law in a second order ADRC is u 0 = K p (rsp − xˆ1 ) − K d xˆ2 u=

u 0 − xˆ3 B

(3) (4)

Now if xˆ3 tracks f , Putting Eq. (4) in (2),the process is reduced to a double integrator form as (5) y¨ = f + u 0 − xˆ3 ≈ u 0 (b) Tuning Method: There are three control and observer parameters in a second order ADRC. The control parameters are B, K p , and K d , and observer parameters are β1 , β2 , and β3 . Gao proposed a tuning method for second order ADRC based on bandwidth parameterization in [9]. Some parameters in this method are tuned by trial and error. (c) Genetic Algorithm: In the present study, GA is used for tuning ADRC for complex power system applications, i.e. for multi-area power systems. Integral time absolute error (ITAE) corresponding to area control error is taken as fitness function. The type of selection used is tournament selection, and simulated binary crossover(SBX) is used amongst various crossover techniques. Mutation used is polymutation.

Load Frequency Control of Single …

213

3 Power System Model for Load Frequency Control As the power system is subjected to small disturbances in load during normal operation, a linearized model of the operating condition can be derived for the LFC problem. LFC model has been widely studied in literature as in [12]. Lumped model for generators is used for various generators present in an area. Various components of a single area power system are governor, turbine, generator, and load. Governor is modelled as 1 (6) CG (s) = TG s + 1 Non-reheat turbine model is represented as CT (s) =

1 TT s + 1

(7)

The following model is employed to simulate load and machine dynamics: CP (s) =

KP TP s + 1

(8)

The denotation for various power system parameters used in this paper and their values are presented in Table 1. A range of values is considered so as to verify the scheme for robustness. (a) Power system model for single area: Linearized model of single area power system is depicted in Fig. 3. Here, our objective is to make the frequency deviation equal to zero each time, and there is a load disturbance by providing an appropriate control input u which in our case will be given by the ADRC. (b) Power system model for multiple area: LFC framework for multi-area power system can be depicted in the same way as for single area power system where ith control area is akin to single area power system as shown in Fig. 4. Here

Table 1 Denotation of power system parameters Symbol Description PL Kp Tp TT TG R

Load disturbance Generator gain Generator and load time constant Non-reheat turbine time constant Governor time constant Speed droop coefficient

Value 0.01 p.u. [60, 180] [10, 30] [0.15, 0.45] [0.04, 0.12] [1.2, 3.6]

214

O. Farooq et al.

Speed Droop Coefficient

1/R B

ΔPL

ADRC

CG(s)

u

ΔPV

Governor

CT(s)

ΔPM

-

Δf

CP(s)

Turbine

Generator

Fig. 3 Block diagram of single area power system

1/Ri

Bi

ΔPL

u ADRC

CGi(s)

ΔPVi

Δpe,i

CTi(s)

-

ΔPMi Δpe,i

Δfi

CPi(s) Tij

1/s

-

Tie line flow to other areas

Δfj j≠i

Fig. 4 ith control area in a multi-area power system ΔPv

-

1/sTt

ΔPm

Fig. 5 Turbine model with GRC

apart from frequency deviation, tie line error should also be made zero. Since frequency and tie line power are the two parameters of importance, their linear summation which is termed as the area control error (ACE) is used as feedback signal for load frequency control. The controller signal u is such that it makes ACE go to zero with good dynamic characteristics. (c) LFC with nonlinearities: Numerous components in a power system have nonlinear characteristics. The modelling of these components involves saturation, dead zone, and other nonlinear characteristics. Power generation in thermal power plants can only alter at a certain maximum rate, termed as “generation rate constraint” (GRC). It is a critical constraint that has an impact on the dynamics of the power system. When GRC is taken into account, the system may exhibit significant frequency deviation, protracted settling times, etc. GRC is modelled by a saturation block as shown in Fig. 5. The extent of a persistent speed fluctuation within which there is no observable

Load Frequency Control of Single …

215 1/R

u

-

CG(s)

ΔPv

Fig. 6 Governor model with dead band

change in the position of the governor regulated valves is defined as the governor dead band (GDB). Generally, the GDB results in a steady sinusoidal oscillation. GDB is modelled as a dead-band block as in Fig. 6.

4 Simulation Results In this section, MATLAB simulations of various power system networks are shown with the proposed control strategy. GA is used for tuning ADRC for application in multi-area power system. Each area has a rating of 2000 MW operating 60 Hz.

4.1 Single Area Power System To verify the scheme for robustness, three models which differ in the values of parameters are considered. When lower values are selected from Table 1, the model so obtained is called lower model. The model obtained when upper values are used is termed as upper model. Finally, the nominal model is got when the model is described using the average of the higher and lower values of the parameter range [7]. A step load disturbance of magnitude 0.01 p.u. is given at time t = 2 s. The response obtained with ADRC is compared with internal model control (IMC)-based PID in [6] and direct synthesis approach-based PID in [13]. From Fig. 7, it is seen that ADRC gives less settling time than other schemes in the nominal as well as lower and upper models. Therefore, the scheme gives good dynamic performance and shows good robustness. To compare the performance of the schemes, performance index viz integral time absolute error(ITAE) is used which is defined as ∞ t| f (t)| dt

ITAE = 0

The values of ITAE are given in Table 2. It is seen that values of ITAE are minimum for ADRC scheme.

216

O. Farooq et al.

Fig. 7 Frequency deviation for a nominal, b upper, and c lower models of single area power system

Table 2 ITAE for single area power system ITAE Nominal ADRC IMC-based PID DSA-based PID

0.0059 0.0460 0.0102

Upper

Lower

0.0129 0.0583 0.0387

0.0067 0.0593 0.0110

Load Frequency Control of Single …

217

Fig. 8 Frequency deviation with ADRC considering nonlinearities in power system

4.2 Single Area Power System with Nonlinearities GRC equal to 0.1 p.u. per minute is considered, i.e. Pm ≤ 0.1 p.u./min = 0.0017 p.u./s [6]. GDB is taken equal to 0.036 Hz. The performance of ADRC in this system is shown in Fig. 8. ADRC is able to perform well in presence of GDB and GRC.

4.3 IEEE 39 Bus System A network with a structure equivalent to the standard IEEE 39 bus system is chosen as the test system in order to examine the performance of ADRC control approach with a more pragmatic power system topology [12]. This frequently studied system is made up of ten generators, nineteen loads, 34 transmission lines, and twelve transformers. To illustrate frequency control in the system, the system is divided into three control areas. Only one generator in each area is considered responsible for LFC task. Electric vehicles are used in all three areas for vehicle to grid facility. EV model is taken from [5]. Area 1 is given step load disturbance of magnitude 0.038 p.u. at time t = 1 s, Area 2 is given step load disturbance of magnitude 0.064 p.u at time t = 50 s and Area 3 is given load disturbance of 0.043 p.u at time t = 90 s. Comparison is done between frequency and tie line power error responses with and without electric vehicles. In each area, only one generator is responsible for LFC assignment viz G2 in Area 1, G6 in Area 2, and G8 in Area 3. The parameters of the system are given in [14]. Generator dynamics are depicted as CG (s) =

1 2H s + D

(9)

where H is inertia constant(s) and D is load damping constant (p.u./Hz). Main parameters of the system are given in Table 3. Nonlinearities and other factors are considered as given in [7].

218

O. Farooq et al.

Table 3 IEEE 39 bus system three area parameters Parameters H D TG G2(Area 1) G6(Area 2) G8(Area 3)

35.8 26.4 34.5

10 8 14

0.15 0.1 0.1

TT

R

Bi

0.3 0.3 0.3

0.05 0.05 0.05

30 28 34

(a) Frequency deviation in Area 1 with and without EV

(b) Frequency deviation in Area 2 with and without EV

(c) Frequency deviation in Area 3 with and without EV Fig. 9 Comparison of frequency deviations of IEEE 39 bus system with and without EV

Frequency deviations of all the three areas with and without EV’s are shown in Fig. 9. Net tie line power deviations of the three areas with and without EV are shown in Fig. 10. It can be observed from Figs. 9 and 10 that GA-based ADRC is able to bring the frequency and tie line power deviations to zero in very less time and does not

Load Frequency Control of Single …

219

(a) Area 1 Net tie line flow deviation with and without EV

(b) Area 2 Net tie line flow deviation with and without EV

(c) Area 3 Net tie line flow deviation with and without EV Fig. 10 Comparison of tie line power deviation of IEEE 39 bus system with and without EV

lead to large overshoot/undershoot. Thus, the proposed control method gives good performance in the system. Further comparison is done between results obtained with and without EVs. It can be observed from Figs. 9 and 10 that the performance of the system with EV’s is better than system without EVs. Comparison between the two performances is done using performance index integral time absolute error(ITAE). The comparison is shown in Table 4. From the ITAE values, it can be observed that with use of electric vehicles, system performance improves substantially.

220

O. Farooq et al.

Table 4 Performance comparison using ITAE GA-based linear ADRC performance in IEEE 39 bus system ITAE Frequency deviation in Area 1 Frequency deviation in Area 2 Frequency deviation in Area 3 Area 1 tie line power deviation Area 2 tie line power deviation Area 3 tie line power deviation

Without EV 0.1180 0.1427 0.1272 3.6710 5.6400 4.7270

With EV 0.0677 0.0786 0.0715 2.0710 2.9940 2.7270

5 Conclusion This paper depicts performance of a second order Linear ADRC on power system applications. The parameters of ADRC are tuned manually for single area power system and using genetic algorithm for multi-area case, i.e. the IEEE 39 bus system. Simulation results prove that ADRC scheme is able to give good performance. Further, EVs are included in IEEE 39 bus system for V2G facility, and simulation results verify the better performance with EVs in place.

References 1. Jaleeli N, VanSlyck LS, Ewart DN et al (1992) Understanding automatic generation control. IEEE Trans Power Syst 7:1106–1122. https://doi.org/10.1109/59.207324 2. Elgerd OI, Happ HH (1972) Electric energy systems theory: an introduction. IEEE Trans Syst Man Cybern SMC-2:296–297. https://doi.org/10.1109/tsmc.1972.4309116 3. Shubham, Roy SP, Mehta RK, et al (2022) A novel application of jellyfish search optimisation tuned dual-stage (1+PI) TID controller for microgrid employing electric vehicle. Int J Ambient Energy 43:8408–8427. https://doi.org/10.1080/01430750.2022.2097952 4. Ahmed M, Magdy G, Khamies M, Kamel S (2022) Modified TID controller for load frequency control of a two-area interconnected diverse-unit power system. Int J Electr Power Energy Syst 135:107528. https://doi.org/10.1016/j.ijepes.2021.107528 5. Debbarma S, Dutta A (2017) Utilizing electric vehicles for LFC in restructured power systems using fractional order controller. IEEE Trans Smart Grid 8:2554–2564. https://doi.org/10.1109/ tsg.2016.2527821 6. Tan Wen (2010) Unified tuning of PID load frequency controller for power systems via IMC. IEEE Trans Power Syst 25:341–350. https://doi.org/10.1109/tpwrs.2009.2036463 7. Saxena S, Hote YV (2017) Stabilization of perturbed system via IMC: an application to load frequency control. Control Eng Pract 64:61–73. https://doi.org/10.1016/j.conengprac.2017.04. 002 8. Han J (2009) From PID to active disturbance rejection control. IEEE Trans Indus Electron 56:900–906. https://doi.org/10.1109/tie.2008.2011621 9. Gao Z (2003) Scaling and bandwidth-parameterization based controller tuning. In: Proceedings of the 2003 American control conference, 2003. https://doi.org/10.1109/acc.2003.1242516 10. Herbst G (2013) A simulative study on active disturbance rejection control (ADRC) as a control tool for practitioners. Electronics 2:246–279. https://doi.org/10.3390/electronics2030246

Load Frequency Control of Single …

221

11. Suhail SA, Bazaz MA, Hussain S (2019) Active disturbance rejection control applied to a DC motor for position control. Proc ICETIT 2019:437–448. https://doi.org/10.1007/978-3-03030577-2-38 12. Bevrani H (2014) Robust power system frequency control, vol 4. Springer, New York 13. Anwar MN, Pan S (2015) A new PID load frequency controller design method in frequency domain through direct synthesis approach. Int J Electr Power Energy Syst 67:560–569. https:// doi.org/10.1016/j.ijepes.2014.12.024 14. Fernando T, Emami K, Yu S et al (2016) A novel quasi-decentralized functional observer approach to LFC of Interconnected Power Systems. In: 2016 IEEE power and energy society general meeting (PESGM). https://doi.org/10.1109/pesgm.2016.7741145

Pest Detection and Identification in Infested Plants Using Digital Images in Agriculture Monica Shinde, Kavita Suryavanshi, and Dhiraj Kumar Kadam

Abstract This paper presents a survey on review articles and research articles that use digital image processing, DNA abstraction, CNN, ANN, etc., techniques to detect, quantify, and classify pest infested plants. Although infestation symptoms can affect any part of the plant, only identification of pest and pest infested plant leaves is explored. This is done for two main reasons: to limit the length of the paper and because pest infestation deals with leaves, fruits, and complete productivity of plot. The selected proposals are focused on detection, identification, quantification, and classification. This paper is expected to be useful to researchers working on pest management, plant infestation, and pest detection, providing overview of this important field of research. Keywords Pest detection · Pest infestation · Pest management · Digital image processing

1 Introduction Agriculture has become much more than simply a means to feed ever growing populations. Several diseases are affecting plants with the potential to cause upsetting economic, social, and ecological losses. Insects or pests have a major role in plant diseases and infestation. Insect pests are well-known to be the most threatened to M. Shinde (B) D. Y. Patil Institute of MCA and Management, Savitribai Phule Pune University, Akurdi, Pune, Maharashtra, India e-mail: [email protected] K. Suryavanshi MCA Department, D. Y. Patil Institute of MCA and Management, Savitribai Phule Pune University, Akurdi, Pune, Maharashtra, India D. K. Kadam Entomology Department, Vasantrao Naik Marathwada Krishi Vidyapeeth, Parbhani, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_19

223

224

M. Shinde et al.

crops and agricultural products. Insect identification is needed for early pest forecasting to prevent further crop damage. In this context, diagnosing diseases and identifying insect pests in an accurate and timely way are of the utmost importance. Some pests do not have any visible symptoms associated, or those appear only when it is too late to act. In those cases, early detection and identification are the key for pest management. There are several ways to avoid these problems. Most pests, however, generate some kind of manifestation in the visible spectrum. In the vast majority of cases, the diagnosis, or at least a first guess about the pest or infestation, is performed visually by humans. Trained raters may be efficient in recognizing and quantifying pest infestation; however, they have associated some disadvantages that may harm the efforts in many cases [1]. Bock et al. list some of those disadvantages: • • • • •

Raters may tire and lose concentration, thus decreasing their accuracy. There can be substantial inter- and intra-rater variability (subjectivity). There is a need to develop standard area diagrams to aid assessment. Training may need to be repeated to maintain quality. Raters are expensive. Visual rating can be destructive if samples are collected in the field for assessment later in the laboratory. • Raters are prone to various illusions (for example, lesion number/size and area infected). Besides those disadvantages, it is important to consider that some crops may extend for extremely large areas, making monitoring a challenging task. Manually identifying insect pests on a small as well as large farm by expert human resources is time-consuming and expensive. Nowadays, following the popularity of high quality image capture devices and the achievements of machine learning in pattern recognition, an automated image-based insect pest recognition system is promising to reduce the labor cost and do this task more efficiently. There have been multiple difficulties in extracting useful features for insect classification problems using images. It is challenging to derive discriminative features from the insect image for classification since there are many pest species and variants of their size and shapes. Depending on the application, many of those problems may be solved, or at least reduced, by the use of digital images combined with some kind of image processing and, in some cases, pattern recognition and automatic classification tools. Many systems have been proposed in the last years, and this paper tries to organize and present those in a meaningful and useful way, as will be seen in the next section. This paper proposes several methods to handle the problems mentioned above in the insect detection, identification, quantification, and classification of pest as well as infested plants. Computer vision techniques, amplification techniques, IoT-based approaches, UAV-based technology, AI, smartphone-based technologies, machine learning, digital image processing, CNN model, neural networks, a multi-scale learning method and many more computer-based and DNA abstraction, pheromone trap method, rapid identification, morphological and molecular identification, occurrence and molecular identification, etc., field-based manual techniques and technologies are discussed shortly through review articles.

Pest Detection and Identification in Infested Plants Using Digital Images …

225

The rest of the paper can be organized as follows. We briefly present all related works in the insect detection, identification, quantification, and classification of pest as well as infested plants problem in Sect. 2 and our proposed system in Sect. 3. The paper ends with our conclusion.

2 Related Works Literature reviews are based on both sides (1) image-based insect identification and classification and (2) trap-based insect identification and classification. Chiwamba et al. [2] proposed a method where an automated application can identify and capture pest moths in the field. Machine learning techniques were used for the same. They discussed different literature reviews based on pheromone traps, object detection, and recognition to reveal the importance of the proposed model. CNN and motion sensors both were used in methodology for the proposed model. Ishengoma et al. [3] propose a combined unmanned aerial vehicle (UAV) technology with autonomously captured plant leaves and a hybrid CNN model to speed up the detection of pests on infested plant leaves. A hybrid CNN model is based on a parallel structure specifically designed to take advantage of the benefits of both individual models, namely VGG16 and InceptionV3. This study compares the performance of the proposed model in terms of accuracy and training time to four existing CNN models, namely VGG16, InceptionV3, XceptionNet, and ResNet50. Their results show that compared to existing models, the proposed hybrid model reduces the training time by 16–44% compared to other models while exhibiting the most superior accuracy of 96.98%. They proposed a framework for a quicker detection of infested plants using UAV-based images and CNN as a key component. Agarwal et al. [4] discussed LAMP assay for rapid identification of most invasive pests. These diagnostic tests are for identification of pests so that farmers can assist in pest management. They have discussed the TREE of samples of pest, DNA extraction, and sequencing in all stages of pest. The huge amounts of data were processed in order for rapid identification. Timilsena et al. [5] presented a paper on potential distribution of pests in Africa and beyond, considering climate change and irrigation patterns. They used the existing CLIMEX model to assess the pest. Considering the risk and threat to plants from the pest, study of climate changes and irrigation pattern on plants to reduce pest infestation was held. Congdon et al. [6] presented in-field capable loop-mediated isothermal amplification for detection of pests to avoid the loss of farmers and will ensure that correct management decisions are taking place in order to control the pests. This is in-field identification which uses specimen collection, total DNA, and crude extraction and LAMP assay development as materials and methods which results in most accurate detection of pests.

226

M. Shinde et al.

Sattar et al. [7] presented network architecture to deploy e-services-based smart agro systems. They discussed climate change effects and food production rate. This proposed model includes a mobile application that enables the client to manage watering power. The mobile application assists farmers in scheduling the farm’s irrigation system as parameters, moisture contents, pressure, temperature, and motion which are used. This paper focuses on how IoT things can be used for agricultural things. Yainna et al. [8] presented a paper on geographic monitoring of insecticide resistance mutations in pests. The field where the pests are damaging the wide range of crops has been detected and monitored. Their information will be helpful in investigating the cause and consequence associated with insecticide resistance where the pests are damaging the crops. Tessnow et al. [9] discussed species which are composed of two morphologically identical but genetically distinct host strains known as the corn and rice strains, which can complicate pest management approaches. They used novel real-time PCR-based assays for differentiating the pest strains using four single nucleotide polymorphisms so that we can identify pests at an early stage with all the information that could affect the plant due to the pest infestation. Prabha et al. [10] presented a convolutional neural network model system to identify pest infestation on maize. They presented an artificial intelligence powered expert system model for identifying the pest infestation in maize. Modern technologies were used for object identification. Image acquisition, classification, pre-processing, data augmentation, feature extraction, and image analysis were used. As background, they used CNN architecture. Every layer of the convolutional neural network was used to develop the system which will automatically identify the pest infestation on plants. Ishengoma et al. [11] presented an automatic recognition algorithm model to detect plant leaves that have been infected by pests. This study aims to increase yield and profit to farmers while reducing input cost and time. Different CNN models were used for investigation, named VGG16, VGG19, InceptionV3, and MobileNetV2. To capture plant leaves, unmanned aerial vehicle (UAV) remote-sensing technologies were used, and for simulation, Shi–Tomasi corner detection techniques were used. Images used were original and modified both. As a result, they improved performance of the model by giving accuracy percentage. Percentage of model accuracy of VGG16, VGG19, InceptionV3, and MobilenetV2 increased from 96%, 93.08%, 96.75%, and 98.25% to 99.92%, 99.67%, 100%, and 100%, respectively. Yousaf et al. [12] presented paper on occurrence and molecular identification of pest. The survey is from Sindh, Pakistan, and Khyber Pakhtunkhwa. They observed the maximum impact of pest infestation on plant leaves which result in loss of products. They used COI gene sequences as a marker to pest’s infestation. DNA extraction and PCR amplification are also used for molecular identification. Gomes et al. [13] presented a workflow in their study which has been implemented for detecting pest attack of pest in cotton plants with the help of machine learning and spectral measurement. They experimented data acquisition and organization and results of counted wavelength were submitted to different machine learning models. Machine learning models were compared by measuring robustness and applying a

Pest Detection and Identification in Infested Plants Using Digital Images …

227

ranking approach with clustering methods. To indicate the expected spectral behavior, band simulation process used in theoretical model application. This results in the affected cotton plant ranking and clustering approach. Pearson et al. [14] submitted non-confidential technical reports at Rothamsted Research. Their project was designed to test three different technologies for monitoring pest over 12 month period in Kenya particularly. This project includes the implementation of preliminary radar data, a network of digital pheromone traps, and image detection algorithm to identify the particular pest. This project is clearly a proof of the importance of early detection of pest and also clearly demonstrates the potential of digital monitoring to deliver major impacts for farmers. Chamara et al. [15] presented a paper to focus the recent advancements in computer science with agriculture especially in the global food security area. This paper identified the use of AI in the area of food availability. They have shared a table where many applications are directly related with the objective of improving food production. While focusing on the role of AI in global food security, they also focused the review from the papers where ANN models are capable of capturing other variables that are impactful on agricultural production like plant infestation due to pests. Sah et al. [16] reviewed of the available information and literatures with the initiatives taken by governmental and non-governmental organizations on identifying the threats of a single pest and its management in Nepal, which is focusing on how a single pest can harm tremendously to plants and automatically to farmer society so it is important to manage the solutions to the infestation by pest on plants on time. Mrisho et al. [17] discuss the development of smartphone-based technologies for diagnosis of disease and pest damage on a plant which requires input from experts who understand the phenotypes of the diseases and pests. Their study examined the expertise of the experts who generated datasets used for the development of an object detection model known as PlantVillage Nuru. Nuru was developed as a diagnosis and training tool. The symptom recognition capability of PlantVillage Nuru was compared to that of the experts and its intended users so as to determine the effectiveness of the app. This app is designed to control pest infestation on time by digital pest management, i.e., IoT. Mahat et al. [18] reported how some pest feeds on more than 300 plant species. They have used a reference dataset to analyze the sample of pest. Sample identification and DNA barcoding were used to detect the pest and identify them correctly. Huge loss due to the pest infestation was discussed in different plant sequences. Tsai et al. [19] presented a paper on rapid identification of invasive pest using species-specific primers in multiplex PCR. Sample collection, DNA extraction, amplification, sequencing, sequence analysis, and design of primers and multiplex PCR methods were used. Study was held to prevent severe cost damage across different areas from all over the world. Gharte Sneha and Bagal [20] discussed the importance of detection of plant disease using image processing techniques. More of the reasons for plant diseases are bacteria, fungi, and viruses which somewhere correlates with pests. They concluded

228

M. Shinde et al.

that an automatic detection of plant disease is necessary. Computer vision techniques can be used to uncover the affected spots from the image through an image processing technique capable of recognizing the plant lesion options. They proposed a vision-based algorithm for detecting plant defects. Chulu et al. [21] this paper proposes a machine learning-based system which will automatically identify and monitor the pest. This study aimed at a data collection process so that pests or moths can be identified at its early stage. This will help farmers to prevent field production loss. They used pheromone traps to capture the pest, and collected dataset was used in object detection. ANN layers were used for developing Web-based model. They used image identification and classification to identify the object, i.e., pest. The more images they have the more accuracy they will provide. Chiwamba et al. [22] presented machine learning algorithms for automated image capture and identification of pest. They used the transfer learning technique to retrain the InceptionV3 model in TensorFlow on the insect dataset, which reduces the training time and improves the accuracy of insect identification. They used two (2) classes to identify the insects. Some experiments were held on the dataset to calculate train accuracy, cross entropy, and validation accuracy. As a result, they managed to achieve a train accuracy of 45–60%, cross entropy of 70–80%, and validation accuracy of 34–50%. They used the InceptionV3 model in TensorFlow for the dataset. Jing et al. [23] paper studied initial detection and spread of pest in China. Also they discussed the comparison with other same kinds of pest in the fields. Molecular techniques were used for this. They studied initial detection and further detection of pests and compared them. Sequencing and identification method were followed by the author. They studied the most harmful pest which can damage over 80 species. Chulu et al. [24] presented a paper on identification and classification of pest using convolutional neural network. They used the TensorFlow deep learning framework for identification and classification of pests. This model was proposed with the help of different layers of CNN and Google’s pre-trained InceptionV3 model. They show the possibility of automation of identification and classification of pest to prevent the harm. Chiwamba et al. [25] propose automated pest pheromone trap based on machine learning. They studied that in recent years, pests are infecting more on some plant leaves, so controlling these pests are essential for farmers to reduce the yield loss. They used IoT architecture and Google’s pre-trained InceptionV3 machine learning model for image classification. By adopting this smart technology, the labor work of monitoring will reduce for farmers. This will be the major contributor in the agriculture sector. Bhavani et al. [26] presented a paper on morphological and molecular identification of harmful pests on sugarcane in Andhra Pradesh, India. They surveyed different experimental plots with maximum incidences recorded of young plants which were infested. The pest needs to be identified at an early point so that the cure can be

Pest Detection and Identification in Infested Plants Using Digital Images …

229

applied on a quick basis. They used morphological identification, molecular identification, DNA extraction, and PCR amplification to get the results for characterization of particular pests. Ateya [27] submitted an IoT-based approach to identify pests on crops. They have given a prediction model to identify the pest. IoT system components, ANN model, and server side application are used as components of this prediction model. Looking at the importance of pest detection, this project especially focuses on issuing a warning message as well as recommendations to farmers on the best course of action. Chulu et al. [28] propose a study for fully automated pest identification and early warning tools based on ANN techniques. This tool is to improve the accuracy and efficiency of pest monitoring. Also it reduces the manual data collection and so the aspect of human intervention. They proposed the architecture where cellular network connectivity can detect and identify pests at its early stage also. Hetzroni et al. [29] presented an algorithm which tried to detect zinc, iron, and nitrogen deficits on the plants by the use of neural networks which determines the condition of the plant. Their methodology aims to monitor the health of plants by observing lettuce leaves. Discussed algorithm works by converting the analog image which is captured by analog camera to digital image. Then, that image is segmented into background and leaf in the first phase of their algorithm. After extraction of required features of the image, these extracted parameters are fed finally into the analysis phase made of neural networks and statistical classifiers. Vibhute and Bodhe [30] discussed a survey of different applications of image processing in the agriculture field such as imaging techniques, weed detection, and fruit grading. This survey included study of 38 papers and discussed how different image processing techniques can be applied to plant’s images. Studying the given literature, it is concluded that the image processing technique has been proved as an effective machine vision system for agriculture domain which can help in identifying the pest on plants. Al Bashish et al. [31] propose a method to detect 5 diverse plant diseases by extracting a number of texture and color features. K-means crowding algorithms were issued to divide the image into 4 clusters. Lewter and Szalanski [32] discussed the most dangerous pest for plants from the US. This pest is economically more harmful for productivity. They studied different patterns, fragments, and DNA of pests to identify and detect pests to avoid more infestation. Pheromone techniques were used for the same. As a result and discussions, they have given 6 different species of harmful pests in the US. Sena Jr et al. [33] presented a paper on pest infested plant identification using digital images. Two stages algorithm was proposed, named the processing and the image analysis. The algorithm first processes the image to get the binary image, which was first converted to grayscale image. At this stage, infested leaves were segmented from other leaves with the help of pixels. At the second stage, the images were subdivided into blocks then classified as ‘damaged’ and ‘non-damaged’ leaves. This classification is dependent upon the number of objects found in each block.

230

M. Shinde et al.

Accuracy of this algorithm was 94.72%. The objective of this precision agriculture is profit maximization and deduction of environmental damage. Sena et al. [34] develop a method for detecting diseases on plant leafs; it uses a preset threshold value. This method is divided into two sections: image processing and image analyzing. First image is transformed to grayscale, filtered, and thresholded to remove noise. The image is then divided into twelve blocks at the analysis stage of their algorithm, and blocks with leaves less than 5% with respect to the total area are thrown away. The number of connected objects (n) signifying the diseased areas are totaled for each remaining block. Levy et al. [35] developed a simple method to analyze the two morphologically indistinguishable host associated strains of the pest. They used PCR primers, insect and cell line sources, DNA isolation, PCR-REFP analysis, and DNA sequencing in materials and methods which resulted in a set of primers to identify the particular pest. This paper shows the importance of identification of pests at the right time, to avoid the crop loss affected by pests. GoMicro [36] is the company who are bringing phone-based AI assessment to agriculture to identify and detect pests at an early level. This is located in Australia and shared accuracy of pest infestation with the use of an AI model, named confusion matrix. They shared the results of a validation exercise carried out with pests by researchers in England, India, and Kenya. Images used for validation provide overall accuracy of 99.27% in England and 97.4% in India.

3 Proposed System The proposed study will be designed for early detection and identification of pest using digital images. This will reduce the manual work which results in higher accuracy. This study will range from manual traps to development of Web and mobile applications. These applications can be used by stakeholders for data analysis and reporting. Figure 1 shows a simple proposed model of how the data will flow. First, the data will be collected at the trap site by the modified and automated pheromone traps and in the form of images. Depending on the cellular network connectivity at that particular time, the collected data will be sent to the cloud server for processing. Once received, the image identification system will identify and classify the image as pest and load the data in the database. Once the data is loaded onto the database, the relevant stakeholders can access it via the Web or mobile interfaces that we will provide.

Pest Detection and Identification in Infested Plants Using Digital Images …

231

Fig. 1 Proposed model

4 Conclusion The impact of pest infestation on global food security can never be underrated. It is an issue that affects all of us in the world, and if we do not do anything about it, the world will starve. This has led us to study these literatures. Insects are an actual threat to food and nutrition security likely to affect the livelihoods of small scale farmers and commercial farmers if not controlled. The main problem every stakeholder faces is the monitoring and early warning of the pest occurrence. The monitoring is usually manual which involves weekly field visits, hence delaying the monitoring process. This is the reason we proposed a model that will make use of cheap technology and do data collection in the field and send the data to the cloud server where data will be easily accessible via the Web interface or mobile application for analysis and reporting. The system once successfully implemented will provide stakeholders a platform that they can use to get the necessary information on pest occurrences or pest infestation in the farm field and be able to make informed and viable decisions. The wide-ranging variety of applications on the subject of counting objects in digital images makes it difficult for someone to prospect all possible useful ideas present in the literature, which can cause potential solutions for problematic issues to be missed. In this context, this paper tried to present a comprehensive survey on the subject, aiming at being a starting point for those conducting research on the issue. Due to the large number of references, the descriptions are short, providing a quick overview of the ideas underlying each of the solutions. It is important to highlight that the work on the subject is not limited to what was shown here. Many papers on

232

M. Shinde et al.

the subject could not be included in order to keep the paper length under control— the papers were selected as to consider the largest number of different problems as possible. Thus, if the reader wishes to attain a more complete understanding on a given application or problem, he/she can refer to the bibliographies of the respective articles.

References 1. Bock CH, Poole GH, Parker PE, Gottwald TR (2010) Plant disease severity estimated visually, by digital photography and image analysis, and by hyperspectral imaging. Crit Rev Plant Sci 29(2):59–107 2. Chiwamba SH, Phiri J, Nkunika POY, Nyirenda M, Kabemba MM (2018) An application of machine learning algorithms in automated identification and capturing of fall armyworm (FAW) moths in the field. In: ICICT 2018, Lusaka, Zambia 3. Ishengoma FS, Rai IA, Ngoga SR (2022) Hybrid convolution neural network model for a quicker detection of infested maize plants with fall armyworms using UAV-based images. Ecol Inform 67(2022):101502 4. Agarwal A, Rako L, Schutze MK, Starkie ML, Tay WT, Rodoni BC, Blacket MJ (2022) A diagnostic LAMP assay for rapid identification of an invasive plant pest, fall armyworm Spodoptera frugiperda (Lepidoptera: Noctuidae). Australia 5. Timilsena BP, Nissy S, Kimathi E, Abdel-Rahman EM, Sei-Admas I, Wamalwa M, Tonnang HEZ, Ekesi S, Hughes DP, Rajotte EG, Subramanian S (2022) Potential distribution of fall armyworm in Africa and beyond, considering climate change and irrigation patterns. www.nat ure.com/scientificreports 6. Congdon BS, Webster CG, Severtson D, Spafford H (2021) In-field capable loop-mediated isothermal amplification detection of fall armyworm (Spodoptera frugiperda; Lepidoptera: Noctuidae) larvae using a rapid and simple crude extraction technique. bioRxiv preprint. https:// doi.org/10.1101/2021.08.09.455740 7. Sattar A, Shampod YA, Ahmed MT, Akter N, Mahmud A (2022) Deployment of e-services based contextual smart agro system using internet of things. Bull Electr Eng Inform 11(1):414– 425. ISSN: 2302-9285. https://doi.org/10.11591/eei.v11i1.3255 8. Yainna S, Nègre N, Silvie PJ, Brévault T, Tay WT, Gordon K, dAlençon E, Walsh T, Nam K (2021) Geographic monitoring of insecticide resistance mutations in native and invasive populations of the fall armyworm. Insects 9. Tessnow AE, Gilligan TM, Burkness E, De Bortoli CP, Jurat-Fuentes JL, Porter P, Sekula D, Sword GA (2021) Novel real-time PCR based assays for differentiating fall armyworm strains using four single nucleotide polymorphisms. PeerJ. https://doi.org/10.7717/peerj.12195 10. Prabha R, Kennedy JS, Vanitha G, Sathiah N, Banu Priya M (2021) Artificial intelligence powered expert system model for identifying fall armyworm infestation in maize (Zea mays L.). J Appl Neural Sci 13(4):1339–1349. ISSN: 0974-9411 (Print), 2231-5209 (Online) 11. Ishengoma FS, Rai IA, Said RN (2021) Identification of maize leaves infected by fall armyworm using UAV-based imagery and convolutional neural networks. Comput Electron Agric 184(2021):106124. www.elsevier.com/locate/compag 12. Yousaf S, Raheman A, Masood M, Ali K, Suleman N (2022) Occurrence and molecular identification of an invasive rice strain of fall armyworm Spodoptera frugiperda (Lepidoptera: Noctuidae) from Sindh, Pakistan, using mitochondrial cytochrome c oxidase I gene sequences. J Plant Dis Prot 129:71–78. https://doi.org/10.1007/s41348-021-00548-6,PAKISTAN 13. Gomes FDG, Pinheiro MMF, Ramos APM, Furuya DEG, Liesenberg V, de Castro Jorge LA, Alaumann RA, Gonçalves WN, Junior JM, Michereff MFF, Borges M, Blassioli-Moraes MC, Osco LP (2021) Detecting the attack of the fall armyworm (Spodoptera frugiperda) in cotton

Pest Detection and Identification in Infested Plants Using Digital Images …

14. 15.

16.

17.

18.

19.

20. 21.

22.

23.

24.

25.

26.

27.

28.

29.

233

plants with machine learning and spectral measurements. https://doi.org/10.20944/preprints 202102.0516.v1. Preprints www.preprints.org Pearson AJ, Bell JR, Subramanian S, Ouma K (2020) Smart armyworm surveillance: project technical report. Rothamsted Research, Harpenden, Herts Chamara RMSR, Senevirathne SMP, Samarasinghe SAILN, Premasiri MWRC, Sandaruwani KHC, Dissanayake DMNN, De Silva SHNP, Ariyaratne WMTP, Marambe B (2020) Role of artificial intelligence in achieving global food security: a promising technology for future. SL J Food Agric 6(2):43–70. https://doi.org/10.4038/sljfa.v6i2.88 Sah LP, Lamichhaney D, Kc HB, Acharya MC, Humagain SP, Bhandari G, Muniappan R (2020) Fall armyworm (Spodoptera frugiperda) in maize: current status and collaborative efforts for its management in Nepal. J Plant Prot Soc 6 Mrisho LM, Mbilinyi NA, Ndalahwa M, Ramcharan AM, Kehs AK, McCloskey PC, Murithi H, Hughes DP, Legg JP (2020) Accuracy of a smartphone-based object detection model, PlantVillage Nuru, in identifying the foliar symptoms of the viral diseases of cassava–CMD and CBSD. Front Plant Sci 11 Mahat K, Mitchell A, Zangpo T (2020) An updated global COI barcode reference data set for fall armyworm (Spodoptera frugiperda) and first record of this species in Bhutan. Elsevier. https://www.elsevier.com/open-access/userlicense/1.0/ Tsai C-L, Chu I-H, Chou M-H, Chareonviriyaphap T, Chiang M-Y, Lin P-A, Lu K-H, Yeh W-B (2020) Rapid identification of the invasive fall armyworm Spodoptera frugiperda (Lepidoptera, Noctuidae) using species-specific primers in multiplex PCR. Sci Rep. www.nature.com/scient ificreports Gharte Sneha H, Bagal SB (2021) Detection of plant leaf disease using image processing. bioRxiv preprint. https://doi.org/10.1101/2021.08.09.455740 Chulu F, Phiri J, Nyirenda M, Kabemba MM, Nkunika P, Chiwamba S (2019) Developing an automatic identification and early warning and monitoring web based system of fall armyworm based on machine learning in developing countries. Zambia Inf Commun Technol (ICT) J 3(1):13–20 Chiwamba SH, Phiri J, Nkunika POY, Nyirenda M, Kabemba MM, Sohati PH (2019) Machine learning algorithms for automated image capture and identification of fall armyworm (FAW) moths. Zambia Inf Commun Technol (ICT) J 3(1):1–4 Jing D-P, Guo J-F, Jiang Y-Y, Zhao J-Z, Sethi A, He K-L, Wang Z-Y (2019) Initial detection and spread of invasive Spodoptera frugiperda in China and comparisons with other noctuid larvae in cornfields using molecular techniques. Insect Sci 1–11. https://doi.org/10.1111/17447917.12700 Chulu F, Phiri J, Nkunika POY, Nyirenda M, Kabemba MM, Sohati PH (2019) A convolutional neural network for automatic identification and classification of fall armyworm moth. Int J Adv Comput Sci Appl (IJACSA) 10(7) Chiwamba SH, Phiri J, Nkunika POY, Sikasote C, Kabemba MM, Moonga MN (2020) Automated fall armyworm (Spodoptera frugiperda, J.E. Smith) pheromone trap based on machine learning. Agric Ecosyst Environ 292 Bhavani B, Chandra Sekhar V, Kishore Varma P, Bharatha Lakshmi M, Jamuna P, Swapna B (2019) Morphological and molecular identification of an invasive insect pest, fall army worm, Spodoptera frugiperda occurring on sugarcane in Andhra Pradesh, India. J Entomol Zool Stud 7(4):12–18 Ateya SM (2018) Fall armyworm prediction model on the maize crop in Kenya: an internet of things based approach. Faculty of Information Technology (FIT), Strathmore University, Kenya Chulu F, Phiri J, Nkunika POY, Nyirenda M, Kabemba MM, Moonga MN (2018) Developing an automated fall armyworm (FAW) identification and early warning and monitoring system based on ANN techniques. In: Proceedings of the ICTSZ international conference in ICTS (ICICT2018), Lusaka, Zambia, 12–13 Dec 2018 Hetzroni A, Miles GE, Engel BA, Hammer PA, Latin RX (1994) Machine vision monitoring of plant health. Adv Space Res 14(11):203–212

234

M. Shinde et al.

30. Vibhute A, Bodhe SK (2012) Applications of image processing in agriculture: a survey. Int J Comput Appl 52(2). ISSN: 0975-8887 31. Al Bashish D, Braik M, Bani-Ahmad S (2010) A framework for detection and classification of plant leaf and stem diseases. In: International conference on signal and image processing. IEEE, Chennai, pp 113–118 32. Lewter JA, Szalanski AL (2007) Molecular identification of the fall armyworm Spodoptera frugiperda (J.E. Smith) (Lepidoptera: Noctuidae) using PCR-RFLP. J Agric Urban Entomol 24(2): 51–57 33. Sena Jr DG, Pinto FAC, Queiroz DM, Viana PA (2003) Fall armyworm damage to maize plant identification using digital images. Science Direct 34. Sena DJ, Pinto F, Queiroz D, Viana P (2003) Fall armyworm damaged maize plant identification using digital images. Biosyst Eng 85(4):449–454 35. Levy HC, Garcia-Maruniak A, Maruniak JE (2022) Strain identification of Spodoptera frugiperda (Lepidoptera: Noctuidae) insects and cell line: PCR-RFLP of cytochrome oxidase C subunit I gene. BioOne Complete Florida Entomological Society, Florida 36. Carter R, Deshmukh S, Anyanda G. FAW AI validation. GoMicro

Narrative Paragraph Generation for Photo Stream Using Neural Networks M. N. Anjali, Tejash More, Kumari Misa, and Keshab Nath

Abstract Humans have the innate ability to perceive an image just by looking at it, for us images are not just a collection of objects but a network of interconnected object relationships. The problem arises when a machine tries to inspect an image, hence we try to convert image data to textual data. Despite major achievements in the image captioning field, there is a lack of models that provide concise captions of a given image; moreover, already existing models are so much bigger in size that the number of learning parameters is very high. The objective of this paper is to fill that gap; hence, we propose a model that incorporates an advanced deep convolution neural network to extract image features and an attention GRU with a local attention network to generate captions. We have also identified a class imbalance problem with Flickr dataset. We rectified this problem by adding images of some specific classes. The model has been trained on our improvised Flickr dataset. Keywords Image captioning · Visual narration · Deep learning · Natural language generation · Generative adversarial network

1 Introduction A single image can display many different ideas. We humans have an outstanding ability to summarize different ideas with respect to different parts of the image. And also we can link the image region with our description. While humans can seamlessly perform this action, it can be a tedious task for machines to express the image in natural language. The image caption generation task has been recently on a surge with the advances in neural machine translations (NMTs) [1] and larger datasets [2, 3]. The encoder-decoder pipeline is used by many image captioning models. For sequence-tosequence learning, many encoder-decoder frameworks are introduced that are based on recurrent neural networks. Recurrent neural networks (RNNs) [4], as well as long M. N. Anjali · T. More · K. Misa · K. Nath (B) Department of Computer Science and Engineering, Indian Institute of Information Technology Kottayam, Kottayam 686635, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_20

235

236

M. N. Anjali et al.

short-term memory (LSTM) networks, can be sequence learners. Due to the vanishing gradient problem, the RNNs can only remember the earlier status for a few time steps [5]. To overcome the vanishing gradient problem, a special type of RNN architecture is designed called the LSTM network in RNNs. It also introduces a memory cell. Each memory cell consists of three gates, a neuron, and a self-recurrent connection. Due to these gates, memory cells can keep and access data for an extended period, making the LSTM network capable of learning long-term dependencies. Memory cells of LSTM models aim to remember old data for the long term, but they are bounded to fewer time steps because, at every time step, long-term information is gradually diluting. To enhance the hierarchical structure of traditional image captioning models, we add a deep convolutional neural network(CNN) to extract the features from sequences as an image encoder. We adopted the Xception [6] model for the encoder part which is a width-based multi-connection convolution neural network. Unlike the existing image captioning model, we have used an advanced CNN model and attention GRU instead of LSTM. To summarize, our primary contribution lies in bringing in a deep CNN architecture like Xception, which is relatively small in size and has less number of parameters to extract features from the images, and the image captioning part is taken care of by RNNs, where we incorporated attention GRU. We aim to provide a higher accuracy and small size image captioning model trained on the Flickr8k dataset. We also managed to tweak the dataset based on our needs and have provided a custom dataset where we have attempted to balance the human class of the Flickr8k dataset and have added about 700 more images to the dataset.

2 Related Work The problem of creating natural language descriptions of images has become a cardinal topic in the field of computer vision. A traditional way to generate descriptions using neural networks is to formulate the problem as a retrieval and ranking problem [7]. The main drawbacks of retrieval-based techniques are that they are time-consuming and fail to provide good descriptions for new sets of image objects [5]. Inspired by the success of deep neural networks in the field of artificial intelligence, researchers have proposed machine translation to label images using the encoder-decoder framework generation rather than translating; hence, the purpose of image captioning is to understand an image and provide its description in a sentence. Vinyals [5] introduced the first neural network approach to captioning images. An encoder/decoder system is trained to optimize the log-likelihood of the target image description. Similarly, the multi-modal fusion layer is used by Mao [8] and Donahue [9] to fuse picture characteristics and word representation at each time step. The captions are derived from the whole images in both cases, i.e., the models in [8, 9], but the image captioning model proposed by Karpathy [10] provides descriptions based on regions. Following this work, Johnson [10] developed a method to jointly locate regions and characterize each using captions.

Narrative Paragraph Generation for Photo Stream Using Neural Networks

237

The recent image captioning models have incorporated convolution neural networks into the architecture for image feature extraction and use a recurrent neural network to perform image captioning. In the model proposed by Grishma et al. [11], the model uses VGG16 (visual geometry group), a 16 convolution layer model for object detection and LSTM and GRU for generating captions from images. According to the paper [12], a CNN, convolutional neural network, is a kind of feedforward neural network that can extract features from data with convolution structures. The paper presented a comprehensive study of all CNN architectures including the classic CNN and advanced CNN. It is different from traditional feature extraction methods as it does not need to extract features manually. This paper was substantial for identifying the best CNN model for our architecture. Some researchers have looked at the topology of networks to describe the link between visuals and descriptions directly or implicitly [13–15] Xu et al. [14] use “hard" and “soft" attention algorithms to incorporate spatial attention on image convolutional features into the encoder-decoder architecture. Yang et al. [16], method uses a validation network to improve attentional mechanisms, while Liu et al. [15], approach aims to improve the accuracy of visual attention. According to Shudong Yang et al. [17] after comparing the performance differences of the two deep learning models, namely Long Short-Term Memory Neural (LSTM) and the gated recurrent unit (GRU) involving two dimensions: dataset size and quantitative evaluation. They found that in terms of model training speed, GRU is 29.29% faster than LSTM for processing the same dataset. Hence, we have utilized attention GRU instead of LSTM for image captioning in our architecture. Francois [6] gives us a new perspective on depth-wise separable convolutions, in the Xception architecture. This method involves the data passing first via the entering flow, then eight times through the middle flow, and lastly, once through the exit flow. Keep in mind that batch normalization comes after every convolution and separable convolution layer. The depth multiplier for all separable convolution layers is 1.

3 Proposed Architecture The proposed architecture “visual narration” adopts the Xception convolution network to extract features from the images which then encodes them and passes them to the sequence decoder with an attention mechanism, i.e., the attention GRU to generate captions of the particular image. We propose a model to generate captions from a given image, and it incorporates a deep CNN(Xception) with attention GRU. Figure 1 shows the architecture framework of our model. It includes a deep CNN for image encoding which mainly extracts the features from the images. It is then passed onto a sequence decoder, which is GRU with an attention mechanism to generate the image captions. Next, we will go over the stages of our proposed model.

238

M. N. Anjali et al.

Fig. 1 Proposed architecture with one deep CNN layer (Xception) to extract image features and sequence decoder (GRU) with attention mechanism to generate captions

3.1 Phase1: CNN Layer Our first step is to extract features from the images for that we use the Xception network which then encodes these features. As per Francois [6], the feature extraction base of the network in the Xception architecture is composed of 36 convolutional layers. With the exception of the first and last modules, all of the 14 modules made up of the 36 convolutional layers contain linear residual connections surrounding them. The Xception architecture can be summed up as a linear stack of residually connected depth-wise separable convolution layers. This makes defining and modifying the architecture relatively simple. The Xception model includes a convolutional base which is followed by a logistic regression layer for image classification problems. Since in our model, we are not classifying the images, but rather generating captions for the same, we have dropped that block as shown in Fig. 2.

3.2 Phase2: Recurrent Network with Attention An attention model allows, for each new word, to focus on a part of the image, that is, it gives attention to only a specific region of the image rather than the whole image. Hence to generate captions, we have incorporated attention GRU with the Minh-Thang Luong’s [18] local attention mechanism. There are mainly two types of

Narrative Paragraph Generation for Photo Stream Using Neural Networks

239

Fig. 2 Model architecture of Xception [6]

attention networks, such as a local attention network and a global attention network. Local attention mechanism attends to only a small subset of words. As choosing a window of input tokens for attention distribution is involved, it is also known as window-based attention. Why not create a global attention network then? It is because of how simple it is to create and train the local attention network. Comparing it to the global attention network, it is computationally easier.

3.3 Phase3: BLEU Score and Beam Search BLEU algorithm is used to rate the accuracy of machine-translated text from one natural language to another. The BLEU score, which ranges from 0 to 1, represents how closely the machine-translated text resembles a set of high-quality reference translations. Many NLP and voice recognition models employ the beam search algorithm as the final layer of decision-making to select the best output given goal variables like maximum probability or the next output character. Based on conditional probability, the beam search algorithm chooses various tokens at a point in a given sequence. Through a hyperparameter called beam width, the algorithm can select any number of N best options. When using beam search, we also look at the N best output sequences’ current preceding words and their probability in relation to the point in the sequence we are currently decoding. In greedy search, we take the best word for

240

M. N. Anjali et al.

each position in the sequence. In a greedy search, every point in the output sequence is examined separately. The word that has the highest likelihood is chosen, and we proceed with the rest of the phrase without going back to the beginning.

4 Datasets Used In this experiment, we have used the Flickr8k dataset and our custom dataset with around 8776 images to carry out the implementation. We have trained our model with 3 different parameters, that is, changing the number of units in the dense network as well as the RNN units, namely 64, 128, and 256 units. The dataset which we considered for training, as well as testing, is the Flickr8k dataset. Flickr8k dataset is a subset of the bigger Flickr30k dataset [2, 3]. Flickr30k entities have 31K images, with 5 captions for each image. A common benchmark for sentence-based image description is the Flickr30k dataset. It links occurrences of the same things across many captions for the same image and associates them with 276k manually annotated bounding boxes to enhance the 158k captions from Flickr30k with 244k coreference chains. Such annotations are necessary to advance grounded language interpretation and automatic image description. Flickr8k dataset has become a standard dataset to be used in training the model in smaller devices which has fewer resources. However, the Flickr8k dataset has a class imbalance problem, especially in different genders of the human (see Fig. 3 (left)). To correct this, we added around 700 images with 5 captions each from boy, girl, and men classes also illustrated in Fig. 3 (right). We focused on adding those images which have multiple objects present in a single image so that it will not greatly increase the vocabulary of the Flickr8k dataset. GloVe: Global vectors for word representation [19] is used for node and edge feature labels. It is a technique for obtaining word vector representations that is unsupervised. Training is done using corpus-based global word-word co-occurrence statistics, and the resulting representations show off some of the word vector space’s

Fig. 3 (left) Human class was not balanced in Flickr8k images. (right customized Flickr8k data with 700 images of girl, boy, and men)

Narrative Paragraph Generation for Photo Stream Using Neural Networks

241

Fig. 4 Illustrate the step-by-step process for caption preparation

fascinating linear substructures. There are many versions and sizes of glove word embeddings available. We used the Wikipedia 2014 version which has 6B tokens in 200d vectors.1

4.1 Caption Preparation The most important part of any Data Science project is data preprocessing. Similarly, the most important part of any natural language processing project is its text data preparation. Since machines cannot understand words as we humans do, so it’s a very complicated task to make a machine understand the textual language. It has become a very necessary and important step to remove any unrelated data from the textual data as those highly confuse the machine to determine what is to be predicted. For that, some steps or filters were applied to the raw caption dataset to make it less confusing to the machine. As shown in Fig. 4, first, contractions were expanded. Shortcut words like can’t, shouldn’t, it’s understandable to us humans, but after removing the punctuation from these words it will become can’t, shouldn’t, its and these shortcut words thought meant the same, should be avoided. Instead of removing these words, we have expanded them. So, can’t becomes cannot, shouldn’t becomes should not, it’s became it is. Second, the whole dataset is converted into lowercase so as to avoid duplication in different cases. Hyperlinks were removed if any, as these are not necessary for a caption. Punctuation was removed as these special characters provide a hindrance to learning. Digits were removed as these are not necessary. Any non-alphanumeric characters will be removed along with the extra whitespace. After processing the captions, every sentence will get 2 extra tokens, “beginseq” and “endseq”, at the start and the end, respectively. All the captions are stored in a dictionary with the image name as key and a list of captions as value. Final prepared captions are shown in Fig. 5. A vocabulary of words has been generated from the captions with threshold = 8. It means that whichever word has occurred more than 8 times in the whole dataset, 1

https://nlp.stanford.edu/projects/glove/.

242

M. N. Anjali et al.

Fig. 5 Preparation of caption dictionary Fig. 6 Caption data. (left) Vocabulary of size 2169. (right) Dictionary with each word in the vocabulary assigned a unique index.

that word will be added to the vocabulary. The length of this vocabulary is 2169. All the words in the vocabulary will get an unique index assigned to it, that index will be sent to the model as an input instead of the word (Fig. 6). There are 8776 images in all, each with five captions. This data has been split into 3 parts which are for training (6776 images), validation (1000 images), and testing (1000 images). Data that has to be fed into the model consists of image features and captions. For the first time, we attempt to predict the second word using the input of the image vector and the first word, (i.e., Input = Image1 + “startseq”; Output = “a”). Next, we try to predict the third word using the input of the image vector and the first two words (i.e., Input = Image1 + “startseq a”; Output = “dog”), and so on. Table 1 summarizes the data matrix for one image and its corresponding caption. In the sliced caption, instead of the list of words, padded captions will be fed with the index of the word replacing the textual word like in Fig. 7. Maximum sequence of the caption is taken as 40.

Narrative Paragraph Generation for Photo Stream Using Neural Networks

243

Table 1 Input data points corresponding to one image and its caption Image feature vector Sliced caption Target word Image_1 Image_1 Image_1 Image_1 Image_1

Beginseq Beginseq a Beginseq a dog Beginseq a dog is Beginseq a dog is running

A Dog Is Running Endseq

Fig. 7 Padded caption: Each word in the sliced caption is replaced by the respective index, and if the length of the capton is smaller than the maximum number which is 40, then it will be post-padded by 0

4.2 Model Parameters Experimentation with the different model parameters has been completed in this project. For optimizer, Adam optimizer has been chosen for this. For loss, sparse categorical cross entropy has been chosen. Training has been done for 51 epochs with batch size 8. A number of neurons in a dense network have been chosen from (64, 128, and 256 units), and a number of RNN units has been chosen from (64, 128, and 256 units). The model has been trained and compared with both attention mechanisms and without.

5 Experimentation and Result The model has been trained without attention mechanism for 51 epochs with batch size 8. We used three different parameters like 64 units, 128 units, and 256 units as shown in Table 2. Both sentence decoders, which are beam search and greedy search, are used and compared against. The model has been evaluated using BLEU scores with 4 kinds of different weights. Similarly, the model has been trained with local

244

M. N. Anjali et al.

Table 2 Non-attention-based visual narration model with captions generated by beam and greedy search Algorithms RNN units BLEU-1 BLEU-2 BLEU-3 BLEU-4 Loss Beam search

64 128 256 Greedy search 64 128 256

0.2668 0.2694 0.2309 0.2596 0.2610 0.2271

0.1565 0.1537 0.1273 0.1570 0.1556 0.1318

0.1064 0.1023 0.0843 0.1118 0.1079 0.0924

0.0417 0.0360 0.0305 0.0480 0.0422 0.0357

0.745 0.618 0.487 0.745 0.618 0.487

Table 3 Attention-based visual narration model with captions generated by beam and greedy search Algorithms Beam search

RNN units

128 256 Greedy search 128 256

BLEU-1

BLEU-2

BLEU-3

BLEU-4

Loss

0.2809 0.2594 0.3315 0.3301

0.1288 0.1193 0.1831 0.1874

0.0700 0.0724 0.1253 0.1324

0.0206 0.0250 0.0522 0.0607

0.419 0.250 0.304 0.208

Fig. 8 Output generated by the proposed model with attention (shows both greedy and beam)

attention mechanism for 51 epochs with batch size 8. In this experiment, we used two different parameters such as 128 units, 256 units as shown in Table 3. The results produced by our proposed model are presented in Fig. 8.

Narrative Paragraph Generation for Photo Stream Using Neural Networks

245

Table 4 Performance analysis of Xception against other deep convolution networks Model Size (MB) TOP-1 TOP-5 Parameters Depth accuracy accuracy VGG16 InceptionV3 ResNet50 InceptionResNetV2 ResNeXt50 Xception

528 92 98 215 96 88

0.713 0.779 0.749 0.803 0.777 0.790

0.901 0.937 0.921 0.953 0.938 0.945

138,357,544 23,851,784 25,636,712 55,873,736 25,097,128 22,910980

23 159 – 572 – 126

5.1 Performance Analysis of Deep CNNs To better understand the deep CNNs architecture and to pick the language CNN for our model, we conducted performance analysis on a few deep convolution neural networks ranging from depth-wise convolutions to width-based multi-connection convolutions. We compared the size, accuracy, and number of parameters for each convolution neural network. From the analysis, it was evident that Xception was the smallest and most advanced deep convolution neural network with a size of 88MB and an accuracy of 0.945, which led us to choose Xception as our CNN layer for our model as shown in Table 4.

6 Conclusion and Future Work We have provided an image captioning model called visual narration where we predominately leveraged the benefits of deep convolution neural network Xception for extracting the image features and attention GRU with local attention for the generating captions. Hence, significantly reducing the size of the model to 88MB compared to state-of-the-art architectures and enhancing the model’s accuracy. We successfully reduced the lose to 0.419 from 0.745. We also tried to balance the Flickr8k dataset by adding images of some specific classes to slightly reduce the class imbalance problem in the original dataset. Our future work includes incorporating transformers for the sentence generation part instead of attention GRU, as transformers like BERT and GPT provide more humanlike captions.

246

M. N. Anjali et al.

References 1. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 27 2. Plummer BA, Wang L, Cervantes CM, Caicedo JC, Hockenmaier J, Lazebnik S (2015) Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE international conference on computer vision, pp 2641– 2649 3. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) European conference on computer vision. Springer, pp 740–755 4. Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D Nonlinear Phenomena 404:132306 5. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156– 3164 6. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258 7. Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853 8. Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A (2014) arXiv preprint arXiv:1412.6632 9. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634 10. Johnson J, Karpathy A, Fei-Fei L (2016) Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp 4565–4574 11. Kalena P, Malde N, Nair A, Parkar S, Sharma G (2019) Visual image caption generator using deep learning. In: International conference on advances in science and technology 12. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst 13. Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 375–383 14. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning (PMLR, 2015), pp 2048–2057 15. Liu C, Mao J, Sha F, Yuille A (2017) Attention correctness in neural image captioning. In: Thirty-first AAAI conference on artificial intelligence (2017) 16. Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov RR (2016) Review networks for caption generation. Adv Neural Inf Process Syst 29 17. Yang S, Yu X, Zhou Y (2020) In 2020 international workshop on electronic communication and artificial intelligence (IWECAI) (IEEE, 2020), pp 98–101 18. Luong MT, Pham H, Manning CD (2015) arXiv preprint arXiv:1508.04025 19. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

Stroke Disease Prediction Using Adaboost Ensemble Learning Technique Sreenidhi Ganachari and Srinivasa Rao Battula

Abstract This paper presents an approach to predict stroke utilizing a better model with Adaboost algorithm. Today the utilization of opportunely trained machine learning algorithms can be significantly utilized in fields such as medicine and data science to identify and provide solutions to quandaries that do not have a solution or if the current solutions are not efficacious. A stroke is a condition where blood arteries in the brain burst, harming the brain. The rigor of a stroke can be abated by early apperception of numerous stroke warning symptoms. Previous works have utilized sundry machine learning algorithms to determine stroke. Despite many research works in the literature indicating successful outcomes in preclinical stroke models, there has been no significant contribution to the medical field. Since one weak classifier cannot reliably predict an object’s categorization by itself, adaptive boosting, also known as AdaBoost, combines numerous weak classifiers in progressive learning to create a strong predictive classifier. It uses ensemble learning techniques to predict the accuracy which makes it better than the other algorithms discussed in the paper. The dataset used emanates from Kaggle with sundry parameters that have been taken into consideration. The machine learning algorithms considered are random forest algorithm, decision tree algorithm, SVM, logistic regression algorithm, KNN, Naïve Bayes, and AdaBoost algorithm to train different models and compare the results for the best model. Amongst all the algorithms AdaBoost algorithm gives us the best precision of 98.8%. Thus, the main aim of this paper is to propose a classifier— AdaBoost classifier that can be utilized by medical practitioners to detect the ailment in its early stages. Keywords Ensemble learning · Adaboost algorithm · Stroke prediction · Stroke disease analysis · Machine learning · Healthcare

S. Ganachari (B) · S. R. Battula School of Computer Science and Engineering, Vellore Institute of Technology, Andhra Pradesh, Amaravati, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_21

247

248

S. Ganachari and S. R. Battula

1 Introduction When blood flow to the brain is disrupted, a stroke, also known as a brain attack, develops. It leads to an emergency situation. A continuous supply of oxygen and nutrients is needed for the brain to operate correctly. Even if the blood flow is halted for a short time, this might result in complications. When brain cells die, the function of the brain is lost. You might not be able to do tasks that need that part of your brain. For example, a stroke can affect your ability to walk, speak, eat, think, and retain information. Control your bowel and bladder function, as well as your emotions and other critical body functions [1]. A global epidemic of stroke has already been established. One stroke will occur in every four persons over the age of 25 during their lives. 13.7 million people will experience their first stroke this year, and 5.5 million of them will pass away as a result. According to current data, 6.7 million deaths will occur annually if nothing is done. According to the World Stroke Organization, stroke claims 116 million years of good life each year, making it the leading cause of death and disability globally. People who reside in countries with few resources are significantly affected by stroke. Between 2000 and 2008, stroke incidence rates in low- and middle-income countries increased by 20% more than in high-income countries. Today, two out of every three stroke victims live in a low- or middle-income nation. Stroke is directly proportional to the age of a person, higher the age higher are the chances of being affected by a stroke. Tobacco usage, physical inactivity, bad diet, hazardous alcohol use, hypertension, atrial fibrillation, elevated blood cholesterol levels, obesity, male gender, genetic disposition, and psychological variables are all risk factors [2–6]. Based on these factors above, choose an appropriate dataset that helps us in early detection.

2 Literature Survey Most researchers have worked on finding a solution to predict stroke using various machine learning models, but the use of Adaboost for this cause has not been implemented. A novel approach has been proposed which makes use of Adaboost technique to predict stroke using various factors beforehand to prevent the occurrence of deaths from this. In [7] Jeena and Kumar have performed stroke prediction using the SVM algorithm and they have obtained an accuracy of 91%. They have performed this for various kernel functions and obtained the highest accuracy for linear function. But the dataset that has been used has considered a small set of parameters and this can be improved to perform the algorithm on a larger dataset to get better results and create a better model with higher performance. In [8] Yu et al. have used the random forest algorithm and LSTM of deep learning to predict stroke using EMG bio-signals. It predicts stroke in real-time and in the future can help to predict the probability of a disease. Although the system predicts

Stroke Disease Prediction Using Adaboost Ensemble Learning Technique

249

the output in real time, it may not be effective in sudden cases of stroke such as while sleeping, driving or walking. This mainly focuses only on EMG bio-signals, but does not take into consideration biometric signal analysis or other aspects of medical knowledge. The Cardiovascular Health Study (CHS) dataset was utilized by Nain et al. in [9] to predict stroke using five machine learning techniques. To find the most effective approach, the authors employed the decision tree with the C4.5 method, principal component analysis, artificial neural networks, and support vector machine. There were fewer input parameters in the CHS dataset, nonetheless, than in the previous analysis. In [10] Nwosu et al. there has been a systematic investigation of risk variables for stroke prediction. In electronic health records, these risk factors are represented as patient characteristics. The authors examine the subspace representation of 10 variables into two main components using principal component analysis and evaluate the effectiveness of multiple state-of-the-art classification techniques for predicting stroke, including decision trees, random forests, and neural networks. With an accuracy of 75.02%, they discover that the multi-layer perceptron model performs the best. The authors may investigate the impact of using a subset of characteristics on classification algorithm accuracy, and because the accuracy is poor, they can enhance it. In [11] Tazin et al. has used 4 machine learning algorithms to train and obtain the most accurate model. The algorithms random forest algorithm, decision tree, voting classifier, and logistic regression have given accuracies of 96%, 94%, 91%, and 79%, respectively. These results obtained can be improved and other algorithms such as AdaBoost can be used to train the model better.

3 Proposed Methodology 3.1 Proposed System Figure 1 gives an overview of the proposed system in the form of a block diagram. Various datasets from Kaggle have been considered for the implementation and after selecting the most appropriate dataset for the model development the next step was to proceed toward pre-processing of the data as shown in Fig. 1. The pre-processing helps to achieve the target dataset. After selecting the machine learning algorithms such as random forest algorithm, decision tree algorithm, SVM, logistic regression algorithm, KNN, Naïve Bayes, and AdaBoost algorithm to train the models, compare the accuracy. The accuracy can be compared using the accuracy, precision, recall, and F1-score.

250

S. Ganachari and S. R. Battula

Fig. 1 Block diagram of the proposed system

3.2 Dataset The dataset used is from Kaggle, consisting of 5110 data samples (rows) and 12 attributes. The attributes are—id, gender, age, hypertension, heart disease, ever married, work type, residence type, average glucose level, BMI, smoking status, and stroke [12]. The id attribute is an unique integer assigned to each row. The gender column has two values—male and female, while age is an integer value giving us a reference to the age of the person. Hypertension and heart disease are given by an integer value of 1 and 0 where 1 indicates that the patient has hypertension or heart disease and 0 indicates that the patient does not have. The ever married is a string literal with yes or no as the input. Work type is also a string literal having—children, Govt job, never worked, private, self-employed giving us an insight of the different categories of work. The residence type has two values—rural and urban. The average glucose level and BMI attributes take floating point numbers as input. The smoking status consists of values such as formerly smoked, never smoked, smokes, unknown.

Stroke Disease Prediction Using Adaboost Ensemble Learning Technique

251

The output is given by the stroke attribute with a value of 0 and 1. ‘1’ indicates a possibility of stroke while ‘0’ indicates no stroke detected.

3.3 Pre-processing Preparing raw data for use in a machine learning model is known as data preprocessing. The dataset may not be refined and balanced in certain cases. Thus, the need to format and clean the dataset before applying our machine learning algorithm arises. Pre-processing includes importing libraries, finding missing data, cleaning the data of redundant rows, and splitting dataset into training and testing dataset. As demonstrated below, the dataset is unbalanced, with 249 rows suggesting a potential of stroke and 4861 rows showing no likelihood of stroke. If the dataset is not balanced, accurate results won’t be generated. The disadvantage of unbalanced categorization is that there aren’t enough samples of the minority class for a model to properly learn the decision boundary. One approach to solving this issue is to oversample the minority class instances. Before fitting a model, this can be achieved by simply reproducing minority class examples in the training dataset. Although it doesn’t give the model any new information, this can help to balance the class distribution. So, to resample the dataset, the SMOTE approach was employed. After selecting a minority class instance at random, SMOTE determines the k-nearest minority class neighbors. The synthetic instance is then created by randomly choosing one of the K-Nearest Neighbors, b, and attaching it to a line segment in the feature space. The synthetic cases are produced by combining the two chosen examples, a and b [3]. The ‘id’ column is redundant and is removed. The attribute ‘BMI’ has null values in 201 rows, these null values are filled with the mean of all the null values [13]. Another part of cleaning the data is label encoding which is done for certain attributes where the input is a string value. Label encoding is the process of converting string values into integer values since the machine learning models can be trained only using integers. The dataset is split into training and testing data in a ratio of 4:1 (80% and 20%, respectively). After splitting, machine learning algorithms are used to train the model.

3.4 Classification 3.4.1

Decision Tree Classification

Decision tree classification can be used to solve both regression and classification issues. This method, which already assigns input variables to their corresponding output variables, is a non-parametric supervised learning technique. The procedure in a decision tree that determines the class of a given dataset starts at the root node. This method compares the values of the record (actual dataset) attribute to the values

252

S. Ganachari and S. R. Battula

of the root property, then follows the branch and jumps to the next node. Decision trees’ non-linear structure gives you much more freedom to consider, plan for, and anticipate a range of potential outcomes.

3.4.2

K-Nearest Neighbor Algorithm

The K-Nearest Neighbor algorithm is a type of supervised learning technique that is also used for classification and regression like the decision tree classification algorithm. K-Nearest Neighbor examines K-Nearest Neighbors to estimate the class or continuous value for a new datapoint. The technique uses instance-based learning, which uses whole training instances to predict output for unseen data rather than using weights from training data to predict output (as in model-based algorithms).

3.4.3

Support Vector Machine Algorithm

SVM is another algorithm that is based on supervised learning and is used for classification and regression like the previously discussed algorithms. The goal of the SVM method is to determine the best decision boundary or line for classifying ndimensional space into groups so that subsequent data points can be quickly assigned to the appropriate category. The ideal line that separates the data points into each category is known as a hyperplane.

3.4.4

Naïve Bayes Algorithm

A Naive Bayes classifier is a probabilistic machine learning model that is used for classification. This algorithm is also a supervised algorithm. The Bayes theorem is the most essential part of the classifier. This is a group of algorithms that share a common premise, namely that each pair of characteristics being categorized is independent of the others.

3.4.5

Random Forest Algorithm

From a randomly selected portion of the training data, the random forest classifier builds a collection of decision trees [14]. It consists of a set of decision trees (DT) drawn from a randomly selected subset of the training set, which then aggregates votes from the various decision trees to determine the final prediction.

Stroke Disease Prediction Using Adaboost Ensemble Learning Technique

3.4.6

253

Logistic Regression Algorithm

Logistic regression is a supervised learning technique. Logistic regression is a straightforward and more effective method for binary and linear classification problems. It is a classification model with linearly separable classes that is straightforward to use and yields great results. For a specific collection of features, the target value can only accept discrete values. The stroke dataset can be used with the logistic regression approach because the output comprises binary values (0 and 1).

3.4.7

Adaptive Boosting (AdaBoost) Classifier Algorithm

A common boosting technique called adaptive boosting (AdaBoost) combines numerous weak classifiers into a single strong classifier. To achieve this, a model is first created using the training data, and then a second model is used to try and correct any errors in the first model. Until the training set is correctly predicted or the maximum number of models has been achieved, more models are added.

3.5 Adaptive Ensemble Learning An approach known as the ensemble method combines predictions from various machine learning algorithms to produce predictions that are more accurate than those from any one model. The bagging algorithm, which is a learning strategy, develops an ensemble of models (classifiers or predictors). In the proposed algorithm the method used in the bagging classifier type is the support vector classifier. Linear kernel has been used for this classifier. Adaptive boosting is an ensemble technique where the work is done on weak classifiers to make them into a strong classifier. In Adaptive Boosting, ‘n’ decision trees are created during the data training phase. As the first decision tree or model is built, the incorrectly classified record from the first model is given priority. Only these records are sent as input for the second model. Until the decision on how many base learners to produce is made, the process is repeated. Figure 2 shows the working of Adaboost algorithm in detail. The steps of the Adaptive Boosting algorithm are as follows: Algorithm: Adaboost Algorithm Input: A training sequence of N examples, S = {(x1, y1)(x N , y N )}, x1 ∈ X , with labels yi ∈ {− 1, + 1}; Base Learning Algorithm; Number of Learning Rounds L. Step 1: Initialize equal weights wi = 1/N , I = 1, 2 . . . N to all observations. Step 2: Classify random samples using stumps. Step 3: For k = 1 to L: Calculate the error ek = N i = 1wi I / N i = 1wi. Step 4: Compute base learner performer αm = log((1 − ek)/ek).

254

S. Ganachari and S. R. Battula

Fig. 2 Block diagram for AdaBoost

Step 5: Update the weights wi ← wi · exp[αm · I (yi = Gm(xi))]. Step 6: Update final weights and make final predictions. Output: G(x) = sign( Mm = 1 αmGm(x)). There are a lot of benefits to using Adaboost. It is quick, easy, and simple to program. Overfitting has been demonstrated to have little effect on boosting. It may now also be utilized with text or numeric data and has been expanded to address learning issues other than binary classification (Fig. 3).

4 Result Analysis 4.1 Data Visualization Figure 4 shows the bar chart plot of some important attributes of the dataset. Gender has two inputs—male and female, with the number of females more than the number of males. Hypertension has values 0 and 1 where 0 means the individual has no hypertension and 1 means the individual has hypertension. The age varies with the highest range being between 40 and 60. If the boxplots are viewed in Fig. 5 all important features such as age, hypertension, glucose level, BMI, and heart disease show a positive correlation while gender is negatively correlated to the stroke output. Figure 6 further shows the relation between each attribute in the form of a heatmap.

Stroke Disease Prediction Using Adaboost Ensemble Learning Technique

255

Fig. 3 Accuracy of the ML models used for stroke analysis

4.2 Algorithm Results For the stroke dataset the decision tree algorithm gives an accuracy of 91.52%. The F1-score is 92%, the precision and recall scores are 92% for this model. After applying this algorithm to our model, the accuracy obtained is 95.13% which is higher than the decision tree model. For the stroke dataset, both linear and non-linear (sigmoid) kernel algorithms were considered. An accuracy of 94.13% was obtained for the linear kernel and an accuracy of 92.43% was obtained for the sigmoid kernel. This algorithm was used to train the model and an accuracy of 87.28% was obtained. The precision score is 93%, recall score 87%, and the F1-score 90% for the stroke dataset. Applying the random forest algorithm has an accuracy of 92.14% close to the accuracy of SVM and lesser than Naïve Bayes algorithm. The accuracy for the logistic regression model obtained is 92.89% with a precision score of 86%, recall score of 93%, and F1-score of 90%. Table 2 summarizes the above discussed results. Amongst all the algorithms used to train the models AdaBoost gives the highest accuracy of 98.8%. As shown in Fig. 3, the figure displays the accuracies of each algorithm used 1. 2. 3. 4. 5.

Sensitivity: Percentage of stroke patients who have tested positive. Specificity: Percentage of non-stroke patients who have tested negative. False Positive Rate: Percentage of non-stroke patients who have tested positive. False Negative Rate: Percentage of stroke patients who have tested negative. Accuracy: Percentage of stroke patients determined as positive and non-patients as negative.

256

S. Ganachari and S. R. Battula

Fig. 4 Bar plots of important attributes to determine stroke

6. Precision: Percentage of people who are actually stroke patients among those who have tested positive. 7. Recall: Percentage of stroke patients who have previously tested positive. 8. F1-Score (Harmonic Mean of Precision and Recall): Percentage of stroke patients who have previously tested positive. False Positive Rate (FPR) = 1 − Specificity

(1)

Stroke Disease Prediction Using Adaboost Ensemble Learning Technique

Fig. 5 Boxplot of attributes that show positive correlation and negative correlation

Fig. 6 Correlation of each feature with the other in the form a heatmap

257

258

S. Ganachari and S. R. Battula

False Negative Rate (FNR) = 1 − Sensitivity

(2)

Accuracy = TP + TN − /TP + FN + FP + TN

(3)

Precision = TP/TP + FP

(4)

F1-Score = 2 × (Precision × Recall)/Precision + Recall

(5)

4.3 Result Comparison Table 1 shows a comparison of the algorithms run by other researchers for the prediction of stroke. The authors have not made use of Adaboost in their work yet so far. If the other algorithm accuracies are compared, this work has comparatively achieved higher accuracy than most other works done so far. This work thus proposes a new algorithm that might help the medical industry in early detection of stroke using Adaboost. Table 1 Accuracy comparison of other proposed works with this work’s model S. No.

Algorithm

This paper (%)

Random forest (%)

Naive Bayes (%)

Random forest (%)

1

Random forest

92.14

96

73

90

2

Decision tree

91.52

94

66

79

3

SVM—linear

94.13

–

80

77

4

Logistic regression

92.88

79

78

77

5

KNN

95.13

–

80

–

6

SVM—sigmoid

92.43

–

–

–

7

Naive Bayes

87.28

–

82

–

8

Adaboost

98.8

–

–

Table 2 F1-score, precision, and recall scores and accuracy of ML models Scores/algorithm Decision Logistic KNN tree (%) regression (%) (%)

SVM Random Naive Proposed (%) forest Bayes algorithm—Adaboost (%) (%) (%)

F1-score

92

90

94

92

93

90

97.9

Precision score

92

86

93

92

93

87

95.8

Recall score

92

93

95

92

95

93

100

Accuracy

91.52

92.888

95.1309 92.43 92.14

87.28 98.8

Stroke Disease Prediction Using Adaboost Ensemble Learning Technique

259

In [11] Tazin et al. have proposed that their model of random forest performs the best at 96%, our work proposes Adaboost model which gives an accuracy of 98.8% higher than the proposed model in [11]. In [15] Sailasya and Kumari propose various machine learning algorithms on the same dataset with Naive Bayes having the highest accuracy of 82%, but this work has achieved higher accuracy of 87.28% in Naive Bayes and all other proposed algorithms. In [16] Saleh et al. have proposed four algorithms with their random forest classifier achieving the highest. But the proposed Adaboost model has higher accuracy than every model proposed in [16]. Along with that we have also proposed a new model using Adaboost with the highest accuracy of 98.8% (Table 2).

5 Conclusion and Future Work Stroke is a potentially fatal medical ailment that requires to be treated right away to obviate further issues. The engenderment of a machine learning model can avail in stroke early diagnosis and abate the astringency of subsequent effects. This study optically canvasses how well different machine learning algorithms soothsayer strokes predicated on sundry physiological parameters. The efficacy of several machine learning algorithms in accurately soothsaying stroke predicated on multiple physiological parameters is demonstrated in this research. With a precision of 98.8%, the AdaBoost classifier algorithm outperforms all of the other methods. Previous works have not used Adaboost in their work for stroke analysis and using this algorithm the proposed model achieved a very high accuracy that shows us that this algorithm works effectively and can be used to predict stroke beforehand using all these parameters. The future focus of this research is to optically discern if the framework models can be ameliorated by utilizing a more immense dataset and machine learning models like Bagging. In exchange for merely providing some rudimental information, the machine learning architecture may avail the general public the potential for a stroke to develop in an adult patient. In an impeccable world, it would avail patients get early treatment for stroke attacks.

References 1. Govindarajan P, Soundarapandian R, Gandomi A, Patan R, Jayaraman P, Manikandan R (2020) Classification of stroke disease using machine learning algorithms. Neural Comput Appl 32. https://doi.org/10.1007/s00521-019-04041-y 2. Amini L, Azarpazhouh R, Farzadfar MT, Mousavi SA, Jazaieri F, Khorvash F, Norouzi R, Toghianfar N (2013) Prediction and control of stroke by data mining. Int J Prev Med 4:S245– S249 3. Cheon S, Kim J, Lim J (2019) The use of deep learning to predict stroke patient mortality. Int J Environ Res Public Health 16(11):1876. https://doi.org/10.3390/ijerph16111876. PMID: 31141892; PMCID: PMC6603534

260

S. Ganachari and S. R. Battula

4. Singh MS, Choudhary P (2017) Stroke prediction using artificial intelligence. In: 2017 8th annual industrial automation and electromechanical engineering conference (IEMECON), pp 158–161 5. Acharya UR, Meiburger K, Faust O, Koh JEW, Oh SL, Ciaccio E, Subudhi A, Vicnesh J, Sabut S (2019) Automatic detection of ischemic stroke using higher order spectra features in brain MRI images. Cogn Syst Res 58. https://doi.org/10.1016/j.cogsys.2019.05.005 6. Dev S, Wang H, Nwosu C, Jain N, Veeravalli B, John D (2022) A predictive analytics approach for stroke prediction using machine learning and neural networks 7. Jeena RS, Kumar S (2016) Stroke prediction using SVM. In: 2016 international conference on control, instrumentation, communication and computational technologies (ICCICCT), pp 600–602 8. Yu J, Park SJ, Kwon S-H, Ho C, Pyo C-S, Lee H (2020) AI-based stroke disease prediction system using real-time electromyography signals. Appl Sci 10:6791. https://doi.org/10.3390/ app10196791 9. Nain N, Vipparthi SK, Raman B (2020) Computer vision and image processing. In: 4th international conference, CVIP 2019, Jaipur, India, 27–29 Sept 2019. Revised selected papers, part II. Internet resource 10. Nwosu C, Dev S, Bhardwaj P, Veeravalli B, John D (2019) Predicting stroke from electronic health records. In: Annual international conference of the IEEE engineering in medicine and biology society. Conference proceedings. Conference 2019. IEEE Engineering in Medicine and Biology Society, pp 5704–5707. https://doi.org/10.1109/EMBC.2019.8857234 11. Tazin T, Alam MN, Dola NN, Bari MS, Bourouis S, Monirujjaman Khan M (2021) Stroke disease detection and prediction using robust learning approaches. J Healthcare Eng 2021:7633381. https://doi.org/10.1155/2021/7633381 12. Emon MU, Keya MS, Meghla TI, Rahman MM, Mamun MSA, Kaiser MS (2020) Performance analysis of machine learning approaches in stroke prediction. In: 2020 4th international conference on electronics, communication and aerospace technology (ICECA), pp 1464–1469. https://doi.org/10.1109/ICECA49313.2020.9297525 13. Monteiro M, Fonseca AC, Freitas AT, Pinho E, Melo T, Francisco AP, Ferro JM, Oliveira AL (2018) Using machine learning to improve the prediction of functional outcome in ischemic stroke patients. IEEE/ACM Trans Comput Biol Bioinform 15(6):1953–1959. https://doi.org/ 10.1109/TCBB.2018.2811471. Epub 2018 Mar 1. PMID: 29994736 14. Liu T, Fan W, Wu C (2019) A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset. Artif Intell Med 101:101723. ISSN 0933-3657 15. Sailasya G, Kumari G (2021) Analyzing the performance of stroke prediction using ML classification algorithms. Int J Adv Comput Sci Appl 12. https://doi.org/10.14569/IJACSA.2021. 0120662 16. Saleh H, Abd-el Ghany SF, Younis E, Omran N, Ali A (2019) Stroke prediction using distributed machine learning based on Apache spark. https://doi.org/10.13140/RG.2.2.13478.68162

Real-time Multi-module Student Engagement Detection System Pooja Ravi and M. Ali Akber Dewan

Abstract We present a method to aggregate four different facial cues to help identify distraction among online learners: facial emotion detection, micro-sleep tracking, yawn detection, and iris distraction detection. In our proposed method, the first module identifies facial emotions using both 2D and 3D convolutional neural networks (CNNs) which facilitates comparison between spatiotemporal and solely spatial features. The other three modules use a 3D facial mesh to localize the eye and lip coordinates in order to track a student’s facial landmarks and identify iris positions as well as signs of micro-sleep like yawns or drowsiness. The results from each module are combined to form an all-encompassing label displayed on an integrated user interface that can further be used to provide real-time alerts to students and instructors when required. From our experiments, the emotion, micro-sleep, yawn, and iris monitoring modules individually achieved 72.5%, 95%, 97%, and 93% accuracy scores, respectively. Keywords Spatiotemporal features · 2D and 3D CNNs · Facial landmark detection · Student engagement detection · Online learning

1 Introduction Online learning has gained popularity in recent years as they offer relatively inexpensive lessons on various topics to interested students from the comfort of their location and time. This has certainly proven to be advantageous to the learners especially during the pandemic when all modes of physical classes were suspended. The online mode of learning aided in developing new skills, brushing up on known syllabi, and has even progressed to the extent of awarding full-fledged degrees. P. Ravi (B) Department of Computing Technologies, SRM IST, Kattankulathur, India e-mail: [email protected] M. Ali Akber Dewan School of Computing and Information Systems, Athabasca University, Alberta, Canada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_22

261

262

P. Ravi and M. Ali Akber Dewan

Though such an innovative online learning platform exists, human supervision is a major missing element. The aspect of learning which keeps students engaged is the fact that an experienced individual playing the role of a professor or teacher is always available to help out the learners in need. The absence of such an authority figure in online learning calls into question the effectiveness of such a mode of imparting knowledge. To address this concern, we propose a comprehensive and elaborate four-step process to ensure that a student is fully engaged while attending online lectures. In our work, we propose a multi-module approach to tackling the problem of student engagement detection and monitoring. We have adopted four different techniques involving facial movements, namely emotion identification from images and videos, micro-sleep tracking, yawn detection, and iris distraction detection by analyzing the position of the student’s iris using a 3D facial mesh. Further, we combine all four modules for obtaining a final consolidated label that indicates the level of engagement displayed by the student: fully engaged, somewhat engaged, or not engaged. This final label can be used as a metric to judge the involvement displayed by the student and can also be employed to track the attentiveness of the concerned individual.

2 Literature Review Several recent research works have attempted to tackle the problem of detecting students’ engagement in online learning. The most common methods used include facial expression identification, mouse usage activity, keyboard strokes activity, and head movement analysis. Some such recent research works have been discussed below. In [1], the authors proposed a multi-module engagement detection system that uses three important aspects of behavioral patterns in an online learning environment: facial expression, keyboard keystroke tracking, and mouse click-stream analysis. In [2], the authors make use of a mini-Xception CNN for detecting the facial expressions displayed for certain key frames. These key frames are identified using the cosine similarity metric. The results from the CNN and the mouse/keyboard click-stream data are combined and used with a Naive Bayes classifier to ascertain how engaged a learner is. The authors claim to achieve high accuracy by combining all these modules. In [7], the authors built a learning management system where they extracted frames from the DAiSEE dataset [4], processed, and classified them according to behavioral and emotional engagement levels using CNNs. Once the images are fed into the CNN model, these are classified as highly engaged, normally engaged, or not engaged. The Viola and Jones [17] and AdaBoost [3] algorithms are used to detect faces in frames for engagement classification. In [12], the authors detect the facial expressions and engagement levels from images. Facial expression recognition was performed using transfer learning under-

Real-time Multi-module Student Engagement …

263

taken by the VGG network [16]. This classifies facial emotions and uses this as input for the engagement detection module. Here, both behavioral and emotional dimensions are used while assigning labels. They combine the detected facial expressions and the engagement model by using some relevant features. From [6], the authors make use of prerecorded behavioral data logs from virtual learning environments in order to construct a valid dataframe. They further adopt six machine learning algorithms for performing engagement classification. Three aspects of course/student data are employed by them: scores, student qualification status, and how actively the student participates in the course. While training the myriad of algorithms, they make use of k-fold cross-validation and finally compare all the models, where the J48 decision tree outperformed the other algorithms. In [8], the authors extract important features and combine them with outputs from convolutional layers in order to perform multi-class classification for various engagement levels. They extract deep frame representations from pre-trained VGG16 model [16], and specifically, its fully connected layers. They use the DAiSEE dataset [4] from which images are processed and fed into the model. Further, the SVM classifier was adopted by them to evaluate the facial expression classifier. Additionally, they perform clustering to sort data into peak and neutral clusters which classify data points depending on the level of engagement shown. Authors of [15] perform distraction detection and emotion recognition in order to generate a concentration index between 0 and 100 (in percentage) which determines how engaged the student is. The distraction detection algorithm uses the ViolaJones algorithm [17] for recognizing faces and singles out the eyes to further record eye movement. This helps classify a student to be distracted or engaged. When the student is focused, the next module is activated to analyze the student’s emotions. They employ the Xception Net [2] for this purpose to identify seven basic emotions. Some aforementioned methodologies rely on very generic and arbitrary student data pertaining to mouse/keyboard strokes or even tabular data which cannot be directly attributed to engagement levels as they are not fully idiosyncratic to the concerned student. The means of identifying levels of engagement is, in certain works, limited to only classifying singular frames and does not consider the finer aspects like real-time facial movements or continuous frame analysis. Some other works make use of very limited data samples and even smaller subsets of homogeneous data for testing purposes, thereby restricting variety and the model’s generalizability. Engagement detection also requires more than just numerical/tabular data to account for active physical involvement. In order to address such drawbacks, we propose an all-encompassing real-time system that focuses on the student’s facial movements/reactions to the learning environment while analyzing and processing live video feeds.

264

P. Ravi and M. Ali Akber Dewan

3 Student Engagement Detection Framework 3.1 Overview We propose four different modules as seen in Fig. 1 for the purposes of detecting student engagement in real time. The modules capture the student’s facial movements in the frames obtained from the camera source. If the student meets the criteria for drowsiness, the integrated pipeline is brought to a halt and an alert is sounded. Otherwise, the frames are passed to the facial emotion detection module which uses 2D and 3D CNNs to identify the initial status of engagement. Further, the frames are sent to facial tracking algorithms wherein yawns and iris movement are captured concomitantly as demonstrated in Fig. 2.

Fig. 1 Modules in engagement detection

Fig. 2 Integrated system—flowchart

Real-time Multi-module Student Engagement …

265

3.2 Emotion Detection In order to accurately identify the emotions displayed by the students in question, CNNs with certain modifications have been employed. We use 3 primary residual CNNs: ResNet-18, ResNet-34, and ResNet-50 [5] in both 2D and 3D configurations for 1, 3, 5, and 7 frames, where one-frame classification pertains to frame-by-frame processing and multi-frame classification indicates the task of temporal video analysis and classification. Finally, the 2D VGG-16-BN architecture [10] has also been employed for model performance comparison and evaluation purposes. The aim is to compare and pick the best neural network architecture for the task at hand which strikes an optimal balance between accuracy and computational requirements. We employed residual networks due to their compute-optimized nature and lightweight processing. A residual neural network model consists of two primary parts: the residual bottleneck and the main model with convolutional blocks. The residual connections ensure that the identity projections of the initial image are fully retained in every forward pass. This leads to better data representation and helps engender more accurate predictions. Residuals have been used as placeholders for downsampling images since they provide an identity or copy of the image to each layer in the form of skip connections. These skip connections also aid in bridging gaps between disconnected layers of huge networks while retaining important information as shown in Fig. 3. We also include the VGG-16-BN network architecture [10] for comparing its performance with the other 2D residual networks of various depths. The VGG model hogs computational resources and so residual networks are already better alternatives in terms of conserving processing power. But the idea is to strike an optimal balance between performance and resource consumption and so, the performance of each model on a benchmark dataset has been discussed in more detail in the results and analysis section (Sect. 4).

Fig. 3 Residual network—skip connections

266

P. Ravi and M. Ali Akber Dewan

Fig. 4 Facial mesh and 468 landmarks

3.3 Micro-sleep Tracking Whenever students display signs of sleepiness or drowsiness, our algorithm for tracking micro-sleep patterns accurately detects a deviation from the engaged state. The proposed algorithm makes use of the Google MediaPipe [11] framework’s 3D facial mesh seen in Fig. 4 for locating facial landmarks. Upon zeroing in on the required landmarks around the eyes, the upper and lower eyelids are taken into consideration for calculating a metric known as eye aspect ratio using the eyelid distance as shown in Eq. (1). The eye aspect ratio is used to calculate the value which can precisely help predict whether the eye is closed or not. In our case, we use it to identify blinks and microsleep by employing a number of consecutive frames (25-frame threshold) as the parameter which classifies closed eyelids as either blinks or drowsiness. The aspect ratio in Eq. (2) is a fraction of the horizontal and vertical distance between both eyelids. When the eye is closed, the vertical distance equals zero and the ratio tends toward infinity, thereby indicating that the upper and lower eyelids are only an infinitesimal distance away from each other. This is registered as a blink and when the ratio does not decrease for more than 25 frames, it is considered to indicate a micro-sleep state. (1) Eyelid Distance = (x1 − x2 )2 + (y1 − y2 )2 where (x1 , y1 ) correspond to the midpoints and (x2 , y2 ) represent the endpoints of each of the eyelids. Aspect Ratio =

Horizontal eyelid distance Vertical eyelid distance

(2)

Real-time Multi-module Student Engagement …

267

Fig. 5 Yawn ratio estimation

Fig. 6 a Eye cropping and b iris localization

3.4 Yawns Detection Yawns are indicative of the person’s attention span and concentration levels. When a person yawns continuously while displaying signs of sleep, we can safely conclude that the person may not be actively attending their lecture. To detect yawns, a curve consisting of disjointed lines is drawn connecting all points across both the upper and lower lips as shown in Fig. 5. These lines are the primary reference for the lip’s position. Whenever the lines representing both lips are far apart and the distance between them crosses a set threshold, the aspect ratio is recorded and a yawn is identified. The MediaPipe framework [11] is used to track yawns using 3D facial landmarks. The coordinates of the lips are passed along with the frame to ascertain the topbottom and left-right distances between the extremities of the lips. Upon localizing on the necessary (x, y) coordinates, the points are scaled according to the height and width of the frame. Further, the aspect ratio of the top-bottom and left-right distances produces the value which is compared to the yawn threshold variable to ascertain if the individual is yawning or not.

3.5 Iris Distraction Tracking We use a real-time iris tracking algorithm for understanding the more nuanced aspects of how well a student is engaged during a lecture. The iris is usually centered on the camera when a concentrated student is involved in the online session.

268

P. Ravi and M. Ali Akber Dewan

Fig. 7 Distraction—a Left b Centered and c Right

The proposed method detects and tracks the iris upon localizing its position in the eye as seen in Fig. 6. This provides more importance to solely the movement of the iris from side to side or away from the camera in general. When such a deviation is observed, we consider it to be a distraction. The iris’s position is by default considered to be center-focused as in Fig. 6. The landmark points closest to the extremes of both eyelids are considered reference points. This baseline helps estimate the iris distance from the center of the iris. When the algorithm encounters a ratio greater than the set threshold, the range limit ensures that the iris is indeed distracted and, hence, displays an alert along with details about the direction in which the student is looking. As can be seen in Fig. 7, the most important parameter here is the ratio value near the position which calculates the extent to which the student turns their head.

4 Results and Analysis 4.1 Dataset 4.1.1

DAiSEE Dataset

The dataset used to train all the networks seen in Tables 1 and 2 is the DAiSEE [4] data consisting of images pertaining to four labels and four different intensity values. The four classes present include boredom, engagement, confusion, and frustration. The four distinct intensity values are positive integers ranging from 0 to 3 wherein 0 corresponds to very low intensity while 3 depicts the highest possible intensity for said label. Although the dataset is more than appropriate for the task at hand, the distribution of labels within each sub-folder is very uneven, and thus, a class imbalance problem confounds the neural networks. To address this, the data has been preprocessed to correspond to a binary classification task with engaged and not engaged states as the two primary labels. The labels were filtered out in such a way that upon encountering a higher intensity value for engaged labels than the intensities of all other classes combined, the student will be

Real-time Multi-module Student Engagement … Table 1 2D model configurations Models Dimensional configuration VGG-16-BN ResNet-18 ResNet-34

2D-1 frame 2D-1 frame 2D-1 frame

Table 2 3D model configurations Models Dimensional configuration ResNet-18

ResNet-50

3D-3 frames 3D-5 frames 3D-7 frames 3D-3 frames 3D-5 frames 3D-7 frames

269

Images used

Parameters (Millions)

4271 4271 4271

138.37 11.02 63.5

Images used

Parameters (Millions)

4593 7655 10717 4593 7655 10717

33.16 33.16 33.16 46.20 46.20 46.20

Fig. 8 Dataset and preprocessing

considered as fully engaged and not engaged otherwise. By adopting such a method, we prevent the skewing of actual results due to a class imbalance problem. We also apply some augmentations and perform full-fledged image processing to ensure that no unnecessary noise corrupts the dataset as can be seen in Fig. 8. Firstly, we pick out frames in regular intervals (skipped the frames depending on which dimensional configuration is chosen) and assign them to empty tensors which eventually get shifted to the GPU. This method of sporadically picking out frames helps us conserve memory and also minimize unnecessary noise. We augment the

270

P. Ravi and M. Ali Akber Dewan

Fig. 9 Some sample images from manually labeled data; a iris distraction b micro-sleep c Yawn

images using: minor rotations, sporadic inversions, and blurring. Further, we also perform image normalization. Finally, the batch of images is loaded into the PyTorch DataLoader along with the labels so that image-label pairs can be passed to the model as and when needed.

4.1.2

Manually Labeled Real-Time Dataset

Additionally, while evaluating each of the facial tracking modules: micro-sleep, yawn, and iris distraction detection, we manually labeled 100 frames for each category and made use of real-time live feed from the webcam source for obtaining our test data with appropriate ground truth labels as seen from Fig. 9. The labels generated by our algorithms were compared with manually produced ground truth labels.

4.2 Experimental Setup 4.2.1

Training

The training for various models mentioned was conducted using the Nvidia RTX 3080 Ti GPU with CUDA integration [13] and also employed PyTorch [14] as the deep learning framework. The DataLoader was constructed with a batch size of 8 and 16 for 2D and 3D models to train them for 150 and 200 epochs, respectively. The loss graphs are depicted in Figs. 10 and 11. The Adam optimizer [9] was used to control the training process along with varying learning rates as mentioned in Tables 3 and 4. The training was performed in a mixed precision configuration. A total of 1386 video files were parsed to curate data tensors with a varying number of frames for the training process. While training, dataloaders with 3-, 5-, and 7-frame inputs were sent to temporal (3D) models, and dataloaders with 1 frame inputs were fed to 2D models for disparate evaluation. As can be observed, the ResNet-18 3D configuration achieves the most optimal results both in terms of loss achieved and computational complexity (number of

Real-time Multi-module Student Engagement …

271

Fig. 10 2D loss graphs

parameters) from Tables 2 and 4 and so this model will be further evaluated and employed as the primary model to generate predictions.

4.2.2

Mixed Precision Training

The neural network model uses weight tensors which get updated during training in the Float 32 space. This occupies a huge chunk of the memory, thereby causing some stalls and obstacles while training on a large amount of data. While the frequently used float32 tensors are representative of all the weight matrices learned by the model, one effective memory footprint reduction technique can be used to aid in training our network in a more efficient manner. Mixed precision training helps scale gradients, which sporadically shifts weight matrices to the float16 space.

272

P. Ravi and M. Ali Akber Dewan

Fig. 11 3D loss graphs Table 3 2D training results Models Frames/instance VGG-16-BN ResNet-18 ResNet-34

1 1 1

Table 4 3D training results Models Frames/instance ResNet-18

ResNet-50

3 5 7 3 5 7

Epochs

Learning rate

Mean training loss

200 200 200

0.0001 0.0001 0.0001

0.0509 ± 0.0102 0.0672 ± 0.0100 0.0680 ± 0.0103

Epochs

Learning rate

Mean training loss

150 150 150 150 150 150

0.00001 0.00001 0.0001 0.0001 0.0001 0.0001

0.0635 ± 0.0785 ± 0.0851 ± 0.3579 ± 0.4861 ± 0.5523 ±

0.0115 0.0146 0.0134 0.0234 0.0185 0.0147

4.3 Metrics 4.3.1

Binary Cross Entropy Loss

We employ this loss criterion to evaluate the performance of the various 2D and 3D convolutional networks for binary classification. The loss requires the value of the model’s prediction and the corresponding ground truth labels in order to compare

Real-time Multi-module Student Engagement …

273

both to arrive at a certain error rate. This error must be minimized by the model and whichever model achieves this minimization to the best possible extent is considered to be a good fit for the data. BCE =

1 yi · log( p) + (1 − yi ) · log(1 − p) N n

(3)

In Eq. (3), N is the number of training samples, yi is the actual label, and p is the probability output from the model.

4.3.2

Precision and Recall

Precision is the measure of how many frames have been correctly classified as belonging to the positive class. This ratio depicts the extent to which our algorithm performs positive label classification. It is depicted in Eq. (4). Precision =

True Positives True Positives + False Positives

(4)

Recall signifies how many positive labels were correctly assigned by our algorithm when taking into consideration all possible positive labels present in the entire dataset. It is expanded in Eq. (5). Recall =

4.3.3

True Positives True Positives + False Negatives

(5)

F1 Score

This metric is a more balanced accuracy measure that combines both the precision (P) and recall (R) values to obtain a single comprehensive score to evaluate any algorithm’s performance. It is the harmonic mean of precision and recall as shown in Eq. (6). F1 Score =

2× P × R P+R

(6)

4.4 Results and Discussion 4.4.1

Facial Emotion Recognition

For behavioral analysis, the ResNet-18 3D model was employed due to its robust performance and 400 test images were fed to the model. The algorithm’s and model’s

274

P. Ravi and M. Ali Akber Dewan

Table 5 Confusion matrix—facial emotion detection Labels Truly engaged Predicted engaged Predicted not engaged

159 69

Truly not engaged 41 131

Fig. 12 Time and space complexity

accuracy values can be ascertained by interpreting the confusion matrix in Table 5 and deriving results from it. The tables depicting the confusion matrix all have 2 columns and 2 rows for displaying the count of actual and predicted labels, respectively. As can be observed from Table 3, the average training loss is least for a 2D network like VGG-16-BN which fits the data well but is too heavy an architecture for our use-case. On the other hand, the ResNet-50 3D architecture with spatial and temporal convolutions turns out to be overkill for the 3-, 5-, and 7-frame data sequences. The most optimal convergence along with a minimal number of parameters is achieved by the temporal yet lightweight ResNet-18 3D architecture for a 3-frame sequence as can be seen in Fig. 12 which shows how some of the best performing models fare against each other with respect to their time and space complexity in terms of inference times (milliseconds) and model parameters (millions). Testing results have been included in Table 9 to further corroborate the efficiency of the ResNet-18 3D model. Thus, the ResNet-18 3D architecture hits the sweet spot with respect to both performance and optimal usage of computational resources. Upon validating the model, the mean loss value was observed to be: 0.8436 ± 0.046. This was performed on 400 images and was run on the GPU. The GPU inference time for validation was 0.00793 s while on the CPU it was observed to be 0.755 s.

4.4.2

Micro-sleep, Yawns, and Iris Distraction Detection

To analyze the performance of—micro-sleep tracking, yawn identification, and iris distraction detection modules, we manually labeled 100 live frames obtained from the webcam at regular intervals and passed each one through all the aforementioned

Real-time Multi-module Student Engagement …

275

Table 6 Confusion matrix—micro-sleep tracking Labels Actually drowsy Predicted drowsy Predicted not drowsy

Actually not drowsy

77 2

3 18

Table 7 Confusion matrix—yawn detection Labels Actually yawning Predicted yawning Predicted not yawning

Actually not yawning

78 1

2 19

Table 8 Confusion matrix—iris distraction detection Labels Actually distracted Predicted distracted Predicted centered

Table 9 Results Module Micro-sleep tracker Yawn detection Iris distraction Emotion detection

Actually centered

66 3

4 27

Precision

Recall

F1 score

Accuracy

0.962

0.974

0.968

0.950

0.975 0.943 0.795

0.987 0.956 0.697

0.981 0.950 0.743

0.970 0.930 0.725

algorithms. The confusion matrix for all three algorithms can be found in Tables 6, 7, and 8 to ascertain the performance in real time. To further corroborate the efficacy of our proposed work, we calculate accuracy values and F1 scores using precision and recall values. The values in Table 9 depict how accurate our algorithms are for various device configurations. The frames obtained to test the performance were all curated to contain a myriad of settings and backgrounds.

4.4.3

Real-time Integrated System

Upon processing the results from each of the modalities, we have come up with a suitable truth table as seen in Table 10 which can help display an aggregated result that can be used to ascertain whether the individual is fully, somewhat (moderately),

276

P. Ravi and M. Ali Akber Dewan

Table 10 Truth table for integrated system Emotion engagement Yawns identified Engaged (1) Engaged (1) Engaged (1) Disengaged (0) Disengaged (0) Disengaged (0) Disengaged (0)

Count2 (1) Count T h1

Fref = F 1 for i= to n

Calculate frame difference NO

Consider Fref & Fi

Update Fref

NO

SSIM < T h2 YES Scale Invarient Feature Transform

Output Key frames

Fig. 1 Flow diagram to extract key frame based on SIFT and structural similarity

value is calculated from the mean and standard deviation of the absolute difference. Compression ratio and fidelity measure is computed to show the accuracy of the method. In this paper, the frames that content salient features and also have adequate video contents are considered as the key frames. Low level feature difference between frames and similarity measurement is performed to choose the frames. Along with that the local features are identified by applying the SIFT algorithm. The approach can extract the frames from the input video irrespective of the formats and the resolution of the video. The proposed approach is based on very basic low level descriptors; hence, the complexity of the task is very less. The flow of the work is shown in Fig. 1. The remaining content of the paper is arranged in the following way. Details of the key frame extraction technique in explained in the proposed method section [Sec. 2]. The results of the algorithm tested on various input videos are shown in Sect. 3. Finally, in Sect. 4, the discussions on the proposed algorithm and a brief conclusion are presented.

2 Proposed Method The proposed key frame detection works by evaluating the structural and feature key point similarities among frames extracted from video stream which is discussed as follows.

364

P. De

2.1 Structural Similarity Between Frames Here in the work, first the frames are extracted from the input video and all the extracted frames are maintained in a chronological order like F1 , F2 , ...Fk according to the video scene. For all the extracted frames in k, the frame difference between all the pair of two consecutive frames F j and F j+1 are measured where j < k. The difference between two consecutive frame is calculated by using the algorithm in 1.

Algorithm 1: Compute frame difference Input: Input video stream Output: Key frames 1 Extract all k number of frames f 1 , f 2 , ... f k and find their respective RGB values R V , G V , BV 2 Calculate the RGB mean as Rm , G m , Bm 3

Rm = Gm =

k

j=1

Rj

kk j=1

k k

Gj Bj

Bm = j=1 k 4 Compute Frame difference f di f f . 5 Frdi f f = |Rv − Rm | + |G v − G m | + |Bv − Bm | 6 Find the Max Frdi f f and Min Frdi f f .

A candidate frame is frame if the frame difference Frdif > Th1, where the threshold Th1 is selected empirically. For the candidate key frames, the structural similarity (SSIM) is checked between each pair of frames as follows. The SSIM is computed using the image local features such as luminance, contrast, and structure [14]. The SSIM is computed as follows. • Luminance: The intensity values are considered to find the luminance and the Nmean xi . of the intensity values represent the luminance and computed as μx = N1 i=1 Similarly, μ y can also be calculated. To measure the luminance comparison function lm(x, y), the following equation is used. lm(x, y) =

2μx μy + C1 μx 2 + μy 2 + C1

The value of C1 is taken as C1 = K 1 L 2 such that K 1 is a small constant and K 1 2) for the problem of collision avoidance among agents. Woo and Kim [4] studied collision avoidance for an unmanned surface vehicle using deep reinforcement learning.

3 Methodology Recent developments in reinforcement learning have made it possible for academics and scientists to devise unique ways to use complex computations quickly and easily in neural networks. The deep Q-networks technique served as the catalyst for recent advances in scaling reinforcement learning (RL) to difficult sequential decisionmaking situations [5, 6]. It was able to develop a human level of competency at a variety of Atari games with the use of convolutional neural networks, experience replay, and Q-learning. Other extensions that boost its stability or speed have since been proposed. In recent years, researchers have proposed various strategies for reinforcement learning using neural network function approximators. These include deep Qlearning, vanilla policy gradient methods, and trust region natural policy gradient methods, which are currently the top contenders in the field. However, there is still potential for development in creating an approach that is data efficient, resilient, and scalable to big models and parallel implementations. The proximal policy optimization (PPO) algorithms belong to a new category of policy gradient methods that involve two alternating steps: data sampling through environmental interaction and maximizing a “surrogate” objective function using stochastic gradient ascent. The new PPO techniques share some of the benefits of trust region policy optimization (TRPO). However, they are considerably easier to implement, more general, and have better sample complexity [7]. PPO is put to the test in a number of experiments on a variety of benchmark tasks, such as playing Atari games and simulating robotic locomotion. Results from [8] found that “PPO outperforms other online policy gradient methods and overall strikes a good balance between sample complexity, simplicity, and wall-time.” However, traditional reinforcement learning techniques such as Q-learning or policy gradient are not well-suited for multi-agent systems. This is because the changing policies of each agent can make the environment non-stationary, which can cause learning instability and prevent the use of past experience replay. Policy gradient methods, which are commonly used for coordinating multiple agents, often suffer from high variance [9]. Model-based policy optimization methods, which rely on a differentiable model of the world and assumptions about agent interactions, can be applied to competitive environments, but they can be difficult to optimize. Adversarial training methods are also known to be unstable. Therefore, there is a need for new approaches that can effectively handle multi-agent reinforcement learning.

506

A. Thaninayagam et al.

3.1 Reinforcement Learning Motivation: The use of reinforcement learning (RL) in the development of collision avoidance systems is motivated by the need for efficient and effective methods for dealing with complex decision-making problems in dynamic environments. Traditional techniques in computer vision and image processing are not well-suited to this task due to the challenges of acquiring and labeling large datasets, the computational complexity of the algorithms, and the difficulty of implementing these methods in real-world scenarios. Dataset: Training a computer vision system typically requires large datasets of labeled images, which can be difficult and expensive to obtain. Extracting images from video and labeling each frame can be time-consuming, and labeling billions of frames is almost impossible. Computational capacity: The complexity of the algorithms used in computer vision and image processing, along with the need to manipulate data in multidimensional matrices, can be computationally intensive. Achieving good accuracy often requires significant amounts of computational resources, such as TPUs with large memory units. Even with these resources, it is not always possible to achieve the desired level of accuracy. The proposed reinforcement learning model has a compact implementation, requiring only 240 lines of code. It can be trained quickly, in less than 30 s, using only a standard laptop with an Intel I5 10th gen CPU. Despite its simplicity, the model is able to achieve superhuman performance, as demonstrated by its ability to avoid collisions at high speeds in a simulated environment. This capability is made possible by the use of reinforcement learning, which allows the agent to learn the best actions to take in a given situation without explicit supervision. The proposed model could be implemented in real-time scenarios, such as in autonomous vehicles equipped with systems that can estimate the coordinates of other vehicles. Background. In RL, the goal is to learn a policy for taking actions in an environment to maximize a scalar reward signal. One common approach for achieving this is to estimate the action-value function, which represents the expected long-term reward for taking a specific action in each state. This is typically done using the Bellman equation, which provides an iterative update rule for estimating the action-value function. By using this equation to update the estimates of the action-value function, reinforcement learning algorithms can learn to take the best actions in each situation to maximize the long-term reward. In the following sections, we will provide a more detailed explanation of the Bellman equation and its role in reinforcement learning. Q i+1 = E r + γmaxa Q i s , a |s, a

(1)

Value iteration algorithms, which use the Bellman equation to iteratively update the estimates of the action-value function, are known to converge to the optimal action-value function, Q*(s, a), as the number of iterations increases. However, this

Collision Avoidance System Using Reinforcement Learning

507

Fig. 1 Block diagram of reinforcement learning algorithm

basic approach is not practical in many situations, as it requires estimating the actionvalue function separately for each individual state-action pair without any generalization. Instead, it is common to use a function approximator, such as a neural network, to estimate the action-value function, Q(s, a; θ ) ≈ Q*(s, a), where θ represents the parameters of the function approximator. This allows the action-value function to be estimated more efficiently and enables the agent to generalize its knowledge to new states and actions. The agent receives a reward for each action it takes in the environment, based on the current state. The goal of the agent is to learn a policy that maximizes the cumulative reward it receives over time, as shown in Fig. 1. In many reinforcement learning algorithms, the action-value function is estimated using a function approximator, such as a deep neural network. This network, known as a Q-network, is parameterized by θ and is used to predict the Q values for each state-action pair. The Q-network is trained by minimizing a loss function, L i (θ i ), at each iteration i, as given by Eq. 2. L i (θi ) = E s,a∼ρ(·) [(yi − Q(s, a; θi ))2 ]

(2)

r + γmaxa Q s , a ; θi−1 |s, a is the target for iteration i and ρ(s, where yi = E s∼E a) is a probability distribution over sequences s and actions a that is referred to as the behavior distribution. The parameters from the previous iteration θ i−1 are frozen when optimizing the loss function L i (θ i ). Differentiating the loss function with respect to the weights, the following gradient in Eq. 3 can be arrived at. ∇θi L i (θi ) = E s,a∼ρ(·);s ∼ε [(r + γmax Q(s, a, θi−1 ) − Q(s, a; θi )∇θi Q(s, a; θi )] (3) The return Gt is the discounted sum of future rewards, where the discount factor is used to account for the importance of the future steps. An agent maximizes its return Gt by finding the optimal policy π through multiple iterations through exploration and exploitation. The policy can be learned in two ways—off-policy and on-policy. In on-policy methods, the agent tries to learn the optimal policy by directly changing the policy function, whereas in off-policy methods, the agent finds the optimal policy by using an estimate of the policy. When following a policy π that starts from a given state s, vπ (s) = E π [Gt |S t = s], or state-action pair, qπ (s, a) = E π [Gt |S t = s, At = a],

508

A. Thaninayagam et al.

optimizing these two values with respect to the policy would in-turn yield a better policy. Typically, the process of finding a new-optimal policy from a state-actionvalue function involves acting greedily with respect to the action-values. This implies that the highest action-value pair is taken with probability 0.9 and otherwise using a random policy with epsilon probability. Environments and agent. In reinforcement learning, the environment and the agent interact at each time step. At any given time step t, the agent observes the environment’s state S t , selects an action At , and receives a reward Rt+1 from the environment. This setting is called a Markov decision process (MDP), which is described as a tuple . The set S denotes a finite set of states, A denotes a finite set of actions, T (s, a, s ) is the transition function indicating the probability of transitioning from state s to state s after taking action a, r(s, a) is the reward function, which specifies the expected reward for taking action a in states, and γ is the discount factor, which determines the relative importance of future rewards. In an MDP, the primary objective of the agent is to learn a policy that specifies the most suitable action to take in each state, ensuring that the long-term reward is maximized. In practice, reinforcement learning experiments are typically episodic, meaning that they are divided into a series of episodes. During each episode, the discount factor γ remains constant, except at the end of the episode, where γ is set to 0. This allows the agent to focus on maximizing the reward within each episode, while still considering the long-term consequences of its actions. Policy and Reward. A policy is a function that specifies the best action to take in each state in order to maximize the long-term reward. One method for learning such a policy is Q-learning, which estimates the action-value function Q(s, a) and uses this to choose the best action in each state. At each step, the agent chooses an ∈-greedily based on the action-values and stores the transition (S t , At , Rt+1 , γt+1 S t+1 ) in a replay memory buffer [10], which stores up to a million previous transitions. The neural network parameters are then optimized using stochastic gradient descent to minimize the loss, as given by (Rt+1 + γt+1 max qθ (S t+1 , a ) − qθ (S t , At ))2 , where t is a randomly selected time step from the replay memory. The loss gradient is backpropagated only to the parameters θ of the online network, which is also used for action selection. The term θ represents the parameters of a target network, a periodic copy of the online network which is not directly optimized. At each step, the agent greedily selects an action according to the estimated action-values and adds a transition to a replay memory buffer. The parameters of the Q-network are then optimized using stochastic gradient descent, and the gradient of the loss is backpropagated into the network parameters. Another popular approach for learning a policy in reinforcement learning is to use policy gradient methods. Policy gradient methods are widely used for learning a policy in reinforcement learning. The basic concept of these methods is to modify the policy parameters directly to maximize the expected cumulative reward, J (θ ) = E S∼ pπ ,a∼π θ [R]. To achieve this, the method involves taking steps in the direction of the objective function’s gradient, ∇θi J (θi ). The gradient of the policy can be expressed using the previously defined Q function, as depicted in Eq. 4.

Collision Avoidance System Using Reinforcement Learning

∇θ J (θ ) = E S∼ pπ ,a∼π θ [∇θ log π θ (a|s)(s, a)]

509

(4)

By following this gradient, the agent is able to learn a policy that maximizes the expected cumulative reward. This can be done using a variety of optimization algorithms, such as gradient ascent or stochastic gradient.

4 Collision Avoidance System with Reinforcement Learning The idea of decentralized training has been adopted in this study, which refers to training the agents independently without sharing the policies. Although [7] argues that decentralized training causes high variances and instability in training, there has been substantial evidence from the paper [11] that decentralized training does work and has proven to be effective and stable in different scenarios. The same is shown in Figs. 2 and 3. In this study, we investigate the use of reinforcement learning for collision avoidance in autonomous vehicles. In particular, we adopt a decentralized training approach, where each agent learns its own policy independently without sharing information with other agents. The previous research has suggested that decentralized training can cause high variances and instability in training, but there is also evidence that it can be effective and stable in certain scenarios. Our experimental results, shown in Figs. 2 and 3, provide further support for the effectiveness and stability of decentralized training in the context of collision avoidance. In this research, a multi-agent setup of Markov decision processes is studied in a partially observable environment. The agents are defined by a set of possible states S, actions A1 , …, AN , and observations O1 , …, ON , and each agent uses a stochastic policy to choose its actions. The agents receive rewards based on the state and their actions, and the goal is to maximize their total expected return over time. To choose actions, each agent i chooses its action by a stochastic policy πθi : Oi × Ai → [0, 1], which produces the next state according to the state transition function T: S × Ai × · · · ×AN −→ S 2 . Each agent i obtains rewards as a function of the state and agent’s action r i : S × Ai −→ R and receives an observation corresponding with the state oi : S −→ Oi . The initial states of each agent i and the discount factor T γ determine , γ t rit over a the agents’ total expected return, which is calculated as Ri = t=0 time horizon T. This approach used in this research is called a partially observable Markov decision process, which is a well-known technique employed in multi-agent environments. The performance of different algorithms is evaluated in this setting, as shown in Fig. 3. The above idea has been extended to work with deterministic policies. If we now consider N continuous policies μθi w.r.t. parameters θ i (abbreviated as μi ), the gradient can be written as Eq. 5. μ

∇θi J (μi ) = E x,a∼D [∇θi μi (ai |oi )]∇ai Q i (x, a1 , . . . , a N )|ai =μi (oi ) ]

(5)

510

A. Thaninayagam et al.

Fig. 2 Overview of our multi-agent decentralized approach

Fig. 3 Performance of different algorithms in the multi-agent particle world environments

Here, the experience replay buffer D contains the tuples (x, x , a1 , …, aN , r 1 , …, μ r N ), recording experiences of all agents. The centralized action-value function Q i is updated in Eq. 6. μ

L(θi ) = E x,a,r,x [(Q i (x, a1 , . . . , a N ) − y)2 ], μ

y = ri + γ Q i (x , a1 , . . . , a N )|a j = μj (o j ).

(6)

Collision Avoidance System Using Reinforcement Learning

511

5 Experiments The agents in the Markov decision process can be considered as ground vehicles (could be aerial as well), and the observation space is formulated as the coordinates (x, y) of other vehicles that are fed through radars present in the autonomous or semi-autonomous vehicle. The action space could be formulated as a continuous function of speed (v) and angular velocity (w), since the vehicles are free to move in any direction in the 2D plane. The reward function is formulated as a function of distance from static and dynamic obstacles that is discounted by a hyperparameter which can be fine-tuned to make the model more robust. The experiments are conducted, keeping in mind the variance and challenging nature of the environment, the vehicle is interacting in. There could be both static and dynamic obstacles present in the environment—could be roads, aerial or water surface. The model is trained keeping in mind the real-world scenario and dynamic applications to make it robust and usable in any conditions. The experiments involve changing the initial positions of the vehicle and the dynamic obstacle which is instructed to move toward the vehicle to hit it as soon as possible. The vehicle tends to dodge the obstacle reaching it and moves toward a direction to reach its target destination in the fastest way possible. The vehicle also makes sure that it does not collide with the static obstacles present nearby—could be inanimate objects like poles, lights, or pedestrian as shown in Fig. 4. The effectiveness of the model is demonstrated in its ability to reach the target destination by taking the shortest path possible while making sure the vehicle is safe from obstacles. The model is trained using a neural network (2 layers of 65 units) with 50,000 timesteps during training and 0.01 s as step size of the vehicle. The starting coordinates of the vehicle and the obstacles were changed, and the performance was recorded for each case. Table 1 discusses the speed of the vehicle with its distance from the dynamic obstacle to present an easy way to understand the performance of the model. The proximal policy optimization (PPO) algorithm is used from [12] to train the model, and the corresponding output with evaluation metrics is recorded in Table 2.

6 Conclusion The present study demonstrates the potential of reinforcement learning for replacing complex control engineering and computer vision systems in collision avoidance research. This approach offers a practical and real-time solution for implementing safety systems in autonomous vehicles. The results of the study show that reinforcement learning algorithms provide an efficient and feasible way to achieve this goal. This study represents an important step toward developing state-of-the-art collision avoidance technology for use in autonomous vehicles. The comprehensive model

512

A. Thaninayagam et al.

Fig. 4 Dotted blue line—vehicle pathway, dotted red line—dynamic obstacle approaching vehicle, red dots represent static obstacles. The axis represented is the Cartesian coordinate system with x and y axis. Used [12, 13] for reference

Table 1 Speed of vehicle, distance from dynamic obstacle measured against time step t Instance

t+1

t Distance from obstacle

Vehicle speed

Distance from obstacle

t+2 Vehicle speed

Distance from obstacle

t+3 Vehicle speed

Distance from obstacle

Vehicle speed

1

0.87

2.47

0.21

1.88

0.46

5.71

0.18

1.63

2

1.20

10.78

0.95

18.26

1.71

32.92

3.45

58.36

3

1.03

12.27

1.04

2.27

0.74

4.29

0.79

7.08

4

0.52

41.77

6.95

66.25

4.31

18.66

0.87

2.47

developed in this study provides a valuable starting point for further research and development in this area.

Collision Avoidance System Using Reinforcement Learning

513

Table 2 Metrics obtained from the model output during training for each iteration Variables

Iteration 1

Iteration 2

Iteration 3

Iteration 4

approxkl

0.00012

0.0012

0.0003

0.0001

clipfrac

0.0

0. 0019

0.0

0.0

explained_variance

− 0.0006

− 0.0004

− 0.0004

− 0. 0002

fps

811

826

872

723

n_updates

7

8

9

12

policy_entropy

1.4159

1. 4147

1.4123

1. 4096

policy_loss

0.0010

− 0.0034

− 0.0033

0.00028

serial_timesteps

896

1024

1024

1536

time_elapsed

1.21

1.37

1.37

2.01

total_timesteps

896

1024

1024

1536

value_loss

1984.712

1991.057

1991.057

1887.147

References 1. Samende C, Cao J, Fan Z (2022) Multi-agent deep deterministic policy gradient algorithm for peer-to-peer energy trading considering distribution network constraints. Appl Energy 317:119123 2. Hsu YH, Gau RH (2020) Reinforcement learning-based collision avoidance and optimal trajectory planning in UAV communication networks. IEEE Trans Mob Comput 21(1):306–320 3. Chen YF, Liu M, Everett M, How JP (2017) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 285–292 4. Woo J, Kim N (2020) Collision avoidance for an unmanned surface vehicle using deep reinforcement learning. Ocean Eng 199:107001 5. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 6. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533 7. Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30 8. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 9. Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D Rainbow (2018) Combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence 10. Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3):293–321 11. Yu C, Velu A, Vinitsky E, Wang Y, Bayen A, Wu Y (2021) The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 12. Hill A, Raffin A, Ernestus M, Gleave A, Kanervisto A, Traore R, Dhariwal P, Hesse C, Klimov O, Nichol A, Plappert M, Radford A, Schulman J, Sidor S, Wu Y (2018) Stable baselines. Published in Github in the year 2018 through https://github.com/hill-a/stable-baselines 13. Sakai A, Ingram D, Dinius J, Chawla K, Raffin A, Paques A (2018) PythonRobotics: a Python code collection of robotics algorithms. arXiv preprint: 1808.10703

Relationship Management in SIoT: A Survey M. Shruthi, D. Sendil Vadivu, and Narendran Rajagopalan

Abstract In the era of digitalization, IoT is an emerging technology which makes things exciting, and in statistics, it is expected to be capable of connecting 75.4 billion devices to communicate with each other and with an economic value of USD 3.9 trillion to 11.2 trillion by 2025. A new idea of combining the social network with IoT has led to the development of a new paradigm, Social Internet of Things (SIoT). Things are considered to possess a social consciousness and are expected to mimic human social networking. One of the essential components of SIoT is relationship management which handles the choice of devices, their communication decision, and the duration of the decided relationship. Relationship management is critical because it determines the number of objects that can be discovered and the duration of the communication. We aim to analyze the relevant work done in relationship management in SIoT and compile a comprehensive analysis of the various methods used for it and their benefits and shortcomings. Keywords Social Internet of Things · Social networks · Relationship management

1 Introduction 1.1 Introduction to SIoT Technological progress has made it possible to embed computational and communication features into devices initially programmed to do mundane tasks. This resulted M. Shruthi (B) · D. Sendil Vadivu · N. Rajagopalan Department of Computer Science and Engineering, National Institute of Technology Puducherry, Karaikal, India e-mail: [email protected] D. Sendil Vadivu e-mail: [email protected] N. Rajagopalan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_40

515

516

M. Shruthi et al.

in the birth of the Internet of things (IoT) [1]. As we enter an era in which the number of devices outnumbers the human population, the scalability of IoT has emerged as a major concern. Atzori et al. [2] coined the Social Internet of Things (SIoT) as a concept that was proposed to enable devices to have social relationships among themselves like humans. It was the introduction of social network concepts into IoT. The aim is to give these devices or objects a social consciousness and enable them to connect and communicate with other devices and perform related tasks with less or no human intervention [3]. SIoT addresses the scalability issue where due to the abundance of devices, their true potential that can be reached by working together seamlessly cannot be realized. SIoT enables devices to discover services easily. They construct their social network in order to achieve common goals such as boosting effectiveness, performance, and functionality, as well as providing the services they require. Each device can communicate with other devices to improve human–object interaction based on a set of rules established by its owners. The owner defines the rules for how objects communicate, pick a companion, and provide services. Many interesting use cases have also been discovered with respect to SIoT. Ericsson [4] envisioned a future where SIoT is part of our day-to-day activities. They consider a use case where a person can message his home automation system and all the smart devices at home communicate with each other based on their relationship. Another use case is cars on the road exchanging insurance details with each other automatically in the event of an accident without human intervention. Malekshahi et al. [2] summarize the essential components of SIoT into service discovery, relationship management, service composition, trustworthiness management, and service APIs. The goal of service discovery is to determine whether objects are capable of providing the needed service. It is comparable to how people use social media to find friends and information. The aim is to create intelligent objects that can make this discovery with minimal human participation. The SIoT relies on amicable interactions between objects to establish a social community that can request or respond to relevant services. Relationship management reveals the underlying intelligence of items, allowing them to decide whether to begin, update, or end a friendship. The object decides what kind of relationship it wants to have with the other objects and shares information or services accordingly. A recommendation system can offer recommendations for services based on the relationship between the devices [5]. Communities are identified by analyzing the relationships. They comprise devices that are related by one or more relationship types. Inside a community, users with common interests are identified. Devices of the same group of interest can share services and recommendations. At this point, it is worth noting that this ideology of imparting intelligence through cooperative relationships between devices is a generalized framework. Many other fields of engineering also develop and deploy this framework as a founding basis for its functionalities [6]. The interaction between the items is enabled through service composition. It combines many services to provide an acceptable response to the service request.

Relationship Management in SIoT: A Survey

517

Most of the time, an object attempts to gather knowledge about the actual world or locate a certain service provided by another object. This component will also feature crowd information processing functionality. This is for information processing acquired from various objects. Trustworthiness management deals with how much confidence devices that are connected in SIoT have in each other [7, 8]. This plays a crucial role while ensuring privacy and security since being connected to untrusted objects may make the network vulnerable to attacks which might lead to loss of privacy or bring down the entire network. Various new methods are being developed at present to calculate trust between devices and to safeguard the system from attacks and malicious devices. All third-party services are included in service APIs so that the user can profit from additional services. It is contained within the SIoT architecture’s application layer and communicates with the device and human interfaces.

1.2 Introduction to Relationship Management in SIoT Relationship management probes the intelligence of the devices in a SIoT environment on how they connect with devices. Just like humans form relationships based on certain factors, devices in SIoT are required to form relationships and add other devices as their friends automatically. The different types of relationships among objects in a SIoT and their real-life examples are shown in Fig. 1. (i) (ii) (iii) (iv) (v)

Parent Object Relationship (POR): This relationship rarely changes over time, such as objects manufactured together or by the same manufacturer. Owner Object Relationship (OOR): This relationship is established among devices that are owned by the same person. Co-work Object Relationship (CWOR): This relationship is formed between objects that share tasks or work toward the same goal. Co-locate Object Relationship (CLOR): This relationship is formed between objects that exist in close proximity to each other. Social Object Relationship (SOR): This relationship is established when owners have social interactions and friendships with each other and are projected onto the objects.

Roopa et al. [9] have classified relationships between objects into two broad categories, namely user object (UO) relationship and object–object (OO) relationship. UO relationships deal with the relationships formed as a result of some connection between the user and the object. Apart from the owner object relationship and social object relationship, it also includes as follows: • Sibling Object Relationship (SIBOR): This relationship is established among objects that are owned by family members or the same group of friends. • Guest Object Relationship (GSTOR): This relationship is formed when the owner of a device is a guest of the owner(s) of the other device(s).

518

M. Shruthi et al.

Fig. 1 Examples for different types of relations in SIoT

OO relationships include those relationships where the objects are related to each other by certain properties. Apart from parent object relationship, co-location object relationship, and co-work object relationship, it also includes as follows: • Guardian Object Relationship (GOR): This relationship is formed as a result of some master–slave-like association between objects. • Stranger Object Relationship (STGOR): This relationship is established when objects are in contact with each other in public spaces or scenarios where they have previously never met and might probably never meet again. • Service Object Relationship (STOR): This relationship is formed when two or more objects are tasked with catering to and completing the same service. As we have briefed the basic concepts of the SIoT, the remainder of this article is organized as follows. Section 2 discusses the general architecture of SIoT and the architecture we propose with emphasis on relationship management. Section 3 discusses recent and relevant works that are related to relationship management in SIoT.

Relationship Management in SIoT: A Survey

519

2 Architecture Although there is no inflexible architecture in place into which all SIoT systems have to fall, many works have proposed their own interpretation of the architecture of SIoT, including ones that draw inspiration from architecture proposed for IoT [10, 11]. Atzori et al. [2] propose an architecture with two sides, namely the server side and the object side. The server side contains a base, component, and application layer, which together take care of the overall computation in the SIoT system. The object side involves the physical, abstraction, and social layers, which communicate with the server side. This architecture views SIoT from a centralized perspective. Malekshahi et al. [2] have envisioned a five-layer architecture which consists of layers already existent in IoT architecture, such as entity, communication, and application layers, along with abstraction and social interaction layers to aid SIoT-related transactions. These new layers are aimed at solving issues related to heterogeneity and user relationships, respectively. Farhadi et al. [12] suggest a straightforward threelayer architecture that consists of a physical layer that handles all sensor and network operations, a SIoT middleware layer that handles all operations related to the SIoT system, and an application layer that interacts with the user. We have proposed an architecture in Fig. 2 inspired by the architectures previously proposed while wanting to put emphasis on relationship management. Although the SIoT system will perform other activities apart from relationship management, for relationship management, the computational engine must have access to data about both the users as well as the objects. That is why we have proposed a three-layer model where the objects, users, and logical layer are separate entities. The logical layer which houses the relationship manager among other functions will get access to data from both the object and user layers. The logical layer is illustrated within the SIoT system because it can either be centralized or distributed among the objects in the system.

3 Related Work Various techniques have been proposed to form relationships between objects. These objects can be heterogeneous or homogeneous, and the relationship can be based on the many sorts of relationships described between things or be impacted by the social ties of the object’s owners. Table 1 gives a concise summary of our findings. Khelloufi et al. [13] in their work focus on finding apt service recommendations based on the social relationships of the owners of the devices. A specific service provided by a device will be suggested to other devices that have strong social ties with the original device. Communities are the name given to such groups. The social connections between devices promote a group’s similarity and trustworthiness. These groups can be used to implement customized hybrid service recommendations in the

520

M. Shruthi et al.

Fig. 2 Proposed SIoT architecture with emphasis on relationship management

SIoT context. Following the formation of the communities, the framework analyzes the interests of the owners in order to establish a cluster of users with shared interests. These are referred to as interest groups. This method computes the device communities for each social connection individually. The communities of socially linked devices are then determined by locating and combining the communities of individual social links. First, the community’s head is identified. It is the node with the highest degree of connection in the social interaction considered. The community is then expanded by recursively incorporating the list of neighbor nodes and, finally, their neighbor nodes. The node is included only if its inner connectivity with the current community is greater than its external connectivity. Devices in the same neighborhood receive similar servicing recommendations. The paper proposes a graph-based method to find out communities and shows good results related to the relevancy of the service recommendation. The suggested framework comes at the expense of the additional computational effort necessary to determine the communities of socially linked devices. A shared location is one of the suggested social links. Some mobile gadgets make it difficult to locate their socially

Proposes artificial social intelligence to give objects a social conscious

Generating temporal sequence network of SIoT using raw data and using Bayesian model for prediction

Uses Lysis architecture to manage relationships

Gray wolf algorithm and maximum ranked neighbor

Relationship management is taken care of by the network service provider

Recommendation system based on time awareness

Common device predilection, social similarity, ranking, GWA-ranking, and SF-GWA-ranking

Clustering and set pair analysis

Used a case study to display the implementation Introduced the concept of SocialNet to bring of the idea physical and cyber relationships together

Dhelim et al. [14]

Aljubairy et al. [15]

Girau et al. [16]

S. Rajendran and R. Jebakumar [17]

Atzori et al. [18]

Chen et al. [19]

S. Rajendran and R. Jebakumar [22]

ChunyingZhang et al. [23]

Ning et al. [24]

Contribution

Khelloufi et al. [13]

A set pair three-way overlapping community discovery algorithm for the weighted SIoT is proposed

A novel cognitive-based device recommendation (CDR) model for social device endorsement is proposed. SF-GWA-ranking outperformed other methods

Includes data related to object usage by the user during decision making

Centralization of relationship management

Proposes a distributed relationship management method

Provides a method to form relationships between heterogeneous devices in SIoT. Handles the scalability issue

Uses the location of devices to generate network of objects and predict their relationship

Provides various use cases and possible research areas for artificial social intelligence

Forming communities among devices with overlapping relationships to provide recommendations

Method used

Graph-based method to illustrate relationship among devices

Author

Table 1 Summary of the review Limitation

Collection and management of relevant data will pose to be a challenge

It may not work for the unbalanced data, and its optimization is not done

–

Other factors like location can also be embedded while determining the relationship

Generalization among NSPs and access of data to the NSPs

Other elements such as friendship updation are yet to be discussed

–

Does not take the services offered by the devices or the context of their interaction into consideration

Algorithms or methods to incorporate artificial social intelligence have not been discussed

Higher computation costs to form communities

Relationship Management in SIoT: A Survey 521

522

M. Shruthi et al.

linked device community. The dataset provided includes the type of relationship between each device. This might not be readily available in a real-world situation. Dhelim et al. [14] discuss the role of IoT in social relationship detection and management and the problem of social relationship explosion in IoT. It also proposes a solution using artificial social intelligence (ASI) to tackle social relationship explosion. ASI is a field of study that has emerged out of combining artificial intelligence and social computing. The authors aim to make the objects aware of their social context to give socially customized services. Many social computing activities can benefit from ASI-based machine learning and deep learning approaches, spanning from social data pre-treatment and feature extraction to service suggestions and application modification. The paper also discusses other domains like healthcare, intelligent transportation, and smart cities as untapped potential areas for SIoT to use ASI technology and achieve good results. Aljubairy et al. [15] propose a method to predict the future relationships among devices in a SIoT environment. The framework is divided into three stages: collecting raw movement data from IoT devices, building SIoT temporal sequence networks, and predicting associations between IoT devices that are likely to develop. The first step involves capturing the location of objects with its timestamp. Using this, the stay points or the locations where the objects were in the same vicinity are detected. The average of their location gives the centers of such meeting points. Next, the sweep line time overlap algorithm is used to find out which objects were in the same vicinity at the same time. This is used to find the network of objects that have interacted. The final phase includes predicting relationships using a Bayesian non-parametric model. The framework is used to model relationships between heterogeneous devices based on co-location object relationships. Predicting future relationships among IoT objects can be utilized for several applications like recommending appropriate services. The work assumes that the objects in the same location have interacted by default. The future includes predicting relationships based on the services offered by the devices. Girau et al. [16] propose a system where coastal data is monitored and shared using SIoT. SIoT here helps to connect multiple heterogeneous objects in a network and form connections between them based on their relationships to share data. The system uses Lysis architecture which is the previously defined architecture that has four layers. The real-world layer contains real-world objects. In this case, it includes the beach unit, sea unit, and the mobile application on users’ phones. Each of these real-world devices is projected as a social virtual object in the virtualization layer. The virtualization layer manages the social relationships and the functionalities related to them between these devices. The aggregation layer takes all the data provided from the previous layer and analyzes it. This can be made available to the end user by the application layer. The virtualization layer has a root social virtual object which passes on queries to its friends, which then passes them on to their friends. The usage of SIoT in this work is predominantly seen in the reporting of data to the user through the mobile application. When a user downloads the mobile application, a social virtual object is created for it at the virtualization layer. The beach unit, when in proximity to the mobile phone, forms a social object relationship (SOR) to share data, collect feedback, and show alerts. This work showcases a real-life example of

Relationship Management in SIoT: A Survey

523

integrating SIoT into existing technology involving heterogeneous devices to solve the scalability problem with extensively analyzed experimental data. Rajendran and Jebakumar [17] have conceived two methods to perform object recommendation-based friendship selection. The former is a novel gray wolf algorithm-user object affiliation to perform smart object recommendation (SOR). The latter is a method for object friendship selection using maximum ranked neighborhood. The SIoT ecosystem is dynamic and data sparse. The first developed framework for SOR is built using these two characteristics of the SIoT ecosystem. The essential key to this paradigm is the user’s preference associated with object analogies which can be location-based (like co-location) or ownership based. This method is based on a SIoT environment with various object availability and sufficient objects. There are two key processes in this situation: acquiring the user’s choice and attaining the object’s sociality. These two procedures help to rate the best object, which is then recommended to a user if it earns the highest rating more frequently than the other objects. The gray wolf algorithm is used to examine the network distance metric, and the best object is recommended to the user. The second work involves the usage of maximum ranked neighborhood (MRN). Each object is considered to be a node in a network of items. Nodes are given different ranks. Hence, there are high-ranking nodes and low-ranking nodes. The objective behind the success of the method is that the number of friends each node can make is limited due to memory and computational power. To get service from a certain node, the requesting node must gain its friendship. To reach a service provider, the neighbor with the highest rank is chosen. If two or more nodes have the same rank, the one with the most friends is chosen, and if that matches, too, then the node with the most mutual friends is chosen. If a node reaches the maximum capacity of friends that it can form, then its friend with the least number of friends is unfriended (link is terminated). In case that is the same, then a friend with the least mutual friends is removed, and the next level of scrutiny is to remove the friend with the least rank. Once the relationship between the service requester and the supplier is formed, the requester provides a satisfaction factor depending on the quality of the service, which determines its rank. The authors achieve good experimental results with two datasets related to heterogeneous devices in smart homes. The authors are yet to customize any privacy management technique. They also suggest a similar methodology that can be used to identify communities in SIoT networks and other elements of relationship management, such as friendship updating and termination in a secure SIoT network. Atzori et al. [18]. This work proposes an architectural solution for creating and managing relationships between devices for various scenarios. The authors suggest that network service providers (NSPs) must be responsible for storing information required to form connections and be responsible for service discovery and relationship management. The main aim of the work is to achieve distributed computation as opposed to centralized operations regarding storage and relationship management. This reduces the load on a single machine, saves time, and is less susceptible to overall failure. Each such device has a virtual identity that houses metadata about the device, such as the type of device, its list of friends, trustworthiness level, etc.

524

M. Shruthi et al.

In the case of an owner object relationship, the relationship manager receives information from a new device. It consults with other relationship managers to check if it is familiar with this device or if its information is already available. If it is recognized, then a new owner object relationship is established. A parent object relationship is established when the relationship manager receives information regarding the make or unique identification number of the device. If it matches with the batch of other devices, a parent object relationship is created. The co-location object relationship is identified by finding out if devices are connected to common access points. A similar technique is used to detect a social object relationship by finding out if two devices have had an encounter with each other. Co-work object relationship is identified when two devices use the same access point and have subscribed to the same topic. The authors use an emulation platform to study the experimental results of the algorithm. Devices are set up in different servers to induce latency. The work provides a generalized algorithm and experimental data to support the working of the same. It serves as a good base to build upon and create more nuanced methods. Chen et al. [19] have introduced the concept of time in forming meaningful associations between devices. The primary idea behind it is that people prefer to interact with different people at different times. Recommendations made keeping this fact in the equation will produce more desirable results. Here, both the user’s preference and the object’s social similarity are considered. The user’s temporal preference includes how often and for how long the user uses the device. The end result is to provide a time-aware object recommendation system. The work provides a fresh idea along with experimental results and a case study. Although many works have touched upon inducing a general consciousness into the Internet of things [20] and spoken about the various features of SIoT, we have found the works mentioned in Table 1 to lean more into the relationship management aspect of the SIoT. They have not only found various methods to initiate and discover relationships but have also proposed new concepts and applied the proposed algorithms to available data providing insightful experimental results. There are still many research gaps in this area which include but are not limited to heterogeneity, context management, scalability, etc. [21].

4 Conclusion In this paper, we have focused on discovering and analyzing the existing literature, technology, and methods related to relationship management in the SIoT. We have observed that a survey on relationship management in SIoT is something that has not been undertaken previously as of our knowledge. We have discussed different types of relations in SIoT and proposed a SIoT architecture with an emphasis on relationship management. We provided insights into many sorts of relationships and how different works identify, form, manage, and use relationships to perform actions like recommendation and prediction using various techniques, which include but are

Relationship Management in SIoT: A Survey

525

not limited to graph-based, link-based, and artificial intelligence-based. The merits and shortcomings of each work have also been discussed. Over the process of this analysis, we have discovered research gaps in areas such as forming relationships between devices dynamically based on user relationships by identifying the user relationship, policies related to the visibility of the devices open to create associations or share data, experimental integration of machine learning into SIoT to discover devices, identifying relationships between objects, and forming connections.

References 1. Ortiz AM, Hussein D, Park S, Han SN, Crespi N (2014) The cluster between internet of things and social networks: review and research challenges. IEEE Internet Things J 1(3):206–215 2. Malekshahi Rad M, Rahmani AM, Sahafi A, Nasih Qader N (2020) Social Internet of Things: vision, challenges, and trends. Hum-Centric Comput Inf Sci 10(1) 3. Atzori L, Iera A, Morabito G (2014) From “smart objects” to “social objects”: the next evolutionary step of the internet of things. IEEE Commun Mag 52(1):97–105 4. Blog on A Social Web of Things by Joakim Formo (2012). https://www.ericsson.com/en/blog/ 2012/4/a-social-web-of-things 5. Bok K, Kim Y, Choi D, Yoo J (2021) User recommendation for data sharing in Social Internet of Things. Sensors 21(2):462 6. Syed Shahul Hameed AS, Rajagopalan N (2022) SPGD: search party gradient descent algorithm, a simple gradient-based parallel algorithm for bound-constrained optimization. Mathematics 10(5):800 7. Yan Z, Zhang P, Vasilakos AV (2014) A survey on trust management for Internet of Things. J Netw Comput Appl 42:120–134 8. Abdelghani W, Zayani CA, Amous I, Sèdes F (2016) Trust management in Social Internet of Things: a survey. In: Dwivedi Y et al (eds) Social media: the good, the bad, and the ugly. I3E 2016. Lecture notes in computer science, vol 9844. Springer, Cham 9. Roopa MS, Pattar S, Buyya R, Venugopal KR, Iyengar SS, Patnaik LM (2019) Social Internet of Things (SIoT): foundations, thrust areas, systematic review and future directions. Comput Commun 139:32–57 10. Fremantle P (2015) A reference architecture for the Internet of Things. WSO2 11. Guth J, Breitenbucher U, Falkenthal M, Fremantle P, Kopp O, Leymann F, Reinfurt L (2018) A detailed analysis of IoT platform architectures: concepts, similarities, and differences. Springer, Berlin 12. Farhadi B, Rahmani AM, Asghari P, Hosseinzadeh M (2021) Friendship selection and management in Social Internet of Things: a systematic review. Comput Netw 201:108568. ISSN: 1389-1286 13. Khelloufi A et al (2020) A social-relationships-based service recommendation system for SIoT devices. IEEE Internet Things J 8(3):1859–1870 14. Dhelim S, Ning H, Farha F, Chen L, Atzori L, Daneshmand M (2021) IoT-enabled social relationships meet artificial social intelligence. IEEE Internet Things J 8(24) 15. Aljubairy A, Zhang WE, Sheng QZ, Alhazmi A (2020) SIoTPredict: a framework for predicting relationships in the Social Internet of Things. In: Advanced information systems engineering. CAiSE 2020. Springer, Cham 16. Girau R et al (2020) Coastal monitoring system based on Social Internet of Things Platform. IEEE Internet Things J 7(2) 17. Rajendran S, Jebakumar R (2021) Object Recommendation based Friendship Selection (ORFS) for navigating smarter social objects in SIoT. Microprocess Microsyst 80:103358

526

M. Shruthi et al.

18. Atzori L, Campolo C, Da B et al (2019) Smart devices in the social loops: criteria and algorithms for the creation of the social links. Future Gener Comput Syst 19. Chen Y, Zhou M, Zheng Z, Chen D (2020) Time-aware smart object recommendation in social internet of things. IEEE Internet Things J 7(3):2014–2027 20. Atzori L, Iera A, Morabito G, Nitti M (2012) The Social Internet of Things (SIoT)—when social networks meet the Internet of Things: concept, architecture and network characterization. Comput Netw 56(16):3594–3608 21. Amin F, Majeed A, Mateen A, Abbasi R, Hwang SO (2022) A systematic survey on the recent advancements in the Social Internet of Things. IEEE Access 10:63867–63884 22. Rajendran S, Jebakumar R (2020) Cognitive based device recommendation (CDR) model for Social Internet of Things. In: IEEE 4th conference on information & communication technology (CICT), pp 1–6 23. Zhang C, Ren J, Liu L, Liu S, Li X, Wang L (2022) Set pair three-way overlapping community discovery algorithm for weighted social internet of things. Digit Commun Netw 24. Ning H, Wang W, Farha F, Xie J, Daneshmand M (2022) SocialNet of Things: a ubiquitous relationship network inspired by social space. IEEE Netw 36(3):197–203

A Technique for Finding an Approximate Solution to an Ill-Posed Inverse Problem Using Tikhonov’s Regularization Method Van Huyen Le

and Liudmila V. Chernenkaya

Abstract This article is devoted to issues related to finding an approximate solution to an ill-posed inverse problem using the Tikhonov regularization method. The object of study in this paper is a mathematical model that describes in the form of a system of ordinary differential equations. Within the framework of the studied mathematical model, the inverse problem is posed as follows, according to the initial data measured at some points in time, to determine the parameters of the mathematical model. On the basis of Tikhonov’s regularization method, a technique for solving the inverse problem has been constructed, which consists of four steps. First, the finite difference method was used to form an “exact” system of algebraic equations. Then, the interpolation method was used to form an “approximate” system of algebraic equations. Further, the Tikhonov regularization method is used to construct the regularization Tikhonov equation. Finally, the regularization parameter is found, and the regularized solution is also found. As a numerical example, a mathematical model of the English language learning process is considered. An inverse problem has been posed and solved within the framework of a mathematical model of the process of teaching English. A check was made for the coincidence of the initial data with the calculated data. The result of the calculations shows the effectiveness and applicability of the constructed methodology in solving practical problems. Keywords Approximate solution · Ill-posed inverse problem · Tikhonov’s regularization method · Mathematical model · Regularized solution · English language learning

V. H. Le (B) · L. V. Chernenkaya Peter the Great St. Petersburg Polytechnic University, Politekhnicheskaya Ulitsa 29, 195251 St. Petersburg, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_41

527

528

V. H. Le and L. V. Chernenkaya

1 Introduction The inverse problem is understood as the process of identifying unknown parameters of the direct problem based on information obtained from a series of observations [1– 8]. Solving inverse problems requires scientists to make great efforts to collect and process measurement data and perform calculations. However, with the development of powerful computers, it became easier to find solutions to inverse problems. Therefore, in recent decades, the inverse problem has gradually become a popular area of research in the field of computational and applied mathematics. In particular, it has wide applications in system identification, optics, radar, acoustics, communication theory, signal processing, medical imaging, geophysics, oceanography, astronomy, remote sensing, natural language processing and machine learning. Due to its great practical applicability, it is turning into an interdisciplinary science, developing as a new promising area of research. Therefore, the inverse problem attracts more and more attention from many scientists around the world. The biggest problem with solving inverse problems is that most inverse problems are ill-posed. Ill-posed problems began to be studied at the beginning of the twentieth century. In 1902, the definition of the correctness of problems was first given by the French Mathematician J. Hadamard for partial differential equations in the article [9]. Accordingly, problems that either do not have solutions, or have many solutions, or are unstable with a small change in the initial data, are called ill-posed problems. In practice, the solution of most inverse problems is not stable with a small change in the initial data. There are various methods for solving inverse problems depending on the type of mathematical equation used in mathematical models. One of the most commonly used methods for solving inverse problems is the Tikhonov regularization method. Tikhonov’s regularization method is an algorithm that allows finding an approximate solution to ill-posed problems with approximate initial data. This method was developed by A. N. Tikhonov was first introduced in [1, 2] in 1963. In this paper, an inverse problem will be formed and solved within the framework of a mathematical model described by a system of ordinary differential equations.

2 Statement of the Inverse Problem Within the framework of this work, we will consider a mathematical model described by a system of ordinary differential equations of the form: dX (t) = AX(t), dt

X(t)t=0 = X(0)

(1)

where A is a matrix with constant coefficients ai j , i, j = 1, 2, ..., n; X(t) is a vector, X(t) = (x1 (t), x2 (t), ..., xn (t))T .

A Technique for Finding an Approximate Solution to an Ill-Posed …

529

Within the framework of the mathematical model (1), the following two tasks will be posed. Task 1. According to the given coefficients ai j = ai0j , where i, j = 1, . . . , n and initial conditions X(0) = (x1 (0), x2 (0), . . . , xn (0))T at the initial time t = 0, it is necessary to determine X(t) = (x1 (t), x2 (t), . . . , xn (t))T . Task 1 will be called a direct problem within the mathematical model (1). It is a problem of constructing a solution to a linear system of differential equations with given initial conditions (the Cauchy problem). Task 2. Based on the given values X(tk ) = (x1 (tk ), x2 (tk ), . . . , xn (tk ))T at time points tk , where k = 1, 2, . . . , n, it is necessary to determine the coefficients ai j = ai0j , where i, j = 1, . . . , n. Task 2 will be called the inverse problem with respect to problem 1 within the mathematical model (1). The solution of problem 2 is denoted by K 0 , K 0 = 0 0 0 0 T a11 , . . . , a1n , . . . , an1 , . . . , ann . In this work, instead of the exact solution K 0 of problem 2, we will look for its approximation. To do this, we will use the finite difference method, the interpolation method, the Tikhonov regularization method and methods for choosing the regularization parameter. As a result of the calculations, approximate coefficients ai j , where i, j = 1, . . . , n, will be found.

3 A Technique for Solving the Inverse Problem The technique for solving the inverse problem 2 includes four steps: • Step 1. Transition of the system of ordinary differential equations (1) into an “exact” system of algebraic equations X K = B with respect to ai j . Here, X is a matrix with elements x1 (tk ), x2 (tk ), ..., xn (tk ); K is a vector with elements ai j ; B is a vector of free members; • Step 2. From the “exact” system of algebraic equations X K = B, form an “approximate” system of algebraic equations X η K = B δ with respect to ai j . Here, X η is an approximation to the matrix X with respect to ||X η − X|| ≤ η; B δ is an approximation to the vector B with respect to ||B δ − B|| ≤ δ; η and δ are small positive numbers; • Step 3. Construction of the Tikhonov regularization equation from the “approximate” system of algebraic equations X η K = B δ obtained as a result of step 2; • Step 4. Finding the regularization parameter and determining the regularized solution, which is an approximation to the exact solution K 0 of system X K = B (see step 1). All four steps are detailed below.

530

V. H. Le and L. V. Chernenkaya

Description of step 1. To transform the system of ordinary differential equations (1) into the “exact” system of algebraic equations X K = B with respect to ai j , the finite difference method will be used. Applying the method of finite differences, from (1) we obtain a system of algebraic equations: ⎧ x1 (t+h)−x1 (t−h)−2o(h 3 ) ⎪ ⎪ = x1 (t)a11 + x2 (t)a12 + · · · + xn (t)a1n , ⎪ 2h ⎪ ⎨ x2 (t+h)−x2 (t−h)−2o(h 3 ) = x1 (t)a21 + x2 (t)a22 + · · · + xn (t)a2n , 2h ⎪ . . . , ⎪ ⎪ ⎪ ⎩ xn (t+h)−xn (t−h)−2o(h 3 ) = x (t)a + x (t)a + · · · + x (t)a . 2h

1

n1

2

n2

n

(2)

nn

with respect to unknowns ai j . Here, stride h is a very small positive number. System (2) contains n 2 unknowns a11 , a12 , . . . , a1n , a21 , a22 , . . . , a2n , …, an1 , an2 , . . . , ann . Substituting t = tk , where k = 1, 2, . . . , n, into (2), we obtain a system of n 2 algebraic equations. In this case, we have a system of equations: ⎧ ⎪ ⎪ x1 (tk )a11 + · · · + xn (tk )a1n = ⎪ ⎪ ⎨ x1 (tk )a21 + · · · + xn (tk )a2n = ⎪ ..., ⎪ ⎪ ⎪ ⎩ x1 (tk )an1 + · · · + xn (tk )ann =

x1 (tk +h)−x1 (tk −h)−2o(h 3 ) , 2h x2 (tk +h)−x2 (tk −h)−2o(h 3 ) , 2h xn (tk +h)−xn (tk −h)−2o(h 2h

3

(3)

)

with respect to unknowns ai j . One can represent (3) in matrix–vector form X K = B, where X is a matrix with elements x1 (tk ), x2 (tk ), …, xn (tk ); K is a vector with elements ai j , K = (a11 , . . . , a1n , . . . , an1 , . . . , ann )T ; B is a vector of the right side (vector of free 0 0 0 0 T members). The desired solution K 0 = a11 , . . . , a1n , . . . , an1 , . . . , ann of the inverse problem 2 is the exact solution of system (3). Description of step 2. To form an “approximate” system of algebraic equations X η K = B δ , the initial data x1 (tk ), x2 (tk ), …, xn (tk ) and the interpolation method (cubic spline method) will be used. Neglecting o h 3 , from (3) we obtain a system of equations: ⎧ x1 (tk )a11 + · · · + xn (tk )a1n = ⎪ ⎪ ⎨ x1 (tk )a21 + · · · + xn (tk )a2n = ⎪ ..., ⎪ ⎩ x1 (tk )an1 + · · · + xn (tk )ann =

x1 (tk +h)−x1 (tk −h) , 2h x2 (tk +h)−x2 (tk −h) , 2h

(4)

xn (tk +h)−xn (tk −h) . 2h

The given values x1 (tk ), x2 (tk ), …, xn (tk ) are obtained from the experiment, and x1 (tk + h), x1 (tk − h), x2 (tk + h), x2 (tk − h), …, xn (tk + h), xn (tk − h) are obtained by the cubic spline interpolation method. Since x1 (tk ), x2 (tk ), …, xn (tk ) are measured experimentally, they may contain measurement and rounding errors. The values

A Technique for Finding an Approximate Solution to an Ill-Posed …

531

x1 (tk + h), x1 (tk − h), x2 (tk + h), x2 (tk − h), …, xn (tk + h), xn (tk − h) are found by interpolation, so they can also contain not only rounding errors, but also interpolation errors. We can rewrite (4) as X η K = B δ , where X η is an approximation to matrix X with respect to ||X η − X|| ≤ η; and B δ is an approximation to vector B with respect to ||B δ − B|| ≤ δ; η and δ are small positive numbers. Step description 3. Tikhonov’s regularization method will be used to construct a regularizing equation from the system of algebraic equations X η K = B δ . The problem of solving system (4) is ill-posed, because it either has no solution, or has more than one solution, or its solution is not stable with a small change in X η and B δ . To solve system (4), the Tikhonov regularization method will be used [3–5]. In the course of this method, instead of finding a solution to system (4), we will look for an approximation to K 0 , which is unique and continuously depends on X η , B δ . An approximation to the desired solution K 0 will be found by the condition: ||X η K − B δ ||2 + α||K ||2 → min min, α

||K ||

(5)

where α (α = const > 0) is the regularization parameter. From (5) follows the regularizing equation: X ∗η X η K + α K = X ∗η B δ ,

(6)

where X ∗η is the conjugate to the matrix X η . (K α = The solution of system (6), denoted by Kα α α α α T a11 , . . . , a1n , . . . , an1 , . . . , ann ), is a regularized solution of the system of algebraic equations X η K = B δ [3–5]. It is necessary to find the parameter α so that K α tends to the desired solution K 0 , i.e., ||K α − K 0 || → 0. Description of step 4. To search for the regularization parameter α and determine the regularized solution K α , in the framework of this work, we will consider the method of the generalized discrepancy [1–8] and the method of choosing the quasi-optimal value of the regularization parameter [10–12]. Method of generalized discrepancy [1–8]. This method will be used when we know the values η, δ. The regularization parameter α will be chosen such that ||X η K α − Bδ || = δ + η||K α || [1–8]. To do this, it is necessary to choose very small values α = 10−1 , 10−2 , . . . , 10−9 . For each value of αi , where i = 1, 2, . . . , N we will calculate the solution K αi of system (6). By supplying K αi to ||X η K αi − B δ || = δ + η||K αi ||, the values ηi , δi will be found. Values that satisfy the following conditions will be selected as regularization parameters: η ≤ ηi , δ ≤ δi , |η − ηi | → 0, |δ − δi | → 0. The following additional condition will be used to check if the values ηi , δi are satisfied. The regularization parameter αi will be chosen if it satisfies the 2 i) → 0 at (ηi , δi ) → 0 [5–8]. conditions αi → 0, (δi +η α Method for choosing the quasi-optimal value of the regularization parameter [10–12]. This method will be used in the case when we do not know the values

532

V. H. Le and L. V. Chernenkaya

η, δ. We will consider a geometric sequence with a given initial value α0 and a ∈ (0, 1), progression denominator q that satisfies the following conditions: q = ααi+1 i i = 1, 2, . . . , N . Based on sequence {αi }, sequence {K αi } is constructed with the corresponding value of the regularization parameter αi . In sequence {αi }, the quasioptimal value of the regularization parameter is considered to be such an element αi for which ||K αi+1 − K αi || → min is achieved. In practice, we will often choose the i

value αi so that ||K αi+1 − K αi || → 0. Based on the found value of the regularization parameter, we can determine the regularized solution by solving system (6).

4 Estimated Example As a numerical example, we will consider a mathematical model that describes the process of teaching English [13–19]. The process of learning English takes place between two opposites: mother tongue and English. The interaction of native and English languages in teaching can be represented graphically in the form of a diagram shown in Fig. 1. This interaction will create three states for each learner, including a mother tongue state, an interlanguage state and an English state. In Fig. 1 marked [13–19]: • • • • •

A0—the state of the native language; A1—the state of the interlanguage; A2—the state of the English language; λ0 —the intensity of the use of the native language for learning English; λ1 —the intensity of the appeal to the English language in the process of learning it (obviously, this parameter can also be associated with the forgetting process); • μ1 —the intensity of the appeal to the native language when forgetting the meanings of words, expressions, concepts of the English language; • μ2 —the intensity of using the English language in the process of learning it. The intensity is understood as the number of calls per unit of time (week, month, semester) [13–19]. It is assumed that the process of learning English is probabilistic in nature, as it depends on many often random factors. Taking into account this assumption, the following system of differential equations is proposed as a mathematical model for the analysis of subordinate bilingualism, written with respect to the probabilities of Fig. 1 The English language teaching model

A Technique for Finding an Approximate Solution to an Ill-Posed …

533

states: ⎧ dP0 (t) ⎪ ⎪ = −λ0 P0 (t) + μ1 P1 (t), ⎪ ⎪ ⎪ dt ⎪ ⎨ dP1 (t) = λ0 P0 (t) − (λ1 + μ1 )P1 (t) + μ2 P2 (t), ⎪ dt ⎪ ⎪ ⎪ ⎪ dP (t) ⎪ ⎩ 2 = λ1 P1 (t) − μ2 P2 (t), dt P0 (t)t=0 = P0 (0),

P1 (t)t=0 = P1 (0),

P2 (t)t=0 = P2 (0),

(7)

where P0 (t) is the probability of the state “knowledge of the native language”; P1 (t) is the probability of the state “knowledge of interlanguage”; P2 (t) is the probability of the state “knowledge of the English language”; P0 (t)+ P1 (t)+ P2 (t) = 1 [13–19]. The probability of the state of the language can be understood as a quantitative assessment of the level of knowledge of the language in the range from zero to one. Moreover, the level of knowledge includes not only the vocabulary, but also the phonetics, grammar and syntax of the language. It is assumed that this assessment is determined when testing the student [13–19]. Direct task. Given the values of parameters λ0 = λ00 , λ1 = λ01 , μ1 = μ01 , μ2 = μ02 and the initial conditions P(0) = (P0 (0), P1 (0), P2 (0))T at the initial time t = 0, it is necessary to determine P(t) = (P0 (t), P1 (t), P2 (t))T . Inverse task. Given the given P(tk ) = (P0 (tk ), P1 (tk ), P2 (tk ))T at time points tk , where k = 1, 2, . . . , N , determine the values of parameters λ0 , λ1 , μ1 , μ2 (i.e., λ00 , λ01 , μ01 , μ02 ). Consider the inverse problem for the process of learning English. Let us know the probabilities of the states “knowledge of the native language”, “knowledge of the interlanguage” and “knowledge of the English language” of the student at different points in time. In particular, as follows, there is an English language training program to last for 20 weeks. For each student, the teacher will conduct an entrance test. The test will include the following two types of questions: both questions and answer options in English; questions in English, answer options in their native language, or vice versa questions in their native language, answer options in English. The result of the test shows the probabilities of the state “knowledge of the native language”, “knowledge of the interlanguage” and “knowledge of the English language” of the student at the initial moment of time. In addition, during the training, intermediate tests will be conducted according to the following plan. After 1 week, 2 weeks, 3 weeks, 4 weeks, the teacher gives tests similar to the entrance test to check the states of “knowledge of the native language”, “knowledge of the interlanguage” and “knowledge of the English language” of the student. For convenience of calculation, we will not use the time unit of the week. Instead, we will use a data scale from 0 to 2. The data will be normalized and presented in Table 1. In this case, the value t will be determined by the formula t = number20of weeks × 2 (without dimension).

534

V. H. Le and L. V. Chernenkaya

Table 1 Measured probabilities of the state “knowledge of the native language”, “knowledge of the interlanguage” and “knowledge of the English language” Number of weeks

t

P0 (t)

P1 (t)

P2 (t)

0

0

0.400

0.600

0

1

0.1

0.479

0.355

0.166

2

0.2

0.522

0.238

0.240

3

0.3

0.547

0.182

0.271

4

0.4

0.564

0.154

0.282

We will apply the constructed technique to solve the inverse problem. First, the “appropriate” value of the regularization parameter α1 = 1 will be selected. Then K 1α will be calculated from the parameter α1 = 1 (K 1α = 1.97826). Next, the geometric sequence {αi } will be built according to the parameter α1 such that αi+1 = 0.8αi , i+1 where i = 1, 2, . . . , 100. Finally, the values K α will be calculated and a sequence i of approximate solutions K α will be constructed. Figure 2 shows an estimate of the norm of the difference of approximate solutions i at two adjacent iterations ||K i+1 α − K α ||. α − From Fig. 2 it is easy to see that for i = 26, 27, . . . , 100 we have that ||K i+1 α α K i || → 0. For each value of i = 26, 27, . . . , 100, the values of ||X η K i − B δ ||2 and ||K iα ||2 will be calculated. As a result of the calculation, the value of the regularization parameter α = α30 = 0.00155 will be chosen so that the value of the regularization

Fig. 2 Estimate of the norm of the difference of approximate solutions at two neighboring iterations

A Technique for Finding an Approximate Solution to an Ill-Posed …

535

parameter αi is the smallest, i.e., ||X η K iα − B δ ||2 is the smallest, and the value of ||K iα ||2 is also the smallest possible. = With α = 0.00155 we find an approximate solution K α (0.25569, 3.83106, 1.89680, 1.86841)T of the inverse problem 2. Here, λα0 = 0.25569, λα1 = 3.83106, μα1 = 1.89680, μα2 = 1.86841. We will solve the direct problem with coefficients λ0 = 0.25569, λ1 = 3.83106, μ1 = 1.89680, μ2 = 1.86841. Figure 3 shows the change in the probability of the state “knowledge of the native language”, “knowledge of the interlanguage” and “knowledge of the English language” at λ0 = 0.25569, λ1 = 3.83106, μ1 = 1.89680, μ2 = 1.86841. In Fig. 3 asterisks indicate the measured probabilities of the state “knowledge of the native language”, “knowledge of the interlanguage” and “knowledge of the English language” (i.e., initial data). Curves P0 (t), P1 (t), P2 (t) express the change in the probability of the states “knowledge of the native language”, “knowledge of the interlanguage” and “knowledge of the English language” over time. Obviously, the measured values are very close to the curves. Hence, the found approximate intensity values λα0 = 0.25569, λα1 = 3.83106, μα1 = 1.89680, μα2 = 1.86841 can be taken as a solution to the inverse problem 2. Finding the values of parameters λ0 , λ1 , μ1 , μ2 helps teachers to control and predict the results of

Fig. 3 Change in the probability of the state “knowledge of the native language”, “knowledge of the interlanguage” and “knowledge of the English language”

536

V. H. Le and L. V. Chernenkaya

teaching students English. Based on this prediction, teachers and students can plan their learning to achieve the greatest efficiency, as well as the best results after the end of the course. To do this, they can change the intensity of the use of the native language for learning English; the intensity of the appeal to the English language in the process of studying it; the intensity of turning to the native language when forgetting the meanings of words, expressions, concepts of the English language; the intensity of the use of English in the process of learning it.

5 Conclusion In the presented work, an inverse problem was posed within the framework of a mathematical model described by a system of ordinary differential equations. A technique for solving the stated inverse problem was constructed, consisting of four steps. To construct the technique, the finite difference method, the interpolation method (cubic spline method) and the Tikhonov regularization method were used. Methods for determining the correction parameter are also proposed. The developed technique makes it possible to determine the approximate parameters of the mathematical model, i.e., coefficients in a system of ordinary differential equations. A numerical example of solving an inverse problem within the framework of a mathematical model of the English language learning process was given. Based on the measured data on the probability of the state “knowledge of the native language”, “knowledge of the interlanguage” and “knowledge of the English language”, approximate values of the parameters for this mathematical model were found. The results in this example show the effectiveness and applicability of the developed methodology.

References 1. Tikhonov AN (1963) On the solution of ill-posed problems and the regularization method. Dokl AN SSSR 151(3):501–504 (in Russian) 2. Tikhonov AN (1963) On regularization of ill-posed problems. Dokl AN SSSR 153(1):49–52 (in Russian) 3. Tikhonov AN, Arsenin VYa (1979) Methods for solving ill-posed problems, 2nd edn. Nauka, Glavnaya redaktsiya fiziko-matematicheskoy, Moscow. (in Russian) 4. Tikhonov AN (1965) On ill-posed problems of linear algebra and a stable method for their solution. Dokl AN SSSR 163(3):591–594 (in Russian) 5. Olkhovoy A (2012) Introduction to the theory of inverse and ill-posed problems. LAP Lambert Academic Publishing. (in Russian) 6. Kabanikhin SI (2009) Inverse and ill-posed problems. Siberian Federal University, Novosibirsk (in Russian) 7. Ivanov VK, Vasin VV, Tanana VP (1978) Theory of linear ill-posed problems and its applications. Nauka, Moscow (in Russian) 8. Denisov AM (1994) Introduction to the theory of inverse problems. Publishing House of Moscow State University, Moscow (in Russian)

A Technique for Finding an Approximate Solution to an Ill-Posed …

537

9. Hadamard J (1902) Sur les problèmes aux dérivés partielles et leur signification physique. Princet Univ Bull 13:45–52 10. Morozov VA (1987) Regularization methods for unstable problems. Publishing House of Moscow State University, Moscow (in Russian) 11. Morozov VA (2003) Algorithmic foundations of methods for solving ill-posed problems. Vych met programmirovaniye 4(1):130–141 (in Russian) 12. Samarsky AA, Vabishchevich PN (2009) Numerical methods for solving inverse problems of mathematical physics, 3rd edn. LKI Publishing House, Moscow (in Russian) 13. Kiriy VG, Rogoznaya NN (2009) Mathematical model of subordinate bilingualism. The emergence of interlanguage. Vestnik IrGTU 2(38):189–191. (in Russian) 14. Kiriy VG, Tran Van An (2010) An ambivalent system of distance learning for a non-native language based on network technologies. Educ Technol Soc 13(4):246–267. (in Russian) 15. Kiriy VG, Tran Van An (2010) On one mathematical model of an ambivalent system of teaching a non-native language. Vestnik NSU. Ser Inf Technol 8(1):45–53. (in Russian) 16. Kiriy VG, Tran Van An (2011) Increasing interactive interaction in an ambivalent system of distance learning in a non-native language. Educ Technol Soc 14(3):354–369. (in Russian) 17. Tran Van An (2012) Practical implementation of non-native language teaching technology based on the site “Ambsystedu”. Educ Technol Soc 15(4):390–408. (in Russian) 18. Kiriy VG, Rogoznaya NN, Tran Van An (2012) On the influence of the parameters of the process of teaching a non-native language on the structure of an interlanguage. Vestnik IrGTU 5(64):15–20 19. Kiriy VG, Tran Van An (2012) On the adjustment of the process of teaching a non-native language. Vestnik IrGTU 10(69):23–28

Optimal Prediction of Heart Disease by Identifying the Type of Chest Pain Using Machine Learning Techniques Ghulab Nabi Ahmad, Hira Fatima, Shafiullah, and Arshil Noor

Abstract Any serious heart ailment is referred to as coronary heart disease. Researchers are concentrating on developing smart systems to identify heart disease precisely based on electronic health data, with the use of machine learning algorithms (MLAs), because they can be fatal. This study introduces ten machine learning techniques (MLTs) for predicting heart diseases using patient data on key healthcare outcomes as: logistic regression classifier (LRC), decision tree classifier (DTC), random forest classifier (RFC), extra tree classifier (EXTC), K-nearest neighbours classifier (KNNC), support vector machine classifier (SVMC), bagging classifier (BC), gradient boosting classifier (GBC), light GBM classifier (LGBMC), Gaussian Naive Bayes classifier (GNBC) on standard datasets from Cleveland, Hungarian, Switzerland, Long Beach VA database, and Statlog Heart Disease of University of California Irvine (UCI) repository to build the prediction models. Data preprocessing and feature selection steps are done before building the models. The models are evaluated based on the accuracy, precision, recall, and F1-score, specificity, MCC, ROC– AUC, and balanced accuracy. The extra tree classifier performed best with 89.10% accuracy, 93.50% precision, 87.80% recall, 90.06% F1-score, 91.10% specificity, 78.00% MCC, 89.44% ROC–AUC, and 89.10% balanced accuracy. Keywords Heart disease · Machine learning techniques · Logistic regression classifier (LRC) · Decision tree classifier (DTC) · Random forest classifier (RFC) · Extra tree classifier (EXTC) · K-nearest neighbours classifier (KNNC) · Support vector machine classifier (SVMC) · Bagging classifier (BC) · Gradient boosting G. N. Ahmad (B) · H. Fatima Institute of Applied Sciences, Mangalayatan University, Aligarh, Uttar Pradesh, India H. Fatima e-mail: [email protected] Shafiullah Department of Mathematics, K.C.T.C College, BRA Bihar University Muzaffarpur, Raxual, India e-mail: [email protected] A. Noor Department of Computer Science Institute of Technology and Management, Aligarh, Uttar Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_42

539

540

G. N. Ahmad et al.

classifier (GBC) · Light GBM classifier (LGBMC) · Gaussian Naive Bayes classifier (GNBC)

1 Introduction According to estimates, 17.9 million people die from cardiovascular diseases (CVDs) each year, which accounts for 31% of all fatalities worldwide [1–3]. Heart attacks and strokes account for four out of every five CVD fatalities, with premature deaths accounting for one-third of these deaths amongst those under the age of 70. This dataset comprises 11 variables that can be used to forecast a potential heart illness, and heart failure is a common occurrence brought on by CVDs. A critical health issue that might impact many individuals all around the world is heart disease (HD). As a result, early diagnosis of a cardiac disease will help with therapy. A technology that can more quickly diagnose heart disease must be developed since the number of people with this condition is rapidly rising [4–8]. The patient’s smoking history influences whether a condition is present or not. The cardiac disease system may describe the most crucial characteristics of cardiovascular patients and identify high-risk individuals, as well as develop a model that makes it simple and clear to distinguish between them. Age, kind of chest discomfort, blood pressure (BP), sex, cholesterol, and heartbeat are some of the parameters used to apply and compare the MLAs. The primary goal of this study is to create a fundamental ML model to improve the accurate diagnosis of cardiac conditions. The different techniques such as LRC, DTC, RFC, EXTC, KNNC, SVMC, BC, GBC, LGBMC, and GNBC are applied for machine learning and achieved the better results in this work. The rest of the paper is structured as follows: Sect. 2 summarised the previous literature review effort in this field of inquiry; Sect. 3 outlined the research framework, which explains the steps taken to carry out this study; Sect. 4 outlines the performance evaluation metrics, findings, and discussion for the experimental setups. Section 5 includes all of the experiments we conducted, their corresponding results, and a state-of-the-art of comparison. Section 6 concludes with recommendations for future work.

2 Related Work A comparison of several ML classification methods has been conducted in this work. Shafique et al. [9] have obtained the accuracy using decision tree, Naive Bayes, and neural network for data analysis was 78% for decision tree and 82% for Naive Bayes. Classification MLAs was employed by the researchers. Pal et al. [10] used random forest model to predict heart disease. They were able to achieve an accuracy of 86.9%, sensitivity of 90.6%, and specificity of 82.7%. Rairikar et al. [11] used hybrid random forest model for predicting cardiovascular illness. According to the study, 88.7% of

Optimal Prediction of Heart Disease by Identifying the Type of Chest …

541

CVD predictions were accurate. Araujo et al. [12] have obtained 87.24% accuracy by the stacking (KNN, LR, KNN) ML method which outperformed the competition in Weka, whilst bagging (i = 10, RFC) came in second with 86.37%. The best technique in Python sickit-learn has an accuracy of 85.36% for bagging (i = 100, RF) and 84.82% for stacking (RF, DT, LR).

3 Methodology The study’s main objective is to identify and forecast binary heart disease’s risk factors. The ten unique Machine Learning Techniques are used to fulfil the objectives of the paper as shown in Fig. 1 below and described under the references [13–22]. • Logistic Regression Classifier (LRC): A statistical and ML approach called logistic regression uses input field values to categorise records in a collection. In order to forecast results, it predicts a dependent variable based on one or more sets of independent factors [13]. • Decision Tree Classifier (DTC): The most popular MLTs are DTC because they are simple and easy to learn. In decision processes, a decision tree may be used freely and attractively to depict choices and outcomes [14].

Fig. 1 Proposed Block

542

G. N. Ahmad et al.

• Random Forest Classifier (RFC): As an ensemble learning technique for categorization, prediction, and other problems, RFC build a large number of DT during the training phase [15]. • Extra Tree Classifier (EXTC): A classifier that uses extra trees. In order to improve prediction accuracy and decrease overfitting, this class provides a meta estimator that uses averaging. The meta estimator fits a number of extra-trees, or randomised decision trees to various subsamples of the dataset. Learn more about the number of trees in the forest in the user handbook [16]. • K-Nearest Neighbours Classifier (KNNC): K-nearest neighbours (KNNC) a supervised MLT, may be utilised to solve classification and regression problems. It is easy to use and understand, but it has a significant flaw in that as data usage grows, it becomes substantially slower [17]. • Support Vector Machine Classifier (SVMC): “Linearly separable data” are those that can be split into two groups using just one straight line. Such data are categorised using Linear SVM, and the classifier that is utilised is called the linear SVM classifier [18]. • Bagging Classifier (BC): A bagging classifier is a group meta-estimator that applies base classifiers to separate, randomly selected subsets of the dataset, then combines (either by casting or by averaging) each result into a final prediction [19]. • Gradient Boosting Classifier (GBC): Gradient boosting classifiers combines, a family of MLT, combine a number of poor learning models to create a potent prediction model. Most often, decision trees are used for gradient enhancement [20]. • Light GBM Classifier (LGBMC): Microsoft developed the Light Gradient Boosting Machine, a free and open source distributed gradient boosting platform for machine learning. It is based on decision tree techniques and used for classification, ranking, and other machine learning applications [21]. • Gaussian Naive Bayes Classifier (GNBC): A machine learning classification algorithm is the Gaussian processes classifier. A extension of the Gaussian probability distribution, Gaussian processes serves as the foundation for complex non-parametric machine learning algorithms for regression and classification [22].

4 Evaluation Matrices Matrices for evaluation are used to determine how well a statistical or machine learning model is doing. Every development has to evaluate machine learning models or algorithms. To test a model, a variety of assessment measures are available [23]. Where true negative (TN), true positive (TP), false positive (FP), and false negative (FN). TP + TN (1) Accuracy = TP + TN + FP + FN

Optimal Prediction of Heart Disease by Identifying the Type of Chest …

543

TP TP + FP TP Recall = TP + FN

Precision =

2 × precision × recall precision + recall TN Specificity = TN + FP

F − 1Score =

TP ∗ TN − FP ∗ FN MCC = √ (TP + FP)(TP + FN)(TN + FP)(TN + FN)

(2) (3) (4) (5) (6)

5 Data Description The combined heart disease dataset (Cleveland, Hungarian, Switzerland, Long Beach VA database, and Statlog) consists of 918 participants with 14 parameter which are given Fig. 2.

6 Exploratory Data Analysis The combined heart disease dataset (Cleveland, Hungarian, Switzerland, Long Beach VA database, and Statlog) consists sample of 918 peoples in which 725 are males, and 193 are females. On the basis 14 parameters, our results conclude that 508 peoples (458 males and 50 females) have heart disease and 410 (267 males and 143 females) of which are healthy (Fig. 3). Data preparation is used in this step to reduce some characteristics’ duplication and to process corrupt, missing, disrespectful, and incorrect values in addition to identifying null values. After that, the standard data format is discovered by splitting, feature scaling, and normalising. Following data preparation, the dataset is split into a training set (which contains 70% of the data) and a test set (30%) and Table 2 and Figs. 4 show data analysis to see how characteristics connect to the outcome, feature cp-chest pain model fitting, and the resulting histogram, as well as the frequency of chest pain type. The statistical operation was carried out during the preprocessing stage to locate and eliminate missing values as well as to determine the Maximum, Minimum, Mean, 25%, 50%, 75%, and Standard Deviation (SD) of each feature set. The results are shown in Table 1. TA: Typical Angina (chest pain related decrease blood supply to the heart) ASY: Asymptomatic (chest pain not showing sign of heart disease)

544

Fig. 2 Heart disease parameter description

G. N. Ahmad et al.

Optimal Prediction of Heart Disease by Identifying the Type of Chest …

545

Heart Disease 500 450 400 350 300 250 200 150 100 50 0

458

50 MALE

FEMALE

458

50

Heart Disease

Fig. 3 Heart failure male and female bar diagram

Fig. 4 Histogram frequency of chest pain type Table 1 The Description of dataset-I for the count, minimum, maximum, mean, and standard deviation Statistical results

Age

Resting BP

Chol

Fasting BS

Thalach

Old peak

Heart disease

Count

918

918

918

918

918

918

918

Mean

53.51

132.39

198.79

00.23

136.80

00.88

00.55

SD

9.43

18.51

109.38

00.42

25.46

1.06

00.49

Minimum

28.00

00.00

0.00

00.00

60.00

-2.60

00.00

25%

47.00

120.00

173.25

00.00

120.00

0.00

00.00

50%

54.00

130.00

223.0

00.00

138.00

0.60

01.00

75%

60.00

140.00

267.0

00.00

156.00

1.50

01.00

Maximum

77.00

200.00

603.0

01.00

202.00

6.20

01.00

546

G. N. Ahmad et al.

Table 2 Identify the chest pain type Chest pain type

ATA

NAP

ASY

TA

Total heart disease dataset

173

203

496

46

918

NAP: Non-Angina Pain(Typically, esohageal spams non heart related) ATA: Atypical Angina (chest pain not related to heart)

6.1 Matrices Features Correlation: The term correlation refers to the relationship features. It is straightforward to discover which characteristics are most closely related to the target variable thanks to the heatmap. We generated a heatmap of related features using the seaborn library. In this investigation, the Pearson’s correlation coefficient was applied. This correlation measures the degree of positive correlation between two number sequences. To visualise the correlation between independent variables, we created a Pearson’s heatmap. The largest association in Table 3 and Fig. 5 is between Oldpeak & heart failure, correlation is 0.4039.

7 Result Analysis The machine learning techniques used in this experiment are applied with the default settings. Table 4 displays the system’s outcome. The LRC model’s accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC Balanced Accuracy are determined to be 85.90%, 92.50%, 82.90%, 87.50%, 90.20%, 71.90%, and 86.50%, respectively. DTC model’s accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC, and Balanced Accuracy were determined to be 69.90%, 78.70%, 67.70%, 72.80%, 73.20%, 40.20%, 70.44%, and 70.40%, respectively. RFC model’s accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC, and Balanced Accuracy were determined to be 88.40%, 91.20%, 89.00%, 90.10%, 87.50%, 76.10%, 88.26%, and 88.30%, respectively. EXTC model’s accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC, and Balanced Accuracy were determined to be 89.10%, 93.50%, 87.80%, 90.06%, 91.10%, 78.00%, 89.44%, and 89.50%, respectively. KNNC model’s accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC, and Balanced Accuracy were determined to be 85.50%, 91.30%, 83.50%, 87.30%, 88.40%, 70.90%, 85.96%, and 85.90%, respectively. SVC model’s accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC, and Balanced Accuracy were determined to be 88.80%, 90.80%, 90.20%, 90.50%, 86.60%, 76.70%, 88.43%, and 88.40%, respectively. LGBMC model’s accuracy,

0.070193

− 0.112135

0.198039

− 0.382045

0.258612

0.282039

FastingBS

MaxHR

Oldpeak

Heartfailure

0.107589

0.164803

0.100893

− 0.095282

Cholesterol

0.254399

1.000000

1.000000

0.254399

Resting BP

Resting BP

Age

Age

Table 3 Describe the features correlation

0.052698 0.267291

0.050148 − 0.232741

− 0.131438

1.000000

0.235792

− 0.260974

1.000000

0.070193

0.198039

FastingBS

− 0.260974

0.100893

− 0.095282

Cholesterol

1.000000 0.403951

− 0.160691 − 0.400421

− 0.160691

0.052698

0.050148

1.000000

− 0.131438

0.235792

0.258612 0.164803

− 0.382045

Oldpeak

− 0.112135

MaxHR

1.000000

0.403951

− 0.400421

0.267291

− 0.232741

0.107589

0.282039

Heartfailure

Optimal Prediction of Heart Disease by Identifying the Type of Chest … 547

548

G. N. Ahmad et al.

Fig. 5 Heatmap of correlation matrix

precision, recall, F1-score, specificity, MCC, and ROC–AUC Balanced Accuracy were determined to be 85.10%, 89.20%, 85.40%, 87.20%, 84.80%, 69.60%, 85.09%, and 85.10%, respectively. GNBC model’s accuracy, precision, recall, F1-score, specificity, MCC, and ROC–AUC Balanced Accuracy were determined to be 85.50%, 85.60%, 90.90%, 88.20%, 77.70%, 69.70%, 84.26%, and 84.30%, respectively. BC model’s accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC, and Balanced Accuracy are determined to be 84.80%, 88.60%, 85.40%, 87.00%, 83.90%, 68.80%, 84.64%, and 84.60%, respectively. GBC model’s accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC, and Balanced Accuracy were determined to be 82.60%, 91.40%, 78.00%, 84.20%, 89.30%, 66.10%, 83.66%, and 83.60%, respectively. And Table 2 shows the below which are confusion matrices outcome value, and Table 5 show the roc-auc the curve.

Optimal Prediction of Heart Disease by Identifying the Type of Chest …

549

Table 4 Performance models accuracy Models

Accuracy Precision Recall F1-score Specificity MCC ROC–AUC Balanced (%) (%) (%) (%) (%) (%) (%) accuracy (%)

LRC

85.90

92.50

82.90

87.50

90.20

71.90 86.56

86.50

DTC

69.90

78.70

67.70

72.80

73.20

40.20 70.44

70.40

RFC

88.40

91.20

89.00

90.10

87.50

76.10 88.26

88.30

EXTC

89.10

93.50

87.80

90.06

91.10

78.00 89.44

89.50

KNNC

85.50

91.30

83.50

87.30

88.40

70.90 85.96

85.90

SVC

88.80

90.80

90.20

90.50

86.60

76.70 88.43

88.40

LGBMC 85.10

89.20

85.40

87.20

84.80

69.60 85.09

85.10

GNBC

85.50

85.60

90.90

88.20

77.70

69.70 84.26

84.30

BC

84.80

88.60

85.40

87.00

83.90

68.80 84.64

84.60

GBC

82.60

91.40

78.00

84.20

89.30

66.10 83.66

83.60

Fig. 6 Comparison of ROC–AUC

550

G. N. Ahmad et al.

Table 5 The confusion matrices outcome value Models

Confusion matrix

True positive

False negative

False positive

True negative

Total

LRC

[[136 28] [11 101]]

136

28

11

101

276

DTC

[[111 53] [30 82]]

111

53

30

82

276

RFC

[[146 18] [14 98]]

146

18

14

98

276

EXTC

[[144 20] [10 102]]

144

20

10

102

276

KNNC

[[137 27] [13 99]]

137

27

13

99

276

SVMC

[[148 16] [15 97]]

148

16

15

97

276

LGBMC

[[140 24] [17 95]]

140

24

17

95

276

GNBC

[[149 15] [25 87]]

149

15

25

87

276

BC

[[140 24] [18 94]]

140

24

18

94

276

GBC

[[128 36] [12 100]]

128

36

12

100

276

8 Comparison with the Previous Research The evaluation of machine learning system models offered by various criteria examined in pertinent earlier works is described in Table 6. It has been mentioned that various criteria were used to evaluate earlier research. The accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC, and Balanced Accuracy of the proposed system attained 89.10%, 93.50%, 87.80%, 90.06%, 91.10%, 78.00%, 89.44%, and 89.50%, respectively. While all prior research only managed to achieve accuracy ranging between 84.82% and 86%, whereas the suggested method achieved precision of 93.50%.

9 Conclusion Any serious heart ailment is referred to as coronary heart disease. Researchers are concentrating on developing smart systems to precisely identify heart diseases based on electronic health data, with the use of machine learning algorithms, because they can be fatal. This study introduces a number of machine learning algorithms for predicting heart diseases using patient data on key healthcare outcomes. The study

Optimal Prediction of Heart Disease by Identifying the Type of Chest …

551

Table 6 Comparison of the performance between the proposed system and previous studies Previous studies

Accuracy Precision Recall F1-score Specificity MCC ROC–AUC Balanced (%) (%) (%) (%) accuracy

Araujo 84.82 et al. [12]

–

–

–

–

–

–

–

Liu et al. [24]

86

86

86

–

–

–

–

93.50

87.80

90.06

84.62

Proposed 89.10 model (%)

91.10

78.00 89.44

89.50

included ten classification techniques: LRC, DTC, RFC, EXTC, KNNC, SVMC, BC, GBC, LGBMC, and GNBC accuracy 85.90%, 69.90%, 88.40%, 89.10%, 85.50%, 88.80%, 85.10%, 84.80%, and 82.60%, respectively. Standard datasets from Cleveland, Hungarian, Switzerland, Long Beach VA database, and Statlog heart disease of University of California Irvine (UCI) repository to build the prediction models. Data preprocessing and feature selection steps were done before building the models. The models were evaluated based on the accuracy, precision, recall, F1-score, specificity, MCC, ROC–AUC, and Balanced Accuracy. The extra tree classifier performed best with 89.10% accuracy, 93.50% precision, 87.80% recall, 90.06% F1-score, 91.10% specificity, 78.00% MCC, 89.44% ROC–AUC, and 89.10% Balanced Accuracy, and future work more research should be conducted in order to improve the accuracy of the current figures. In order to enhance the sample size and provide statistical models the chance to train on a big dataset, they can construct their own dataset using the current datasets, increasing the chances that the predictions’ accuracy will improve. Eliminate false positives and false negatives from the current models with great care. The prediction models must be made accessible to the general public via a Web-based or mobile application so that anyone may try to monitor their heart health and contact a doctor if their results point to a linked illness. Users of these programmes should be aware that they are not a replacement for expert medical advice but rather are essentially models built using a particular dataset, which may contain inaccuracies.

References 1. Gaziano TA, Bitton A, Anand S, Abrahams-Gessel S, Murphy A (2010) Growing epidemic of coronary heart disease in low-and middle-income countries. Curr Prob Cardiol 35(2):72–115 2. Darba S, Safaei N, Mahboub-Ahari A, Nosratnejad S, Alizadeh G, Ameri H, Yousefi M (2020) Direct and indirect costs associated with coronary artery (heart) disease in Tabriz Iran. Risk Manage Healthc Policy 13:969 3. Evans MA, Sano S, Walsh K (2020) Cardiovascular disease, aging, and clonal hematopoiesis. Annu Rev Pathol 15:419 4. Halaris A (2016) Inflammation-associated co-morbidity between depression and cardiovascular disease. Inflammation-associated depression, Evidence, mechanisms and implications, pp 45– 70

552

G. N. Ahmad et al.

5. de la Torre JC (2006) How do heart disease and stroke become risk factors for Alzheimer’s disease? Neurol Res 28(6):637–644 6. Hamilton MT, Hamilton DG, Zderic TW (2007) Role of low energy expenditure and sitting in obesity, metabolic syndrome, type 2 diabetes, and cardiovascular disease. Diabetes 56(11):2655–2667 7. Azmi J, Arif M, Nafis MT, Alam MA, Tanweer S, Wang G (2022) A systematic review on machine learning approaches for cardiovascular disease prediction using medical big data. Med Eng Phys, 103825 8. Goff DC Jr, Khan SS, Lloyd-Jones D, Arnett DK, Carnethon MR, Labarthe DR, Loop MS, Luepker RV, McConnell MV, Mensah GA, Mujahid MS (2021) Bending the curve in cardiovascular disease mortality: Bethesda+ 40 and beyond. Circulation 143(8):837–851 9. Shafique U, Majeed F, Qaiser H, Mustafa IU (2015) Data mining in healthcare for heart diseases. Int J Innov Appl Stud 10(4):1312 10. Pal M, Parija S (2021) Prediction of heart diseases using random forest. In: Journal of physics: conference series 1817(1):012009. IOP Publishing 11. Rairikar A, Kulkarni V, Sabale V, Kale H, Lamgunde A (2017) Heart disease prediction using data mining techniques. In: 2017 international conference on intelligent computing and control (I2C2). IEEE, pp 1–8 12. Araujo M, Pope L, Still S, Yannone C (2021) GR-130-prediction of heart disease with machine learning techniques 13. Maalouf M (2011) Logistic regression in data analysis: an overview. Int J Data Anal Tech Strat 3(3):281–299 14. Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674 15. Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222 16. Kumar P, Singh SN, Dawra S (2022) Software component reusability prediction using extra tree classifier and enhanced Harris hawks optimization algorithm. Int J Syst Assur Eng Manage 13(2):892–903 17. Toulni Y, Nsiri B, Drissi TB (2023) Heart problems diagnosis using ECG and PCG signals and a k-nearest neighbor classifier. In: IoT based control networks and intelligent systems, pp 547–560. Springer, Singapore 18. Faieq AK, Mijwil MM (2022) Prediction of heart diseases utilising support vector machine and artificial neural network. Indonesian J Electr Eng Comput Sci 26(1):374–380 19. Frasanta MAH, Wijaya DR, Nugroho H, Fahrudin T (2022) Heart diagnose application using bagging algorithm. In: 2022 1st international conference on information system and information technology (ICISIT). IEEE, pp 342–346 20. Ahmad GN, Fatima H, Ullah S, Saidi AS (2022) Efficient medical diagnosis of human heart diseases using machine learning techniques with and without GridSearchCV. IEEE Access 10:80151–80173 21. Zhang S, Yuan Y, Yao Z, Yang J, Wang X, Tian J (2022) Coronary artery disease detection model based on class balancing methods and LightGBM algorithm. Electronics 11(9):1495 22. Reddy VSK, Meghana P, Reddy NS, Rao BA (2022) Prediction on cardiovascular disease using decision tree and Naïve Bayes classifiers. In: Journal of physics: conference series vol 2161(1): 012015. IOP Publishing 23. Boukhatem C, Youssef HY, Nassif AB (2022) Heart disease prediction using machine learning. In: 2022 advances in science and engineering technology international conferences (ASET). IEEE, pp 1–6 24. Liu J, Dong X, Zhao H, Tian Y (2022) Predictive classifier for cardiovascular disease based on stacking model fusion. Processes 10(4):749

VANET Hybrid Routing Protocol Featuring Perpetual Hopfield Network and Enhanced K-Means Clustering Algorithm Anuranj Pullanatt and A. Anitha

Abstract Passengers and drivers can use a variety of user applications on vehicular ad hoc networks (VANETs), in addition to security and Internet access applications. A dependable routing system is regarded as a critical problem in order to enable effective data transmission between vehicles. The main goal of this paper is to propose a new clustering-based routing protocol to increase data transmission in a VANET under high volume and high mobility conditions by combining a modified K-means approach with the maximum sustainable group problem and a linear hopfield network. Here, the K-means algorithm’s fundamental input parameters like the cluster head and how many cluster are need are chosen using hopfield network that are continuous and the problem of the maximum sustainable group are rather than at random. The distance factor in the approach for grouping vehicles into clusters will then be replaced with the link reliability model as a parameter. Finally, a weight function is used to pick the cluster head based on the speed, node degree, and the quantity of open buffer space. A simulation was performed in a highway vehicle setting to assess the effectiveness of the proposed technique, and comparisons were made with ICA-RBF and RMRPTS. According to the simulation results, throughput increases dramatically as a result of KMRP’s capacity to prevent traffic congestion and collisions. Moreover, KMRP provides a rapid solution that does not require a lot of compute or memory, reducing the end-to-end time. Finally, in terms of high density and mobility, KMRP outperforms other systems in terms of PDR. Keywords VANET · K-means · Clustering · Hopfield network · Routing protocol · Stability set problem

A. Pullanatt (B) · A. Anitha Noorul Islam Centre for Higher Education, Noorul Islam University, Thucklay, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_43

553

554

A. Pullanatt and A. Anitha

1 Introduction In the last few years, industrialization and urbanization have seen significant change. Indeed, a lot of businesses and academic institutions have expressed a strong interest in the intelligent transportation system (ITS). The latter utilizes information and communication technology to increase traffic efficiency and road safety [1, 2]. The main obstacles are regarded as being those brought on by environmental contamination, the depletion of energy supplies, the rise in accidents, the difficulties in managing and supervising transportation, and the rising demand for transportation [3, 4]. In order to provide autonomous wireless communication between various platforms, such as vehicle to vehicle (V2V), vehicle to infrastructure (V2I), vehicle to broadband cloud computing, and on-board communication, researchers have turned to vehicular networks, which have emerged as the future technology of ITS [5, 6]. Vehicles traveling at high speeds and moving continuously have created special requirements and characteristics for this kind of network. The availability of effective and scalable routing protocols is one of the key criteria for VANET applications [7, 8]. Therefore, one of the main problems for VANET is to provide a successful and reliable routing algorithm. Using clustering techniques is one of the best ways to address the VANET’s scalability issue. Vehicles are arranged in clusters in a clustering-based method. [9] A cluster is a collection of nodes that can communicate and share data through a connection they have created, and each cluster is led by a cluster head (CH) [10]. The cluster head can receive data from cluster member nodes. The cluster head can then use data aggregation techniques to remove redundant data [11]. This paper’s main contributions are an overview of the various clustering-based routing protocol schemes that have been suggested for VANET in the literature and the proposal of a routing protocol which is based on cluster for VANET, which combines managed security service provider and CHN using an altered K-means technique. The clustering method is classified into two phases, the first phase is the initial phase, and the second phase is the clustering phase.

2 Related Work VANETs have seen a number of clustering techniques and routing protocols proposed. Three major clustering and routing protocols are used in the VANET; they are clustering solutions based on neural networks, clustering solutions based on meta heuristics, and routing protocol based on clustering. In clustering solution based on neural network, it is classified into two phases, such as routing technique for VANET employing virtual annealing technique and neural networks, as well as a hybrid routing strategy utilizing an RBF neural network and imperialist competitive algorithms. In clustering solutions based on meta heuristics, it is classified into three types, they are clustering-based reliable low latency routing scheme using ACO method for vehicular networks, reliable multi-level routing protocol with tabu search

VANET Hybrid Routing Protocol Featuring Perpetual Hopfield …

555

(RMRPTS) in VANET, and CBQoS-VANET in VANET. Control overhead reduction in cluster-based VANET routing protocol, changes in strategy for energy-efficient pseudonyms based on distance and clustering over road networks, diffusion of data method based on probabilistic broadcasting and clustering in the VANET, clustering VANET using affinity propagation and PassCAR, communication protocol based on clustering for various mix-zones via road networks, routing protocol for VANET using routing protocol with passive clustering assistance, and VANET clusteringbased routing protocol appropriate are the seven types of routing protocol based on clustering that are classified. The authors of [12] present an improved initial centroid selection approach to improve the efficiency of certificate validation in VANET when using K-means. Although relatively efficient and straightforward in clustering, the K-means technique has the disadvantage that its performance is susceptible to centroid initialization. To address this limitation, the authors of [13] propose the K-harmonic means (KHM) algorithm, which formulates the performance function based on the harmonic averages of the distances from each data point to the centers, so that the algorithm’s performance is essentially insensitive to centroids initialization. Sabbagh et al. [14] created an AODV routing protocol and used the clustering concept and cuckoo search algorithm to create an efficient and stable path between the source and destination in a VANET. They used the K-means technique to construct clusters and the cuckoo search algorithm to select the shortest and most stable path among all possible paths. To provide an efficient and stable path, they calculated three weighted parameters in the cuckoo search algorithm as a fitness function. The simulation was run with NS-3 simulator and Bonn motion to test the performance, and the results are compared with popular routing protocol AODV. According to the evaluation results, the routing protocol KMCSA outperforms in terms of packet delivery ratio, packet loss ratio, overhead, average delay, and throughput, even when subjected to black hole attacks.

3 K-Means Clustering Algorithm One of the most well-known algorithms used in the clustering process is the K-means method. It is frequently employed in a number of industries, including data mining, sensor networks, and ad hoc networks. Hajlaoui et al. and Chai et al. [8, 15] It is a straightforward uncontrolled study technique that divides the data group into the K clusters that are given at the outset. Its major goal is to close the gap between the cluster’s members and cluster head. The algorithm initially selects K initial clusters, as shown in Fig. 1a. Rearranging a group of points with l > K into N clusters is the goal. With 1 N of data in a group of data, K-means randomly chooses K point xi where each centroid is located as centroids are associated with a cluster C. Each data point in the group is then connected by the algorithm to the nearest centroid. An objective function-based procedure is shown in Fig. 1b, calculating the total of all squared distances within a cluster for each cluster. The objective function is used to

556

A. Pullanatt and A. Anitha

Fig. 1 Clustering method of K-means

do the computation (1). avgminx

n

a(yl , vk ) = avgminx

k=1 yl ∈xk

n

|yl − vk |

(1)

k=1 yl ∈xk

where a(yl , vk ) = |yl -vk |2 is the separation between the point and the cluster’s centroid. yl is the location of the point, and vk is the location of the centroid with k = 1,…, n, n is the number of clusters. vk =

1 yl ∀k |xk | l∈x

(2)

k

Pseudocode of K-means algorithm is described below with clustering in Fig. 1, and a flowchart of K-means clustering algorithm is described in Fig. 2 showing clustering. Algorithm 1 – Pseudocode of K-means Input: Number of centroids k; set of points N; list of centroids randomly assigned C k Output: Set of clusters with their centroids Begin 1: Repeat 2: For each data point in N do 3: Calculate the distance between the data point and the centroid of each cluster using Eq. (1) 4: Assign the data point to the nearest centroid 5: End for 6: For each cluster in C k do 7: Calculate the new centroid position using Eq. (2) 8: End for 9: Until All data points belong to a cluster or the maximum number of iterations reached End

Therefore, K-means has effectively addressed a number of issues that came up during the clustering process in VANET [8]. However, due to the frequent changes in the number of connected vehicles and the network topology, there are some issues that arise, particularly with the random selection of the initial number of clusters and the objective function for attributing each point to a cluster [16].

VANET Hybrid Routing Protocol Featuring Perpetual Hopfield …

557

Fig. 2 K-means clustering algorithm flowchart

4 Proposed System In terms of bandwidth and routing overhead costs, the incorporation of clustering techniques into the routing protocols designed for VANET has proven effective [17– 19]. Because of the significant amount of vehicle mobility and frequent disconnections, cluster-based routing solutions are really efficient and resistant to ongoing changes in network design. In this regard, the suggested method is constructed with a roadway topology and consists of two methods: one for executing clustering, using a modified K-means algorithm, and the other for picking the initial cluster heads, using neural networks. In the suggested system, each cluster is headed by a cluster head that facilitates accepting, gathering, and dispersing a summary of data from all cluster members to other cluster heads.

558

A. Pullanatt and A. Anitha

In the proposed approach, the first phase is the initial phase, and second phase is cluster phase. Using the MSSP and CHN techniques, the initial cluster heads are chosen in the initial phase. One of the biggest issues in cluster analysis is still figuring out how many clusters there should be and which initial cluster heads are appropriate for a given data set [15, 20]. These are the fundamental settings for the K-means algorithm. Therefore, in order to get good results, the right cluster heads must be specified. We provide a technique to successfully make a thorough assessment of these characteristics in order to address this gap. The process will be broken down into two phases. To discover a stable set of cluster heads, we first reformulated the maximum stable set problem. Later, this issue be represented by a QP model problem 0-1 (QP). The next step is then provided for applying CHN to the QP problem. An energy function related to the CHN is needed for this stage, and a suitable parameterization technique for the MSSP problem will be shown. In Fig. 3, the yellow vehicles are regarded as a stable collection of nodes (initial cluster heads), while the red nodes are viewed as cluster members. The MSSP issue with k binary variables is shown by the quadratic program 0-1 (QP), which has a continuous function subject to quadratic restrictions. The following algebraic expression can be used to introduce this issue. (O P){Min f (y) = −

k

yk subject toy t x y = 0y ∈ {0, 1}k

k=1

The second phase is cluster phase where this phase has two type, they are cluster formation and cluster maintenance. In the cluster formation, the value of a vehicle’s connection reliability model with the accompanying cluster head determines how it is assigned to a cluster. Taking into account acceleration and position changes,

Fig. 3 Group of initial cluster heads

VANET Hybrid Routing Protocol Featuring Perpetual Hopfield …

559

Fig. 4 Cluster formation

this value gives an approximation of how long this vehicle can stay in the cluster. Accordingly, using this approach, the likelihood that a vehicle will join a cluster depends on traffic volume and speed in addition to the distance between it and the cluster head. Consequently, the link dependability model-based modified K-means algorithm’s new goal function F is written as follows: F = avgmaxx

n

ρkl (Tkl , λ)

k=1 yk ∈cl

As a result, a vehicle with the highest link reliability value between this vehicle and the cluster head can join the cluster. Figure 4 depicts the vehicles that comprise each cluster in red, while the cluster heads are depicted in yellow. During cluster maintenance, each cluster node computes and communicates its weight value to the appropriate cluster head. The new cluster head will then be chosen as the node whose CH value is lesser than the weight value. Indeed, depending on the network topology, the cluster head will occasionally need to be replaced. The node’s buffer size, speed, and node degree are the three metrics used to calculate the weight value [21]. Therefore, we used the following formula to determine this value: weight = α ∗

Bf Bini

+β ∗

u u max

+γ ∗

N Nmax

B f , Bini denote the open space and the initial value for vehicle buffer. u and u max specify the vehicle’s velocity and the maximum velocity of the vehicle. N and Nmax represent the vehicle’s neighbors and the maximum number of cars in the communication range.

560

A. Pullanatt and A. Anitha

α, β, and γ are weighted factors which correspond to free buffer and velocity, with α + β + γ = 1. In order to choose the value of α, β, and γ , we ran the simulation numerous times with different settings and got better results each time when we chose α = 0.5, β = 0.3, and γ = 0.2.

5 Experiment and Result This section includes a description of the simulated setting, a list of the parameters utilized, and a summary of the simulation’s findings. In the simulation, we assessed the effectiveness of our suggested strategy in comparison with the outcomes produced by ICA-RBF [22] and RMRPTS [23]. The same goal is intended by these more modern solutions. Simulation setup We have used network simulator NS2 to simulate our proposed routing protocol, as sample shown in Fig. 6, KMRP, in order to assess its effectiveness. The efficiency of, average end-to-end delay and ratio of packet delivery, the performance of the KMRP is contrasted with that of dispersed and centralized routing protocols created to achieve the same goal (PDR). Simulation parameters are shown in Table 1. We ran the simulation using two different situations, changing the number of vehicles from 150 to 350 and the speed from 90 to 150 kilometers per hour. We have Table 1 Simulation parameters

Parameter

Value

Road length

5 km

No. of lanes

2

Topology

Highway

No. of vehicles

100, 150, 200, 250, 300

Minimum speed

66 km/hour

Maximum speed

120 km/hour

Transmission range

250 m

Packet rate

4

Packet size

1000 bytes

Queue length

50 packets

Compared protocols

KMRP, ICA-RBF, RMRPTS

MAC protocol

IEEE 802.11P

Simulation time

100 s

Number of simulations runs

10

Critical density

200

VANET Hybrid Routing Protocol Featuring Perpetual Hopfield …

561

Fig. 5 Numbers of vehicle and throughput

taken into account a 5 km road with two lanes for the two situations, as illustrated in Fig. 3. Two directions are used to maneuver the cars. Each vehicle’s top speed is limited to 150 km/h, which means the maximum speed relates to the actual speed of a car on a roadway, which in the simulation can range from 90 to 150. The simulation was run ten times, and the average of the outcomes was calculated. Simulation result (A) Throughput: Fig. 5 shows how density affects throughput for the KMRP, ICA-RBF [12], and RMRPTS [6] algorithms. When there are 300 vehicles, the throughput for KMRP drops somewhat and hits 830 kbps. Comparing the KMRP protocol to ICA-RBF and RMRPTS, it generally gives a little higher throughput. This indicates that the bandwidth provided by ICA-RBF and RMRPTS is insufficient for the transmission of many control packets. On the other hand, KMRP lowers network congestion by forming the right number of clusters, with each cluster head taking commitment for exchange of data between various cluster members. A large improvement in throughput is consequently achieved by reducing crashes, which in turn helps to reduce traffic congestion. (B) Average end-to-end delay: The impacts of volume on the average end-to-end latency for the two suggested methods, ICA-RBF and RMRPTS, are shown in Fig. 7. For both ICA-RBF and RMRPTS, the average end-to-end delay rises with increasing vehicle density and speed. Higher bandwidth usage is what’s causing the rise in transmission delay. (C) Packet delivery ratio: The effect of volume on the ratio of packet delivery for KMRP, ICA-RBF, and RMRPTS is depicted in Fig. 8. This figure illustrates that KMRP offers an increased delivery of packet ratio than other methods. The delivery of packet ratio of the ICA-RBF and RMRPTS really rises from 93 and 87% to 94 and 92%, respectively, while the delivery of packet ratio of the KMRP maintains a value better than 92% and rises to 94% when the numbers of vehicle is equal to 350. This is because the effectiveness is consistent with network conditions like high volume. The effectiveness of the suggested procedure mostly relies on your capacity to modify the size of the cluster and select the right number of CH utilizing the MSSP method and CHN algorithm.

562

A. Pullanatt and A. Anitha

Fig. 6 Distribution of vehicle in VANET

Fig. 7 Numbers of vehicle and average end-to-end delay

6 Conclusion In this study, utilizing a modified K-means method together with the maximum sustainable group problem and linear hopfield network, we proposed a new clustering-based routing protocol to increase the transmission of data in VANET in high volume and high mobility circumstance. In our method, the appropriate CHs are chosen by solving the maximum stability set problem via continuous hopfield network. The process of clustering is then depending on the link dependability model, and the cluster’s maintenance chooses a new cluster head using criteria such a free buffer rate, clustering node degree, and speed of the vehicle. A vehicle in a cluster

VANET Hybrid Routing Protocol Featuring Perpetual Hopfield …

563

Fig. 8 Packet delivery ratios and number of vehicles

will be regarded as the cluster head for that cluster if it has a sufficient free space, an acceptable velocity, and a maximum cluster node degree. A simulation was run in a highway vehicle environment to evaluate the effectiveness of the suggested strategy, and comparisons with ICA-RBF and RMRPTS were made. Throughput increases significantly as a result of KMRP’s ability to eliminate traffic jams and crashes, according to the simulation’s results. Furthermore, KMRP offers a quick approach that does not need a lot of compute or memory, which lowers the end-to-end time. Finally, in terms of high density and mobility, KMRP offers superior PDR than other schemes. By ensuring cluster stability using our suggested method, we may steer clear of recurrent and redundant data transfer.

References 1. Liu L, Chen C, Pei Q, Maharjan S, Zhang Y (2020) Vehicular edge computing and networking: a survey. Mobile Netw Appl 2:1–24. https://doi.org/10.1007/s11036-020-01624-1 2. Wang S, Huang C, Wang D (2020) Delay-aware relay selection with heterogeneous communication range in VANETs. Wirel Netw 26(2):995–1004 3. An C, Wu C, Yoshinaga T, Chen X, Ji Y (2018) A context-aware edgebased VANET communication scheme for ITS. Sensors 18(7):2022 4. Kandali K, Bennis H (2019) Performance assessment of AODV, DSR and DSDV in an urban VANET scenario. In: Proc advance intelligent system sustainable development 915, 98–109 5. Hasrouny H, Samhat AE, Bassil C, Laouiti A (2019) Trust model for secure group leader-based communications in VANET. Wirel Netw 25(8):4639–4661 6. Hussain R, Rezaeifar Z, Son J, Bhuiyan MZA, Kim S, Oh H (2017) PB-MII: replacing static RSUs with public buses-based mobile intermediary infrastructure in urban VANET-based clouds. Cluster Comput 20(3):2231–2252 7. Kandali K, Bennis H, Benyassi M (2020) ‘A novel route discovery mechanism based on neighborhood broadcasting methods in VANET.’ In: Ezziyyani M (ed) Advanced intelligent systems for sustainable development (lecture notes in networks and systems), vol 92. Springer, Cham, Switzerland, pp 1–12 8. Hajlaoui R, Alsolami E, Moulahi T, Guyennet H (2019) An adjusted K-medoids clustering algorithm for effective stability in vehicular ad hoc networks. Int J Commun Syst 32(12):e3995

564

A. Pullanatt and A. Anitha

9. Shah AFMS, Karabulut MA, Ilhan H, Tureli U (2020) Performance optimization of clusterbased MAC protocol for VANETs. IEEE Access 8:167731–167738. https://doi.org/10.1109/ ACCESS.2020.3023642 10. Shah YA, Habib HA, Aadil F, Khan MF, Maqsood M, Nawaz T (2018) CAMONET: Moth-flame optimization (MFO) based clustering algorithm for VANETs. IEEE Access 6:48611–48624. https://doi.org/10.1109/ACCESS.2018.2868118 11. Ramalingam M, Thangarajan R (2020) Mutated K-means algorithm for dynamic clustering to perform effective and intelligent broadcasting in medical surveillance using selective reliable broadcast protocol in VANET. Comput Commun 150:563–568 12. Zhang Q, Almulla M, Ren Y, Boukerche A (2012) An efficient certificate revocation validation scheme with k-means clustering for vehicular ad hoc networks. IEEE Symp Comput Commun (ISCC) 2012:862–867 13. Runkler T (2011) Partially supervised k-harmonic means clustering. IEEE Symp Comput Intell Data Mining (CIDM) 2011:96–103 14. Sabbagh AA, Shcherbakov MV (2022) A hybrid clustering-based routing protocol for Vanet using k-means and cuckoo search algorithm. In: Vishnevskiy VM, Samouylov KE, Kozyrev DV (eds) Distributed computer and communication networks. DCCN 2021. Communications in computer and information science, vol 1552. Springer, Cham. https://doi.org/10.1007/9783-030-97110-6_4 15. Chai R, Ge X, Chen Q (2014) Adaptive K-harmonic means clustering algorithm for VANETs. In: proceedings 14th international symposium on communications and information technologies (ISCIT), Incheon, U.K., pp 233–237 16. Agrawal A, Gupta H (2013) ‘Global K-means (GKM) clustering algorithm: a survey.’ Int J Comput Appl 79(2):20–24 17. Abuashour A, Kadoch M (1970). Control overhead reduction in cluster-based VANET Routing protocol. SpringerLink. Retrieved December 15, 2022, from https://doi.org/10.1007/978-3319-74439-1_10 18. Abbas F, Fan P (2018) Clustering-based reliable low-latency routing scheme using ACO method for vehicular networks. Veh Commun 12:66–74 19. Bagherlou H, Ghaffari A (2018) A routing protocol for vehicular ad hoc networks using simulated annealing algorithm and neural networks. J Supercomput 74(6), 2528–2552 20. Mekelleche RF, Hafid H (2020) Towards the development of vehicular adhoc networks (VANETs): Challenges and applications. In: IoT and cloud computing advancements in vehicular Ad-Hoc networks. Hershey, PA, USA: IGI Global, pp 21–47 21. Wang S-S, Lin Y-S (2013) PassCAR: a passive clustering aided routing protocol for vehicular ad hoc networks. Comput Commun 36(2):170–179 22. Mohammadnezhad M, Ghaffari A (2019) Hybrid routing scheme using imperialist competitive algorithm and RBF neural networks for VANETs. Wireless Netw 25(5):2831–2849 23. Moridi E, Barati H (2017) RMRPTS: a reliable multi-level routing protocol with tabu search in VANET. Telecommun Syst 65(1):127–137

Geo Science-Based Optimization Algorithms: A New Paradigm Aishwarya Mishra and Lavika Goel

Abstract Various problems in science and engineering may be formulated as optimization problems with intricate nonlinear constraints. Despite the fact that naturallyinspiring systems feature a multitude of intricate underlying mechanisms. Various Nature-Inspired Optimization Algorithms (NIOAs) have been created recently, drawing inspiration from nature. Likewise, a new paradigm is presented that draws its inspiration from geoscience, which studies the surface of the earth and related natural phenomena. Highly nonlinear challenges typically call for more advanced optimization algorithms, which ordinary algorithms may find difficult to solve. Due to their adaptability and efficiency, geoscience-based optimization algorithms are currently popular. However, there are several significant problems with geoscience and computation inspired by nature. An in-depth analysis of certain new geoscience-based optimization algorithms is presented in this paper. Keywords Optimization · Geo-science · Nature-inspired optimization algorithm (NIOAs) · Computational intelligence

1 Introduction Many real-world applications involve the optimization of certain objectives such as numerous real-world applications involve the optimization of specific goals, such as the reduction of expenses, the reduction of energy consumption, the reduction of environmental impact, and the improvement of sustainability, efficiency, and performance. The optimization problems that may be posed include multimodal objectives and are highly nonlinear, subject to a wide variety of complex nonlinear constraints. Dealing with such challenges is quite risky. Even though current computers are A. Mishra (B) · L. Goel Malaviya National Institute of Technology, Jaipur 302017, India e-mail: [email protected] L. Goel e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_44

565

566

A. Mishra and L. Goel

becoming more and more powerful, using simple brute force approaches is still undesirable and impractical. Because of this, it is crucial in these circumstances to use effective strategies. But it’s likely that there aren’t any effective methods for solving optimization problems that are unique to each application. A large number of alternative optimization strategies exist, although the most majority are gradientbased and local search approaches like the interior-point method and the trust-region method [1]. Due to this, the final solutions could be dependent on the original starting points. Utilizing approximation algorithms, including genetic algorithms (GA) [2] and Swarm intelligence (SI) [3], is a current trend. In reality, a variety of SI-based algorithms, like as Particle Swarm Optimization (PSO), Gravitational Search Algorithm (GSA), Bat Algorithm (BA), Firefly Algorithm (FA), Flamingo Search Algorithm (FSA), Ant Colony Optimization algorithm (ACO), and others, have evolved in recent years. These nature-inspired algorithms frequently employ a swarm of several, cooperative agents to produce the search motions in the search space. These global optimizers are frequently straightforward, adaptable, and unexpectedly effective, as demonstrated in various applications and case studies [2, 9, 10, and 11]. Over the past three decades, there has been a substantial advancement and the emergence of several applications [3]. This article will provide a brief summary of some of these significant developments in a nutshell, in order to better understand the article’s structure, please read on. In Sect. 2, we get a high-level overview of a wide range of current NIOAs geoscience-based concepts and highlights their key traits. Section 3 is devoted to analysis using various applications of algorithms inspired by nature, time complexity, exploration, and exploitation of NIOAs.

2 NIOAs Study 2.1 With Respect to Time Complexity The time complexity of optimization algorithms, which are iterative algorithms that generate a series of solutions that converge to an optimal solution to the problem, is displayed in Table 1.

2.2 With Respect to the Application Optimization issues with significant nonlinear constraints are present in the majority of real-world applications. Some of the earlier proposed conventional approximation algorithms, shown in Fig. 1, including evolutionary algorithms, trajectory based, swarm based and gradient-based methods, have failed to tackle these real-world problems, and the majority of solutions have been found using heat

Geo Science-Based Optimization Algorithms: A New Paradigm

567

Table 1 NIOAs with time complexity NIOAs

Time complexity

References

Particle swarm optimization (PSO)

(3. d. m + m). n ~ O (d. m. n)

[4–7]

Ant colony optimization (ACO)

2.m.d + 2 m). n ~ O (d.m.n)

[8–11]

Bat algorithm (BA)

O (4.m.d + m). n) ~ O (d.m.n)

[12–14]

Firefly algorithm (FA)

O (m.m.d). n

[15, 16]

Gravitational search algorithm (GSA)

O(d + m). m. n)

[17–19]

Flamingo search algorithm (FSA)

2.m. d

[20]

Plate tectonic based neighbourhood search optimization algorithm (PBO)

O (d.n.m)

[21]

Fig. 1 Classification of approximation algorithm

and trail methods. Table 2 illustrates the various engineering applications of the nature-inspired optimization techniques that offer adaptive computational tools for challenging optimization issues.

2.3 With Respect to Exploration and Exploitation Table 3 illustrates the two fundamental features of population-based algorithms, exploration and exploitation, with local sensitive and global sensitive parameters.

568

A. Mishra and L. Goel

Table 2 NIOAs with applications NIOAs

Applications

Particle swarm optimization (PSO)

Forensic science, file allocation, game theory, automated computer design, code breaking, etc

Ant colony optimization (ACO)

Classification, AntNet for network routing application, multiple knapsack problem

Bat algorithm (BA)

Web document classification, image watermarking, and clustering using multivariable PID controllers

Firefly algorithm (FA)

Medical pattern classification, train neural networks, and clustering issues

Gravitational search algorithm (GSA)

Intrusion detection, cluster analysis, load frequency control

Flamingo search algorithm (FSA)

Path planning

Plate tectonic based neighbourhood search optimization algorithm (PBO)

Fault monitoring in IoT systems

Exploration is escalating search space leading to new candidate solutions whereas exploitation is digging a solution in one specified local search space. A workable trade-off between the two aspects will assure a near-optimal solution. They are factors which contribute to intensification or convergence. In every iteration a better solution is achieved leading to a final near optimal solution. The particle diversity can be maintained throughout the initial iterations while local search functionality is improved during the final iteration. Table 3 NIOAs best in NIOAs

Best in? Exploration

Performance evaluation Exploitation

Global SearchArea/local SearchArea

PSO

✓

Local SA = 0.026 Global SA = 0.0267

ACO

✓

Local SA = 0.1095 Global SA = 0.0990

✓

Local SA = 0.0754 Global SA = 0.0876

BA

✓

FA

✓

Local SA = 0.1441 Global SA = 0.1587

GSA

✓

Local SA = 0.0141 Global SA = 0.0171

FSA

✓

PBO

✓

Local SA = 0.0053 Global SA = 0.0102

✓

Local SA = 0.0222 Global SA = 0.0131

Geo Science-Based Optimization Algorithms: A New Paradigm

569

3 Proposed Optimization Techniques Under the Geoscience Paradigm The study of the earth and its natural resources, such as rocks, minerals, and mountains, is known as geoscience. It also involves research on the processes taking place in the earth’s atmosphere, on its surface, or below it. Plate tectonics, earthquakes, volcanoes, ocean currents, tidal waves, and many other natural phenomena are a few of these processes shown in Fig. 2. The Big Bang-Big Crunch (BB-BC) optimization technique was inspired by the universe’s evolution [12]. It is carried out in two stages. The distribution of potential solutions occurs stochastically in the search space during the Big Bang (first phase), and during the Big Crunch (second phase), the scattered solutions are at the center of mass. The algorithm maintains a balanced ratio of exploitation to exploration [13]. In [14], the author developed a variation of BB-BC in which chaos was added to the second phase to speed convergence to the representation point. In the first phase, a uniform population was initialized to expand the search space. The procedure is hence known as UBB-CBC. The technique raised the standard of the BB-BC method’s solutions. The phenomenon of plate tectonics is the division of the globe into several plates or tectonic plates. The earth is made up mostly of two layers: the asthenosphere, which is the inner, partially molten layer, and the lithosphere, which is the solid, rocky layer and includes the crust and upper mantle. The earth’s asthenosphere layer is made up of a number of tectonic plates that make up the lithosphere. These tectonic plates are moving very slowly and continuously. When put into an algorithm, this general surface upkeep can aid in finding the best solution [15]. Ocean currents are defined as the directed flow or movement of seawater that is caused by a variety of elements, including solar heating, wind, gravity, Coriolis, and others. Ocean water is always in motion due to a variety of variables. The ocean conveyor belt is a term used to describe the continuous transport of water in oceans. The balanced shift in the earth’s seasons is caused by this ongoing movement of Fig. 2 Various geo science phenomena

570

A. Mishra and L. Goel

water, which can serve as an objective function for an optimization algorithm based on the ocean current model. A volcano is a location where molten rock or magma from the earth’s mantle emerges as lava at the surface of the planet. The majority of volcanic eruptions occur near plate boundaries when two plates are moving either apart (diverging) or together (converging) (converging). Due to its greater density, the oceanic plate subducts beneath the continental plate. By drawing inspiration from the eruption of a volcano, an optimization algorithm can be created with the objective function or fitness function being the cooling down of the volcano. An earthquake is a geological event that occurs deep below the ground and produces intense vibrations, as a result of the energy that is suddenly released in the lithosphere of the planet, which causes seismic waves. The earth shakes when the vibrations reach the surface. The deformation in the surfaces of the defects or their rough surfaces generates the stick–slip behavior.

3.1 Governing Forces and Equations Plate tectonics: The responsible forces in the plate tectonic-based optimization algorithm are as follows: Ridge push force, Slab Pull force, Plate drag, and Basal drag. Fplate−motion =

Fslab−push − Fplate−drag if T ≤ Tc Fridge−pull − Fplate−drag if T > Tc

(1)

P M Ii = max(|∇ f (c)i |∀i ∈ [1, d]

(2)

P M Ioptpoint ≤ ε

(3)

P M Ic ≥ 0

(4)

where, T is the temperature, Tc is a critical temperature i.e., 635 K, T > Tc is ridge push condition, T ≤ Tc is slab pull condition, and P M Ii is plate mobility index. Ocean currents: The responsible forces in the ocean currents are as follows: Coriolis force, Gravitational force, Wind direction and Pressure gradient force. C F = c(2ωsinθ )

(5)

where, C F is Coriolis Force, c is speed of moving body and ω is angular velocity of earth’s rotation.

Geo Science-Based Optimization Algorithms: A New Paradigm

571

Volcanic eruptions: The responsible forces in the volcanic eruptions are as follows: Density, Pressure and Buoyancy. A culmination of volcanic causes and effects: Volcano Eruption ∝ B and P Volcano Eruption ∝

1 T

Volcano Eruption ∝

1 ρ

P = ρgh

(6)

where, P is Pressure, ρ is density, g is the acceleration due to gravity, and h the depth of the fluid. Earthquake: The responsible factors in the earthquake are as follows: Frictional force, Potential energy and Kinetic energy. i.e. Intensity of Earthquake ∝ FF Intensity of Earthquake ∝ K E FF = μ × F

(7)

where, FF is Frictional Force, μ is Friction coefficient, and F is Force of mass.

3.2 Stability Point The study of processes taking place in the earth’s atmosphere, on its surface, or underneath it is included in geoscience. Plate tectonics, earthquakes, volcanoes, ocean currents, tidal waves, and many other natural phenomena are a few of these processes. Geoscience-based algorithms can be advantageous because they can avoid local maxima or minima and because they are dynamic in nature, they can be useful in managing applications that need a dynamic environment. Additionally, geoscience phenomena have the unique ability to stabilize after a specific amount of time, regardless of the situation, therefore they can be useful in managing optimization applications under dynamism when dealing with an unknown space. The variables and stability points that can be used in optimization methods as variable quantity and objective functions have been identified, and they are given in Table 4.

572

A. Mishra and L. Goel

Table 4 Factors impacting geoscience phenomena’s stability point Natural phenomenon Factors impacting

Stability point

Plate tectonics

Ridge push force When all of the plates are in a steady posture Slab Pull force and moving at their normal, consistent speed Plate drag or basal drag

Ocean currents

Gravity Maintain equilibrium between the forces of Pressure gradient forces gravity and Coriolis despite the constant Coriolis effect motion of the water Frictional forces

Earthquake

Frictional force Potential energy Kinetic energy

There has been no seismic activity on the surface of the earth

Volcanic eruptions

Density Buoyancy Pressure Temperature

When the magma cools down and loses its heat, which results in a drop in temperature and pressure and buoyancy tends to zero

4 Conclusion Getting exploration and exploitation in the right proportions, Geo science inspired technique improves global optimization results and speeds up convergence, as shown by the comparison with NIOAs and constriction factor of other algorithms. Hence, one possible research direction is the use of new Geo-inspired phenomena such as Plate tectonics-based neighbourhood search optimization algorithm (PBO) and other natural geo science phenomenon like Ocean currents, Earthquakes, Volcanic eruptions are may be more effective in future for complex and large problem-solving real-world applications. New applications of geo science-inspired optimization algorithms to practical issues are another fascinating field for future study. Future research might either continue to build such algorithms or incorporate them into already existing NIOAs.

References 1. Simon D, Rarick R, Ergezer M, Du D (2011) Analytical and numerical comparisons of biogeography-based optimization and genetic algorithms. Inf Sci (N Y) 181(7):1224–1248. https://doi.org/10.1016/j.ins.2010.12.006 2. Shukla AK, Pippal SK, Chauhan SS (2019) An empirical evaluation of teaching–learningbased optimization, genetic algorithm and particle swarm optimization. Int J Comput Appl. https://doi.org/10.1080/1206212X.2019.1686562 3. Goel L, Mishra A (2022) A survey of recent deep learning algorithms used in smart farming. In: 2022 IEEE region 10 symposium (TENSYMP), pp 1–6. https://doi.org/10.1109/TENSYM P54529.2022.9864477. 4. Wu D, Jiang N, Du W, Tang K, Cao X (2020) Particle swarm optimization with moving particles on scale-free networks. IEEE Trans Netw Sci Eng 7(1):497–506. https://doi.org/10. 1109/TNSE.2018.2854884

Geo Science-Based Optimization Algorithms: A New Paradigm

573

5. Vaze R, Deshmukh N, Kumar R, Saxena A (2021) Development and application of quantum entanglement inspired particle swarm optimization. Knowl Based Syst 219. https://doi.org/10. 1016/j.knosys.2021.106859 6. Wagner MP, Slawig T, Taravat A, Oppelt N (2020) Remote sensing data assimilation in dynamic crop models using particle swarm optimization. ISPRS Int J Geoinf 9(2). https://doi.org/10. 3390/ijgi9020105 7. Mirzaie N, Banihabib ME, Shahdany SMH, Randhir TO (2021) Fuzzy particle swarm optimization for conjunctive use of groundwater and reclaimed wastewater under uncertainty. Agric Water Manag 256. https://doi.org/10.1016/j.agwat.2021.107116. 8. Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39. https://doi.org/10.1109/MCI.2006.329691 9. Sebayang AH et al (2017) Optimization of bioethanol production from sorghum grains using artificial neural networks integrated with ant colony. Ind Crops Prod 97:146–155. https://doi. org/10.1016/j.indcrop.2016.11.064 10. Zhang Y, Li M, Zheng L, Qin Q, Lee WS (2019) Spectral features extraction for estimation of soil total nitrogen content based on modified ant colony optimization algorithm. Geoderma 333:23–34. https://doi.org/10.1016/j.geoderma.2018.07.004 11. Karri RR, Sahu JN, Meikap BC (2019) Improving efficacy of Cr (VI) adsorption process on sustainable adsorbent derived from waste biomass (sugarcane bagasse) with help of ant colony optimization. Ind Crops Prod 143. https://doi.org/10.1016/j.indcrop.2019.111927 12. Yang X-S, He X (2013) Bat algorithm: literature review and applications 13. Mishra AR, Pippal SK, Kumar AA, Singh D, Singh A (2021) Clear vision—obstacle detection using bat algorithm optimization technique. In: 2021 9th international conference on reliability, infocom technologies and optimization (trends and future directions) (ICRITO), pp 1–5. https:// doi.org/10.1109/ICRITO51393.2021.9596467 14. Senthilnath J, Kulkarni S, Benediktsson JA, Yang XS (2016) A novel approach for multispectral satellite image classification based on the bat algorithm. IEEE Geosci Remote Sens Lett 13(4):599–603. https://doi.org/10.1109/LGRS.2016.2530724 15. Fister I, Yang XS, Brest J (2013) A comprehensive review of firefly algorithms. Swarm Evol Comput 13:34–46. https://doi.org/10.1016/j.swevo.2013.06.001 16. Yang X-S (2014) Cuckoo search and firefly algorithm: overview and analysis, pp 1–26. https:// doi.org/10.1007/978-3-319-02141-6_1 17. Rashedi E, Nezamabadi-pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci (N Y) 179(13):2232–2248. https://doi.org/10.1016/j.ins.2009.03.004 18. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2010) BGSA: binary gravitational search algorithm. Nat Comput 9(3):727–745. https://doi.org/10.1007/s11047-009-9175-3 19. Goel L (2022) Path extraction and planning for intelligent battlefield preparation using particle swarm optimization, gravitational search algorithm, and genetic algorithm, pp 77–89. https:// doi.org/10.1007/978-981-16-7136-4_7 20. Zhiheng W, Jianhua L (2021) Flamingo search algorithm: a new swarm intelligence optimization algorithm. IEEE Access 9:88564–88582. https://doi.org/10.1109/ACCESS.2021.309 0512 21. Goel L, Jain R (2021) A plate-tectonics based neighborhood search optimizer and its application for fault monitoring in IoT systems. Knowl Based Syst 234. https://doi.org/10.1016/j.knosys. 2021.107551

Identifying Incorrect Postures While Performing Sun Salutation Using MoveNet Sheetal Girase, Omkar Dutta, Adwait Mahadar, Atharva Ghodmare, and Mangesh Bedekar

Abstract Sun salutation or salute to the Sun is a sequence of about twelve poses that are gently linked to one another. The basic movement pattern entails rising to your feet, striking the downward and upward dog postures, and then rising to your feet again. Sun salutation stimulates the parasympathetic nervous system, which helps you to feel relaxed. Yoga necessitates expert coaching since it consists of asanas that must be performed correctly for the best results. COVID has taught us that we can perform many activities online like exercises. For the better experience in the online mode, there is a need to have a system that will not only check correct posture but also alert users if the pose is performed wrongly. This paper addresses this issue by creating a supervised system that detects a yoga pose using MoveNet architecture and neural networks and transfer learning for pose estimation. Twelve Sun salutation poses are examined to train the system and gage its prediction accuracy. The most recent, cutting-edge human posture estimation architecture, MoveNet model, developed by Google’s TensorFlow team, is employed. An effective use of this model’s KeyPoints regression field function was made on the input image frames to locate 17 KeyPoints on the human body. As a result of which, the paper offers a system having accuracy of 96% in identifying users’ yoga posture. The system is capable of verifying posture and sending buzzers to alert the users when they perform erroneous postures. Keywords MoveNet · Sun salutation · Neural network · Pose recognition

1 Introduction In today’s modern and fast-paced world, most of us lead erratic and hectic lives. Life’s ever-increasing pressures, such as excessive task expectations, health, relationship, and financial issues, can have a detrimental impact on our well-being. All of this can result in anxiety, mood fluctuations, sleeplessness, and even despair. It is scientifically S. Girase (B) · O. Dutta · A. Mahadar · A. Ghodmare · M. Bedekar Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra 411038, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_45

575

576

S. Girase et al.

Fig. 1 Twelve Sun salutation asanas

proven that yoga is an effective therapy to help individuals facing health challenges at any level to manage their condition, reduce symptoms, increase vitality, and improve attitude. Sun salutations, also known as Surya Namaskars, are a staple of traditional yoga techniques. It is a practice in yoga to incorporate a sequence of twelve gracefully linked asanas as shown in Fig. 1. They can be performed at any time of day as a standalone physical workout or at the start of a session to warm up the entire body. It allows us to use the body as an instrument of higher awareness, so that one can receive wisdom and knowledge. Sun salutation helps in losing weight, helps to keep you disease-free and healthy, balances the body and mind, improves blood circulation, improves digestion system, and strengthens the heart. However, just like any other form of exercise or workout, you need to be careful while performing yoga. Being careless while getting into different asanas of yoga can cause serious problems or injury. Performing unsupervised yoga or not following the proper form can cause back pain and stiffness in the neck [1]. While Sun salutation has immense health and mental benefits, one should be careful while performing it, especially in today’s virtual and secluded world where people cannot physically attend yoga classes to learn precise yoga techniques. Yoga pose monitoring systems that observe body movements can be very beneficial to the yoga instructor and the yoga performer to perform and assess the poses efficiently. There is, thus, a need to have an automated system that helps perform yoga in a disciplined manner by accurately detecting Sun salutation poses and alert the user if any incorrect postures. Although models are implemented which are capable of doing this task, they lack the swiftness in achieving the result. To address this issue, the proposed system combines a fast pose detection module combined with an ANN as shown in Fig. 2. Pose detection remains the core of this proposed system which is accomplished by MoveNet—a very quick and precise model that detects 17 KeyPoints. These KeyPoints are the best indicator of the pose which is currently being performed. To sum up, the key contributions of this paper are as follows: • To understand the yoga pose, it is essential to understand the human in the frame of the video. For the same, the MoveNet architecture is employed. It is more accurate as compared to earlier models like PoseNet [2]. It accurately classifies whether the performed Sun salutation pose is incorrect or correct in the real-time environment. • If the pose is incorrect, the model alerts the user with a Buzzer to correct his/her posture. • Generation of real-time reports along with the timestamp which helps users correct his posture and improve his performance.

Identifying Incorrect Postures While Performing Sun Salutation Using MoveNet

577

Fig. 2 a Real-time report generation; b interface that indicates the next asana to be performed

• While performing Sun salutation, it is necessary to perform all 12 asanas in sequence. For the same, the user interface is developed which informs users about the next asana to be performed to maintain the sequence/order of Sun salutation.

2 Related Work Recently, the skeleton modality has attracted much attention in the research community. The extracted skeleton-based representation of human movement is very compact and accurate as compared to optical flow, RGB sequence. This significantly reduces computational cost on human activity recognition tasks. Badrinarayanan et al. [3] monitored biochemical signal reading of subjects while they performed Sun salutations, using body mounted inertial measurement unit (IMU). Their main objective was to provide a framework to quantify grace and consistency for any exercise with repetitive movements. They applied Hilbert–Huang transform (HHT) to the IMU data and also did a comparative analysis of three signal processing techniques—fast Fourier transform (FFT), continuous wavelet transform (CWT), and HHT. Through their study, they observed that the biochemical signals generated by the body were nonlinear in nature, and HHT was the most effective technique compared to FFT and WT in analyzing nonlinear and non-stationary signals. Machine learning techniques for identification of yoga poses. Agrawal et al. [4] used a tf-pose skeleton for extracting angles of the joints in the human body and used them as a feature to implement various machine learning models. They have achieved high accuracy with random forest classifier algorithm. Some approaches that relied on deep learning techniques along with OpenPose architecture for the estimation of human pose have been employed [5, 6, 7]. Time taken to predict the pose was higher as compared to other models [7]. Jain and Harit [8] proposed an algorithm that assesses how well a person practices Sun salutation in terms of grace and consistency. Their approach works by training individual

578

S. Girase et al.

hidden Markov models (HMMs) for each asana using space time interest points (STIPs) features followed by automatic segmentation and labeling of the entire Sun salutation sequence using a concatenated-HMM. Yadav et al. [5] created a model to accurately recognize various yoga asanas using deep learning algorithms. They created a dataset of six yoga asanas (i.e., Bhujangasana, Padmasana, Shavasana, Tadasana, Trikonasana, and Vrikshasana) using 15 individuals. Kumar and Sinha [9] implemented three pose classification models. The first model used a backing vector machine, the second CNN, and the third OpenPose architecture, followed by CNN and LSTM. The data [10] consisted of 15 people performing six different asanas. The third model, which uses OpenPose on each casing of the video to provide KeyPoints using part confidence maps, part affinity fields, and a bipartite coordinating system, produced the best results with a precision of 99.87%. These KeyPoints are then sent to CNN, which classifies the poses. The model then employs LSTM to investigate the change in asanas in the video. A yoga self-coaching system based on transfer learning principles was introduced in this work [11]. Investigated was the transfer learning method, which was trained on the MobileNet model. The purpose of the yoga self-coaching system was to identify the yoga postures performed in accordance with the chosen yoga posture guide and with the help of MediaPipe algorithm identify improper posture. These systems are generally trained on the set of poses which are distinctively different which makes it comparatively easier for their proposed networks to perform well. Also, the pretrained architectures are slow in comparison with MoveNet, making it difficult to achieve the almost real-time requirement of the system.

3 Methodology In the realm of computer vision, the task of human pose estimation is still a challenge under consideration. Figure 3a gives an overview of our methodology. As an input for pose estimation tasks, the user’s activity is being recorded as he/she performs Sun salutations using a camera and processes it. The captured video is first converted into images and preprocessed in order to make it suitable for pose estimation. The pose estimation module employs the MoveNet model to estimate the user’s pose, and an artificial neural network (ANN) classifies it. The assessment module checks whether the user performs the Sun salutations in sequence and whether there are any inaccuracies in the asanas as shown in Fig. 3a. Figure 3b shows high-level architecture of our proposed methodology. The MoveNet architecture is discussed in detail in Sect. 4.

Identifying Incorrect Postures While Performing Sun Salutation Using MoveNet

a Video

Convertn g video

Preproce ssing the

Pose Esmao

ANN

Assessme nt

Output

Real me Yoga acvity captured through the webcam

Retrieving frames at the rate of 30 frames per second Applying Transformaon, resizing and filtering non RGB frames MoveNet Architecture Calculang the confidence score based on the Pose Esmaon dataframe Based on confidence score predicng the correctness of the pose performed and the next asana in order Generaon of Buzzer in case of incorrect pose and generaon of the report

b

Fig. 3 a Proposed methodology, b high-level system architecture diagram

579

580

S. Girase et al.

Table 1 Output of MoveNet model for the Fig. 3a Feature name

Position/score

Feature name

Position/score

nose_x

129

left_knee_score

0.781883

nose_y

283

right_knee_x

347

nose_score

0.769335

right_knee_y

288

left_eye_x

118

right_knee_score

0.825123

left_eye_y

271

left_ankle_x

311

left_eye_score

0.683927

left_ankle_y

470

right_eye_x

117

left_ankle_score

0.679038

right_eye_y

271

right_ankle_x

321

right_eye_score

0.714086

right_ankle_y

458

left_ear_x

133

right_ankle_score

0.473261

3.1 Preprocessing Module While the subject is performing Sun salutations, a real-time video is captured through the computer’s Webcam using the Python-OpenCV module. This video is broken into image frames at 30 fps. The preprocessing module is developed following techniques discussed in [12]. For the MoveNet architecture, the input size requirement is 193 × 300. So all the frames are then resized into 192 × 300. Later, the images which are not into required RGB format are filtered out. Now, these images are fed as an input to the pose classification module.

3.2 Pose Classification Module The pose classification module consists of the MoveNet model and our own ANN. The MoveNet model accepts the preprocessed image as an input and then extracts different parts of the body by using landmark vectors. These landmarks are then organized in a dataframe, with each row containing the extracted feature (landmark) coordinates as shown in Table 1. The landmarks recognized on the Fig. 4a are recognized by the MoveNet model as shown in Fig. 4b. This dataframe output is fed to the ANN model which reads the landmarks and based on which it classifies the Sun salutation pose.

3.3 Assessment Module The assessment module receives the pose number of the recognized Sun salutation asana and its confidence number as an output from the pose classification module.

Identifying Incorrect Postures While Performing Sun Salutation Using MoveNet

581

Fig. 4 a Input frame to the pose classification module; b overlayed KeyPoints obtained using MoveNet

This module performs two functions, first to check whether the pose number is in the correct order of poses or not. Second, it checks whether the performed pose is correct based on the confidence score. This score is compared with the chosen threshold limit which is 0.1 based on which the pose is classified as correct or incorrect. In case of incorrect order of pose and/or incorrect pose, the system notifies the user via a buzzer alarm and specifies this in the report with proper timestamp.

Algorithm of proposed methodology 1

Initialization of variables: assign zero to i variable, current time to the x variable and VideoCapture(0) to the vid variable

2

while(True) do:

3

if(current time - x)=5: //Consider the frame on the fifth second

4

if(i > 12) // Exit Loop

5

Break out of the Loop

6

current-pose

7

if(expected-pose[i] != current-pose) //If incorrect pose detected

8

getPose(frame), frame

save the current frame from the vid stream

Play alarm and note the deviation in the report

9

i=i+1

10

end

12

Display report

582

S. Girase et al.

4 Human Pose Estimation and Classification Human pose estimation (HPE) is a method for identifying and classifying different joints in the human body. In essence, it is a technique to record a specific set of coordinates for each joint (head, torso, arm, etc.), which are marked as a KeyPoint that may be used to characterize a posture of a person. These points’ relationship is referred to as a pair. Not all points can form a pair since the relationship that develops between them needs to be meaningful. The initial goal of HPE is to create a skeleton-like model of the human body, which will subsequently be processed further for task-specific applications like human activity analysis, fitness tracking, augmented reality, etc. To estimate human pose, Toshev and Szegedy [13] used the CNN by switching from the classical-based approach to the deep learning-based approach, and they called it DeepPose: human pose estimation via deep neural networks (DNNs). They treated it as a CNN-related regression task toward human body joints. The writers also put forward an additional technique wherein they performed the cascade of such regressors to get even further precise and consistent results. They contended that the suggested DNN can model the specified data in a holistic way, i.e., the network has the capability to model hidden poses, which was not possible in the classical method.

4.1 MoveNet Model MoveNet [14] is a highly accurate and quick model that recognizes 17 body KeyPoints as shown in Fig. 5. The lightning and thunder variations of the model are available on TensorFlow Hub. Thunder is designed for applications that require great accuracy, and lightning is designed for latency-critical applications. On most modern computers, laptops, and phones, both versions run faster than real time (30 + FPS), which is critical for live fitness, health, and wellness applications. MoveNet is a bottom-up estimate algorithm that uses heatmaps to locate human KeyPoints accurately. There are two parts to the architecture: a feature extractor and a set of prediction heads. The prediction technique is based on Center Net, but with a few tweaks that increase speed and accuracy. The TensorFlow object detection API is used to train all of the models. The model relies on the following distance formula [14]: a=

2 √ (x1− x2 )2 + y1 − y2

where a is the distance between two points (x1 , y1) and (x2 , y2 )

Identifying Incorrect Postures While Performing Sun Salutation Using MoveNet

583

01 - Nose 02 - Left_Eye 03 - Right_Eye 04 - Left_Ear 05 - Right_Ear 06 - Left_Shoulder 07 - Right_Shoulder 08 - Left_Elbow 09 - Right_Elbow 10 - Left_Wrist 11 - Right_Wrist 12 - Left_Hip 13 - Right_Hip 14 - Left_Knee 15 - Right_Knee 16 - Left_Ankle 17 - Right_Ankle

Fig. 5 KeyPoints and their index

4.2 Pose Classification Using ANN In the architecture of our ANN, the first layer is a 128-node dense layer with ReLU activation function. This layer is followed by a 0.2 dropout layer. Another dense layer with 64 nodes and ReLU activation function follows. The next layer is a 0.2 dropout layer. They are followed by a Softmax-activation output layer.

5 Experiment 5.1 Dataset Preparation The dataset used in the experiment is a self-curated set of images of eight unique Sun salutation poses. For testing the model, a real-time video is captured through the computer’s Webcam using the Python-OpenCV module of the subject who is performing Sun salutations. For every fifth second, the frame is saved. Each of these captured frames are then resized as per the requirements of the MoveNet model. These processed frames would be then fed to the pose classification module which would output the pose number corresponding to one of the eight unique asanas.

584

S. Girase et al.

Table 2 Confusion matrix of pose estimation model Asana

Precision

Recalls

F1-score

Support

Pranamasna

1.00

1.00

1.00

8

Hasta Uttanasna

1.00

1.00

1.00

7

Hasta_Padasana

1.00

1.00

1.00

8

Ashwa sanchalanasana

1.00

1.00

1.00

9

Dandasana

1.00

1.00

1.00

9

Ashtanga Namaskarasana

1.00

0.89

0.94

9

Cobra pose

1.00

1.00

1.00

12

Downward facing dog

0.90

1.00

0.95

9

Accuracy

0.99

71

Macro average

0.99

0.99

0.99

71

Weighted average

0.99

0.99

0.99

71

5.2 Experimental Setup The experiment was performed on an Intel i5 processor and a 2 Gb Nvidia Mx150graphics card. Python 3.8 was used for running the system. Training and testing were divided into 80:20 ratio. Roughly, 40 images were used for training and around 10 images of each pose for testing our model. These are preprocessed for frame extraction as discussed earlier. To assess the performance of the model, a confusion matrix was used. It is displayed as a matrix and provides a comparison of actual and expected results as shown in Table 2. The N in an N x N matrix represents the number of classes or outputs. As a result, an 8 × 8 matrix for the 8 classes (8 unique poses in Sun salutation) was extracted. The confusion matrix is organized into four categories: true positive, false negative, false positive, and true negative. Accuracy metrics were referred for assessing the overall performance of the classifier. Other common metrics are precision, recall (formally named sensibility), and the Fl-score. Precision is the proportion of accurately predicted true instances among all predicted true instances, whereas recall is the fraction of true instances recovered from all predictions. The F1-score is calculated by taking the weighted average of precision and recall. At the beginning of training, the model is trained for 200 epochs with a learning rate of 1e-3. It is observed that the model starts to overfit after 40 epochs, and no significant improvement in the validation accuracy is offered after that as shown in Fig. 6. The precision and recall were also used to monitor the performance on validation data after every epoch as shown in Fig. 6. From the decreasing training loss as shown in Fig. 6, it is clear that the model is learning and overfitted. Early stopping is employed during training to cut down on time, and the learning rate is decreased as training goes on. To do this, the optimizer step is given the exponential decay function, with initial learning rate set to 1e-3, decay rate set to 0.9, and batch size set to 16. Reduced learning rate enhances the possibility of convergence. Early stopping has been added to the training process to halt it when

Identifying Incorrect Postures While Performing Sun Salutation Using MoveNet

a

b

c

585

d

Fig. 6 Graphical representation of model a precision; b recall; c accuracy; and d loss of overfitted model

a

b

c

d

Fig. 7 Graphical representation of model a precision; b recall; c accuracy; and d loss after fine tuning

the model performance begins to stall or when it begins to overfit the training data. It has been observed that for training up to 46 epochs are sufficient as after that there was no significant change observed. Also, the observed validation accuracy was 96% with precision as 0.9762 and a recall of 0.8913. After fine tuning, graphs were plotted as shown in Fig. 7 and later taken the live video feed from the computer’s Webcam as input to the model, and an almost real-time output is observed. Currently, the model can correctly detect as well as classify all the Sun salutation poses accurately; moreover, the classification model used in the study gives 96% accuracy. However, to further improve the model and achieve even better results, the MoveNet architecture can be configured. Multiple layers of the MoveNet architecture can be unfrozen and trained in order to better detect specific Sun salutation poses. Additionally, the dataset can be augmented to a few thousand images which includes people of all shapes, sizes, color, and ethnicity performing Sun salutations. This would allow the model to expand its feature pool without gaining high variance and would also avoid the problem of high bias.

6 Conclusion In this paper, an investigation toward the supervised pose classification problem was done. Various KeyPoint identification models were studied and used to model the problem of identifying incorrect postures while performing Sun salutation. Due to the unavailability of adequate dataset, self-made dataset is used on which the model was trained. The MoveNet model is used to identify the KeyPoints of the poses as it

586

S. Girase et al.

is faster and delivers better accuracy. Our model generates a buzzer if the subject’s posture is incorrect while performing Sun salutation. Also, it guides the subject as to which is the next asana to be performed and takes care of the order of the Sun salutation asana. To the best of our knowledge, this is the first system which guides users in an autonomous mode and also generates reports so that users can improve his/her yoga experience. In future, we intend to increase the dataset so that our architecture can be used for verification of any yoga pose. Also, along with the captions, voice over can be included that will make the model more dynamic and robust. For the health benefits, breathing inhalation and exaltation need to be monitored in an automated environment. We intend to include a module that will monitor the subject’s inhalation and exaltation patterns along with the audio for mantra chanting.

References 1. Lein D, Singh H, Kim S (2020) Are screening by yoga instructors and their practice patterns important to prevent injuries in yoga clients? Complement Ther Clin Pract 2. Kendall A, Grimes M, Cipolla R (2015) PoseNet: A convolutional network for real-time 6DOF camera relocalization. In: Proceedings of the IEEE international conference on computer vision, 2015 international conference on computer vision, ICCV 2015, 7410693, pp 2938–2946 3. Badrinarayanan B, Rao S, Bhaskar R, Kumar S (2012) A comparative study on performance analysis of sun-salutation using fast fourier transform, wavelet transform and Hilbert-Huang transform. Int J Sports Sci Eng 4. Agrawal Y, Shah Y, Sharma A (2020) Implementation of machine learning technique for identification of yoga poses. In: 2020 IEEE 9th international conference on communication systems and network technologies, pp 40–43. https://doi.org/10.1109/CSNT48778.2020.911 5758 5. Yadav S, Singh A, Gupta A, Raheja J (2019) Real-time yoga recognition using deep learning. Neural Comput Appl 6. Fazil R, De Silva B, Alawathugoda S, Nijabdeen S, Rupasinghe P, Liyanapathirana C (2020) Infinity yoga tutor : yoga posture detection and correction system. In: Proceedings of 5th international conference on information technology research 7. Huang X, Pan D, Huang Y, Deng J, Zhu P, Shi P, Xu R, Qi Z, He J (2021) Intelligent yoga coaching system based on posture recognition. In: Proceedings—2021 international conference on culture-oriented science and technology, pp 290–293 8. Jain H, Harit G (2016) A framework to assess sun salutation videos tenth indian conference on computer vision. Graphics Image Process (ICVGIP ‘16) 9. Kumar D, Sinha A (2020) Yoga pose detection and classification using deep learning. Int J Sci Res Comput Sci Eng Inf Technol 10. Verma M, Kumawat S, Nakashima Y, Shanmuganathan R (2020) Yoga-82: a new dataset for fine-grained classification of human poses 11. Long C, Jo E, Nam Y (2021) Development of a yoga posture coaching system using an interactive display based on transfer learning. J Supercomput 12. Sharmila G, Rajamohan K (2021) A systematic literature review on image preprocessing and feature extraction techniques in precision agriculture, congress on intelligent systems. In: Lecture notes on data engineering and communications technologies 114. Springer, Singapore. https://doi.org/10.1007/978-981-16-9416-5_24

Identifying Incorrect Postures While Performing Sun Salutation Using MoveNet

587

13. Toshev A, Szegedy C (2014) DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 14. Bajpai R, Joshi D (2021) MoveNet: a deep neural network for joint profile prediction across variable walking speeds and slopes. IEEE Trans Instrum Meas

Output Feedback Scheme-Based Network Synchronization of a Class of Discrete Time Systems in Chain and Ring Topology Ravi Kumar Ranjan, Bharat Bhushan Sharma, and Dipak J. Prajapati

Abstract The work presented in this paper deals with the synchronization problem of a class of nonlinear discrete time networked systems. Synchronization scheme is based on output feedback methodology derived in contraction framework for a network connected in chain and ring topology. Output feedback-based network synchronization schemes are well established by utilizing Lyapunov theory; however, contraction theory-based procedures are least explored. Effort is made here to develop a systematic procedure to derive an explicit structure of synchronizing output feedback controller based for nonlinear discrete systems using contraction theory for both chain and ring network configurations. Conditions for exponential synchronization between systems at each node of the network are derived in terms of controller gains. Theoretical results are verified by considering network of a chaotic Chua systems for numerical validation. Keywords Chaotic systems · Output feedback controller · Discrete time systems · Contraction theory · Complex networks

1 Introduction Control and synchronization of networked systems have seen increasing attention in past few decades due to their considerable practical applications in a variety of areas of human interest like electric power grids, communication systems, social networks, cellular networks, etc. To establish synchronization of these networks with identical or non-identical subsystems, the subsystems of network should be coupled. Similar to R. K. Ranjan (B) · B. B. Sharma Department of Electrical Engineering, National Institute of Technology Hamirpur, Hamirpur 177005, India e-mail: [email protected] B. B. Sharma e-mail: [email protected] D. J. Prajapati Government Engineering College, Modasa, Gujarat 383 315, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_46

589

590

R. K. Ranjan et al.

the two-system synchronization methodology, the adjacent subsystems of a network can be considered as master and slave systems. Synchronization of a master–slave subsystem pair is established first; subsequently, overall network synchronization is established. Synchronization of network is ensured to achieve a common goal as seen in many practical applications, like mobile communication, process industry, synchronization of lasers, neural networks, etc. To obtain synchronization, various control schemes were developed for continuous and discrete time systems, such as adaptive control [1, 2], backstepping design technique [3, 4], observer-based schemes [5, 6], using reduced order observer technique [7], sliding mode [8, 9], etc. All these techniques are mainly based on Lyapunov theory analysis. Lyapunov scheme is a tedious control approach as it requires quadratic energy function formulation of the system state vectors for analysis. Also, the analysis provides stability condition with respect to equilibrium point. Further, the formulation of additional conditions is required for non-autonomous stability analysis [10]. Recently introduced contraction theory [11–13]-based scheme has certain advantages over Lyapunov analysis scheme. Contraction theory characterizes necessary and sufficient conditions for incremental exponential convergence of multiple state vectors of nonlinear systems to a single trajectory. Several studies related to the notion of contraction theory for control and synchronization of continuous and discrete time nonlinear systems can be found in the literature, some of them can be found in [4, 14–17]. Usually, synchronization schemes for discrete systems utilize all the state vectors for controller formulation as found in the previous approaches. Output feedback-based network synchronization is limitedly explored, especially for discrete time network systems in contraction framework. Some pioneer contribution related to contraction-based network synchronization schemes using measurable state vectors can be found in [4, 18], but they are formulated for continuous time systems. The work presented here explores output feedback-based synchronization methodology derived in contraction framework for network synchronization of a class of discrete time nonlinear systems. All subsystems of the network are considered to be identical. The subsystems are coupled with each other utilizing only measurable state vectors. The proposed synchronization scheme is derived for chain network topology and further, extended for ring network topology. The derived methodology is verified using a network of chaotic Chua systems belonging to the proposed class of discrete time nonlinear systems. The manuscript is structured as follows: A brief introduction of the contraction theory approach is discussed in Sect. 2. Problem formulation for network synchronization is elaborated in Sect. 3. Sections 4 and 5 briefly describe proposed synchronization schemes for chain and ring network topology, respectively. Numerical simulations are presented in Sect. 6. Finally, the paper is concluded in Sect. 7.

Output Feedback Scheme-Based Network Synchronization of a Class …

591

2 Contraction Theory for Discrete Time Systems Contraction theory-based approach is recently developed tool for evaluating the convergence behaviour between two nearly arbitrary trajectories of a system defined in the state space. Contraction theory results are utilized for both continuous time and discrete time dynamical systems [11–13, 19]. To briefly review these results, let us consider discrete time nonlinear system as x(k + 1) = {(x(k), k)

(1)

For considered system in Eq. (1), δx(k) is the infinitesimal increment in state vector x(k); thus, the dynamics in differential framework can be obtained as δx(k + 1) =

∂{(x(k), k) δx(k) ∂x(k)

(2)

The virtual squared length between the neighbouring trajectories of systems is evaluated as δxT (k + 1)δx(k + 1) = δxT (k)

∂{T (x(k), k) ∂{(x(k), k) δx(k) ∂x(k) ∂x(k)

(3)

For system in Eq. (1), the state trajectories will exponentially converge to single trajectory if following conditions holds: ∂{T (x(k), k) ∂{(x(k), k) − I ≤ −βI ≤ 0 ∂x(k) ∂x(k)

(4)

For generalization, consider a transformation (x(k), k) so that δZ(k) = (x, k)δx(k)

(5)

The squared distance between system trajectories in transformed domain comes out to be δZ T (k + 1)δZ(k + 1) = δxT (k)

∂{(x(k), k) ∂{T (x(k), k) T k+1 k+1 δx(k) (6) ∂x(k) ∂x(k)

where k+1 = (x(k + 1), k + 1). The squared distance represented above is generalized as δZ T (k + 1)δZ(k + 1) = δZ T (k)J T (k)J (k)δZ(k)

(7)

592

R. K. Ranjan et al. T

T ∂{ (x(k),k) −1 where J (k) = k+1 k is generalized Jacobian matrix. ∂x(k) Using Eq. (7), following definition for contracting region for discrete time system can be given:

Definition 1 For discrete time system represented in Eq. (1), the state space is said to be a contracting region with respect to a uniformly negative definite (UND) metT k+1 , if for a scalar β > 0, ∀x(k), ∀k, J (k) satisfies the ric M(x(k), k) = k+1 following inequality: J (k)T (k)J (k) − I ≤ β I < 0 T

T ∂{ (x(k),k) −1 where J (k) is the generalized Jacobian matrix, such that J (k)=k+1 k . ∂x(k) Symmetric part of the square matrix can be given as

∂{T (x(k), k) ∂{(x(k), k) Mk+1 − Mk ≤ βMk < 0 ∂x(k) ∂x(k) Lemma 1 Given the discrete time nonlinear system as x(k + 1) = {(x(k), k), any trajectory of the system, which starts in a ball of constant radius with respect to the metric Mk , centred at a given trajectory and contains all time in a contraction region with respect to Mk , remain all times in that ball and converges exponentially to that trajectory [11].

3 Problem Formulation The mathematical description of the proposed class of discrete time nonlinear system is given as x(k + 1) = Ax(k) + F(x(k)) y(k) = Cx(k)

(8)

Here, x(k) ∈ R p represents state vector with parameter matrix A ∈ R p× p , and vector function F ∈ R p represents vector function of associated system nonlinearity. y ∈ Rr represents measurable state vector of the system, and matrix C ∈ Rr × p is the constant matrix. For a general practical situation p > r . The nonlinear function F(x(k)) further satisfies the following assumption: Assumption 1 For the proposed class of discrete time nonlinear systems with mathematical description given in Eq. (8), the nonlinear function F(x(k)) can be decomposed as: F(x(k)) = f(y(k))x(k) where nonlinear function f ∈ R p× p .

Output Feedback Scheme-Based Network Synchronization of a Class …

593

Using Assumption 1, the system description in (8) yields x(k + 1) = Ax(k) + f(y(k))x(k)

(9)

The main aim of this manuscript is to develop output feedback-based synchronizing controller in contraction framework for output coupled network in two different configurations, viz. i. Chain network of n-systems ii. Ring network of n-systems. All the subsystems of network are identical with dynamics as represented in (9), for both chain and ring configurations. Schematic diagrams of both the schemes are shown in Fig. 1a, b.

4 Chain Network of N-Systems Output feedback-based synchronization of chain network using contraction approach is elaborated in this section. Synchronization of the subsystems at each node of the network is done by utilizing the measurable state vector of the preceding system. Interconnections between the subsystems are shown in Fig. 1a, where xn (k) represents the n-th system of network. A common controller gain L is utilized for synchronization.

Fig. 1 a Chain network topology and b ring network topology

594

R. K. Ranjan et al.

The dynamics of the chain network of n-systems can be represented as x1 (k + 1) = Ax1 (k) + f(y1 (k))x1 (k) x2 (k + 1) = Ax2 (k) + f(y2 (k))x2 (k) + U(x1 (k)) − U(x2 (k)) x3 (k + 1) = Ax3 (k) + f(y3 (k))x3 (k) + U(x2 (k)) − Ux3 (k)) .. .. . = . xn (k + 1) = Axn (k) + f(yn (k))xn (k) + U(xn−1 (k)) − U(xn (k))

(10)

Here, xi (k) ∈ R p for index i = 1, 2, 3, · · · , n; represents i-th system of network with coupling force U(xi−1 (k)) − U(xi (k)); for i = 1, 2, 3, · · · , n. For the network represented in (10), it is considered that adjacent systems are one-way coupled with the output variables only. Therefore, coupling force for n-th system can further be rewritten as (11) U(xi−1 (k)) − U(xi (k)) = L(yi−1 (k) − yi (k)) The output feedback-based controller gain L of size ( p × r ) is derived using contraction-based approach, and the result is presented in the form of Theorem 1. Theorem 1 For any i-th subsystem of network described in (10), if a discrete time virtual system is defined such that the virtual system with state dynamics v(k) ∈ R p is contracting subject to the following conditions: (i) The Jacobian matrix J of virtual system dynamics with respect to its state vector is UND. (ii) The Jacobian matrix J can be made UND with a particular choice of output feedback gain L such that J T J − I < 0 for J = [A + f(yi (k)) − LC]

(12)

Further, if i-th subsystem is contracting, then all the subsystems of the network (10) will be exponentially synchronized irrespective to the choice of their initial conditions. Proof he i-th system given in Eq. (10) with coupling force given in Eq. (11) comes out to be xi (k + 1) = Axi (k) + f(yi (k))xi (k) + L(yi−1 (k) − yi (k))

(13)

Virtual system for (13) can be considered as vi (k + 1) = Avi (k) + f(yi (k))vi (k) + L(yi−1 (k) − yvi (k)) where yvi (k) = Cvi (k) is the output of i-th virtual system.

(14)

Output Feedback Scheme-Based Network Synchronization of a Class …

595

For virtual increment δvi (k) in state vector vi (k), the virtual dynamics in (14) reduces to (15) δvi (k + 1) = J δvi (k) where, Jacobian matrix J = [A + f(yi (k)) − LC]. According to contraction theory for discrete systems, contracting nature of (14) is ensured if largest eigenvalue of Jacobian matrix J remains less than unity uniformly, viz. P = J T J − I < 0, if controller gain L is suitably selected. A suitable value of output feedback gain L can be properly selected by the designer for the desired system performance. The contracting nature of virtual system (14) ensures its particular solution (13) is also contracting. For i-th subsystem, if [Lyi−1 (k)] is considered as input, then replacing yi (k) by yi−1 (k), in dynamics of i-th system, the uncoupled system dynamics of (i − 1)-th system is obtained. Thus, (i − 1)-th system is a particular solution of i-th system. Therefore, derived observer gain L guarantees contracting nature of (i − 1)-th subsystem as well. Subsequently, proceeding in the chain network, overall subsystems are proved to be contracting, i.e. all the subsystems of network will get synchronized exponentially. This completes the proof.

5 Ring Network of N-Systems The proposed scheme for network synchronization is now extended for the case when subsystems in network are coupled in ring. It is considered that all the network links are unidirectional. As considered in the previous section, the network links are coupled only with measurable states. Mathematically, these systems can be expressed as x1 (k + 1) = Ax1 (k) + f(y1 (k))x1 (k) + U(xn (k)) − U(x1 (k)) x2 (k + 1) = Ax2 (k) + f(y2 (k))x2 (k) + U(x1 (k)) − U(x2 (k)) x3 (k + 1) = Ax3 (k) + f(y3 (k))x3 (k) + U(x2 (k)) − U(x3 (k)) (16) .. .. . = . xn (k + 1) = Axn (k) + f(yn (k))xn (k) + U(xn−1 (k)) − U(xn (k)) Here the coupling force of i-th system is considered as U(xn−1 (k)) − U(xn (k)) = L[(yn−1 (k) − yn (k))]

(17)

where index i = 1, 2, 3, · · · , n, is computed circularly. Theorem 2 For any i-th subsystem of network described in (16), if a discrete time virtual system vi (k) ∈ R p , is defined for which the Jacobian matrix

596

R. K. Ranjan et al.

J = [A + f(yi (k)) − LC]

(18)

of virtual system dynamics with respect to its state vector is UND with a suitable choice of output feedback gain L such that J T J − I < 0; then the contracting nature of i-th subsystem guarantees exponential synchronization of network described in (16), irrespective of the initial conditions of the systems. Proof The i-th subsystem of ring network in Eq. (16) using coupling strength considered in (17) can be given as xi (k + 1) = Axi (k) + f(yi (k))xi (k) + L[yi−1 (k) − yi (k)]

(19)

Let i-th discrete time virtual system for equation (19) is defined as vi (k + 1) = Avi (k) + f(yi (k))vi (k) + L[yi−1 (k) − yvi (k)]

(20)

For above equation, introducing virtual increment of δvi (k) in state vector vi (k), one can get (21) δvi (k + 1) = J δvi (k) where Jacobian matrix comes out to be J = [A + f(yi (k)) − LC]

(22)

The virtual system in Eq. (20) is contracting with proper selection of output feedback gain L such that J T J − I < 0. The i-th subsystem in Eq. (19) is a particular solution of virtual system (20), which implies that contacting nature of virtual system (20) ensures i-th subsystem (19) is also contracting. Moreover, if any i-th subsystem of the network (16) is contracting, it guarantees the exponential synchronization of all the subsystems in the network, as discussed in detail in Theorem 1.

6 Numerical Simulations Numerical simulations are executed in this section to justify the efficacy of proposed control scheme in the previous section. Discrete time chaotic Chua systems with cubic nonlinearity belonging to the proposed class of nonlinear systems is considered as an example to elaborate the simulation results. The mathematical description of three-dimensional discrete time Chua system is given as

Output Feedback Scheme-Based Network Synchronization of a Class …

597

⎡

⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎤⎡ x1 (k + 1) −αc α 0 x1 (k) −αr y12 (k) 0 0 x1 (k) ⎣x2 (k + 1)⎦ = ⎣ 1 −1 1⎦ ⎣x2 (k)⎦ + ⎣ 0 0 0⎦ ⎣x2 (k)⎦ 0 β 0 x3 (k + 1) x3 (k) x3 (k) 0 00 y1 (k) = 1 0 0 x(k) (23) ⎡ ⎤ ⎤ −αr y12 (k) 0 0 −αc α 0 0 0 0⎦ and C = 1 0 0 . where A = ⎣ 1 −1 1⎦ , f(y 1 (k)) = ⎣ 0 β 0 0 00 The Chua system exhibits chaotic behaviour for the parameter settings are chosen 4 , c = −1 and β = 100 , respectively. As nonlinear chaotic system as, α = 9.5, r = 63 7 7 value of states remain bounded, thus, y1 (k) ≤ |γmax |, where γmax is the maximum 2 . state vector y1 (k). Therefore, in present case y12 (k) min = 0 and y12 (k) max = γmax ⎡

6.1 Numerical Simulations for Chain Network The numerical simulation is performed in this section for the case when network is connected in one-way chain configuration. Six output-coupled chaotic Chua systems are considered to be connected in the network. System parameters of all the subsystems are presumed to be identical. The first member of network is given in (23), and rest of members in the network are expressed as ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ x3i+1 (k) x3i+1 (k + 1) x3i+1 (k) L1 ⎣x3i+2 (k + 1)⎦ = A ⎣x3i+2 (k)⎦ + f(yi+1 (k)) ⎣x3i+2 (k)⎦ + ⎣ L 2 ⎦ yi (k) − yi+1 (k) L3 x3i+3 (k + 1) x3i+3 (k) x3i+3 (k) ⎡

with measurable states as yi+1 (k) = Cxi+1 (k)

(24) where index i = 1, 2, 3, 4, 5. The i-th virtual system of (24) is considered as ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ v3i+1 (k) v3i+1 (k + 1) v3i+1 (k) L1 ⎣v3i+2 (k + 1)⎦ = A ⎣v3i+2 (k)⎦ + f(yi+1 (k)) ⎣v3i+2 (k)⎦ + ⎣ L 2 ⎦ yi (k) − yv ,(i+1) (k) L3 v3i+3 (k + 1) v3i+3 (k) v3i+3 (k) ⎡

(25) where yv ,(i+1) = Cvi+1 is the output of (i + 1)-th Chua system. Introducing virtual increment δvi (k) in state vector vi (k), the Jacobian for above equation comes out to be

598

R. K. Ranjan et al.

⎡ ⎤ ⎤ ⎡ 2 (k) − L α 0 A B C −α c + r yi+1 1 ⎢ ⎥ T 2 2 J=⎣ 1 − L2 −1 1⎦ =⇒ J J − I < 0 = ⎣ B β + α −1⎦ < 0 C −1 0 β 0 −L 3

2 2 2 where, A = L 1 + α r yi+1 (k) + c + (L 2 − 1) + L 3 − 1, B = −α L 1 + α 2 r yi+1 (k) + c + L 2 − L 3 β − 1 and C = 1 − L 2 . Controller gains L 1 , L 2 and L 3 are selected for negative definiteness condition, i.e. JT J − I < 0. Following inequalities should hold to ensure uniform negative definiteness of (JT J − I) A < 0;

B 2 − A β 2 + α 2 < 0; A + 2BC + C 2 β 2 + α 2 < 0

For simulation, controller gains are suitability selected as L 1 = 1.25, L 2 = 1 and L 3 = 0, which are activated at 5-th step. Initial conditions of five subsystems are T T T T taken as -0.2 -0.5 0.2 ; -0.7 -0.1 0.9 ; -0.8 -0.4 0.6 ; -0.3 -0.2 0.31 ; T and -0.6 -0.47 0.1 , respectively. Simulation is run for 50 steps with step size 0.01, and the synchronization plots are shown in Fig. 2. Synchronization for first, second and third states of respective subsystems is shown in Fig. 2a–c, respectively. T Cumulative synchronization error E = E 1 E 2 E 3 is computed as: 1 1 (x3i−2 −x3i+1 ); E 2 = (x3i−1 − x3i+2 ) ; n − 1 i=1 n − 1 i=1 n−1

E1 =

n−1

1 E3 = (x3i − x3i+3 ) n − 1 i=1 n−1

(26)

Here, six subsystems are present in the network, so n = 6. Cumulative errors exponentially converge to zero as shown in Fig. 2d, evidently, and corresponding states of all the subsystems are synchronized so overall chain network is synchronized.

6.2 Numerical Simulations for Ring Network In this section, numerical simulations are performed for output-coupled ring network of chaotic Chua systems with cubic nonlinearity. The mathematical expression of i-th subsystem of network can be given as ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ x3i−2 (k) x3i−2 (k + 1) x3i−2 (k) L1 ⎣x3i−1 (k + 1)⎦ = A ⎣x3i−1 (k)⎦ + f(yi (k)) ⎣x3i−1 (k)⎦ + ⎣ L 2 ⎦ yi−1 (k) − yi (k) L3 x3i (k + 1) x3i (k) x3i (k) ⎡

with measurable states as yi (k) = Cxi (k).

(27)

Output Feedback Scheme-Based Network Synchronization of a Class …

599

Fig. 2 Synchronization of six Chua systems connected in chain configuration: a–c Synchronization behaviour of corresponding states of subsystems and d variation of cumulative synchronization error of corresponding states.

where index i = 1, 2, . . . , 6, computed circularly. Defining the i-th virtual system dynamics for (27) as ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ v3i−2 (k) v3i−2 (k + 1) v3i−2 (k) L1 ⎣v3i−1 (k + 1)⎦ = A ⎣v3i−1 (k)⎦ + f(yi (k)) ⎣v3i−1 (k)⎦ + ⎣ L 2 ⎦ yi−1 (k) − yvi (k) L3 v3i (k + 1) v3i (k) v3i (k)

(28) Introducing virtual increment δvi (k) in state vector vi (k) yields the Jacobian as ⎡ ⎡ ⎤ ⎤ A B C −α c + r yi2 (k) − L 1 α 0 −1 1⎦ =⇒ JT J − I = ⎣ B β 2 + α 2 +1⎦ < 0 1 − L2 J=⎣ C −1 0 β 0 −L 3 L 3 are JT J − I < 0. Controller gains L 1 , L 2 and selected to2ensure 2 2 Here, A = L 1 + α r yi (k) + c + (L 2 − 1) + L 3 − 1, B = −α L 1 + α r yi2 (k) + c + L 2 − L 3 β − 1 and C = 1 − L 2 . Following inequalities ensure uniform negative definiteness of (JT J − I) matrix: A < 0; B 2 − A β 2 + α 2 < 0; A + 2BC + C 2 β 2 + α 2 < 0 The controller choice, its activation time, simulation run time and initial parameters settings are kept same as for chain network synchronization presented in the previous section.

600

R. K. Ranjan et al.

Fig. 3 Synchronization of six Chua’s system connected in ring configuration: a–c Synchronization behaviour of corresponding states of subsystems and d Variation of cumulative synchronization error of corresponding states.

Simulation plots for ring network of discrete time Chua systems are shown in Fig. 3. Synchronization behaviour of corresponding states is shown in Fig. 3a–c. Cumulative synchronization errors are calculated as given in Eq. (26), which are enumerated in Fig. 3d. Simulation plots show that exponential synchronization of network is achieved in nearly 15 steps.

7 Conclusions In the present work, network synchronization for a class discrete time nonlinear systems is addressed. Contraction-based output feedback controller for synchronization is proposed for the network with coupling through measurable states only. The proposed methodology draws parallel to the Lyapunov-based synchronization of network of discrete time systems in contraction framework. The synchronization methodology is analysed for chain and ring network topology. Suitable selection of controller or coupling gains to establish JT J − I < 0 ensures synchronization of corresponding states. For this paper, concept of virtual system stability related to contraction framework is utilized. Theoretical results are verified with networks of chaotic Chua systems having cubic nonlinearity. Derived results can be further explored for more complex network configurations.

Output Feedback Scheme-Based Network Synchronization of a Class …

601

References 1. Zhang J (2004) Adaptive neural network control of discrete-time nonlinear systems, Ph.D. thesis, PhD thesis 2. Dousseh YP, Monwanou AV, Koukpémèdji AA, Miwadinou CH, Chabi Orou JB (2022) Dynamics analysis, adaptive control, synchronization and anti-synchronization of a novel modified chaotic financial system. Int J Dyn Control 1–15 3. Anand P, Sharma BB (2021) Finite-time stabilization of a general class of nonlinear systems using lyapunov based backstepping procedure. In: Innovations in power and advanced computing technologies (i-PACT). IEEE, pp 1–6 4. Chauhan Y, Sharma B, Ranjan RK (2021) Synchronization of nonlinear systems in chain network configuration with parametric uncertainty. In: IEEE 18th India Council International Conference (INDICON). IEEE, pp 1–6 5. Ranjan RK, Sharma BB, Chauhan Y (2021) Stabilization of a class of chaotic systems with uncertainty using output feedback control methodology. In: 2021 IEEE 6th international conference on computing, communication and automation (ICCCA), pp 533–538 6. Boubakir A, Labiod S (2022) Observer-based adaptive neural network control design for projective synchronization of uncertain chaotic systems. J Vib Control 10775463221101935 7. Wang L, Chen CLP (2021) Reduced-order observer-based dynamic event-triggered adaptive nn control for stochastic nonlinear systems subject to unknown input saturation. IEEE Trans Neural Netw Learn Syst 32(4):1678–1690 8. Zhang H (2012) Chaos synchronization between two different hyperchaotic systems with uncertain parameters. In: Advances in electrical engineering and automation. Springer, Heidelberg, pp 389–394 9. Lü L, Yu M, Li C, Liu S, Yan B, Chang H, Zhou J, Liu Y (2013) Projective synchronization of a class of complex network based on high-order sliding mode control. Nonlinear Dyn 73(1):411– 416 10. Tsukamoto H, Chung S-J, Slotine J-JE (2021) Contraction theory for nonlinear stability analysis and learning-based control: a tutorial overview. Ann Rev Control 52:135–169 11. Lohmiller W, Slotine J-JE (1998) On contraction analysis for non-linear systems. Automatica 34(6):683–696 12. Lohmiller W (1999) Contraction analysis of nonlinear systems, Ph.D. thesis, Department of Mechanical Engineering, MIT 13. Lohmiller W, Slotine J-J (2000) Control system design for mechanical systems using contraction theory. IEEE Trans Autom Control 45(5):984–989 14. DeLellis P, di Bernardo M, Gorochowski TE, Russo G (2010) Synchronization and control of complex networks via contraction, adaptation and evolution. IEEE Circuits Syst Mag 10(3):64– 82 15. Li Q, Shen B, Wang Z, Huang T, Luo J (2018) Synchronization control for a class of discrete time-delay complex dynamical networks: a dynamic event-triggered approach. IEEE Trans Cybernetics 49(5):1979–1986 16. Sharma BB, Kar IN (2009) Recursive algorithm based controller design for a class of discrete time chaotic systems. In: 7th Asian control conference. IEEE, pp 1162–1167 17. Sharma BB, Kar IN (2011) Stabilization and tracking controller for a class of nonlinear discretetime systems. Chaos Solitons Fractals 44(10):902–913 18. Sharma BB, Kar IN (2011) Observer-based synchronization scheme for a class of chaotic systems using contraction theory, nonlinear dynamics, vol 63, issue 3, pp 429–445 19. Jouffroy J (2003) A simple extension of contraction theory to study incremental stability properties. In: European control conference (ECC). IEEE, pp 1315–1321

Contrast Enhancement of Medical Images Using Otsu Thresholding Kurman Sangeeta , Modalavalasa Divya, and Bammidi Divyajyothi

Abstract Low variation between medical pictures does not produce accurate interpretations during analysis or by medical professionals or by machines in an automated diagnostic system. Contrast enhancement technique outcomes in high contrast images to increase the visibility and interpretability. We proposed a hybrid method for contrast enhancement that uses a thresholding by Otsu as the fundamental procedure for binarization and one of the sub-image histograms is histogram equalized while using CLAHE to equalize the other portion. The results are compared with state-of-the-art contrast enhancement techniques such as MMBEBHE and BBHE and evaluation based on various quantitative image quality assessment metrics such as structural similarity index measurement (SSIM), average mean brightness error (AMBE), and mean absolute error (MAE), FSIM and entropy. Keywords Clipped adaptive histogram equalization (CLAHE) · Average mean brightness error (AMBE) · Structural similarity index metric (SSIM) · Feature similarity index metric (FSIM)

1 Introduction Medical imaging aids in visualization of internal body parts for analysis by medical professionals. But due to drawbacks in the acquisition process, the images are in poor quality, low contrast, noisy, blurry, and distorted. Contrast enhancement techniques obtain high contrast images for better visualization and interpretability required for accurate use of a computer to analyse aided diagnostic systems. It minimizes noise, distortions, and enhances the image quality. The difficulty lies in the fact that enhancement in low contrast and illumination is needed, provided the original image details such as brightness are preserved. The enhanced image leads to correct segmentation and feature extraction and hence accurate diagnosis. The literature study suggests various contrast enhancing methods in general, it may be divided into direct and K. Sangeeta (B) · M. Divya · B. Divyajyothi CSM Department, AITAM, Tekkali, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_47

603

604

K. Sangeeta et al.

indirect ways. Contrast measure specification and enhancement methods are direct approaches. Indirect methods involve methods such as modification of high and low frequency signals of an image as in homomorphic filtering, histogram modification, and histogram transformation-based methods. Out of all these techniques, histogram modification methods such as histogram equalization, histogram matching, and other techniques are most common for its low cost of calculation and speedy execution. Transformation using histogram equalization maps the original image’s histogram to a near approximate consistent distribution. Histogram equalization (HE) is affected by intensity saturation and as a result few undesirable visual artefacts can be seen in the output image. Thus, HE does not perform better in preserving the image brightness. Kim [1] demonstrates an image segmentation-based approach BBHE, or brightness preserving bi-histogram equalization, is a technique where the input original image histogram is partitioned at the mean intensity value and each sub-image histogram is equalized so that brightness is preserved. Yu [2] presented a Dualistic Sub Image Histogram Equalization (DSIHE), a novel technique that uses the median, and has been presented rather than mean strength for partitioning the histogram. Both BBHE and DSIHE have brightness shifting and intensity saturation issues. Recursively segmenting the input picture using the local mean to establish the threshold value is the Recursive Mean Separate Histogram Equalization (RMSHE) method proposed by Soong and Ramli [3]. The brightness is relatively preserved better than BBHE. This technique results in over improvement of low contrast areas and also determining the best value for the looping step count r, is a challenging task. Soong and Ramli suggested minimum mean brightness error bi-histogram equalization [4] where the histogram partitioning uses the intensity value that results in the least value of absolute mean brightness error. Sim and Tan [5] proposed Recursive Sub Image Histogram Equalization (RSIHE), a modified DSIHE, a variation of RMSHE in terms of recursive technique that partitions the histogram using median intensity level as the threshold. The approach was more effective than DSIHE in keeping the information and brightness. Ibrahim and Pik Kong suggested Brightness Preserving Dynamic Histogram Equalization (BPDHE) [6] consists of smoothening of the input histogram using a one-dimensional Gaussian filter before being divided using local maxima as the threshold value. Gautam and Tiwari [7] presented range limited bi-histogram equalization with adaptive gamma correction (RLBHE with AGC) is a technique that does partitioning using Otsu’s single threshold. According to its experimental findings, it enhances poor contrast images more effectively than RLBHE with AGCWD. However, segmenting multi-objective photos with complicated backgrounds with this method is unsuccessful. Xu et al. [8] suggested Range Limited Double Threshold Weighted Histogram Equalization (RLDTWHE that applied double thresholding followed by histogram equalization on a range of grey levels. Rani and Agarwal [9] proposed a novel and a successful statistical-based method Range Limited Double Threshold Weighted Histogram Equalization (RLDTWHE). This strategy integrates adaptive gamma correction, weighted distribution, homomorphic filtering, and Otsu’s double thresholding, dynamic range stretching. Thepade and Pardhi [10] suggested a linear weighted fusion of Contrast Limited Adaptive HE (CLAHE) with colour restoration technique and brightness preserving dynamic HE (BPDHE). Lather

Contrast Enhancement of Medical Images Using Otsu Thresholding

605

and Singh [18] presented a novel technique of enhancing contrast of MRI images of brain. They applied minimum mean brightness error bi-histogram equalization (MMBEBHE) approach for contrast enhancement and wiener and bilateral filter for denoising. In order to assess the outcomes of the suggested method, speckle and Gaussian noise were added. Acharya and Kumar [19] presented a novel approach image sub-division and quadruple clipped adaptive histogram equalization (ISQCAHE) for improving low-exposure pictures. The suggested solution entails computing the histogram, which comprises a novel way of picture sub-division, an improvement regulating mechanism, a modified probability density function (PDF), and histogram equalization (HE). The proposed technique uses Otsu global thresholding for segmenting the original grey image which ensures maximum between-class variance in the resultant image. Each sub-image is processed independently to consider the local features also. The sub-image with grey levels between 0 and t is then histogram equalized while the sub-image with grey levels between t + 1 and maximum intensity level, here 256, is equalized using Contrast Limited Adaptive Histogram Equalization (CLAHE), thereby preserving brightness.

1.1 Organization of the Paper This paper is organized into five sections. Section 2 describes technical background details and image quality assessment methods used in our proposed method. Section 3 discusses the suggested enhancement technique. Section 5 summarizes the proposed method in the conclusion section.

2 Technical Background 2.1 Otsu Thresholding [11] By using Otsu’s method, automatically determine the best method for visualization in binary. Binarization of a picture, frequently, includes two processes at work: determining a grey threshold based on some objective criterion and classifying each pixel as either foreground or background. The foreground class is composed of pixels whose intensity lies above the set threshold, otherwise belonging to the background [12]. Otsu’s method selects the threshold that reduces within-class variance or maximizes variation between classes and makes use of variance. The total variance is given in Eq. (1). Total variance = Sum of the within-class and interclass variances. σ2 w(t) = w0 (t) σ02 (t) + w1 (t) σ12 (t)

(1)

606

K. Sangeeta et al.

where the variance of two classes is represented by σ0 (t) and σ1 (t), whilst the probability of the pixels count for class 1 and class 2 at threshold t is represented by w0 (t) and w1 (t), respectively. Ptotal = Count of pixels in an image, P0 (t) = Class 1 pixels count at threshold t, P1 (t) = Class 2 pixels count at threshold t. So, the weights are given in Eqs. (2) and (3). w0 (t) = P2 (t)/Ptotal

(2)

w0 (t) = P1 (t)/Ptotal

(3)

The variance can be computed using the below formula in Eq. (4). σ 2 (t) =

(xi − x)2 /(N − 1)

(4)

where xi is the grey value of ith pixel in class, x is the means of pixel values in the class. N is the number of pixels.

2.2 CLAHE [13] CLAHE limits the maximum number of pixels in a histogram bin by clipping the histogram using a clip limit parameter so that the clipped pixels are redistributed equally to all the histogram bins. This makes the histogram count the same. The parameters of this algorithm which influence the output are clip limit and size of the contextual region.

2.3 Evaluation Metrics for Image Quality Image quality assessment Image quality assessment (IQA) is primarily split into two categories: evaluations based on research references and evaluations without references. The primary distinction is that, in order to assess the differences between pictures, reference-based approaches require a high-quality image as a source. Ex: Structural Similarity Index (SSIM) In no-reference image quality evaluation, the algorithm just gets the distorted picture whose quality is being evaluated rather than a base image to determine image quality.

Contrast Enhancement of Medical Images Using Otsu Thresholding

607

Reference-Based Measures of Image Quality Peak Signal-to-Noise Ratio (PSNR) Peak signal-to-noise ratio (PSNR) with reference to the source image, determines the picture A’s peak signal-to-noise ratio. Greater image quality is indicated by a higher PSNR score. PSNR is a term used to describe the relationship between the maximum allowed value (power) of a signal and the power of distorted noise that reduces the signal’s ability to be accurately represented. Since many signals have a very wide dynamic range, the PSNR is often expressed in terms of the logarithmic decibel scale (ratio between the biggest and lowest possible values of a variable) is given by Eq. (5).

Max f PSNR = 20 √ MSE

(5)

Absolute Mean Brightness Error (AMBE) It is suggested that the goal measurement be used to assess how well the original brightness was maintained. Absolute mean brightness error (AMBE), often known as absolute variance between the input and output images, is defined as AMBE = E(X) – E(Y). The input and output images are indicated by X and Y, respectively. Lower AMBE means that the brightness has been kept better. AMBE is the difference in value between the means of the input and output images. Structural Similarity Index Metric (SSIM) [14] The structural similarity index metric (SSIM), which is based on the human vision system’s capacity to extract scene structural information, may assess features including sharpness, contrast, brightness. Feature Similarity Index Metric (FSIM) [15] The subjective fidelity scores do not correspond well with traditional measurements like the peak signal-to-noise ratio (PSNR) and mean squared error (MSE), which act directly on the intensity of the picture. As a result, several efforts have been undertaken to build a human vision system based on IQA measurements. These models stress the significance of the HVS sensitivity to various visual signals, including brightness, contrast, frequency content, and the interplay between various signal components. Entropy The Shannon entropy, often known as the entropy, is a metric for the level of uncertainty associated with a random variable. It quantifies the image’s information, often in units like bits. Formally, it is described by given in Eq. (6).

608

K. Sangeeta et al.

H(x) =

n i=1

p(xi )I (xi ) = −

n

p(xi ) log10 I (xi )

(6)

i=1

A better-quality picture is one with a greater entropy.

3 Proposed Method The original image needs to be segmented in order to deal with the local features. For the purpose of segmenting the histogram of an input picture, the suggested approach makes use of the benefits of Otsu global thresholding. To improve the contrast, each sub-image is treated separately. The sub-image with grey levels between 0 and t is histogram equalized while the sub-image with grey levels between t + 1 and maximum intensity level, here 256, is equalized using Contrast Limited Adaptive Histogram Equalization (CLAHE), thereby preserving brightness. The model architecture is shown in Fig. 1. The proposed algorithm is implemented using MATLAB. The images considered are from Medical Images Database [17]. Algorithm: 1. The given low contrast image is changed to a grayscale version. 2. The picture is divided into two sub-images using Otsu thresholding based on a maximum class variance threshold value, let us say t. Fig. 1 Model architecture

Input image

Otsu Thresholding

Subimage1

Subimage2

HE

CLAHE CLAHE

Output image

Contrast Enhancement of Medical Images Using Otsu Thresholding

609

Table 1 Evaluation parameters for comparing picture quality of cervical cancer image Metrics\method

MMBEBHE

BBHE

Proposed method

Entropy

5.0476

5.5741

5.3990

AMBE

44.0458

44.0828

34.3150

FSIM

0.5235

0.5165

0.6016

MSE

8.1265e + 03

5.1067e + 03

3.5232e + 03

PSNR

9.0318

11.0494

12.6614

SNR

7.4751

9.4927

11.1047

SSIM

0.2895

0.3249

0.4350

3. Each sub-image can be processed independently using a combination of different contrast enhancement techniques. The sub-image with grey levels between 0 and t is histogram equalized while the sub-image with grey levels between t + 1 and the greatest intensity level (here 256) is equalized with CLAHE. 4. The resultant output image is compared to cutting-edge methods brightness preserving bi-histogram equalization (BBHE) and minimum error mean brightness bi-histogram equalization (MEMBBHE) (MMBEBHE in Table 1). The output and input images are compared using PSNR, AMBE, Entropy, SSIM, FSIM image quality assessment (IQA) metrics.

4 Result Analysis 4.1 Cervical Cancer [16] A single cell cervical cancer picture from the Herlev Pap-Smear dataset [16] is used to assess the suggested approach. The values of various metrics against each of the contrast techniques is shown in Table 1. Figure 2a shows the original grey picture and its histogram while Fig. 2b shows the final picture and its histogram, respectively. According to the results as shown in above Table 1, our suggested technique when compared to the benchmark contrast enhancement approaches, reveals substantial results. The AMBE of the resultant image using the proposed approach are considerably reduced. FSIM, PSNR values are better compared to benchmark techniques MMBEBHE and BBHE. The entropy is better than MMBEBHE and closer to BBHE technique. By adjusting the CLAHE algorithm’s parameters, the efficacy of the recommended approach is determined. The tile size and clip limit are changed during the experiment, and the values of the picture quality metrics are noted as shown in Table 2.

610

K. Sangeeta et al.

Fig. 2 (a) Input image and its histogram (b) output image and its histogram Table 2 Comparison of image quality assessment metrics with different tile size and clip limit Metrics

CL = 0.3 and tile size = 15 CL = 0.09 and tile size = CL = 0.2 and tile size = 19 × 15 15 × 15 × 19

Entropy

5.1530

5.7295

5.3679

AMBE

22.7558

49.6619

34.9608

FSIM

0.6135

0.6026

0.5983

PSNR

14.1727

10.9857

12.5710

SNR

12.6161

9.4291

11.0143

SSIM

0.4632

0.4155

0.43170

Contrast Enhancement of Medical Images Using Otsu Thresholding

611

4.2 Breast Digital X-ray Image [17] Another experiment is conducted with the Breast digital X-ray image [17] as input image for the proposed approach and the parameters for the CLAHE algorithm is set as [5 5] for tile size and clip limit as 0.09. Figure 3a shows the original grey picture and its histogram while Fig. 3b shows the final picture and its histogram, respectively. The values of various image quality assessment metrics for different algorithms are shown in Table 3. The AMBE of the resultant image using the proposed approach are considerably good and same as MMBEBHE but not better than BBHE technique. FSIM, PSNR values are better compared to benchmark techniques MMBEBHE and BBHE. The entropy is also better than MMBEBHE and BBHE technique.

Fig. 3 (a) Input image and its histogram (b) output image and its histogram

612 Table 3 Evaluation parameters for comparing picture quality of Breast digital X-ray image

K. Sangeeta et al. Metrics\method Entropy

MMBEBHE

BBHE

Proposed method

4.9543

4.9220

6.3849 − 15.9967

− 15.2397

− 43.4486

FSIM

0.7949

0.7471

0.8566

PSNR

17.3952

11.9248

21.4418

AMBE

SNR

7.4688

1.9984

11.5154

SSIM

0.8360

0.7195

0.8217

4.3 Kidney Image [17] The parameters for the CLAHE algorithm are set as [5 5] for tile size and clip limit as 0.09 for kidney image [17]. Figure 4a shows the original grey picture and its histogram while Fig. 4b shows the final picture and its histogram, respectively. The values of various image quality assessment metrics for different algorithms are shown in Table 4. The AMBE of the resultant image using the proposed approach is considerably better than MMBEBHE and BBHE technique. FSIM, PSNR values are better compared to benchmark techniques MMBEBHE and BBHE. The entropy is also better than MMBEBHE and BBHE techniques.

5 Conclusion We proposed a contrast enhancing technique for medical which not only preserves the brightness but also assures maximum class variance. The Otsu’s thresholding for segmentation between classes ensures maximum class variance and brightness is retained by appropriate parameters of the CLAHE algorithm used for histogram equalization. The experimental results and the parameter values are shown in tables for different medical images. The results are compared based on quality assessment metrics AMBE, entropy, FSIM, SSIM, SNR, PSNR. The proposed methods on an average performs better than the benchmark techniques BBHE and MMBEBHE as suggested by the experimental results. The results of the proposed method is highlighted with bold fonts in Tables 3 and 4 respectively.

Contrast Enhancement of Medical Images Using Otsu Thresholding

613

Fig. 4 (a) Input image and its histogram (b) output image and its histogram Table 4 Evaluation parameters for comparing picture quality of kidney image

Metrics\method Entropy

MMBEBHE

BBHE

Proposed method

6.1278

6.0526

7.4413 − 1.7000

− 4.8479

− 20.2059

FSIM

0.8601

0.8246

0.8630

PSNR

20.1263

17.6767

22.4767

SNR

13.8873

11.4377

16.2377

SSIM

0.8238

0.8010

0.8028

AMBE

614

K. Sangeeta et al.

References 1. Kim Y (1997) Contrast enhancement using brightness preserving bi-histogram equalization. IEEE Trans Consum Electron 43:1–8. https://doi.org/10.1109/30.580378 2. Wang Y, Chen Q Zhang B (1999) Image enhancement based on equal area dualistic sub-image histogram equalization method. IEEE Trans Consumer Electron 45(1):68-75. https://doi.org/ 10.1109/30.754419 3. Chen S-D, Ramli AR (2003) Contrast enhancement using recursive mean-separate histogram equalization (RMSHE) for scalable brightness preservation. IEEE Trans Consum Electron 49(4):1301–1309. https://doi.org/10.1109/TCE.2003.1261233 4. Chen S-D, Ramli AR () Minimum mean brightness error bi-histogram equalization (MMBEBE) in contrast enhancement. IEEE Trans Consumer Electron 49(4):1310–1319. https://doi.org/10. 1109/TCE.2003.1261234 5. Sim KS, Tso CP Tan YY (2007) Recursive sub-image histogram equalization applied to gray scale images. Pattern Recogn Lett 28(10):1209-1221, ISSN 0167-8655. https://doi.org/10. 1016/j.patrec.2007.02.003 6. Ibrahim H, Pik Kong NS (2007) Brightness preserving dynamic histogram equalization for image contrast enhancement. IEEE Trans Consumer Electron 53(4):1752–1758. https://doi. org/10.1109/TCE.2007.4429280 7. Gautam, Tiwari (2015) Efficient color image contrast enhancement using range limited bi-histogram equalization with adaptive Gama correction. In: Proceedings of the IEEE international conference on industrial instrumentation and control (ICIC), 175–180 8. Xu H, Chen Q, Zuo C (2015) Range limited double-thresholds multi-histogram equalization for image contrast enhancement. Rev 22:246–255. https://doi.org/10.1007/s10043-015-0073-x 9. Rani G, Agarwal M (2020) Contrast enhancement using optimum threshold selection. Int J Softw Innov (IJSI) 8(3):96–118. https://doi.org/10.4018/IJSI.2020070107 10. Thepade SD, Pardhi PM (2022) Contrast enhancement with brightness preservation of low light images using a blending of CLAHE and BPDHE histogram equalization methods. Int J Inf Tecnol. https://doi.org/10.1007/s41870-022-01054-0 11. Otsu N (1979) A threshold selection method from gray-level histogram. IEEE Trans Syst Man Cybern 90(1):62–66 12. Jain BD (1995) Goal directed evaluation of binarization methods. IEEE Trans Pattern Anal Machine Intell 17:1191–1200 13. Zuiderveld K (1994) Contrast limited adaptive histogram equalization. Graphics Gems IV, 474–485, code: 479–484 14. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612 15. Zhang L, Zhang L, Mou X, Zhang D (2011) FSIM: a feature similarity index for image quality assessment. IEEE Trans Image Process 20(8):2378–2386. https://doi.org/10.1109/TIP.2011. 2109730 16. Jan J, Jonas N, George D, Beth B (2005) Pap-smear benchmark data for pattern classification. nature inspired smart information systems (NiSIS) 17. Medical Images Database. http://www.imageprocessingplace.com/DIP-3E/dip3e_book_i mages_downloads.htm 18. Lather M, Singh P (2022) Contrast enhancement and noise removal from medical images using a hybrid technique. In: Kountchev R, Mironov R, Nakamatsu K (eds) New approaches for multidimensional signal processing. Smart innovation, systems and technologies, vol 270. Springer, Singapore. https://doi.org/10.1007/978-981-16-8558-3_17 19. Acharya UK, Kumar S (2022) Image sub-division and quadruple clipped adaptive histogram equalization (ISQCAHE) for low exposure image enhancement. Multidim Syst Sign Process. https://doi.org/10.1007/s11045-022-00853-9

Color Image Encryption Using Hybrid Three-Scroll Unified Chaotic Attractor and 6D Hyperchaotic System Subhashish Pal, Arghya Pathak, Ansuman Mahanty, Hrishikesh Mondal, and Mrinal Kanti Mandal

Abstract This paper proposes a hybrid encryption algorithm using the three-scroll unified chaotic attractor (TSUCA) and 6D hyperchaotic systems.With the help of a 32-character key, six highly sensitive initial conditions have been generated. Out of these six initial conditions, the first three have been used in TSUCA, and all six initial conditions have been used in the 6D hyperchaotic system along with the image information for generating the chaotic sequences. The proposed algorithm involves pixel confusion and pixel shuffling to acquire a high security level. Two-level encryption using chaotic sequences generated from TUSCA and 6D hyperchaotic systems are used in the encryption algorithm. To check the efficacy of the suggested algorithm, standard security tests like key space and key sensitivity, histogram analysis, correlation analysis, NPCR, UACI, entropy, noise effect, etc., have been performed. The suggested cryptosystem has shown promising results compared to other methods, as mentioned in this paper. Keywords Encryption · Decryption · Cryptosystem · Chaotic attractors · Hyperchaotic system · Lyapunov exponent

1 Introduction With the ascent of Internet technology and modern computing, the necessity of a strong cryptosystem is always required to deal with security breaches and cyberattacks. The key-based cryptosystems [1] are of two types: symmetric key and asymmetric key encryption. In the first one, the same key is used, while in the second one, S. Pal · A. Pathak · M. K. Mandal (B) Department of Physics, National Institute of Technology, Durgapur 713209, India e-mail: [email protected] S. Pal · A. Mahanty Department of Physics, Dr. B. C. Roy Engineering College, Durgapur 713206, India H. Mondal Department of Physics, Durgapur Government College, Durgapur 713214, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_48

615

616

S. Pal et al.

different keys are used for encrypting and decrypting information. Depending on the domain in which they are used, both systems have advantages and disadvantages. Data Encryption Standard (DES) [1], Advanced Encryption Standard (AES) [1], International Data Encryption Algorithm (IDEA) [1], Blowfish algorithm [2, 3], etc., are the types of symmetric key encryption. The most well-known examples of asymmetric key encryption are Diffie-Hellman exchange method [2], Rivest Shamir Adleman (RSA) [2], elliptical curve cryptography (ECC) [4], etc. In the recent past, chaotic systems and hyperchaotic systems, in conjunction with other approaches such as chaotic maps [5, 6], bit permutation [7], dynamic S-box [8], discrete wavelet transformations (DWT) [9], genetic codes (DNA) [10–12], sparse matrix [13], and many more, have shown promising performance in the domain of image encryption techniques [14]. A nonlinear dynamical system exhibits chaotic behavior under some specific conditions. The main decisive factors for their chaotic nature are the dependency on the control parameters and the initial conditions [15]. A chaotic system having at least two positive Lyapunov exponents is referred to as a hyperchaotic system. A hyperchaotic system produces pseudorandom sequences that are difficult to predict and comprehend because of their inherent complexity. As a result of their inherent randomness, sensitivity to the initial conditions and ability to manipulate, chaotic systems have grown in prominence in the field of cryptography. Chaotic encryption was first proposed by Fridrich in 1998 [16]. Confusion and diffusion are the two most important operations in designing a strong cryptosystem. Chaotic equations do these in an efficient manner. Continuous efforts have been made by the researcher to improve the quality of the cryptosystem, and still, it is a challenging task to invent a scheme to design a robust cryptosystem that is free of attacks and difficult to extract information/s. Here, we have proposed a hybrid system that comprises a three-scroll unified chaotic attractor (TSUCA) [17] and a 6D hyperchaotic system [18] tweaked for optimum performance as discussed in detail in Sects. 2 and 3. The proposed system is tested with standard image samples and compared with the results of some recently reported works, which are given in Sect. 4. The motivation behind this work is to design an encryption algorithm that can fulfill the requirements of a robust cryptosystem, and moreover, no such hybrid model based on chaotic and hyperchaotic systems has been reported till date.

2 Chaotic and the Hyperchaotic Systems 2.1 Three-Scroll Unified Chaotic Attractor (TSUCA) Pan et al. [17] proposed a three-scroll unified chaotic system, which is represented by the set of Eq. (1).

Color Image Encryption Using Hybrid Three-Scroll Unified Chaotic …

617

Fig. 1 Projection of the attractor: a TSUCA in xyz phase space and 6D hyperchaotic system in b xyz, c yzu, d zuv, e uvw, and f vwx phase space

⎫ x˙ = α(y − x) + δx z ⎬ y˙ = βx − x z + ξ y ⎭ z˙ = γ z + x y− x 2

(1)

The said system shows chaotic behavior for the values of control parameters α = 32.48, β = 45.84, γ = 1.18, δ = 0.13, = 0.57, and ξ = 14.7. The Lyapunov exponents for this system are L 1 = 0.1566, L 2 = − 0.1864, and L 3 = − 3.7889. As L1 is positive, so the system shows chaotic behavior and its chaotic attractor is shown in Fig. 1a.

2.2 6D Hyperchaotic System The 6D hyperchaotic system described by the set of Eq. (2) was introduced by Yang et al. [18]. The dynamical equations of their system are ⎫ x˙ = a(y − x) + z ⎪ ⎪ ⎪ ⎪ y˙ = −cy − x z + w ⎪ ⎪ ⎪ ⎬ z˙ = −b + x y u˙ = −y − v ⎪ ⎪ ⎪ ⎪ v˙ = dy + u ⎪ ⎪ ⎪ ⎭ w˙ = ex + f y

(2)

618

S. Pal et al.

For the values of the control parameters a = 10, b = 100, c = 2.7, d = 2, e = − 3, and f = 1, the Lyapunov exponents of this system are L 1 = 1.3163, L 2 = 0.0733, L 3 = 0.0478, L 4 = 0.0189, L 5 = 0.0000, and L 6 = − 14.2010. As five Lyapunov exponents are positive, the system exhibits hyperchaotic behavior. Fig. 1b–f shows the 3D-attractor in different phase spaces.

3 Proposed Encryption Scheme For a color image matrix P (m, n, 3), m and n represent the number of rows and columns, respectively. Here, P (i, j, k) indicates the intensity values at position (i, j), for different color channels. The values of k are 1, 2, and 3 for R, G, and B color channels, respectively. The different steps that are involved in our proposed encryption algorithm are as follows: Step 1: Step 2:

Double precision array K 1,32 is generated from 32 characters of the secret key. Six constants C 1 , C 2 , C 3 , C 4 , C 5 , and C 6 are generated using (3) from K 1,32 . ⎫ 8 ⎪ C1 + K 1,i + K 1,33−i /2 ⎪ ⎪ ⎪ ⎪ i=1 ⎪ 8 ⎪ ⎪ ⎪ ⎪ C2 = C2 − K 1,i+8 + K 1,33−i−8 /2 ⎪ ⎪ ⎪ ⎪ i=1 ⎪ 8 ⎪ ⎪ ⎪ C3 = C3 + K 1,i + K 1,33−i /2 ⎪ ⎬ C1 =

i=1

8 ⎪ ⎪ C4 = C4 − K 1,i+8 + K 1,33−i−8 /2 ⎪ ⎪ ⎪ ⎪ i=1 ⎪ 8 ⎪ ⎪ ⎪ C5 + K 1,i + K 1,33−i /2 ⎪ C5 = ⎪ ⎪ ⎪ i=1 ⎪ 8 ⎪ ⎪ ⎪ ⎭ C6 = C6 − K 1,i+8 + K 1,33−i−8 /2 ⎪

(3)

i=1

Step 3:

Step 4:

(m × n × 3) numbers of chaotic sequences have been generated from (1) with the initial conditions x 0 = C 1 /104 , y0 = C 2 /104 , and z0 = C 3 /104 . The corresponding iterative solutions (x i , yi , zi ) are reshaped in threedimensional matrix A (m, n, 3). The values of the generated chaotic sequences are then converted within the range 0 to 255 by taking modulo division, and a new chaotic image matrix PC (m, n, 3) is formed after bitwise XOR operation with the original pixel of the image. Dividing each element of PC by 255, a new matrix I (p, q, 3) is generated for each color channel. Where p = mn/32 and q = 32 and the elements of I (p, q, 3) lies in between 0 and 1.

Color Image Encryption Using Hybrid Three-Scroll Unified Chaotic …

Step 5:

619

Using (4), a matrix F (p, 1, 3) has been generated by taking the average of Dic1 , Dic2 , Dic3 , Dic4 , Dic5 , and Dic6 for each color channel. Where c stands for different color channels R, G, and B.

⎫ 8 ⎪ c1 c1 c6 ⎪ I (i, j,1)−I (i,33− j,1) Di = mod Di + , Di ⎪ ⎪ ⎪ 2 ⎪ j=1 ⎪

⎪ ⎪ ⎪ 8 ⎪ I (i, j+8,1)−I (i,33− j−8− j,1) c2 c2 c5 ⎪ Di + Di = mod , Di ⎪ ⎪ ⎪ 2 ⎪ ⎪ j=1

⎪ ⎪ ⎪ 8 ⎪ c3 c3 c4 ⎪ I (i, j,1)−I (i,33− j,1) ⎪ Di = mod Di + , D ⎪ i ⎬ 2 j=1

(4) 8 ⎪ c4 c4 c3 ⎪ I (i, j+8,1)−I (i,33− j−8− j,1) ⎪ Di + Di = mod , Di ⎪ ⎪ 2 ⎪ ⎪ j=1

⎪ ⎪ ⎪ 8 ⎪ c5 c5 c2 ⎪ I (i, j,1)−I (i,33− j,1) ⎪ Di + Di = mod , D ⎪ i ⎪ 2 ⎪ j=1 ⎪

⎪ ⎪ ⎪ 8 ⎪ I (i, j+8,1)−I (i,33− j−8− j,1) c6 c6 c1 ⎪ ⎪ Di + Di = mod , D ⎪ i ⎭ 2 j=1

Step 6:

Step 7: Step 8:

Step 9:

A two-dimensional array L (p, q) is then developed from the chaotic sequence generated by (2) by taking the initial conditions x 0 = C 1 /104 , y0 = C 2 /104 , z0 = C 3 /104 , u0 = C 4 /104 , v0 = C 5 /104 , and w0 = C 6 /104 . The value of each element of L is then converted within the range of 0–255 by taking the modulo operation. A set of arrays DiR , DiG , and DiB are obtained from DiR , DiG , and DiB by converting each element of them in the range 0–9. With clockwise rotation to the elements of jth row of L (p, q) by N times a new array L r (p, q) is obtained, where N is the value of jth element of DiR . A similar approach is taken for DiG , and DiB to get L g (p, q) and L b (p, q). Newly generated L r (p, q), L g (p, q) and L b (p, q) are then combined to form a three-dimensional array L rgb (p, q, 3). Then, L rgb is reshaped to L rgb (m, n, 3). The fractional part of F (1, 1, i) is then taken and assigned as x11 = x 1 +x 2 +x 3 F(1, 1, 1), x 2 = F(1, 1, 2), x 3 = F(1, 1, 3), and x 4 = ( 1 1 1 ) . By 1

1

1

3

using initial conditions x11 and x13 in Eq. (5), chaotic sequences xi1 and xi3 are generated. For initial conditions, xi1 and xi3 , chaotic sequences xi2 and xi4 are generated using Eq. (6). a = 3.999xia 1 − xia xi+1

(5)

b xi+1 = 2.565xib 1 − xib 1 + xib

(6)

620

S. Pal et al.

Here, the index i, varies from 1 to mn, values of a are 1 and 3, and b values are 2 and 4. Values of xi1 and xi2 are converted within the range 1–256, and the values of xi3 and xi4 are converted within the range 1–3. Step 10: For each channel of the chaotic image, pixel shuffling is carried out by interchanging the pixel positions between I (xi1 , xi2 , xi3 ) and I (xi2 , xi1 , xi4 ). Where i varies from 1 to mn and the resulting I is reshaped to I (m, n, 3). Step 11: Finally, bitwise XOR operations are carried out between L rgb (m, n, 3) and I (m, n, 3) to get an encrypted image EI (m, n, 3). At the receiver’s end, the original image is retrieved from the encrypted image by using the 32-character key and the decryption algorithm.

4 Result and Security Analysis The proposed encryption scheme has been tested with different standard image samples. The following analysis was carried out to test the efficacy of our proposed mechanism.

4.1 Key Space and Key Sensitivity We have recommended a hybrid system using a chaotic attractor and a 6D hyperchaotic system. Starting circumstances are regulating the system, and any change to the initial conditions will result in a new chaotic sequence. The key space is 2256 ≈ 1.158 × 1077 , making brute-force attacks pointless. A single character shift in the input key causes a change in the chaotic sequence. Figure 2 shows the results due to change of a single character in the key at the decryption end and confirms the sensitivity of the key to our proposed algorithm.

Fig. 2 a Lena original, b Lena encrypted, c Lena decrypted with same key, and d Lena decrypted with slightly different key

Color Image Encryption Using Hybrid Three-Scroll Unified Chaotic …

621

Table 1 Test results NPCR and UACI Images

NPCR R

UACI G

B

R

G

B

Lena

99.6401

99.6331

99.6368

33.9909

34.0687

34.1176

Baboon

99.6248

99.6309

99.6297

33.9913

33.9304

33.9813

Pepper

99.6324

99.6309

99.6290

34.1280

33.9997

33.9641

Airplane

99.6429

99.6689

99.6250

33.9588

33.9578

34.0691

4.2 Differential Attack Analysis Attackers can establish a correlation between the original image and the encrypted image by making small changes to the original and observing the corresponding change in the encrypted image. Noise to peak signal ratio (NPCR) and unified average changing intensity (UACI) are often used as one of the tools to assess the encryption algorithm’s resistance against such differential attacks [19]. The NPCR and UACI are determined by using the following relations: 1 D(i, j) × 100% L i, j

(7)

1 |E 1 (i, j) − E 2 (i, j)| × 100% L i, j 255

(8)

NPCR = UACI =

Here, L is the total number of pixels in the image. E 1 and E 2 are the two encrypted images corresponding to the original image and the original image with a change in one pixel value, respectively. D (i, j) is defined as If E 1 (i, j) = E 2 (i, j), then D (i, j) = 1, if E 1 (i, j) = E 2 (i, j), then D (i, j) = 0. NPCR and UACI tests were performed to assess the strength of the proposed cryptosystem against the differential attack. Test results are shown in Table 1, and comparison with recently reported work is shown in Table 2. The result shows that the proposed cryptosystem values of NPCRs and UACIs are more than 99.6% and close to 33.3%, respectively, and hence, the proposed algorithm is good enough to withstand the differential attack [19].

4.3 Information Entropy Analysis Shannon [22] introduced the concept of entropy in communication systems. It measures the randomness of the encrypted image. Greater the value of entropy, more the unpredictability. Entropy can be determined by using the following equation:

622

S. Pal et al.

Table 2 Comparison of NPCR and UACI value of Lena with other methods Method

NPCR

UACI

R

G

B

R

G

B

Reference [7]

99.6124

99.6134

99.6192

33.4438

33.5232

33.5010

Reference [12]

99.6300

99.6000

99.6000

33.6000

33.3000

33.4399

Reference [20]

99.6531

99.6522

99.6518

33.4572

33.4715

33.4384

Reference [21]

99.6137

99.6053

99.6079

33.4655

33.4781

33.4746

Proposed

99.6401

99.6331

99.6368

33.9909

34.0687

34.1176

En(m) =

N 2 −1

P(m i )

i=0

1 P(m i )

(9)

Here, P(mi ) is the probability of mi . If En(m) = N, the output of a source that emits 2N symbols will be completely arbitrary. Because each symbol in our system is represented by a number with 8 bits, the optimum value for En (m) is 8. The entropy in various channels for the original image is shown in Table 3, and comparison with other methods is given in Table 4. Table 3 Test results of entropy Images

Original image

Encrypted image

R

G

B

R

G

B

Lena

7.2353

7.5683

6.9176

7.9982

7.9981

7.9990

Baboon

7.7141

7.5069

7.7519

7.9986

7.9982

7.9982

Pepper

7.3669

7.6775

7.1925

7.9987

7.9983

7.9981

Airplane

6.7232

6.8167

6.1987

7.9981

7.9983

7.9986

Table 4 Comparison of information entropy of encrypted Lena with other methods Method

Channels R

G

B

Reference [23]

7.998

7.9979

7.9978

Reference [24]

7.9974

7.997

7.9971

Reference [25]

7.9892

7.9898

7.9899

Reference [26]

7.9971

7.9974

7.9973

Proposed

7.9982

7.9981

7.9990

Color Image Encryption Using Hybrid Three-Scroll Unified Chaotic …

623

4.4 Histogram and Chi-Square Test A histogram is used to display the frequency of occurrence of a certain value. Pixels in images may have any value between 0 and 255. The frequency with which a given pixel occurs in the image will be shown via an image histogram. In order to prevent the frequency of a specific pixel value from being predictable, the encrypted image histogram must be uniform and independent of the original image’s characteristics. For reference, Fig. 3 shows the histogram plot for encrypted and decrypted images of Lena for different color channels. The plain image histogram has spikes, but the encrypted image histogram is relatively homogenous. This validated the efficacy of the proposed method. The encrypted image histogram provides proof of the encryption method’s success for statistical analysis, but it is inadequate to prove the accuracy of the pixel values in a decrypted image [27]. To calculate the monotony of the histogram, we have used the chi-square test, using Eq. (10), as a measure. χ2 =

o(max) i=0

(oi − ei )2 ei

(10)

mn Here, ei = o(max) , o(max) is the maximum pixel value, oi is the observed pixel value at index i in the histogram, ei is the expected pixel value which is the same at every index i. The theoretically accepted value of the chi-square test for the histogram of the encrypted image is 293 [27]. Lowering the value of the chi-square test indicates a better encryption algorithm. In Table 5, the estimated values of chi-square for the

Fig. 3 Histogram plot of Lena original (in top row) and corresponding Lena decrypted (in bottom row) for different color channels

624

S. Pal et al.

Table 5 Chi-square test results for encrypted images Images

R

G

B

Remark

Lena

250.2266

260.5703

235.2344

Pass

Baboon

215.3438

253.8906

250.3906

Pass

Pepper

262.8906

248.4141

259.6875

Pass

Airplane

261.9531

246.8906

221.1406

Pass

encrypted images are lower than the accepted threshold value. This means that the proposed method passes this test.

4.5 Correlation Analysis Every pixel of a plane image is highly correlated with their neighboring pixels, regardless of whether they are oriented horizontally, vertically, or diagonally. The correlations between the pixels of the image need to be eliminated during encryption so that the resulting encrypted images must have low correlations and resemble noise. When generating the correlation scatter plot for all three channels, we used more than a thousand pairs of neighboring pixels from both the planar image and the encrypted image. As an example, Fig. 4 shows the correlation scatter plot in a horizontal direction for Lena. Similar types of observations are seen in the cases of vertical and diagonal directions. It is clear from Fig. 4 that in the case of encrypted images, the correlation coefficient has decreased remarkably. We have also calculated the value of the correlation coefficient using the relationships given below. cov(x, y) rx y = √ . Dx D y

(11)

N N xi , Dx = where cov(x, y) = N1 i=1 (xi − E(x))(yi − E(y)) and E x = N1 i=1 2 1 N i=1 (x i − E(x)) . Here, N stands for the total number of pixel pairs taken from N the test image, and x and y represent the values of the neighboring pixels. Calculated correlation coefficients for sample images are listed in Table 6, and comparison of data with other methods for the Lena image is listed in Table 7. It is clearly seen that the correlation coefficients for the proposed algorithm are smaller compared to other methods.

4.6 Effect of Noise During transmission through the communication channel, the quality of the decrypted image may be degraded by the influence of noise on the transmitted encrypted image.

Color Image Encryption Using Hybrid Three-Scroll Unified Chaotic …

625

Fig. 4 Correlation between two horizontally adjacent pixels for Lena original (in the top row) and Lena encrypted (in the bottom row) for different color channels Table 6 Correlation coefficients for different channels in horizontal direction Images

Original image

Encrypted image

R

G

B

R

G

Lena

0.9545

0.9437

0.9245

− 0.0028

− 0.0025

0.0045

Baboon

0.8578

0.7340

0.8106

− 0.0126

− 0.0044

0.0117

B

Pepper

0.9730

0.9846

0.9709

0.0127

0.0263

− 0.0459

Airplane

0.9297

0.9220

0.9384

0.0026

0.0048

0.0047

Table 7 Comparison of correlation coefficients of encrypted Lena with other methods Method

Direction

R

Reference [7]

Horizontal

Reference [20]

− 0.0127

− 0.0075

− 0.0007

0.0067

− 0.0068

0.0042

Diagonal

0.006

− 0.0078

0.0026

Horizontal

0.0092

0.0002

0.0076

Vertical

0.0203

− 0.0025

0.0006

− 0.0073

− 0.0131

0.0111

0.0137

− 0.0246

− 0.0137

Horizontal Vertical Diagonal

Proposed

B

Vertical

Diagonal Reference [21]

G

− 0.0237

− 0.017

0.0023

0.0109

− 0.0133

− 0.0013 0.0045

− 0.0028

− 0.0025

Vertical

0.0067

0.0022

0.0030

Diagonal

0.0021

0.0021

− 0.0073

Horizontal

626

S. Pal et al.

Fig. 5 Lena decrypted for SnP noise with density a d = 0.001, b d = 0.005, c d = 0.01, and GN with mean zero and variance are d v = 0.0001, e v = 0.0025, f v = 0.01

For testing the strength of the proposed algorithm, we have used two types of noise. Over the encrypted image, salt and pepper noise (SnP) with varying densities (d) and Gaussian noise (GN) with variances (v) have been superimposed. Figure 5 suggests that the original image was preserved with just small alterations, and with an increase in noise level, a noticeable change is seen. These findings demonstrate the proposed algorithm’s robustness in the face of interference from the communication channel.

5 Conclusion In this article, we have proposed an efficient image encryption algorithm based on the TUSCA and 6D-hyperchaotic hybrid systems. Image pixels are encrypted with the neighboring pixel dependent chaotic key sequences. Therefore, the input chaotic sequence is very sensitive to a slight variation in the pixel intensity of the input image or variation of character of the input key. The cryptanalysis results confirm the efficiency of the proposed encryption algorithm. The proposed method has been checked with a variety of security tests, and all of them have yielded satisfactory results. In order to demonstrate the efficacy of the algorithm, the obtained results are also compared to the recently reported work. As a result, it is fair to state that the suggested cryptosystem solves the problems faced by the aforementioned algorithms. The suggested cryptosystem has potential for use in secure image transmission.

Color Image Encryption Using Hybrid Three-Scroll Unified Chaotic …

627

References 1. Schneier B (2015) Applied Cryptography—protocols, algorithms, and source code, 20th Anniversary edn. C. John Wiley & Sons, Inc., New York 2. Buchmann J (2004) Introduction to cryptography, 335. Springer, New York 3. Ferguson N, Bruce S (2003) Practical cryptography. 141. New York: Wiley 4. Koblitz N, Menezes A, Vanstone S (2000) The state of elliptic curve cryptography. Des Codes Crypt 19(2):173–193 5. Xu J, Li P, Yang F, Yan H (2019) High intensity image encryption scheme based on quantum logistic chaotic map and complex hyperchaotic system. IEEE Access 7:167904–167918 6. Karmakar J, Debashis N, Mandal MK (2019) Hyper-chaotic Image Encryption using ACM and GBS. International conference on advanced computational and communication paradigms (ICACCP), IEEE, pp 1–6 7. Wang X, Zhang HL (2015) A color image encryption with heterogeneous bit-permutation and correlated chaos. Opt Commun 342:51–60 8. Liu Y, Xiaojun T, Jing M (2016) Image encryption algorithm based on hyper-chaotic system and dynamic S-box. Multim Tools Appl 75(13):7739–7759 9. Wu X, Wang D, Kurths J, Kan H (2016) A novel lossless color image encryption scheme using 2D DWT and 6D hyperchaotic system. Inf Sci 349:137–153 10. Wang X, Maochang Z (2021) An image encryption algorithm based on hyperchaotic system and DNA coding. Opt Laser Technol 143:107316 11. Kar M, Kumar A, Nandi D, Mandal MK (2020) Image encryption using DNA coding and hyperchaotic system. IETE Tech Rev 37(1):12–23 12. Wang XY, Zhang HL, Bao XM (2016) Color image encryption scheme using CML and DNA sequence operations. Biosystems 144:18–26 13. Karmakar J, Debashis N, Mandal MK (2020) A novel hyper-chaotic image encryption with sparse-representation based compression. Multim Tools Appl 79(37):28277–28300 14. Kaur M, Kumar V (2020) A comprehensive review on image encryption techniques. Arch Comput Methods Eng 27(1):15–43 15. Kocarev L, Szczepanski J, Amigo JM, Tomovski I (2006) Discrete chaos-I: theory. IEEE Trans Circuits Syst I Regul Pap 53(6):1300–1309 16. Fridrich J (1998) Symmetric ciphers based on two-dimensional chaotic maps. Int J Bifurcat Chaos 8(06):1259–1284 17. Pan L, Zhou W, Fang J, Li D (2010) A new three-scroll unified chaotic system coined. Int J Nonlinear Sci 10(4):462–474 18. Yang L, Yang Q, Chen G (2020) Hidden attractors, singularly degenerate heteroclinic orbits, multistability and physical realization of a new 6D hyperchaotic system. Commun Nonlinear Sci Numer Simul 90:105362 19. Wu Y, Joseph P, Agaian S (2011) NPCR and UACI randomness tests for image encryption. J Sel Areas Telecommun 1(2):31–38 20. Xuejing K, Guo Z (2020) A new color image encryption scheme based on DNA encoding and spatiotemporal chaotic system. Signal Process: Image Commun 80:115670 21. Wu XJ, Wang KS, Wang XY, Kan HB, Kurths J (2018) Color image DNA encryption using NCA map-based CML and one-time keys. Signal Process 148:272–287 22. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:623–656 23. Zhang YQ, He Y, Li P, Wang XY (2020) A new color image encryption scheme based on 2DNLCML system and genetic operations. Opt Lasers Eng 128:106040 24. Wu XG, Wang KS, Wang XY (2018) Color image DNA encryption using NCA map-based CML and one-time keys. Signal Process 148:272–287 25. Rehman A, Liao X, Ashraf R, Ullah S, Wang H (2018) A color image encryption technique using exclusive-OR with DNA complementary rules based on chaos theory and SHA-2. Optik 159:348–367

628

S. Pal et al.

26. Wu X, Wang K, Wang X, Kan H (2017) Lossless chaotic color image cryptosystem based on DNA encryption and entropy. Nonlinear Dyn 90(2):855–875 27. Ma S, Zhang Y, Yang Z, Hu J, Lei X (2019) A new plaintext-related image encryption scheme based on chaotic sequence. IEEE Access 7:30344–30360

IoT Adoption in Agriculture: Awareness and Challenges Faced by Rural Farmers in Delta Districts of Tamil Nadu S. Arjune

and V. Srinivasa Kumar

Abstract The implementation of the Internet of Things in farming, like in other sectors, promises formerly unattainable efficiency, energy and expense reduction, digitization, and information processes. However, in agriculture, these benefits do not serve as upgrades, but rather as remedies for the entire market, which is confronted with a variety of harmful difficulties. The implementation of IoT in farming was analogous to a second batch of the agricultural revolution. Farmers have reaped two advantages from IoT. They may now complete the same number of tasks in far less time while also boosting agriculture production thanks to precise information collected through IoT. This study has been confined to delta district of Tamil N¯adu, comprise of Pudukkottai, Perambalur, Ariyalur, Cuddalore, Trichy, Nagapattinam, Thiruvarur, and Thanjavur. Further, questions pertaining to characteristics, problems faced, awareness of farmers in IoT adoption alone ascertained in the study. The study findings revealed the influence of awareness level of farmers towards IoT adoption along with the influence of challenges faced by farmers in IoT adoption. The IoT technology is the need for the hour which will definitely help farmers progress. Improving the agriculture sector with technology adoption not only helps the farmers but also helps the whole nation progress. Keywords Internet of Things · Awareness · Challenges · IoT adoption

1 Introduction Many crops are grown in India, with wheat and rice being the most important dietary staples. In addition to grains, potato, sugarcane, oilseeds, and cotton, tea, coffee, rubber, and jute, Indian farmers grow cotton, tea, coffee, and rubber. India is also a S. Arjune (B) · V. Srinivasa Kumar School of Management, SASTRA Deemed to be University, Thanjavur, India e-mail: [email protected] V. Srinivasa Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_49

629

630

S. Arjune and V. Srinivasa Kumar

fishing powerhouse. With an annual catch of over 3 million tonnes, India ranks among the best ten fishing nations in the world. Despite the vast scale of the agricultural industry, agricultural output per acre in India is often low when compared to worldwide norms. The farming industry includes firms or businesses that are primarily concerned in washing, manufacturing, preserving, or packing various forms of agricultural products. Fibres, meals, and raw materials are some of the most frequent farm commodities. Breeding fish and livestock, cultivating crops, farms, harvesting wood, and dairies are all part of it. Aside from that, it includes companies that support agricultural operations by manufacturing seeds, chemicals, and farm machinery. As a consequence, it is a vital element of the Indian economy and employs the vast majority of the country’s people. With dwindling agricultural areas and depletion of scarce natural resources by 2050, increasing farm production has become crucial. The scarcity of natural resources like as groundwater and farming land, along with declining yield trends in key staple commodities, has exacerbated the situation. Furthermore, agricultural employment has reduced in the majority of countries. As the agricultural manpower has declined, the use of Internet networks and services in farming techniques has increased in order to eliminate the demand for physical labour. IoT systems are aimed at assisting producers in reducing the shortfall by ensuring greater returns, revenue, and environmental preservation. Precision agriculture refers to the use of Internet of Things (IoT) to assure optimal resource use in order to produce high agricultural output while lowering operating expenses. IoT in agriculture technologies include special tools, Internet network, software, and services in information technology.

1.1 The Application of IoT in Farming UAVs, sometimes known as drones, have grown in popularity in the sector. Drones are commonly used in precision farming as an IoT-based surveillance system, as instruments for farm management, on-demand water management, and pesticide control. The Xaircraft P30, a Red Dot Award winner, is an unmanned crop protection drone. It employs innovative algorithms for superior flight abilities and definite chemical dosing, allowing it to save up to 30% of pesticide content and 90% of groundwater. Irrigation control based on smart sprinklers allows producers dramatically cut water usage, making farming more affordable. Linked coolers and heaters in transport and storage infrastructure improve product preservation and assist reduce trash. Intelligent LED bulbs adjust to changing conditions and ensure that every region of a greenhouse or floor capacity receives the appropriate quantity of light. The key is in the creative approach to agricultural landscapes and the widespread application of IoT in farming practices. Glass covers the bulk of cropland in the Netherlands, and vegetables are cultivated in self-contained greenhouses. To keep a stable climate in greenhouses, the Dutch uses linked technology, detectors to monitor CO2 levels, moisture, smart lights, crop growth, and database management. As a consequence, they extract ten times the average yield of an open area. Companies such as Infarm

IoT Adoption in Agriculture: Awareness and Challenges Faced by Rural …

631

and Jones Nutrition propose to construct independent greenhouses in urban settings such as a market, residential garden, or even a pharmaceutical plant. Smart vertical gardens are soilless owing to aquatic technology—plants grow in nutritional liquids regulated and managed by detector technology and do not require typical irrigation, making them capital efficient. Land, humidity, wetness, sunlight, ambient temperature, carbon dioxide, solar energy detector, and a variety of other IoT technology are utilized in agriculture. Sensors are mounted across the fields, in IoT-based monitoring devices, on precision agriculture vehicles, and weather forecasts gather information continually, bringing insight and control to farm operations. The integration of data from various sensors enables farmers to generate plant models and anticipate how plants will grow in certain circumstances, incorporate precision farming approaches, develop harvesting plans, and so on.

1.2 Agriculture’s IoT Issues 1.2.1

Connectivity

To make an IoT infrastructure work, you must enable connection all through the agricultural area, warehouses, buildings, greenhouses, and so on. And there is a lot of room to work with here. It should ideally be a dependable, continuous connection that can endure severe weather conditions and open area settings. Regrettably, connectivity remains an issue in the Internet of Things overall, as various systems employ multiple methods and data transfer techniques.

1.2.2

Design and Lifespan

Any IoT device used in farming should be able to manage not only connections but also outdoor conditions. Robotics, portable detectors, IoT in power grid, and weather forecasting stations should be simple yet practical, with a particular level of toughness to “operate in the farm.” Not to mention the difficulty and uniqueness of building an IoT tool overall.

1.2.3

Time and Funds Are Limited

The role of IoT in farming is critical, but implementation of smart innovation in this field occurs in the setting of a constantly shifting environment and time constraint. Companies that develop and implement IoT for farming must consider rapid changes in climate and growing extreme climate, as well as deal with limited farmland and negative variables such as declining pollinators.

632

S. Arjune and V. Srinivasa Kumar

2 Literature Review Smart farming is an idea that reinforces the incorporation of advanced technologies such as IoT and AI to increase performance in agricultural methods. One of the greatest obstacles with these vast volumes of information is their huge variation in terms of structure and definition. Furthermore, the various services and capabilities in a smart agro-ecosystem have limited ability to collaborate because there is a lack of common system and data integration techniques. These difficulties provide significant obstacles to collective service delivery, information and technological convergence, and information policies [1]. Existing Internet of Things options for precision agriculture are often split between those that employ aerial sensing via UAVs or those that use ground-based detecting technologies. Although ground-based detection gives great accuracy of data, it requires vast panels of ground-based detectors, which have high operating costs and difficulty. However, while aerial sensing is significantly less expensive than ground-based sensing options, the information recorded via aerial sensing is less reliable and covers a shorter time span [2]. The use of Internet of Things (IoT) into different agricultural operations has increased entire farming metrics such as productivity, quantity and quality of grown goods, and revenue rise. IoT has begun to disclose its tangible opportunities to ultimate consumers, as it can help and help them make decisions. On the flipside, IoT must face numerous challenges in terms of technological sophistication, customization, user-friendliness, deployment, efficiency, and capacity factor [3]. Overall, we can conclude from the cutting-edge Internet of Things smart farming survey that IoT has various uses in agriculture. Applications and services have previously incorporated IoT technologies. However, many obstacles and research questions remain in the areas of confidentiality and protection, connectivity issues, governance, accessibility, and dependability as well as looking for ways to improve [4]. The IoT and key technologies have had a major effect on precision agriculture as a main sub within the area of agriculture. Collection of data from IoT systems is facilitated by new tech in numerous farming activities. The vast amount of precision agriculture data obtained can be used for daily planning and assessment, such as crop yield, increased revenue, quality control, livestock and fishery, and farming techniques [5]. In smart agriculture, Internet of Things digital technology is largely used in the manufacturing and distribution of agricultural goods. Transmission technology is an essential component of the Internet of Things sector. It contributes to the creation of detectors and other goods in the higher layers, as well as docking with end devices in the application level. Each medium of communication has benefits and drawbacks [6].

IoT Adoption in Agriculture: Awareness and Challenges Faced by Rural …

633

High cost is involved in deploying agricultural detectors, bots, UAVs, other instruments and innovations on farm land for surveillance and operation. Highresource personnel are required to oversee and deploy the equipment and technology. High end-device, tool, and technology maintenance costs, continuous highbandwidth Internet access are necessary. Installation of high definition for crop satellite data using detectors and drones, a large capacity database is necessary to retain periodic logs of plant and other activity data. A reliable and consistent energy-efficient atmosphere is the key challenges in IOT adoption [7]. Microalgae bioconversion forecast has always been tricky. Microalgae industries face a hurdle technicians were required to take observations. Every day, microalgae mass is added to the production tank. Nonetheless, the setup of sensing devices and machine learning implementation optimization strategy has the potential to enhance input utilization rate by maximizing the best production circumstances with the least amount of chemical usage, while creating a high biomass of microalgae production [8]. The implementation of Internet of Things in smart greenhouses frequently requires a choice between farm production costs, nature conservation, environmental destruction, and viability. The implementation of IoT construction is cash and frequently results in increased electricity generation, which raises the risk of global warming. The increased sage of IoT sensors and connections introduces new management problems of armful by-products removal, resource shortages, and the devastation of delicate ecosystems all cause global warming. The global implementation of upgraded 5G networks would complement the incorporation of IoT technology in greenhouses innovation [9]. A modest trial design methodology allows for the creation of a farm surveillance system. Even though the paradigm offers many advantages, there are numerous significant obstacles. Concerns about material costs, application costs, and energy efficiency remain at the top of the list. Second, there is data storage and dependability. Third, there is data availability and access, as well as rapid real-time deployment. Without a question, the sensor can provide detection area and farming parameters, but details such as where sensors should be installed and how many sensors are desired for a present land area should be precisely described [10]. Agriculture’s modern digital technology, which is largely employed by agricultural producers, makes a significant contribution to agricultural production. However, current online services for small-scale farmers are not viable in the rural context and do not address the needs for a full range of solutions across the agricultural cycle. In most emerging regions, particularly Tanzania, modern technology and services are used to address a problem at a certain point in the agricultural process or along a particular production chain [11]. The integration of IoT in farming is making a significant difference in modern agricultural practices. The Internet of Things has led to a tremendous upheaval in the farming industry. Many conventional farming tasks are being conducted faster, with less effort and less manpower, thanks to the use of IoT devices. IoT devices

634

S. Arjune and V. Srinivasa Kumar

have also enhanced crop yields and helped to maintain crop demand and supply. IoT devices mostly employ wireless detectors to collect crop data and deliver it to central computers. Data collected from detectors provides various information regarding ambient conditions, plant conditions, and so on [12]. In different phases of digitized farming, there are numerous research difficulties and potential. In the agricultural area, digital twin concepts can be useful for land and water management, plant, automation and farm equipment, and post-harvest food manufacturing. Artificial models can anticipate and solve unknown concerns in the farms by using true and frequent feedback about agricultural assets. It may assist farmers in reducing financial stress on the agriculture industry and labour concerns, while also assisting authorities in charge of food security and ecological protection in boosting the farming industry [13]. Based on the survey results, IoT elements for the modern agriculture industry comprising both software and hardware have been concentrated on research and have attained numerous milestone accomplishments. Several Internet of Things (IoT) systems have been used on big farms/fields. However, extensive IoT adoption in farming continues to face some obstacles. The two major concerns are effectiveness and technical difficulties. We discuss these concerns in conjunction with the regulations that will promote IoT adoption in farming technologies [14]. The rapid growth of Internet of Things-based tools has altered practically every aspect of life, including commerce, farming, and monitoring. In the face of multiple barriers, these dramatic changes are upending old farming techniques and offering new possibilities. IoT helps in the collection of information valuable in the farming sector, such as changes in weather conditions, soil quality, quantity of water necessary for plants, cultivation, bug and pest identification, insect location interruption of animals to the sphere, and gardening. IoT enables producers to successfully employ innovation to remotely check their forms [15]. Relevant papers were chosen, and full-text studies were made available. General items that meet the requirements for eligibility were only 14 studies and they were examined. This study provided the outcomes of previously specified challenges that are relevant to farm production and IT. The articles chosen brings in three major concerns and obstacles in IT adoption in terms of entrepreneurship and the agricultural industry The primary concerns were facilities, manpower, and institution [16]. In terms of data transfer, dependability is a major concern for IoT devices. Devices must collect and transmit trustworthy data in order to make correct choices when needed. Incorrect measurements have a significant impact on system dependability [17]. The “modern agricultural revolution” will alter agriculture, enhancing productivity, ecology, egalitarianism, and openness. However, in order to take advantage of their greater capacities, technology must be integrated into the agrarian system on a bigger level. However, several security issues in agriculture, such as interoperability, variability, handling big quantities of information, and processing enormous amounts of information, have to be resolved [18].

IoT Adoption in Agriculture: Awareness and Challenges Faced by Rural …

635

The implementation of a robust and sustainable agrarian economy is the future of farming. The precision farming technique is currently being automated. However, a lack of information about technical improvements may stymie the creation of new agricultural advanced features. Moreover, emerging innovations like digital physical systems, virtualization, IoT, and big data have transformed precision agriculture. However, adopting such techniques in a timely manner is still a difficulty around the world due to limitations such as Internet access, privacy, expense, and high computing cost [19]. The Internet of Things is transforming the agriculture sector by giving farmers a wide range of resources to handle a variety of difficulties on the farm. Using IoTenabled devices, farmers can access their land from practically anywhere and at any point. Sensors and actuators are utilized to control agricultural activities, while wireless sensors are employed to analyse the field. To remotely access the field and gather data in the form of images and snapshots, wireless cameras and sensors were deployed [20]. Smart land irrigation system: This system would prepare the agricultural soil for yield by ploy, weed, planting preparation, and staking. Smart irrigation process: This technology would automate the controlled provision of the necessary amount of water for plant life. Smart fertilization: This is the technique of automating fertilizer pouring on the land while maintaining oversight of the quality of manure as well as the time frame of pouring. Smart insect control and detection systems: This system continuously monitors and identifies pest problems, as well as evaluates agricultural damage [21].

3 Research Methodology A structured questionnaire is used for collecting responses for the study. Questions pertaining to socio-economic characteristics, awareness, challenges among farmers in IoT adoption are alone ascertained in the study.

3.1 Sample Design By adopting convenient sampling method, questionnaires are distributed to 224 farmers in delta districts.

3.2 Hypothesis H1 —There is a positive relationship between awareness level and IoT adoption among farmers.

636

S. Arjune and V. Srinivasa Kumar

Fig. 1 PLS SEM output (awareness level on IoT adoption)

H2 —There is a positive relationship between challenges faced and IoT adoption among farmers.

3.3 Framework of Analysis The collected data are analysed by using PLS SEM and SPSS. Figure 1 represents the influence of various awareness towards IoT adoption among the farmers. Figure 2 represents the influence of various challenges towards IoT adoption among the farmers.

4 Findings From Table 1, it was found that majority of the respondents age were between 30 and 40, done higher education, married, belong to 4–5 family size, earning Rs 15,000– 20,000. From Table 2, path analysis represents the influence of independent variables on the dependent variable. Considering the p-value, it was found that variables like smart

IoT Adoption in Agriculture: Awareness and Challenges Faced by Rural …

637

Fig. 2 PLS SEM output (challenges faced on IoT adoption)

Table 1 Profile characteristics of rural farmers

S. No.

Profile characteristics

Majority

Number of respondents

1

Age

30–40

115

2

Education level

Higher education

115

3

Marital status

Married

210

4

Family members

4–5

187

5

Personal income in RS

15,000–20,000

202

Source Primary data collected

irrigation system, smart livestock farming, smart pest detection, and control system have positive influence towards IoT adoption. Hence, H1 gets accepted for these variables. While smart fertilizer systems, smart harvest systems, smart quality groundwater management systems, and smart soil cultivation have a negative influence towards IoT adoption. Hence, H1 gets rejected for these variables. It is clear that farmers’ awareness on certain technologies definitely influence them in IoT adoption. Here, the farmers are aware of irrigation, livestock monitoring, and avoiding pest usages. They know the improvements happening in these areas. But still there exists a knowledge gap among the farmers towards fertilizer system, harvest

638

S. Arjune and V. Srinivasa Kumar

Table 2 Path analysis Original sample

Sample mean

Standard deviation

T statistics

P-value

− 0.032

− 0.034

0.056

0.561

− 0.575

Smart harvest system

0.179

0.174

0.078

0.280

− 0.023

Smart irrigation system

0.717

0.725

0.077

0.281

0.000

Smart livestock farming

0.282

0.230

0.111

0.456

0.000

Smart pest detection and control systems

0.567

0.113

0.234

0.567

0.000

− 0.433

0.117

0.677

0.721

− 0.751

0.555

0.005

0.678

0.235

− 0.534

Smart fertilizer system

Smart quality groundwater management system Smart soil cultivation system

Source PLS SEM extract from primary data collected

system, water management, and cultivation which are the major reasons that they are not able to reap the benefits of IoT technologies. If they learn about the technologies happening in those areas definitely, they will consider IoT adoption in their fields. From Table 3, it was found that network issue, technical problem, trust, connection issue, skills, infrastructure, ease of use, investment cost were a major concern for farmers in IoT adoption in Agriculture. For these variables, H2 gets accepted. While security issues, risk assessment, standardization, and scalability were not considered much challenges by the farmers in IoT adoption. H2 gets rejected for these variables. Based on the result, it is evident that they consider basic problems like cost, infrastructure, trust, technical hitches were the major concerns that the farmer saw as challenges/barriers that stops them from adopting IoT technologies into their farm field.

5 Recommendations This study is focussed only on delta district farmers as well as concentrates on limited factors. For future researchers, they can take this study all over India. They can also relate the growth of farmers towards the country’s growth. Finally, this paper will help policy makers to understand the rural farmers in delta district interest towards IoT adoption better.

IoT Adoption in Agriculture: Awareness and Challenges Faced by Rural …

639

Table 3 Path analysis Original sample

Sample mean

Security issue

0.113

0.036

Risk assessment

0.234

0.127

Network issue

0.005

0.345

Lack of standardization

0.256

Technical problems

Standard deviation

T statistics

P-value

0.112

1.123

− 0.467

0.079

1.456

− 0.073

0.078

0.678

0.000

0.009

0.116

0.456

− 0.085

0.113

0.116

0.324

0.897

0.000

Trust

0.433

0.118

0.656

0.721

0.000

High investment cost

0.782

0.008

0.678

0.278

0.000

Connection issue 0.224

0.114

0.789

0.756

0.000

Scalability

0.784

0.666

0.966

0.345

− 0.564

Lack of skills

0.478

0.145

0.765

0.989

0.000

Lack of infrastructure

0.008

0.008

0.876

0.765

0.000

Ease of use

0.897

0.078

0.844

0.969

0.000

Source PLS SEM extract from primary data collected

6 Conclusion The rise of farmers will definitely raise the nation. The implementation of IoT in farming was analogous to a second batch of the agricultural revolution. Farmers have reaped two advantages from IoT. They may now complete the same number of tasks in far less time while also boosting agriculture production thanks to precise information collected through IoT. Precision agriculture based on IoT not only revolutionizes traditional agricultural practices but also benchmarks other agrarian methods such as organic production, household farming (intricate or tiny spaces, specific cattle and/or civilizations, restoration of specific or high-quality variants, etc.), and improves highly transparent farmland. Precision agriculture powered by IoT is also advantageous in terms of ecological concerns. It can assist farmers in making better use of freshwater and maximizing supplies and remedies. IoT technology is the need of the hour which will definitely help farmers progress. Improving the agriculture sector with technology adoption not only helps the farmers but also helps the whole nation progress. Farmers irrespective of the hindrances if they adopt IoT definitely they will reap the benefits provided by IoT. It is vital for the farmers to adopt technology for their survival in this competitive world.

640

S. Arjune and V. Srinivasa Kumar

References 1. Amiri-Zarandi MH (2022) A platform approach to smart farm information processing. Agriculture 12(6):838 2. Bagha HY (2022) Hybrid sensing platform for IoT-based precision agriculture. Fut Internet 14(8):233 3. Boursianis AD-T (2022) Internet of things (IoT) and agricultural unmanned aerial vehicles (UAVs) in smart farming: a comprehensive review. Internet of Things 18:100187 4. Chaganti RV (2022) Blockchain-based cloud-enabled security monitoring using Internet of Things in smart agriculture. Fut Internet 14(9):250 5. De Alwis SH (2022) A survey on smart farming data, applications and techniques. Comput Ind 138:103624 6. DishaSharma DD (2022) A review: development of smart agriculture using IOT, agriculture robots and wireless sensory networks. Comput Intell Syst 167–176 7. Gupta RB (2022) Selection of suitable IoT-based end-devices, tools, and technologies for implementing smart farming: issues and challenges. Int J Stud Res Technol Manag 10(2):28–35 8. Lim HR (2022) Smart microalgae farming with internet-of-things for sustainable agriculture. Biotechnol Adv 57:107931 9. Maraveas CP (2022) Applications of IoT for optimized greenhouse environment and resources management. Comput Electron Agric 198:106993 10. Mohapatra BN (2022) A prototype of smart agriculture system using Internet of Thing based on Blynk application platform. J Electron, Electromed Eng, Med Informat 4(1):24–28 11. Mushi GE (2022) Digital technology and services for sustainable agriculture in Tanzania: a literature review. Sustainability 14(4):2415 12. Namana MS (2022) Internet of Things for smart agriculture-state of the art and challenges. Ecol Eng Environ Technol 23(6):147–160 13. Nasirahmadi A (2022) Toward the next generation of digitalization in agriculture based on digital twin paradigm. Sensors 22(2):498 14. Quy VK (2022) IoT-enabled smart agriculture: architecture, applications, and challenges. Appl Sci 12(7):3396 15. Rehman AS (2022) A revisit of internet of things technologies for monitoring and control strategies in smart agriculture. Agronomy 12(1):127 16. Sabirin NH (2022) Information technology (IT) in agriculture sector: issues and challenges. Soc Manag Res J 19(2):111–137 17. Shaikh TA (2022) Machine learning for smart agriculture and precision farming: towards making the fields talk. Arch Comput Methods Eng, 1–41 18. Shaikh TA (2022) Towards leveraging the role of machine learning and artificial intelligence in precision agriculture and smart farming. Comput Electron Agric 198:107119 19. Sharma VT (2022) Technological revolutions in smart farming: current trends, challenges & future directions. Comput Electron Agric 201:107217 20. Sinha BB (2022) Recent advancements and challenges of Internet of Things in smart agriculture: a survey. Futur Gener Comput Syst 126:169–184 21. Vangala AD (2022) Security in IoT-enabled smart agriculture: architecture, security solutions and challenges. Cluster Comput, 1–24

Optimized Reversible Arithmetic and Logic Unit Saroja S. Bhusare , Veeramma Yatnalli , E. Shreyas, Shreeram Aithal, Gayana A. Jain, and O. Sreekaar

Abstract Reversible logic is emerging as a prominent and most efficient approach in recent years. These techniques have large number applications in low power VLSI, DNA computing, digital image processing, cryptography, quantum computing, and in optical information processing. A reversible-based ALU is presented in this paper. The control block and the adder block are the two main components of ALU, and they are derived by using the basic reversible gates. The proposed ALU has been designed and implemented at CMOS level using the Cadence tool. The proposed reversible control unit has two R-I, two COG, one Feynman, and a reversible AND gate which are used to derive the controlled outputs. Comparative results show improved performance than other available techniques related to performance parameters like quantum cost, number of reversible gates, constant inputs, and their garbage outputs. Keywords Arithmetic and logic unit (ALU) · Reversible logic gates · Quantum cost · Low power

1 Introduction Technological advancements in very large-scale integration provide a massive amount of computing. As Gordon Moore stated, the number of active components like transistors and FETs that have been integrated on a single silicon chip roughly doubles in every 18 months. Since many transistors are integrated on a small silicon chip, the concern for power dissipation as heat increases. As power dissipation rises S. S. Bhusare · V. Yatnalli (B) · E. Shreyas · S. Aithal · G. A. Jain · O. Sreekaar ECE Department, JSS Academy of Technical Education, Bangalore, Karnataka, India e-mail: [email protected] S. S. Bhusare e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_50

641

642

S. S. Bhusare et al.

in the system, information loss might occur, which is connected to the Landauer kTln(2) equation, where k and T refer to Boltzman’s constant, temperate retrospectively. It is also demonstrated that minimal power dissipation will be obtained when the system is reversible because the reversible logic circuits can retrieve input vectors from output vectors in a unique way, thus no information is lost [1]. In the 1970s, the first issues regarding computation’s reversibility were raised. Logical and physical reversibility were two intertwined challenges. The ability to reproduce the input from the output is referred to as logical reversibility. For example, AND gate is an irreversible gate with two inputs mapped to the single output, and NOT is a reversible gate with a single input mapped to a single output. AND has one output, because one of the inputs is erased and its information is lost. The changed entropy due to loss of this information is ln 2, which is related to the Landauer equation (kTln(2). It made clear that the reversible logic circuits will not dissipate the power as the result they have minimum signal or information loss. A reversible logic gate is a 1 × 1 mapping with n-input and n-outputs, and the outputs can be determined from inputs. The inputs can be derived uniquely back from the output. The reversible synthesis is not achievable for one-to-many or many-to-one mapped circuit. The reversible logic circuit synthesis is difficult when compared to irreversible logic circuit synthesis. Feedback and fan out are not permissible in this logic circuit. Garbage outputs are those that are not employed in the circuit’s subsequent calculations. Inputs which are maintained constant to implement the given function are constant inputs. Electronic circuits whose application is for fetching, decoding, executing, and writing back the operations of an instruction, like done in processors, ALU serves an essential role in accomplishing these functions. Arithmetic logic circuits are to be constructed with compact size, low power, and low propagation delay in the period of rising technology and scaling of devices up to the nanoscale regime. As the arithmetic and logic unit is one of the most important components of a processor’s core, it must be efficient and should dissipate as little power as possible. In this article, a unique way of designing the ALU has been presented by using reversible gate logic.

2 Previous Work The authors have proposed reversible gates for a one-bit ALU [2]. Three NOT gates, two C-NOT gates, two Fredkin gates, two Toffoli gates, and two Peres gates were used in the design of the ALU. This design can perform twelve different arithmetic and logic functions. The suggested design was implemented using Verilog and Quartus II Simulator. Authors have proposed two different types of reversible ALU designs [3]. The controller block consists of three Feynman gates, three R-I gates, and one Fredkin gate. In addition to this, the two adder blocks are designed using the PFAG gate and the HNG gate. The performance parameter of the HNG-based ALU was higher than that of the PFAG Gate. The Quartus II Simulator has been used to implement this proposal. Authors have introduced an 8- bit ALU using the COG as the controller and the HNG as the adder unit [4]. The propagation delay was less than 5.52 ns,

Optimized Reversible Arithmetic and Logic Unit

643

which has improved the performance by 33.41%. The suggested design was created with the help of Cadence software. Authors have proposed a multi-logic function generator circuit that uses COG gates to design eight and sixteen functionalities, respectively [1]. ALU with eight functions requires seven COG gates, whereas ALU with sixteen functionalities uses 15 COG gates. Different logical functions were generated using the proposed function generators. Authors have proposed the term IG, for a novel reversible gate [5]. This gate was fault tolerant and preserves parity. Two IG gates are used for implementing one-bit fault tolerant full adder. In addition, a four-bit ripple carry adder was also implemented. Authors have proposed the design of two programmable reversible logic gate topologies for the implementation of ALU, viz., MRG and PAOG and realized an efficient reversible ALU which surpassed the performance of HNG [6]. The ModelSIM SE 6.3 Advanced Programming and Debugging application is used to verify the proposed ALU design in Verilog. Authors have proposed a reversible ALU with a multiplexer unit and control signals (ALU) [7]. Using DPG, YAG, Fredkin, and Peres gates, the gate count could be lowered, resulting in reduced number of constant inputs and garbage outputs. The implemented ALU is analyzed in QCA with the necessary specifications [8]. The paper [9] presents one-bit ALU using reversible gates and has been implemented using Xilinx ISE Suite. Improving the efficiency of constituent parts of ALU is carried out in paper [10]. The proposed architecture has quantum cost (35%), garbage output (37%), and Ancilla input (64%) results in arithmetic and logical units compared to other implementations [11]. The article [12] describes reversible ALU based on QCA technique and unique BS1 block. The paper is structured as follows: The explanation about the various reversible gates is included in Sect. 3. The design and implementation of 1-bit ALU is explained in Sect. 4. Results are included in Sect. 5.

3 Background The quantum computing theory can be applied to realize quantum gates or the reversible logic gates. Qubits are minuscule quantum data units that are used to represent quantum bits in 2D vector form. A 2 * 2 unary matrix is used to represent a single qubit quantum gate. They are used in quantum circuits, logic and digital circuits. For example, a single qubit is used in representing the Hadamard gate and Hadamard matrix H and is given by 1 H = √ [111 − 1] 2 Since H is a unary matrix, rows are orthogonal. With current quantum technology, quantum gates larger than 2 * 2 cannot be realized directly. The number of quantum gates (1 * 1 or 2 * 2) required to generate the gate is known as the quantum cost. Consider a function F. It is reversible, if there is a one-to-one relation between inputs

644

S. S. Bhusare et al.

and outputs. Following is a representation of a n * n reversible logic gate: Input vectors = I v = (I 1, I 2, I 3 . . . , I n) Output vectors = Ov = (O1, O2, O3 . . . , On)

3.1 Reversible Logic Gates Feynman, Peres, Fredkin, COG, and R-I reversible gates are employed in our proposed ALU design. These reversible gates are described below, along with their quantum cost numbers.

3.1.1

Feynman Gate

Feynman reversible gate as shown in Fig. 1 is commonly known as C-Not gate as shown in Fig. 2. It consists of 2 inputs and 2 outputs. The outputs P and Q are mapped to A and B inputs. Output Q is equal to NOT of B if A is 1 else it is B. The gate’s two outputs equal A when B input is set to 0. Feynman gate is also used to replay a signal since fan out is not allowed in reversible logic circuits. Quantum cost is one. The quantum form of the Feynman gate, which consists of one Ex-OR and a buffer, is illustrated below. Truth table of Feynman gate is as shown in Table 1.

Fig.1 Feynman gate and its quantum representation

Fig. 2 Matrix representation of C-NOT

Optimized Reversible Arithmetic and Logic Unit Table 1 C-NOT-truth table

645

Input

Output

A

B

P

Q

0

0

0

0

1

0

1

1

0

1

0

1

1

1

1

0

Fig. 3 Peres reversible gate

Fig. 4 Matrix representation

3.1.2

Peres Gate

Peres reversible gate as shown in Fig. 3 (matrix representation in Fig. 4) has 3 inputs named as A, B and C, and are mapped to the 3 outputs as P, Q, and R. In addition, Peres gate is the universal gate and its truth table is as shown in Table 2. This gate has one Feynman, 2 V+, and one 1 V gate, and cost is equal to four.

3.1.3

COG Gate

COG gate as shown in Fig. 5 is also known as the controlled operation. The matrix representation of the COG is as shown in Fig. 6. This gate is a reversible gate where its inputs viz., A, B, and C are mapped to the outputs P, Q and R in fan-out fashion. The quantum number of the COG is 4. Truth table is as shown in Table 3.

646

S. S. Bhusare et al.

Table 2 Truth table of Peres gate Input

Output

A

B

C

P

Q

R

0

0

0

0

0

0

0

0

1

0

0

1

0

1

0

0

1

0

0

1

1

0

1

1

1

0

0

1

1

0

1

0

1

1

1

1

1

1

0

1

0

1

1

1

1

1

0

0

Fig. 5 COG gate

Fig. 6 Matrix representation of COG.R-I gate

Table 3 Truth table of COG gate Input

Output

A

B

C

P

Q

R

0

0

0

0

0

1

0

1

0

0

0

0

1

0

0

1

0

0

1

1

0

1

1

1

0

0

1

0

1

1

0

1

1

0

0

1

1

0

1

0

0

1

1

1

1

1

1

Optimized Reversible Arithmetic and Logic Unit

647

Fig. 7 R-I gate

The R-I reversible gate as shown in Fig. 7 (its matrix representation in Fig. 8) is reversible of 3 × 3 with three inputs named as C, B, and which are uniquely mapped to R, Q and P outputs. This gate may be used to duplicate as well as to reverse a signal, and the cost of the gate is 4. Truth table of R-I gate is as shown in Table 4. Fig. 8 Matrix representation of R-I

Table 4 Truth table of R-I gat Input

Output

A

B

C

P

Q

R

0

0

0

0

0

0

0

0

1

0

0

1

0

1

0

1

0

0

0

1

1

1

1

1

1

0

0

0

1

0

1

0

1

0

1

1

1

1

0

1

0

1

1

1

1

1

1

0

648

S. S. Bhusare et al.

4 Design and Implementation With the growing advancements in the technology, scaling the area, optimizing the power consumption and reduction of delay are some of the key factors to be concerned in designing of any integrated circuits. In general, the ALUs are designed using various approaches which includes multiplexers, full adders, multipliers with logic gates in turn increasing the size and complexity of the overall system. To overcome this, a reversible logic-based ALU has been designed which replaces the bulkiness of the ALU by a reversible controlling circuit and a reversible adder. The proposed ALU is implemented which produces the same functionality as the conventional ALU’s as in Fig. 9. The proposed one-bit ALU provides 8-arithmetic and 4-logical operations. The truth table of ALU is as shown in Table 5.

4.1 Control Unit of ALU Control units are the most challenging to design for any digital circuit, and they also have the most stringent constraints. The proposed reversible control unit as shown in Fig. 10 has two R-I, two COG, one Feynman and a reversible AND gate which are used to derive the controlled outputs X, Y, Z. These outputs are fed to the reversible adder as controlled inputs. The output Boolean expressions for the control unit are given in the following equations. X = A + B S2 (S0 S1 ) Y = S1 B + S0 B

Fig. 9 Proposed 1-bit ALU

Optimized Reversible Arithmetic and Logic Unit

649

Table 5 Functionality table for proposed ALU Function

Output F

S2

S1

S0

Cin

Decrement A

F = A-1

0

1

1

0

Transfer A

F=A

0

1

1

1

OR logic

F = A|B

1

0

0

x

XOR logic

F = A^B

1

0

0

x

XNOR logic

F = ~ (A^B)

1

0

1

x

NOR logic

F = ~ (A|B)

1

0

1

x

Transfer A

F=A

0

0

0

0

Increment A

F=A+1

0

0

0

1

Addition

F=A+B

0

0

1

0

Add with carry

F=A+B+1

0

0

1

1

Subtract with borrow

F = A + B

0

1

0

0

Subtraction

F = A + B + 1

0

1

0

1

Z = S1 BC Figure 11 represents the control unit implementation using Cadence. Figures 12 and 13 represent the full adder unit and its implementation using Cadence, respectively.

Fig. 10 Control unit of ALU

650

S. S. Bhusare et al.

Fig. 11 Control unit implementation using Cadence

Fig. 12 Full adder unit

Fig. 13 Full adder implementation using Cadence

5 Simulation Results Figure 14 represents the design of proposed ALU in cadence. It has three constant pins, four selector lines, and nine unused output pins (garbage outputs). Figure 15 depicts the output waveform of the proposed control unit.

Optimized Reversible Arithmetic and Logic Unit

651

Fig. 14 Design of proposed ALU in Cadence

Fig. 15 Simulation results of control unit

Figure 16 shows the simulation results of the proposed full adder. Figure 17 depicts the output waveform of the proposed one-bit ALU. The output simulation satisfies the functionality for various input combinations. Figure 18 shows the statistics compares the performance characteristics of the current and proposed arithmetic logic units. The performance metrics to be

Fig. 16 Simulation results of full adder unit

652

S. S. Bhusare et al.

Fig. 17 Output waveform of 1-bit ALU block

QC GO GC CP

35 30 25 20 15

Fig. 18 Comparison with three different designs of one-bit ALU

Table 6 Comparison of various techniques with the proposed work Techniques

Quantum cost

Garbage output

Gate count

Constant pin

[2]

29

8

10

4

[3]

28

10

9

5

28

10

8

5

[4]

26

9

6

5

Proposed work

22

9

8

3

compared are quantum cost, garbage output, gate count, and propagation latency. The performance parameters comparison of various techniques is as shown in Table 6.

6 Conclusion A unique 1-bit ALU is presented in this paper. It utilizes reversible logic which is implemented at the CMOS level. Results demonstrate how the implemented reversible ALU functions work, while Table 6 compares three possible one-bit ALU designs. The proposed design outperforms the other two techniques with respect to

Optimized Reversible Arithmetic and Logic Unit

653

the performance parameters like, garbage outputs, hardware complexity, constant inputs, number of gates, and overall quantum cost. The design is built and verified using Cadence tool. This approach might be utilized to generate elaborate designs in nanotechnology. There may be numerous ways to reduce garbage outputs as well as constant inputs in the future. The technique like GDI can be used to reduce the number of MOSFET’s used.

References 1. Mamataj S, Das B (2014) An approach to design a controlled multi-logic function generator by using COG reversible logic gates 3(3):56 2. Dixit A, Kapse V (2012) Arithmetic & logic unit (ALU) design using reversible control unit. Int J Eng Innov Technol 1:55–60 3. Gopal L, Mahayadin NSM, Chowdhury AK, Gopalai AA, Singh AK (2014) Design and synthesis of reversible arithmetic and logic unit (ALU). In: 2014 International conference on computer, communications, and control technology (I4CT). IEEE, pp 289–293 4. Deeptha A, Muthanna D, Dhrithi M, Pratiksha M, Kariyappa BS (2016) Design and optimization of 8 bit ALU using reversible logic. In: 2016 IEEE International conference on recent trends in electronics, information & communication technology (RTEICT). IEEE, pp 1632–1636 5. Islam Saiful Md, Rahman MM, Begum Z, Zulfiquar Hafiz M, Al Mahmud A (2009) Synthesis of fault tolerant reversible logic circuits. In: 2009 IEEE circuits and systems international conference on testing and diagnosis, pp 1–4 6. Morrison M, Ranganathan N (2011) Design of a reversible ALU based on novel programmable reversible logic gate structures. In: 2011 IEEE computer society annual symposium on VLSI. IEEE, pp 126–131 7. Syamala Y, Tilak AVN (2011) Reversible arithmetic logic unit. In: 2011 3rd International conference on electronics computer technology, vol 5. IEEE, pp 207–211 8. Kamaraj A, Marichamy P (2017) Design and implementation of arithmetic and logic unit (ALU) using novel reversible gates in quantum cellular automata. In: 2017 4th International conference on advanced computing and communication systems (ICACCS). IEEE, pp 1–8 9. Khatter P, Pandey N, Gupta K (2018) An arithmetic and logical unit using reversible gates. In: 2018 International conference on computing, power and communication technologies (GUCON). IEEE, pp 476–480 10. Duggi N, Rajula S (2021) Implementation of low area ALU using reversible logic formulations. In: Intelligent manufacturing and energy sustainability. Springer, Singapore, pp 455–465 11. Pandey P, Kumari K, Malvika, Prathima A, Mummaneni K (2022) Optimized design of ALU using reversible gates. In: Das KN, Das D, Ray AK, Suganthan PN (eds) Proceedings of the international conference on computational intelligence and sustainable technologies. Algorithms for Intelligent Systems. Springer, Singapore 12. Safaiezadeh B, Mahdipour E, Haghparast M, Sayedsalehi S, Hosseinzadeh M (2022) Novel design and simulation of reversible ALU in quantum dot cellular automata. J Supercomput 78(1):868–882

A Review on Diagnosis of Breast Cancer Using Mammography Techniques Bahareh Nazar Hosseini Saber and Reyhaneh Nazar Hosseini Saber

Abstract Many men and women today have breast cancer, which is caused by abnormalities such as lumps and calcification. There are methods such as MRI, biopsy, X-ray, and mammography. Our main goal in this article is to introduce different mammography techniques, such as key techniques for CAD systems and their various methods such as 1-image improvement, 2-asymmetry detection and ROI etc., and as well as pre-processing operations such as 1-removing labels and annotations, 2-K-means classification and noise removal, etc. And another technique is tumor segmentation in which the final image of the tumor is extracted and also another technique of feature extraction which discusses the desired texture in terms of mean and entropy and so on. And at the end of the classification, which shows that the information obtained from the tumor is a few percent true positive and a few percent false positive. Keywords Mammography · Breast cancer · GA · PSO · CAD · SVM · CC · DDSM · BI-RADS

1 Introduction A patient’s chances of survival increase with early detection. The high mortality rate is primarily caused by cancer. Lung cancer death rates are highest in men, while women are the most likely to die from breast cancer [1]. Several techniques [2] can be used to diagnose breast cancer, including magnetic resonance imaging (MRI), biopsies, and X-ray mammography. The American Cancer Society recommends using mammography as a necessary screening procedure for the early detection of breast B. N. H. Saber (B) Department of Biomedical Engineering, Islamic Azad University, Tehran Medical Branch, Tehran, Iran e-mail: [email protected] R. N. H. Saber Department of Biomedical Engineering, Islamic Azad University, Central Tehran Branch, Tehran, Iran © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_51

655

656

B. N. H. Saber and R. N. H. Saber

cancer. Breast cancer mortality rates are reduced by more than 40% in women who undergo mammography [3] due to the accuracy of the screening, which ranges from 85 to 90%. There is a range of sensitivity between 68 and 78% for mammography detecting cancer. As a result, X-ray mammography is considered one of the safest treatments for the early detection of breast cancer [4]. Surgical biopsy is more sensitive, but it is uncomfortable and expensive; therefore, one of the safest methods for identifying breast cancer early is X-ray mammography. Women who do not exhibit any symptoms or who have no new breast abnormalities can receive mammography as a means of detecting breast changes [5]. Each woman should receive a mammogram at least once a year by the age of 40 [2]. By squeezing the breasts between two solid surfaces, breast tissue is dispersed for images obtained during mammography. Identifying cancer before clinical signs appear is the objective [5]. Masses or tumors are detected by white-area mammography. It is possible to develop both malignant and benign tumors. There is no danger to the patient’s health if the tumor is benign, and it is unlikely to grow or change over time. In general, benign breast tumors are the most common type. There is usually no danger associated with white particles. If there is white tissue formation, high density, shape, or pattern, which could indicate cancer, the radiologist will examine it to determine whether it is cancerous. As well as providing information about fat, adipose glands, and dense glands, mammography can also provide information about dorsal tissue characteristics [5]. Radiologists have difficulty evaluating the vast number of mammograms collected in extended mammography screenings. The evaluation may be invalidated by falsely interpreting benign cells as malignant (false positive) or missing cancerous tissues (false negative). Consequently, conventional breast cancer screenings fail to detect 10–30% of cancers. Radiologists can now evaluate mammographic images more effectively with computer-aided diagnosis (CAD) technologies. With the CAD system, the suspicious regions can be identified, the tissue can be assessed for normality or abnormality, and the cancerous or benign tissues can be differentiated. This may enhance the ability to detect abnormal tissue conditions (such as microcalcifications and masses) more accurately [4–6].

2 Breast Cancer Diagnosis Using Mammography For breast exams, a variety of imaging methods are available, including X-rays, magnetic resonance imaging, and ultrasound. Mammography, which images the breast with a low-dose X-ray machine, is now the most reliable way to find breast cancer before it manifests clinically [7]. In the present day, mammography is available in two different forms: film mammography and digital mammography. When a breast is imaged by film mammography, the image is captured immediately on the film, when a breast is imaged by digital mammography, a digital image is acquired and preserved [8]. The use of digital media in mammography could have potential advantages over the use of film. Film-on-screen mammography has some limitations as

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

657

compared to digital mammography, including the following: (1) range of X-ray irradiation, (2) contrast can only be adjusted after the image has been acquired, (3) film must be a detector, display, and storage medium, and (4) film preparation is labor-intensive and can distort the image. In response to some of these limitations, researchers have developed more advanced devices for digital mammography. Mammography performed using digital technology is overcoming the shortcomings of the previously mentioned film and will continue to do so, and this will lead to the following potential benefits [9, 10]: (1) an enhanced dynamic range and reduced noise; (2) enhancing contrast perception; (3) improving image quality, and (4) reducing exposure to Xrays, the procedure will improve image quality. The diagnostic accuracy of digital mammography is the same as that of conventional film mammography when used for breast cancer screening. Compared to traditional mammography, digital mammography detects a more significant number of malignancies, particularly microcalcified lesions (MCs) [10]. Mammography examinations can be classified as either screening or diagnostic. Mammography screening is used to detect breast cancer in individuals who do not have any symptoms of the disease [11]. As part of a mammography screening, two angles are taken from each breast, namely the cranial caudal (CC) and mediolateral oblique (MLO). The purpose of mammography is to study diseases that have already manifested abnormal clinical symptoms, including breast masses. It is possible to view each breast differently when analyzed by diagnostic mammography. In response to an abnormal screening mammogram, diagnostic mammography is commonly performed to determine whether more breast screening or a biopsy is necessary. Are there any females who have been diagnosed with breast cancer [10]? It has been demonstrated that mammography examinations, particularly screening mammograms, are capable of improving cancer detection and reducing mortality and morbidity [7]. The resolution of mammography is frequently inadequate, which poses several problems. Radiologists are, therefore, unable to assess the data effectively due to this. Several studies have indicated that mammography can generate a high number of false positives and false negatives, necessitating further clinical evaluation of breast biopsies for many women who do not have malignancies. Mammography’s specificity, selectivity, and sensitivity have been improved using various solutions [10] to minimize unnecessary biopsies during cancer treatment.

3 Mammogram Database There are several public mammography databases available, including the digital database for screening mammography (DDSM). There are 10,480 LJPEG image entries in the DDSM databases, making them the most extensive public database. In LJPEG format, each image is assigned a favorite area, a nonstandard variant of the free DDSM format [5]. A simplified version of MIAS has been produced by the UK National Breast Screening Program, which was painstakingly selected for inclusion in Mini MIAS. The images were scanned using a microdensitometer with an optical density response range of 0.0–3.2, with 8 bits recorded for each pixel.

658

B. N. H. Saber and R. N. H. Saber

Fig. 1 Sample of dense tissue from the Mini MIAS database [3]

Compared to the scanned MIAS database, the Mini MIAS database has a spatial resolution of 200 µm rather than 50 µm. An image of a mammogram is segmented into 1024 by 1024 pixels in the MIAS database. There are tags and comments in the Mini MIAS mammogram database. Information regarding tissue density, abnormality classification, and abnormality intensity is contained in the fundamental reality. The center of each anomaly is identified by a pixel value, and the radius of its surrounding circle (in pixels) is estimated. Mammography databases can be categorized based on tissue density: fat, sebaceous glands, and dense glands. Each category is illustrated in Fig. 1 [3].

4 Computer-Assisted Breast Cancer Diagnosis The use of computerized image analysis by a radiologist as part of computer-aided diagnosis (CAD) is a combination of diagnostic imaging, image analysis, artificial intelligence methods, and pattern recognition. CAD systems utilizing mammography technologies fall into two categories: a classic mammography system with a video screen and a digital mammography system [10].

5 Critical Techniques for CAD Systems (A) Image analysis techniques for locating MC clusters On mammograms, calcium deposits appear as bright spots. Breast cancer clusters may be a helpful indicator of the disease. Breast cancer is detected in 30–50% of cases by mammography screening [12]. It is generally agreed that diagnostic methods for MC can be classified into four categories: (a) methods for enhancing images, (b)

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

659

stochastic modeling methods, (c) multi-scale analysis strategies, and (d) machine learning methods. (1) The principal image enhancement techniques A collection of image enhancement algorithms was developed based on the fact that MCs are brighter than their surroundings. MCs are identified from their surroundings using a threshold to increase contrast through image enhancement techniques. The filtration method is one of the methods used to enhance pictures [13]. Using image difference and morphological degradation, this method reduces false positives. Another is created as a countermeasure. A transitional band filter sensitive to noise can be viewed as the image approach to the difference. False positives were reduced by using morphological operators during post-processing. (2) Stochastic modeling techniques MCs and their environments are statistically disparate in stochastic modeling methodologies [14]. For example, the regions without MCs would have a distribution similar to a Gaussian and regions with MCs would be non-Gaussian. Curvature and elongation are non-zero in Gaussian forms. This method, however, does not explain Gaussian statistics due to spatially diverse background faults (non-MC). According to [15, 16], MRF1 makes use of generalized Gaussian nuclei (rotating glass) energy functions. Compared with other statistical techniques for image segmentation, MRF models provide a better description of the spatial intensity distribution of an image [7, 10–15]. (3) The multi-scale analysis techniques The objective of the multi-scale analysis is to capitalize on the frequency differences found within and outside MC locations. For MC recognition, wavelet transforms have been extensively studied. A biorthogonal wavelet transformation was used, in which the MCs were represented as circular Gaussian forms that varied in width at different scales. Continuous wavelet transformations provide constant translations. MC diagnosis is better performed with higher order wavelets than with Daubechies wavelets. (4) In machine learning techniques The objective is to find associations between data sets. In MC diagnostics, the issue is frequently viewed as a binary classification problem, where a particular pixel point is examined for MCs. Machine learning (SVMs) has been advanced by the support vector machine (SVM) family of learning algorithms. [10]. An SVM uses an optimal super-plane classifier (which maximizes the separation margin between two classes) to map a nonlinear core to a higher-dimensional domain. MC can be accurately identified by SVMs with exceptional accuracy [17, 18]. A screening mammography image is applied to an SVM classifier in Fig. 2, which emphasizes the MCs in the image [10]. 1

Markov Random Field.

660

B. N. H. Saber and R. N. H. Saber

Fig. 2 ROI of mammography, in which MCs are identified by circles [16]

(B) Region of interest (ROI) By dividing ROIs into different regions and extracting attributes from each region, the area-based technique uses those attributes to categorize areas [10]. A circular ROI is the result. The item must be recovered within the required rectangular area, accomplished by (Eq. 1) [1]. The ROI rectangle containing the radiologist-provided data is represented by a circle in green and a square in red in Fig. 3 [1] I = [x − r, 1000 − y, 2r, 2r ]

(1)

(C) Mammogram detection of architectural deformation Breast Imaging-Reporting and Data System (BI-RADS) describes architectural distortion as follows: “Natural architecture (breast) is altered without a particular bulk.” Several hypotheses suggest a single site with localized contractions or distortions at the periphery of the parenchyma [7, 10–19]. Despite being the third most Fig. 3 Marking the area of interest [1]

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

661

common mammographic indicator of concealed breast cancer, architectural distortion is usually overlooked during screening due to the variety of its appearance [10]. As a result of architectural distortions, 12–45% of breast malignancies are missed or misinterpreted in mammography screenings [20, 21]. (D) Mammography diagnosis of bilateral asymmetry To diagnose breast cancer, radiologists use asymmetry between left and right mammograms [22]. According to BI-RADS, asymmetry occurs when one breast has a greater volume or density of tissue without distinctive masses or ducts than the other. Breast cancer can be detected early by analyzing asymmetry for signs such as congestion, parenchymal distortion, and tiny, asymmetrically packed regions. A set of mammograms with graphs of the resulting rose for a person with asymmetry are shown in Fig. 4. The diagnosis of bilateral asymmetry was achieved with 82.6% sensitivity and 86.4% specificity based on 88 mammograms [10]. (E) Enhancing the diagnostic imaging for breast cancer As mammographic images often lack contrast and detail visibility, image enhancement methods have been proposed to improve clarity and readability. Image enhancement is the process of improving the quality of an image such that it is superior to the original idea for a specific application or set of purposes. Direct contrast augmentation and indirect contrast augmentation are both types of amplification. An initial contrast criterion is set, then immediate contrast modification is used to enhance the contrast [10].

6 Phases of Cancer Diagnosis (1) Image capture These photos were taken from a database of mammographic images that had been categorized as benign or malignant based on the presence of cancerous and noncancerous abnormalities. To determine whether a class is normal or aberrant, we must distinguish between them [16]. (2) Pre-processing As a result of pre-processing of the scanned image [3], mammography generated from the MIAS database [10] may contain noise. Noise is caused by the grain of the film itself [3]. Several obstacles must be overcome to extract features from mammographic images, including noise, resolution variation, image quality, and contrast. To provide an accurate picture of breast anatomy, mammography must be processed to give a precise picture of the tumor location [23]. Pre-processing is primarily intended to reduce noise [3]. An image that has been pre-processed normally consists of two

662

B. N. H. Saber and R. N. H. Saber

Fig. 4 a, b Figures mdb111 and mdb112 show a case with bilateral asymmetry. The linear estimation of the pectoralis muscle edge, represented by the Hough transform and the line perpendicular to that used in the alignment method, is shown, c, d mammographic rose diagrams are obtained after alignment [10]

steps: the first is digital image processing for mammography, which reduces noise and enhances contrast. Secondly, picture segmentation is used to identify areas likely to be detected by a tumor biopsy. The pre-processing for this project involves isolating the target region and removing picture labels, liminating noise, and enhancing contrast [16] Eliminate all tags and annotations There was some labeling and commentary included in the digital mammography that was created. Labels and annotations have a similar intensity level to the breast [24]. An example of a label and annotation is shown in Fig. 5 to the left. It is necessary to convert a picture to binary format to eliminate tags and annotations. As shown in Fig. 5a, there is a considerable contrast between the breast image and the backdrop. It is, therefore, possible to distinguish the chest from the

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

663

Fig. 5 a Example of labeling and annotation in scanned film mammography, b complete breast mask, and c output of labeling and annotation [3]

backdrop using the thresholding approach. It is determined that 0.05 is the optimal threshold value based on the variation of threshold values. As a result of this phase, binary image objects are generated, one representing the entire chest, and the other representing its label and annotation. There may be a connection between the two items. The second step entails separating the things by using morphological erosion in conjunction with a disk structural element with a radius of three pixels. Once all the items have been separated, we use morphological techniques to retain only the largest object in the image, the chest area. It has been decided to delete labels and annotations at this level. The tissue beneath the chest area of the image is also removed at this stage. We then use morphological dilation to restore and smooth the boundary of the chest region after we have obtained only the chest region. It is the whole breast mask that is output from this stage. Lastly, the breast region mask must match the original mammography image. Figure 5 illustrates the unlabeled and unannotated chest region, including the pectoralis muscle, which has been obtained as a result of step (c). As can be seen in Fig. 6, there is an overall trend. The k-means classification A mammogram shows the pectoralis muscle at the top of the picture as a uniform region of high intensity, from which the remaining breast tissue hangs. In this analysis, muscle is excluded since it is not considered breast tissue. Depending on the density of the breast tissue, such as sebaceous glands, fat, or dense glands, the contrast between the pectoralis muscle and the breast tissue interface can vary greatly. To provide reliable classification between pectoralis muscle and pectoral tissue, regardless of contrast, we used the six-class k-means classification with the city block as the metric criteria. The pectoralis muscle, the pectoral tissue, and the backdrop were identified based on the mean intensity of each of the six groups. The six hues we use to represent the average intensity decrease classes are yellow, honey, green, alder, blue, and eggplant. As shown in Fig. 7, in fatty breasts with sebaceous glands, the two classes with the highest probability correspond to the pectoralis major muscle. In contrast, the class with the lowest intensity corresponds to the backdrop, and the remaining categories correspond to the remainder of the breast. Figure 7e, f illustrates that this statement does not apply to certain types of thick breasts.

664 Fig. 6 Label removal and annotation process [3]

B. N. H. Saber and R. N. H. Saber

Thresholding

Erosion

Breast object selection

Brest object boundary smoothing

Remove of object outside TBM(Total breast mask)

Breast mask creation The pectoralis muscle region is identified in the k-means picture. As a result, the next step involves removing the pectoralis muscle and creating a breast mask that matches only the breast tissue. The pixels from the two classes with the highest values (yellow and honey) are selected as primary breast tissue masks, and those from the third and fourth classes (green and alder) are chosen as secondary breast tissue masks. To refine the edge of the significant breast mask, we will first use morphological operations. Using a 3-pixel-radius disk as a structural element, the primary breast tissue was subjected to morphological ablation. The holes were then filled. In order to achieve a smooth edge on the mask, the convex hull is then utilized. It is necessary to reduce the convex hull as much as possible to achieve a smooth line in the chest mask. The chest mask area is then calculated by subtracting this breast mask from the overall breast mask. All phases of this procedure are illustrated in Fig. 8 [3]. As shown in Fig. 9, this phase has resulted in the following outcome. To acquire the breast region, the final mask must be matched with the original mammography. A depiction of the categorization of breast k-means is shown in Fig. 7a, a representation of the final breast mask in Fig. 9b, and a picture of the final segmentation of breast tissue, excluding the pectoral muscle, is presented in Fig. 9c. Noise removal An initial shift of the left is applied to all mammographic images. As a result, the intensity of the first and last half-columns is compared (from left to right). Since the picture is already positioned to the left, nothing changes if the sum of the columns

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

665

Fig. 7 Input and output image of k-means classification of the fat group (a, b), class of fatty glands (c, d), group of dense glands (e, f) [3]

666

B. N. H. Saber and R. N. H. Saber

Fig. 8 Breast mask production process [3]

Fig. 9 Output of breast mask production process a image classified with k-means, b final breast mask, and c final breast segmentation [3]

in the first half exceeds the sum in the second half. Reflection otherwise alters the orientation of the image from right to left. Using a 2D filter, noise sources are eliminated so that each pixel in the output picture is surrounded by a median of 5 by 5, while the borders of the image are replaced with zeros. In contrast to linear filters and other filters, the middle filter minimizes noise sources while maintaining the texture of mammographic images. Figure 10a–c is mammographic images filtered to depict masses with restricted, speculative, or unknown characteristics, respectively. This particular mammogram was obtained from MIAS, a database of mammography image analysis provided by the Society for Mammography. A seed development method is used to extract the pectoralis muscle after noise, and artifacts have been removed. Mammograms of varying densities may be analyzed using this accurate method, which is fully automated. The pectoral muscle can be seen in the upper left (for left photos) or upper half of the picture (for right images). As a result, every image was aligned to the left before the seed growth method was used. The tumor region is then identified using the seed growth method [4].

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

667

Fig. 10 Mammographic images of MIAS 132 (enclosed mass type), MIAS 134 (guessed mass type), and MIAS 202 (unspecified mass type), respectively, a–f after eliminate noise and stickers and smooth the chest muscles, g–i after enhancement and segmentation, and j–l after ROI extraction [4]

Improve image contrast It has been observed that most mammograms are either dim or too bright. Histogram components are located at the lowest light intensity levels in the dark version [16]. Similarly, the components of the histogram are concentrated in the top portion (brightest) of the light intensity scale. While a high-contrast image has a broader histogram, a low-contrast image has a comparatively thin histogram [25]. In a picture, a histogram shows how many pixels are present at each brightness level.

668

B. N. H. Saber and R. N. H. Saber

By smoothing the histogram, the contrast of the input image is maximized. Mammographic images were enhanced using the histogram equation in this study. A mammogram without a histogram is illustrated in Fig. 11. An example of a histogram can be seen in Fig. 12. Histogram width indicates that smoothing of the histogram was effectively based on the decompressed width [16]. (3) Classification Finding and extracting malignant tumors requires a thorough segmentation technique. Using regional segmentation, one can separate masses from their surroundings. The clustering-based image thresholding is automatically carried out using Otsu’s technique. The approach determines the best threshold dividing the two classes so that their intra-class variance is as low as possible under the assumption that the image has two pixel classes (foreground pixels and background pixels) that follow a bimodal histogram. The picture is multiplied by the original image as part of the

Fig. 11 a Main image and b histogram of the original image [16]

Fig. 12 a ROI image histogram and b ROI image histogram uniformity [16]

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

669

normalization procedure after segmentation (binary mask) [23]. The process of these steps is shown in Fig. 13. (4) Feature extraction The gray-level co-occurrence matrix (GLCM) is a statistical method for extracting texture attributes. The following terms describe these properties: contrast, correlation, energy, homogeneity, mean, standard deviation, entropy, variance, smoothness, elasticity, and skewness. Contrary to this, form characteristics such as area, strength, eccentricity, circumference, and main axis length are retrieved from the model [23]. A digital mammogram may be the most effective method for diagnosing breast cancer [16]. In addition to identifying tumor density, tissue characteristics can provide spatial information about the tumor. Cancer and microcalcifications can be analyzed using tissue, one of its most essential characteristics. Several characteristics are extracted from the data, including compaction, entropy, mean, and smoothness [26].

Fig. 13 Patterns are retrieved in different stages. a original mammography, b advanced image, and c fragmented image [23]

670

B. N. H. Saber and R. N. H. Saber

7 Geometric Properties Compression To calculate the compaction of an area, it is necessary to take into consider the minimum value of one for a circle and assess its roundness based on location and circumference. (Except those defined by digitally oriented errors) compression is a dimensionless, non-directional quantity [6–27] Compactness =

p2 4π A

(2)

The perimeter of an item is P, and its area is A. Entropy denotes the degree of gray-level disorder Objects with high entropy, such as salt and pepper, produce large amounts of entropy due to the overdistribution of turbulence. A low entropy image has a minimal amount of contrast. A picture cannot have zero entropy if it is flawlessly smooth. There is a negative value of entropy for the tumor [16] e=−

L−1

p(z i ) p(z i )

(3)

i=0

In this case, it represents an intensity variable, p(zi ) represents the histogram of intensity levels in a region, and L represents the number of possible intensity levels. Average The ROI is calculated by averaging the pixels. Because tumors have a high value for = (zi ), it is essential to know the ROI brightness. m=

L=1

z i p(z i )

(4)

i=0

Smoothness By measuring uniformity and relative smoothness, we can determine the uniformity and relative smoothness of the intensity at a specific ROI. This method is used to determine whether or not a region is flat. Flatness is indicated by a value close to zero. The smooth value approaches 1 when the area is complicated. An ROI near zero means a flat area that is potentially carcinogenic [16]. R =1−

1 1 + σ2

(5)

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

671

where σ is the standard deviation. A deviation from the mathematical mean calculated using the values’ mean root square is known as a normalized deviation. It is the most prevalent statistical distribution and is frequently used to establish how values will be distributed when gathering data. The standard deviation is nearly zero if every data point is centered around the mean. There will be a deviation from zero in the standard deviation if there are many points that deviate from the mean. A zero standard deviation occurs when all the data values are identical. There is no zero point in the tumor [28]. σ =

√ μ2 (z) = σ 2

(6)

where μ2 represents the 2D order of the moments surrounding the mean. μ2 =

L−1

(z i − m)2 p(z i )

(7)

i=0

where m is the average intensity [16]. (5) Classification In order to demonstrate the efficacy of the suggested CAD system for the identification of breast cancer, we looked at 100 mammograms from the MIAS database. There are 322 photos in the MIAS database, categorized as natural, benign, and malignant masses, as well as their true nature. Mammography has a resolution of 1024 × 1024 pixels, with an intensity of 8 bits per pixel. This study evaluated 100 mammographic images from the MIAS database in the following manner: fifty normal photographs, fifty abnormal photographs, twenty-five benign abnormal photographs, and twentyfive malignant photographs. As a first step, the proposed CAD system classifies all one hundred mammographic images as normal or abnormal. In the following step, the system identifies 50 abnormal mammograms, 25 of which were used for training and 25 for testing. Time used, sensitivity, specificity, and accuracy are assessed for the suggested CAD system, with sensitivity (SN) representing true positive rates and specificity (SP) representing false-positive rates. [4] Sensitivity (SN) equals total number (TP)/(TP + FP), and specificity (SP) equals total number (TN)/(TN + FP). The precision is equal to (TP + TN)/(TP + FP + TN + FN). Where TP represents a true positive, TN represents a true negative, FP represents a false positive, and FN represents a false negative. A. Results of normal-abnormal categorization In Table 1, two feature selection strategies (GA2 -SVM-MI and PSO3 -SVM-MI4 ) and three KNN classifiers are summarized for normal-abnormal classification (linear 2

Genetic algorithm. Particle swarm optimization. 4 Mutual information. 3

672

B. N. H. Saber and R. N. H. Saber

Table 1 Results of the classifiers for normal-abnormal cases [4] GA-SVM + MI (60 training and 40testing) Classifier

FP

FN

Accuracy %

SN %

SP %

Linear SVM

0

2

95

91.3

100

Kernel SVM

0

1

97.5

93.75

100

KNN

2

0

95

100 SN %

90.48

PSO-SVM + MI (60 training and 40 testing) Classifier

FN

FP

Accuracy%

Linear SVM

1

2

92.5

95.83

SP %

Kernel SVM

2

0

95

90.47

100

KNN

3

1

90

87.5

95

89.5

SVM and core SVM). There is a high degree of accuracy, sensitivity, and classification features in all situations (over 90%). GA-SVM-MI has a better feature selection method than PSO-SVM-MI about accuracy and classification features. On the other hand, the PSO-SVM-MI algorithm requires less processing time than the GA-SVMMI algorithm. According to Table 1, the core SVM classifier is the most accurate when compared to the KNN and linear SVM classifiers when implementing GA-SVM-MI or PSO-SVM-MI. This indicates that the suggested CAD system, which uses GA-SVM-MI for feature selection and SVM kernels for classification, has the best accuracy (97.5%) of all the tested systems. An analysis of the results of the proposed approach (GASVM-MI + SVM kernel) compared with the results of other CAD systems. With 96% accuracy, 100% sensitivity, and 93% specificity, an artificial neural network method is used to diagnose normal and abnormal mammography. We have improved accuracy (97.5%), specificity (95%),and sensitivity (100%) using a lower percentage of taught data and a more significant percentage of tested data as compared to the suggested method, by using a lower rate of trained data and a more significant percentage of sampled data [4]. B. Benign-malignant categorization results Similarly, Table 2 presents the categorization results for benign versus malignant tumors. Comparing the proposed methodology (GA-SVM-MI + kernel SVM) with existing approaches shows that it has the greatest accuracy (96%) in correctly classifying benign and malignant tumors. False positives (benign misdiagnosed as malignant) are preferred by radiologists over false negatives (cancer classified as benign). Using the suggested method, only one benign instance has been misclassified as malignant (false positive 1), demonstrating its effectiveness. The proposed diagnostic system performs better than existing CAD techniques for benign-malignant categorization, based on comparing sensitivity (100% vs. 83.9%) and specificity (93.3% vs. 91%) [4].

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

673

Table 2 Results of classifiers for benign-malignant cases [4] GA-SVM + MI (25 training and 25 testing) Classifier

FP

FN

Accuracy%

SN%

SP %

Linear SVM

1

Kernel SVM

1

1

92

92.8

92.3

0

96

100

KNN

2

93.3

0

92

100

88.2

PSO-SVM + MI (25 training and 25 testing) Classifier

FP

FN

Accuracy %

SN %

SP %

Linear SVM

1

1

92

92.3

92.8

Kernel SVM

0

2

92

88.89

100

KNN

2

1

88

93.3

84.6

8 Result Women are most likely to be affected by breast cancer, as mentioned previously. In addition to mammography, other diagnostic techniques such as biopsy and MRI are also available. Mammography is the most cost-effective and safest way to diagnose this condition. A mammogram can be classified as either a film or a digital image. There are several databases available for mammograms, however, in this article, the MIAS database was utilized the most, and an overview of CAD system developments and critical techniques, including image enhancement and asymmetry and distortion detection, was presented. Another method of diagnosing breast cancer, such as feature extraction, analyzes the geometric features of the tissue. It is important to note, however, that statistics, as well as statistical characteristics, can sometimes provide contradictory information. It describes the characteristics of a malignant tumor rather than those of a benign tumor, for example. In this stage, we utilized the classification accuracy property, which indicates, based on the acquired results, that the GA-SVM-MI property SVM classifier has the highest classification accuracy (97.5%) for abnormal and benign instances—only a false-positive malignant case (96%) is detected in the test.

References 1. Swapnil P et al (2016) Region marking and grid based textural analysis for early identification of breast cancer in digital mammography. In: International conference on advanced computing, IEEE 2. Giger (2018) ML: Machine learning in medical imaging. J Am Coll Radiol 15(3 Pt B):512–520 3. Rahmatika A et al (2019) Automated segmentation of breast tissue and pectoral muscle in digital mammography. 978-1-5386-8448-1/19/$31.00, IEEE

674

B. N. H. Saber and R. N. H. Saber

4. Salama MS et al (2018) An improved approach for computer-aided diagnosis of breast cancer in digital mammography. 978-1-5386-3392-2/18/$31.00, IEEE 5. Parisa Beham M et al (2020) MAMMSIT: A Database For The diagnosis and detection of Breast Cancer in Mammography images. Authorized licensed use limited to: Cornell University Library. Downloaded on August 27, 2020 at 08:11:26 UTC from IEEE 6. Birdwell RL, Ikeda DM, O’Shaughnessy KD, Sickles EA (2001) Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection. Radiology 219:192–202 7. Ng KH, Muttarak M (2003) Advances in mammography have improved early detection of breast cancer. J. Hong Kong College Radiol 6(3):126–131 8. Available: http://www.cancer.gov/cancertopics/factsheet/DMISTQandA [online] (2005) 9. Yang W (2006) Digital mammography update. Biomed Imag Intervention J 2(4):45–12 10. Tang J et al (2009) Computer-aided detection and diagnosis of breast cancer with mammography: recent advances. In: IEEE Trans Inf Technol Biomed 13(2) 11. NCI Cancer Fact Sheets (2007) [Online]. Available: http://www.cancer.gov/cancertopics/fac tsheet/Detection/screening-mammograms 12. Kopans DB (2007) Breast imaging, 3rd edn. Williams & Wilkins, Baltimore, MD 13. Nishikawa RM, Giger ML, Doi K, Vyborny CJ, Schmidt RA (1995) Computer-aided detection of clustered microcalcifications on digital mammograms. Med Biol Eng Comput 33(2):174–178 14. Gurcan MN, Yardimci Y, Cetin AE, Ansari R (1997) Detection of microcalcifications in mammograms using higher order statistics. IEEE Signal Process Lett 4(8):213–216 15. Casaseca-de-la-Higuera P, Arribas JI, Munoz-Moreno E, Alberola-Lopez C (2005) A comparative study on microcalcification detection methods with posterior probability estimation based on Gaussian mixture models. In: Proc 27th Annu Int Conf Eng Med Biol Soc 1:49–54 16. Goudarzi M et al (2018) Extraction of fuzzy rules at different concept levels related to image features of mammography for diagnosis of breast cancer. Journal homepage: www.elsevier. com/locate/bbe, Science Direct, 2018 17. El Naqa I, Yang Y, Wernick MN, Galatsanos NP, Nishikawa RM (2002) A support vector machine approach for detection of microcalcifications. IEEE Trans Med Imag 21(12):1552– 1563 18. Wei L, Yang Y, Nishikawa RM, Wernick MN, Edwards A (2005) Relevance vector machine for automatic detection of clustered microcalcifications. IEEE Trans Med Imag 24(10):1278–1285 19. American College of Radiology (ACR) (1998) Illustrated breast imaging reporting and data system (BI-RADS), 3rd edn, Amer. Coll. Radiol, Reston, VA 20. Yankaskas BC, Schell MJ, Bird RE, Desrochers DA (2001) Reassessment of breast cancers missed during routine screening mammography: a community based study. Amer J Roentgenol 177(3):535–541 21. Burrell H, Evans A, Wilson A, Pinder S (2001) False-negative breast screening assessment: what lessons we can learn? Clin Radial 56(5):385–388 22. Homer MJ (1997) Mammographic interpretation: a practical approach. McGraw-Hill, Boston, MA 23. Ghongade RD et al (2017) Computer-aided diagnosis system for breast cancer using RF classifier, accepted to be presented at the IEEE WiSPNET 2017 conference, IEEE 24. Shi P, Zhong J, Rampun A, Wang H (2018) A hierarchical pipeline for breast boundary segmentation and calcification detection in mammograms. Comput Biol Med 96:178–188 25. Cheng E, Xie N, Ling H, Bakic PR, Maidment ADA, Megalooikonomou V (2010) Mammographic image classification using histogram intersection. In: IEEE international symposium on biomedical imagine: from nano to macro, pp 197–200 26. Al-shamlan H, El-zaart A (2010) Feature extraction values for breast cancer mammography images. In: International conference on bioinformatics and biomedical technology, pp 335–340

A Review on Diagnosis of Breast Cancer Using Mammography Techniques

675

27. Pak F, Rashidy Kanan H, Alikhassi A (2015) Breast cancer detection and classification in digital mammography based on non-subsampled contourlet transform (NSCT) and super resolution. Comput Methods Program Biomed 122(2):89–107 28. Sheshadri HS, Kandaswamy A (2007) Experimental investigation on breast tissue classification based on statistical feature extraction of mammograms. Comput Med Imag Graph 31(1):46–48

Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices Using Deep Learning Moushumi Barman and Bobby Sharma

Abstract Internet of Things (IoT) plays a vital role in transforming the world from telephone to smartphone, typewriter to laptop then notebook, normal home to smart home, hard work to smart work. The tremendous use of IoT in numerous applications not only eases our life but also saves time for innovative work. But everything has its negative influence. There is a ruinous-effect of using IoT in large-scale. It encourages the cyber-attacks to target the network and system which further leads to various kinds of cyber-attacks. Among all the cyber-attacks, malware proves themselves the most dangerous attack to bring interruption to daily operation and threaten the user to think twice before clicking on any link. This paper comprises the different types of malware attacks along with the IoT architecture. However, researchers have proposed different approaches to detect malware attacks on IoT using deep learning. There are some security issues which are still working on it. Further, this paper describes such a type of security issue that proves that IoT devices are still vulnerable to malware attacks. Keywords IoT · Malware · Ransomware · Botenago

1 Introduction The collection of small and delicate sensor nodes that combine with hardware and software that transmits and stores data over the Internet is termed as Internet of Things (IoT) (see Fig. 1). Currently, IoT is the biggest computerized platform that connects the physical and virtual world. Mohanta et al. [1] mentioned that the modern IoT devices are more efficient than traditional due to the wide range of sensors increasing the demand of IoT in the field of agriculture, medicine, healthcare, military, industry, M. Barman (B) · B. Sharma Department of CSE, Assam Don Bosco University, Guwahati, Assam, India e-mail: [email protected] B. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_52

677

678

M. Barman and B. Sharma

Fig. 1 Sample of IoT and associated networks

transportation, etc. Sikder et al. [2] mentioned that the sensor adapts the fluctuation of their surroundings and is capable of implementing required actions to upgrade progressing tasks effectively. The tremendous use of IoT leads the IoT network into several security attacks. Ngo et al. [3] broached malware attack is considered as one of the most threatening threats to IoT devices. This paper describes (i)

The background of IoT and mentioned about the attacks that can be possible in each layer. (ii) Analysis of different malware attacks in IoT devices. (iii) Malware and their security challenges mentioned by different researchers.

2 Background 2.1 IoT Architecture and Layer Vulnerability The architecture of IoT devices consists of four components: sensing layer, network layer, data processing layer, and application layer (see Fig. 2). In each layer, there is a possibility of different cyber-attacks that are mentioned in Table 1. Sensing Layer: Sikder et al. [2] mentioned that the objective behind the sensing layer is to perceive any wonders in the device’s periphery and procure data from the physical world. This layer comprises multiple numbers of different sensors connected to

Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices …

679

Fig. 2 Four-layer structure in IoT devices

Table 1 Different types of malware attacks in IoT architecture Type of attack

Layer

Possible attack

Physical attack

Application layer, sensing layer

Side channel attack, virus, worms, Trojan Horses, Spyware, Adware

Software attack

Application layer, data processing layer

Malware, data inconsistency, unauthorized access, DDoS

Network attack

Data processing layer, network layer

Traffic analysis attack, RFID spoofing, routing information attack, DDoS, data breach

Data attack

Application layer

Tampering attack, malicious code injection, fake node injection

IoT devices through sensor hubs. Sensors are categorized in motion sensors, environmental sensors, and position sensors. Mohurle and Patil [4] described that motion sensors monitor the fluctuation in motion and aspects of the devices like monitoring linear and angular motions in a device. Environmental sensors comprising light, pressure, moisture, heat, etc., are integrated on IoT devices to sense the environmental change. For example: Global Positioning System (GPS) for navigational purpose and magnetic sensors used for digital compass. Network Layer: Nandhini et al. [5] presented the purpose of the network layer is to transmit collected data from the sensing layer to other IoT devices. The network layer of IoT devices executed by using different communication technologies (e.g., Wi-Fi, Bluetooth, LoRa) permits data flow between other devices within the same network.

680

M. Barman and B. Sharma

The popular attack in this layer is DoS attack, in-consequence numerous nodes are required for data exchanging. Other risks and issues can be seen at the network layer, such as viruses, man-in-the-middle attacks, integrity and confidentiality. Data Preprocessing Layer: The data processing layer comprises primary information to prepare units of IoT devices. The data processing layer takes information gathered in the sensing layer and examines the information to make choices dependent on the outcome. Sikder et al. [2]. In some IoT devices, the data processing layer additionally spares the consequences of the past investigation to improve the client experience. This layer may share the consequences of the information preparing with other associated devices through the network layer. Application Layer: The application layer actualizes and presents the after effects of the data processing layer to achieve dissimilar uses of IoT devices. Sikder et al. [2] presented the application layer as a client driven layer which executes different errands for the clients. There exist various applications which incorporate, shrewd transportation savvy home, individual consideration, healthcare, and so on. Table 1 describes the possible attacks in IoT architectures.

3 Related Work Waheed et al. [6] presented malware as an abridgement of malicious software. Xiao et al. [7] stated that, along with services and networks, IoT devices are endangered to cyber-attacks, physical attacks, software attacks, and primary leakage. Ngo et al. [3] mentioned about the rise of malware attacks by attackers in different way targeted to computers running Microsoft Windows. Ngo et al. [3] mentioned about the malware created by replicating the source code amid online instructions. IoT has covered most parts of the world, cyber criminals inject malware such as Aidra, Bashlite, and Mirai utilizing scanners that locate exposed ports and credentials on IoT devices. The reason behind attackers’ center of attraction toward IoT devices is its weak security design or implementation. IoT malware has multiple features like performing DDoS attacks in IoT services such as FTP, SSH, or Telnet by IoT malware; performing brute-force attacks to gain access to IoT devices. A study says malware design by duplicating the source code according with online instructions or an alternative of the identical venomous code generated by the malware creator. Shoban et al. [8] gathered system calls of IoT using strace tools in Ubuntu. For data pre-processed, authors implement N-gram techniques and term frequency inverse document frequency (TFIDF) process. The extricate system calls were classified into two classes, i.e., benign and malicious sequence using RNN deep learning technique. The proposed method was tested on the IOTPOT dataset. It achieved an accuracy of 98.31% on the IOTPOT dataset. Aishahrani [9] proposed CoLLaborative intruder detection system (CoLL-IoT), an interactive system that detects malware attacks in IoT devices. The interactive

Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices …

681

system consists of four layers namely IoT layer, network layer, fog layer, and cloud layer. In this process, all the layers work together for data analyzing and monitoring purposes. The proposed method was tested on the UNSW-NB15 dataset and achieved an accuracy of 98.35%, F1-score of 97.39%, precision of 96.36%, and type I error of 3.61%. Alissa et al. [10] proposed a model of dwarf-mongoose optimization with machine learning driven ransomware detection (DWOML-RWD). The model discussed here, DWOML-RWD, was developed primarily for identifying and categorising different ransomware attacks. Author used an enhanced krill herd optimization algorithm with the help of dynamic oppositional-based learning (QOBL) for feature-selection purposes. The proposed method tested on a benchmark dataset. It achieved an accuracy of 99.40%, F1-score of 99.40%, FNR of 00.60%, FPR of 00.60%. Riaz [11] proposed a deep learning-based entity classification approach consists of three steps: (i) data preprocessing using scaling, normalization and denoising technique, (ii) selection of feature, (iii) implementation of one-hot encoding followed by the entity classifier on the basis of CNN and LSTM. Author includes KNN, SVM, LR, and fuzzy C-mean techniques to evaluate for malware classification. The proposed method tested on IoBT malware dataset. It achieved an accuracy of 99.5%. Asam et al. [12] proposed CNN-based IoT malware detection architecture (iMDA) that subsumes multiple feature learning blocks such as edge exploration and smoothing, multi-path dilated convolutional operations, and channel squeezing and boasting in CNN to learn a diverse set of features. Table 2 describes a comparative analysis on existing work mentioned in the paper.

4 Analysis of Different Malware Attacks Malware attack refers to when a digital attacker generates venomous software that is introduced on another person’s device without their insight to access personal data or to harm the device, as a rule for monetary benefit. Table 3 shows some existing malware attacks. Chen et al. [13] and Ahn et al. [14] describe ransomware as one of the dangerous attacks among all the malware. In January 2020, Travelex, a major international foreign currency exchange, was suspended owing to ransomware attacks. A ransomware attack demands the Richmond Community Schools to pay $10,000 in bitcoin. In Warren, New Jersey, Temple Har Shalom hacked its network and encrypted several machines by the cyber criminals behind the Sodinokibi Ransomware. In February 2020, The University of Maastricht paid the hackers who inflicted a ransomware of US$220,000. In March 2020, ransomware attacked medical and military contractor Kimchuk. In April 2020, ransomware locked two numbers of law firms from the system in Manitoba. In May 2020, attacks will steal data about up to 100,000 people at Interserve. In June 2020, attackers violated AJ Telecom, Austria’s largest ISP. In July 2020, Athens ISD paid monetary for school data release [15]. Ransomware becomes a serious threat for the world and serious actions are

682

M. Barman and B. Sharma

Table 2 Comparative analysis on existing work Method

Dataset

Accuracy and performance

Limitation

RNN [8]

IOTPOT

Accuracy: 98.31%

Lack of multiclass classification on system calls

CoLL-IoT [9]

UNSW-NB15

Accuracy:98.35% F1-score: 97.39% Precision:96.36% Type I error:3.61

The value of FPR, FDR, FNR, FOR, and time complexity is not mentioned

DWOML-RWD [10]

Benchmark

Accuracy: 99.40% F1-score:99.40 FNR: 00.60 FPR: 00.60

The algorithm was carried out with a very small dataset. Considered only one malware attack (Ransomware)

CNN and LSTM [11] IoBT

Accuracy:99.5%

Lack of accuracy on a huge dataset

iMDA [12]

Accuracy:97.93% F1-score: 93.94% Precision:98.64% MCC:87.96% AUC-PR:96.89 AUC-ROC:99.38

The detection accuracy is low. Need to consider feature selection. Techniques help to gain detection accuracy

NA

needed to control or to reduce such kinds of malware attacks. Kanranja et al. [16] presented the effect of ransomware attack. Table 4 shows some existing ransomware attacks that made a global impact and caused widespread damage.

5 Malware Security Issues in IoT Aslan and Samet [44] mentioned some highly obscure techniques that decrease detection chances of malware in a system. Malware can transparently cross defense software that is executed in kernel mode like antivirus software, windows security, etc., and some malware occasionally has features to grow classes exponentially further which leads to rising difficulties to detect all the malware on a system using single detection techniques. Some obscuring techniques are as follows: • To avoid their existence in the host and to hide malicious code, the attacker uses encrypted code along with actual code known as encryption method like ransomware attack. Alzarooni [45] developed an extended version of dynamic domain reduction method AAPL programs to mitigate encryption malware attack. The objective of the DDR procedure is to consequently observe a space of program inputs that can be utilized to deliver test information for achievable program ways. To adjust this

Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices …

683

Table 3 Different types of malware attacks Malware attack

Year Description

Virus [17, 18]

1986 The cyber criminals design a malicious program to infect legitimate software so that the user uses this venomous software and allow the virus to spread in the computer without user concern. A virus needs to be actioned before they can spread in the system. Some popular viruses are creeper, Elk Cloner, ILOVEYOU, Code Red, Nimda, Slammer, Blaster, Welchia, etc.

Spyware [19, 20]

1995 Spyware is a combination of two words Spy + ware, Spy depicts a secret agent and merchandise described as a kind of malware attack. The attacker designs a software that aims to steal the user’s personal and confidential data that may harm the user device by hiding in the background on a computer like credit card details, password, and other sensitive data. Websites may likewise take part in spyware practices like web tracking. Different spyware threats are Cool Web Search (CWS), Gator (GAIN), 180search Assistant, ISTbar/AUpdate, Transponder (vx2), Internet Optimizer, BlazeFind, Hot as Hell, Advanced Keylogger, TIBS Dialer

Ransomware [21, 22] 1989 Ransomware is a blend of two words Ransom + ware, ransom denotes payment and ware is described as a sort of malware attack. In ransom attack, also known as scareware attack, attackers design strong encryption algorithms to cipher victims’ data and demand for encipher key as a result mostly temporary or in partial cases perpetually loss of data and shows unusual behavior in system operations along with monetary loss. Some ransomware is WannaCry, Ryuk, SamSam, Petya, TeslaCrypt, CryptoLocker, AIDS Trojan, or PC Cyborg Trojan horses [23]

1971 The term Trojan acquired from the story of the misleading Trojan horse that compels the fall of the city of Troy in ancient Greek. The cyber-attacks inject Trojans through social engineering such as an attached malicious email to user, mislead user as free software, videos or music, or apparently sound advertisements, malvertising or cross site scripting. Some popular Trojan horses are Backdoor, Exploit, Rootkit, Trojan-Banker, Trojan-Spy

Adware [24]

2007 Malware generates automatically online advertisements. This kind of malware creates capital for its developer by showing advertisements on the screen

Worm [25]

1975 A worm is a self-contained malware piece of software that clones itself inordinately to spread to other computers or devices. In a short period, worms adapt recursive procedures to self-replicate without a host program and distribute themselves in exponential form to infect more and more computers. Some examples of worms are The Morris Worm, Storm Worm, SQL Slammer, Zotob

Botenago [26]

2021 Malware reaches the victims by two backdoor ports, i.e., 31,412 and 19,412. Port 19,412 tries to listen to gain the victim’s IP. Once a connection is established with the details of a port, it will wreath via plotted exploit functions and carry out them with the given IP

684

M. Barman and B. Sharma

Table 4 Different types of ransomware attacks Ransomware attack

Year

Description

WannaCry [27–29]

2017 WannaCry is ransomware that embezzles flaws in the Microsoft Windows Server Message Block (SMB) protocol and has a self-engendering instrument that lets it contaminate different machines. WannaCry is bundled as a dropper, an independent program that extracts the encoding/decoding application, files containing encryption keys, and the Tor correspondence program

Cerber [30–32]

1995 Cerber is a ransomware framework leased to programmers on the Dark Web known as ransomware-as-a-service (RaaS) and a malicious software that infects the system via malvertising or phishing emails or malware contain websites that encrypts data on clicking on the link or visiting the websites and afterward holds them as a captive, requesting a payoff installment in return for decrypting data. At the point when it successfully encrypts data on the system, it shows a ransom note on the work area background. The attacker used the 2048 key (AES CBC 256-bit encryption) RSA algorithm to encrypt the data on the system

CryptoLocker [33, 34] 2013 The hackers retained decryption keys until ransom was paid either via digital cryptocurrency bitcoin or MoneyPak within 72 h, or else the decryption key apparently pulled down and recovery of data would be effectively impossible. Initially, CryptoLocker attackers targeted business persons through spam mails as customer whim in case of beneficiaries’ concern. The Trojan version of CryptoLocker camouflages itself as UPS and Xerox PDF connections, while the fresher worm form can navigate inward organizations and goes on USB drives. The CryptoLocker encryption uniquely targets proficient class file types, similar to Word, Excel, Photoshop, and InDesign, while regarding music and video records Locky [35–37]

2016 Locky is a combination of a key encapsulation scheme (public key cryptosystem) and a data encapsulation scheme (symmetric-key cryptosystem) known as a hybrid cryptosystem introduced in February 2016. The Locky ransomware dispersed by using exploit kits includes a package of spam campaigns like Rig. The malicious program scans each drive of a system including network drives to encode them using AES and RSA algorithms. To find the command-and-control server (C&C), Locky’s key uses domain generation algorithm (DGA)

Petya [38, 39]

2016 Petya is a member of ransomware family that contaminates a computer system by penetrating the master file table (MFT), encrypts the entire hard drive which makes the entire disk inaccessible, though the existent data are not encrypted. Petya was released in 2016, through a fake job application message attached to an infected data which is stored in Dropbox and spread throughout the system. Windows computers are only victims of Petya ransomware. Petya needs the user consent to construct admin-level changes. Once the user agrees, the system reboots automatically with a system fake crash screen and allows Petya to encrypt the drives secretly. After successfully encrypting data, it shows a ransom notice. Attackers diagnosed that Petya is not that much successful; later they developed a modified version of Petya known as NotPetya that demonstrates more danger and is capable of escalating robotically

NotPetya [39, 40]

2017 The creation of NotPetya escalates the system in Ukraine by using backdoor in accounting software further, utilizing EternalBlue and EternalRomance, weaknesses in the Windows SMB protocol. NotPetya is a strong ransomware that not only encrypts the MFT but also the rest of the files in the drive. The encryption process destroys the file in such a manner that it cannot be recovered after paying the ransom. A lateral movement of NotPetya ransomware attack, initially the malware injects through doc format as M.E. doc, a Ukrainian finance application. Further, malware spread as a dll file over an internal network utilizing unlike lateral movement methods (continued)

Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices …

685

Table 4 (continued) Ransomware attack

Year

Description

Ryuk [41]

2018 Ryuk infuses machines through phishing messages or drive-by-downloads. It utilizes a dropper, which extricates a Trojan on the casualties’ machine and sets up an industrious organization association. Hackers would then be able to utilize Ryuk as a reason for an advanced persistent threat (APT), introducing extra devices like keyloggers, performing benefit heightening, and sidelong development. Ryuk is introduced on each extra framework the hackers get entrance. When the hackers have introduced the Trojan on, however, many machines as could be expected under the circumstances, they actuate the ransomware and encode the data. In a Ryuk-based attack campaign, ransomware perspective is just the last phase of the assault, after the hackers have just done harm and taken the records they need

GandCrab [42, 43]

2018 GandCrab encodes documents on a client’s machine and requests a payment and was utilized to dispatch ransomware-based blackmail assaults and took steps to uncover casualties and some unavoidable habits. There are a few forms all of which target Windows machines. Free decrypts will be accessible today for most forms of GandCrab. Suppose a campaign of spam mail carries malware files having subject name as “Greeting Card”, “Jackpot winner”, “Invalid Account”, “Congratulations for winning lottery”, etc., to convince the victim so that the user opens the file then attaches it. JavaScript is exceptionally muddled with base64 encode and URL encode. It utilizes bitsadmin.exe and PowerShell.exe to download ancient rarities from Hosted malware space. The underlying payload downloaded is only a downloaded document which makes it self-duplicate, ends itself, and runs from another area. This document downloads three new segments containing Mail Spammer, Monroe Digger, and GandCrab Ransomware

strategy for low-level executable malware programs, first, we fostered an expansion of the DDR calculation for breaking down executable malware programs (low-level code). Then, at that point, the author showed that the DDR strategy is right in creating an under-estimate arrangement (a subset) from the underlying spaces of the program input factors. With mechanized test information age utilizing the lengthy DDR procedure, complete investigation of attainable program ways can be performed. Also, the author mentioned about the technique consequently gives protected under-estimate program inputs, which can be utilized by malware location frameworks to set off and catch noxious ways of behaving. • During malware payload encryption and decryption, hackers utilize unlike key, which increases more difficulties to detect malware known as oligomorphic method. Szor [46] exhibits the tenor venomous code on various platforms. Author describes classification of malware strategies for contamination, in-memory operation, preservation instinct, payload delivery, exploitation to fetch features of malicious code. Author mentioned malware identification and responded to code obfuscation threats like encryption, polymorphism, and metamorphism. Author provides empirical approaches for contrasting spite code. Author justified the provided issues by implementing mitigation techniques technically such as scanning, code emulation, disinfection, inoculation, integrity checking, sandboxing, honeypots, behavior blocking.

686

M. Barman and B. Sharma

• The malware utilizes a dissimilar key to encode and decode similarly an oligomorphic method. Nevertheless, the encoded payload portion consists of some copies of the decipherer and can be encoded in a layered polymorphic method. Polymorphic method is more strenuous to detect than oligomorphic method. Stallings al. [47] describe polymorphic virus as a type of infection that makes duplicates during replication that are practically the same yet have unmistakably unique cycle designs, to overcome programs that sweep for infections. For this situation, the “signature” of the infection will fluctuate with each duplicate. To accomplish this variety, the infection may arbitrarily embed pointless guidelines or trade the request for free directions. A more powerful methodology is to utilize encryption. The procedure of the encryption infection is followed. The part of the infection that is answerable for producing keys and performing encryption/unscrambling is alluded to as the transformation motor. The change motor itself is modified with each utilization. Wong and Stamp [48] describe the oligomorphic method as dissimilar to encoded infections, oligomorphic infections truly do change their decryptors in new ages. Win95/Remembrance was able to fabricate 96 unique decryptor designs. Subsequently, the recognition of the infection, in view of the decryptors code, was not a functional arrangement, however, conceivable. Most items attempted to manage the infection by unique decoding of the encoded code, all things being 8 equals. Hence, the location is as yet in view of the steady code of the decoded infection body. Strangely, a few items that the author tried could not recognize all examples of Remembrance, in light of the fact that such infections should be analyzed down to their best subtleties to find and comprehend the oligomorphic decryptor generator. Without such a cautious manual investigation the sluggish oligomorphic infection procedures are difficult to recognize dependably. Clearly, they are an extraordinary chance for mechanized infection investigation. • Alam et al. [49] mentioned how attackers use a metamorphic method to make changes in opcode on each and every iteration during malicious code execution. The detection of code becomes difficult due to generation of each new copy along with different signatures. Aslan an Samet [44] mentioned about signature-based and heuristic-based approaches to detect malware attacks, however, the signaturebased identification approach has neglected to distinguish obscure malware. Then again, conduct-based, model checking-based, and cloud-based approaches perform well for obscure and convoluted malware; and profound learning-based, cell phones-based, and IoT-based approaches likewise arise to identify a few pieces of known and obscure malware. Notwithstanding, no methodology can recognize all malware in nature. This shows that to construct a compelling technique to recognize malware is an exceptionally difficult undertaking, and there is a gigantic hole for new examinations and techniques. Stallings et al. [47] describe a metamorphic virus as an infection changes with each contamination. The thing that matters is that a metamorphic infection revamps itself totally at every emphasis, utilizing different change strategies, expanding the trouble of recognition. Transformative infections might change their way of behaving as well as their appearance. Wong and Stamp [48] define metamorphics as a bodypolymorphics. It states that metamorphics infections do not have a decryptor, or a steady infection body.

Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices …

687

Nonetheless, they can make new ages that appear to be unique. Transformative infections do not utilize a steady information region loaded up with string constants; however, they have one single code body that conveys information as code. • A code protection method executes enormous counter approaches to avert it from being analyzed accurately. For example, it creates frequent changes to the system to hide from the detection system. Aslan and Samet [44] mentioned about signature-based and heuristic-based approaches to detect malware attacks, however, the signature-based identification approach has neglected to distinguish obscure malware. Then again, conduct-based, model checking-based, and cloud-based approaches perform well for obscure and convoluted malware; and profound learning-based, cell phones-based, and IoT-based approaches likewise arise to identify a few pieces of known and obscure malware. Notwithstanding, no methodology can recognize all malware in nature. This shows that to construct a compelling technique to recognize malware is an exceptionally difficult undertaking, and there is a gigantic hole for new examinations and techniques. Soliman et al. [50], Hassan [51], Kuzlu et al. [50] mentioned some other security issues in IoT devices as follows: • Preserving end-to-end communication between two IoT devices without intrusion of a third party and this challenge has been considered as a significant issue in the heterogeneous IoT architecture. Also, IoT gadgets cannot run protocols like transport layer security (TLS) which gives secure correspondence over the web 9 to keep away from fraud and listening in—or Web Protocol Security (IPSec) protocol—that verifies and scrambles parcels of information sent-because of low computational, power assets. • Data privacy and security keeps on being the single biggest issues in the present interconnected world. Preserving data from malicious software on IoT devices where gaining this data may lead to disastrous consequences. Data is continually being bridled, communicated, put away, and handled by huge organizations utilizing a wide cluster of IoT gadgets, like smart televisions, speakers and lighting frameworks, associated printers, air conditioning frameworks, and smart indoor regulators. Security is a significant test in IoT gadgets as private data turned out to be more defenseless against attackers that need to access data about the climate and the client. Where IoT gadgets in smart homes, smart vehicles log information about the climate and the clients so this breaks client security. • Preserving data from unauthorized persons who steal data and use it for their own profit. In spite of the fact that the data contained in the smart gadgets is not risky all alone, its profundity could absolutely leave purchasers and associations in desperate waterways. The data could rapidly depict everything about a client, which assists crooks with every one of the essential realities to take advantage of their character. Moreover, entertainment devices gather a great deal of data and even watch the client. This expands the possibilities of wholesale fraud and abuse. The security of these frameworks is connected with advanced characters

688

M. Barman and B. Sharma

that just implies that a vigorous identity and access management (IAM) system should be a fundamental piece of all the IoT gadgets and organizations to alleviate any gamble. Accordingly, the protection related issues can prompt compromised characters, which further affects both the customers and specialist organizations. • To get the IoT gadget against distributed denial of service (DDoS) attacks where the designated gadget is overwhelmed with demands till its assets are depleted, then, at that point, an intrusion detection system is required yet the execution of such frameworks in IoT is right now under research.

6 Conclusion The rapid rise of attack in IoT devices, specialists are rolling to artificial intelligence which provides fortifying these devices logically and in real-time. Several AI techniques along with applications are introduced to investigate cybercrime but, in many cases, these models are not yet common in commercial applications but kind of undergoing research are still strenuous. However, the discussed models are favorable and predict familiar attack detection in a very short period of time. Machine learning (ML) and deep learning (DL) are the famous AI techniques that researchers implement in their research. IoT and ML convey experiences in any case concealed in information for quick, computerized reactions and further developed direction. ML for IoT can be utilized to project future patterns, distinguish inconsistencies, and expand insight by ingesting picture, video, and sound. ML can assist with demystifying the secret examples in IoT information by breaking down monstrous volumes of information utilizing modern calculations. AI induction can enhance or supplant manual cycles with computerized frameworks involving measurably determined activities in basic cycles. Deep learning is a sort of ML and man-made consciousness (computer-based intelligence) that impersonates the manner in which people gain specific kinds of information. Deep learning is a significant component of information science, which incorporates measurements and prescient displaying. It is very valuable to information researchers who are entrusted with gathering, examining, and deciphering a lot of information; deep learning makes this interaction quicker and simpler. As this paper presents about malware issues in IoT devices. Some reasons to proceed research on mitigating malware issues in IoT devices using deep learning are as follows: • The program assembles the list of capabilities without anyone else without oversight. Unsupervised learning is not just quicker, yet it is normally more precise. • Deep learning can make complex factual models straightforwardly from its own iterative result, it can make exact prescient models from enormous amounts of unlabeled, unstructured information. This is significant as the Internet of Things (IoT) keeps on turning out to be more unavoidable on the grounds that the greater part of the information people and machines make is unstructured and is not marked more precise.

Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices …

689

References 1. Mohanta, B.K., Jena, D., Satapathy, U., Patnaik, S. (2020). Survey on IoT Security: Challenges and Solution using Machine Learning, Artificial Intelligence and Blockchain Technology. Internet of Things,100227. 2. Sikder, A.K., Petracca, G., Aksu. H., Jaegar, T., Uluagar, A.S. (2018). A survey on sensor-based threats to internet-of-things (iot) devices and applications. arXiv preprint arXiv:1802.02041. 3. Ngo, Q.D., Nguyen, H.T., Nguyen, L.C., Nguyen, D.H. (2020). A Survey of IoT malware and detetection methods based on static features. ICT express. 4. Mohurle S, Patil M (2017) A brief study of wannacry threat: Ransomware attack. Int J Adv Res Comput Sci 8(5):1938–1940 5. Nandhini, R., Aparna, R., Srilakshmi, P. (2018). Study on Security issues in Internet of Things. International Conference on Social Impact of Internet of Things (IoT). 6. Waheed, N., He, X., Usman, M. (2020). Security & Privacy in IoT Using Machine Learning & Blockchain: Threats & Countermeasures. arXiv preprint arXiv:2002.03488. 7. Xiao L, Wan X, Lu X, Zhang Y, Wu D (2018) IoT security techniques based on machine learning: How do IoT devices use AI to enhance security? IEEE Signal Magazine 35(5):41–49 8. Shobana, M., Poonkuzhali, S. (2020). A novel approach to detect IoT malware by system calls using Deep learning techniques. In 2020 International Conference on Innovative Trends in Information Technology (ICITIIT), pp. 1–5. IEEE. 9. Alshahrani HM (2021) Coll-iot: A collaborative intruder detection system for internet of things devices. Electronics 10(7):848 10. A. Alissa, K., H. Elkamchouchi, D., Tarmissi, K., Yafoz, A., Alsini, R., Alghushairy, O., Mohamed, A., Al Duhayyim, M. (2022). Dwarf Mongoose Optimization with MachineLearning-Driven Ransomware Detection in Internet of Things Environment. Applied Sciences, 12(19), 9513 11. Riaz S, Latif S, Usman SM, Ullah SS, Algarni AD, Yasin A, Anwar A, Elmannai H, Hussain S (2022) Malware Detection in Internet of Things (IoT) Devices Using Deep Learning. Sensors 22(23):9305 12. Asam M, Khan SH, Akbar A, Bibi S, Jamal T, Khan A, Ghafoor U, Bhutta MR (2022) IoT malware detection architecture using a novel channel boosted and squeezed CNN. Sci Rep 12(1):1–12 13. Chen, Q., Islam, S.R., Haswell, H., Bridges, R.A. (2017). Automated ransomware behavior analysis: Pattern extraction and early detection. In International Conference on Science of Cyber Security ( pp. 199–214). Springer, Cham. 14. Ahn, G.J., Doupe, A., Zhao, Z. and Liao, K. (2016). Ransomware and cryptocurrency: partners in crime. In Cybercrime Through an Interdisciplinary Lens (pp. 119–140). Routledge. 15. Teceze Digital Innovation & Excellence, https://www.teceze.com/how-to-prevent-ransom ware-attack-in-2020, last accessed 2022/12/09. 16. Kanranja EM, Masupe S, Jaffery MG (2020) Analysis of internet of things malware using image texture features and machine learning techniques. Internet of Things 9:100153 17. Aycock, J. (2006). Computer viruses and malware. Springer Science & Business Media. 18. Iliev A, Kyurkchiev N, Rahnev A, Terzieva T (2019) Some models in the theory of computer viruses propagation. LAP LAMBERT Academic Publishing, Saarbrucken, Germany 19. DeNardis, L. (2007). A history of internet security. In The history of information security, pp. 681–704. Elsevier Science BV. 20. Spyware Wikipedia, https://en.wikipedia.org/wiki/Spyware, last accessed on 2022/12/09. 21. Humayun, M., Jhanjhi, N.Z., Alsayat, A., Ponnusamy, V. (2020). Internet of things and ransomware: Evolution, mitigation and prevention. Egyptian Information Journal. 22. Ransomware Wikipedia, https://en.wikipedia.org/wiki/Ransomware, last accessed on 2022/12/09. 23. Trajon Horse Wikipedia, https://en.wikipedia.org/wiki/Trojan horse (computing), last accessed on 2022/12/09.

690

M. Barman and B. Sharma

24. Adware Wikipedia, https://en.wikipedia.org/wiki/Adware, last accessed on 2022/12/09. 25. Computer Worm Wikipedia, https://en.wikipedia.org/wiki/Computer worm, last accessed on 2022/12/09. 26. Fouzas, K.P. (2022). Evaluation of the open source HELK SIEM through a series of simulated attacks. 27. Mohurle S, Patil M (2017) A brief study of wannacry threat: Ransomware attack 2017. Int J Adv Res Comput Sci 8(5):1938–1940 28. WannaCry Wikipedia, https://en.wikipedia.org/wiki/WannaCry ransomware attack, last accessed on 2022/12/09. 29. Kuzlu M, Fair C, Guler O (2021) Role of artificial intelligence in the Internet of Things (IoT) cybersecurity. Discover Internet of Things 1(1):1–14 30. Kurniawan A, Riadi I (2018) Detection and analysis cerber ransomware based on network forensics behavior. International Journal of Network Security 20(5):836–843 31. Ganorkar SS, Kandasamy K (2017) Understanding and defending crypto-ransomware. ARPN Journal of Engineering and Applied Sciences 12(12):3920–3925 32. Butt, U.J., Abbod, M.F., Kumar, A. (2020). Cyber threat ransomware and marketing to networked consumers. In Handbook of research on innovations in technology and marketing for the connected consumer, pp. 155–185. IGI Global. 33. Liao, K., Zhao, Z., Doupé, A., Ahn, G.J. (2016). Behind closed doors: measurement and analysis of CryptoLocker ransoms in Bitcoin. In 2016 APWG symposium on electronic crime research (eCrime) ( pp. 1–13). IEEE. 34. Hansberry, A., Lasse, A. , Tarrh, A.: Cryptolocker: 2013’s most malicious malware. Retrieved February, 9, 2017. 35. Almashhadani AO, Kaiiali M, Sezer S, O’Kane P (2017) A multi-classifier network-based crypto ransomware detection system: A case study of locky ransomware. IEEE access 7:47053– 47067 36. Prakash, K.P., Nafis, T. and Biswas, S.S. (2017). Preventive Measures and Incident Response for Locky Ransomware. International Journal of Advanced Research in Computer Science, 8(5). 37. Locky Wikipedia, https://en.wikipedia.org/wiki/Locky, last accessed on 2022/12/09. 38. Fayi, S.Y.A. (2018). What Petya/NotPetya ransomware is and what its remidiations are. In Information technology-new generations (pp. 93–100). Springer, Cham. 39. Watson, F.C., CISM, C., ECSA, A. (2017). Petya/NotPetya Why It Is Nastier Than WannaCry and Why We Should Care. ISACA, 6, 1-6. 40. Adamov, A., Carlsson, A. (2017). The state of ransomware. Trends and mitigation techniques. In EWDTS, pp. 1–8. 41. Budke CA, Enko PJ (2020) Physician Practice Cybersecurity Threats: Ransomware. Mo Med 117(2):102 42. Lemmou Y, Souidi EM (2018) Inside gandcrab ransomware. International Conference on Cryptology and Network Security. Springer, Cham, pp 154–174 43. Luntovskyy, A. and Gütter, D. (2022). Highly-distributed systems: IoT, robotics, mobile apps, energy efficiency, security. Springer Nature. 44. Aslan ÖA, Samet R (2020) A comprehensive review on malware detection approaches. IEEE Access 8:6249–6271 45. Alzarooni, K. (2012). Malware variant detection. PhD Dissertation, Department of Computer Science, University College London, London, UK. 46. Szor, P. (2012). The Art of Computer Virus Research and Defense. Upper Saddle River, NJ, USA, Pearson Education. 47. Stallings, W., Brown, L., Bauer, M.D., Howard, M. (2012). Computer security: principles and practice. Upper Saddle River, Pearson Education. 48. Wong W, Stamp M (2006) Hunting for metamorphic engines. J Comput Virol 2(3):211–229 49. Alam, S., Horspool, R.N., Traore, I., Sogukpinar, I. (2015). A framework for metamorphic malware analysis and real-time detection. computers & security, 48, 212–233.

Pragmatic Way of Analyzing Malware Attacks Detection in IoT Devices …

691

50. Soliman, S.W., Sobh, M.A. and Bahaa-Eldin, A.M. (2017).Taxonomy of malware analysis in the IoT. In 2017 12th International Conference on Computer Engineering and Systems (ICCES), pp. 519–529. IEEE. 51. Hassan NA (2019) Ransomware families. In: Ransomware revealed. Apress, Berkeley, CA, pp 47–68

Network Security Risks, Challenges, and Solutions for Underwater Wireless Sensor Network’s Trusted Node-to-Node Communication: A Survey D. Jocil and R. Vadivel

Abstract The underwater wireless sensor network is a type of underwaternetworked system and a subclass of wireless sensor networks. Underwater wireless sensor networks have attracted the interest of many academics and researchers due to their extensive research area and recent developments. UWSNs have enlarged more attention in playing an essential part by observing numerous applications, such as conservational watching, calamity avoidance, contamination monitoring, oil/gas spill exposure, mine detection, assisted navigation, and so on. Underwater wireless communications persist quite challenging, due to the unique and harsh conditions that characterize underwater channels. The hard underwater environment and peculiar abilities also launch dangerous attacks and threats. These characteristics can easily afford an advantage to attackers to steal information while communicating. The two essential requirements for UWSN’s secure, dependable communications against internal and external assaults are security and reliability. To ensure secure data transfer on UWSNs, academics and researchers are concentrating on internal and external attacks. This article presents a survey on UWSN security threats, requirements, issues, and solutions. Keywords UWSN security · Malicious attacks · Requirements · Challenges · Solutions

1 Introduction Wireless sensor networks include terrestrial and underwater subclasses (UWSNs and TWSNs, respectively). Underwater wireless sensor networks, more commonly D. Jocil (B) · R. Vadivel Department of Information Technology, Bharathiar University, Coimbatore, Tamil Nadu 641046, India e-mail: [email protected] R. Vadivel e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_53

693

694

D. Jocil and R. Vadivel

known as UWSNs, are used to analyze and keep an eye on the underwater environment, enabling and facilitating the evaluation of natural resources in marine areas. Examples of UWSN applications include marine data collection, tragedy deterrence, discovering underwater resources, personnel rescue, and military [1]. Limited power, high propagation delay, limited bandwidth, variable speed, and path loss cause cruel attacks in UWSNs [2]. The amount of water on Earth is roughly 71%, but less than 10% of it has been investigated [3]. Due to this reason, researchers and the academic world pay attention to UWSN studies. Signal attenuation, salinity, temperature, low sound speed (1500 m/s), background noise, and multipath scattering are a few peculiar underwater factors that contribute to the peculiar underwater environment [4]. A reliable cluster-based secure routing technique is popular in underwater sensor networks. The challenge for researchers is to defend trusted node-to-node communication from attackers [5].

2 UWSN Architecture The UWSN system architecture as stated in Fig. 1 consists of important parts such as a base station, surface buoys, radio or acoustic links, cluster head (CH), and sensor nodes. The architecture sensing nodes can sense data, compute and transfer the gathered information. Various cluster connecting points transfer the gathered information to the buoys on the water’s surface. The water surface buoys perform hub operation, and the base station located on land performs remote control operations on underwater sensor nodes by controlling the buoys on the water surface of the acoustic wireless communication network. Figure 1 illustrates the basic system architecture for underwater wireless sensor networks.

3 UWSN Security Threats and Requirements 3.1 Security Threats Many potential security threats and challenges are unfilled in Underwater Sensor Networks (UWSNs). Many research articles emphasize deploying new approaches and techniques to enrich UWSNs’ network communication. Even though the network environments are susceptible to many attacks on the network layers such as physical, data link, network, and application, due to an insecure working environment, underwater acoustic sensor channels suffer from many limitations like transmission error, physical vulnerability, transmission loss, dynamic network topology, limited bandwidth, variable propagation delay, ambient noise, multipath, Doppler effect, etc.

Network Security Risks, Challenges, and Solutions for Underwater …

695

Fig. 1 UWSN architecture

[6]. The UWSN threats and attacks are categorized as passive and active as shown in Fig. 2. Passive Attacks. Unauthorized eavesdropping on the communication path can result in passive attacks. As shown in Fig. 3, the eavesdropper keeps track of communication exchanges and collects data without the sender or receiver’s knowledge. The primary issues with passive attacks are eavesdropping, interference, message distortion, imitation, and secret information leak. Eavesdroppers utilize underwater microphones and hydrophones to monitor packets being broadcast in the communication station. Due to acoustic channels’ open and vulnerable nature, the attacker can easily exploit

Fig. 2 UWSNs’ attack classification

696

D. Jocil and R. Vadivel

Fig. 3 Passive attack

packet exchange detection, identifying the data shifting locations, influencing the node station, and packet movements. Utilizing encryption techniques, which make it challenging for attackers to obtain any information throughout the communication process [7], is the most effective way to prevent these assaults. Active Attacks. The objective of active attacks is to capture network data and insert, modify, destroy, or remove data on the communication process over network circumstances as shown in Fig. 4. The threat launch from an insider node may cause substantial damage to the network. Insider node attacks are more damaging than outside attacks and are harder to defeat, according to previous research [8]. Trust management techniques, authentication protocol, encryption techniques, and trusted node-to-node cooperative routing techniques can be used to detect and avoid these attacks [9]. Active attacks can be classified as below based on their objectives.

Fig. 4 Active attack

Network Security Risks, Challenges, and Solutions for Underwater …

697

Node compromise attacks. In UWSNs, the sensor nodes are placed in harsh, hostile water conditions. It is unable to ensure the safety of every tens or hundreds of nodes placed throughout a wide portion of a network. Attackers can capture, decrypt, and modify data stored in memory. The compromised nodes are then utilized to inject themselves into the network as legitimate nodes to monitor, steal data, and interfere with network communication, doing serious harm [9]. Repudiation attacks. Malicious nodes try to block a specific operation or communication with another node in a repudiation attack. In this technique, a malicious user can modify the authoring data by entering false user credentials. In these assaults, the network is unable to track the malicious behavior to identify the hacker. DoS attacks. A Denial-of-Service (DoS) attack, which prevents authorized nodes on the network from accessing resources, can be carried out in several different methods. The attacker’s goal is to access network services by blocking legitimate nodes, and these attacks are critical and difficult to identify. DoS attacks interrupt network protocols, lower the overall communication network’s availability, and are more harmful because they are difficult to identify [10].

3.2 Security Requirements UWSNs need the equivalent crucial safety amenities as TWSNs, comprising the following requirements. Confidentiality. Exchanging data packets between nodes on UWSN is a crucial role to guarantee confidentiality. The exchanging packets cannot be accessed by an unauthorized party. Privacy should arise in the user’s information, preventing MAC, routing protocols, etc. In [11], entropy based ant colony optimization (E-ACO) is presented for UWSN’s secure route selection. It provides a higher Packet Delivery Ratio (0.89) with 500 nodes matched with existing approaches such as FRR-MFO, SAPDA, IACO, and SAPDA. Integrity. Integrity is to protect the received data from modification, corruption, or removal by unauthorized nodes in transition by a malicious attack. Inter-Cluster Authentication Technique can be used for protecting the packet on delivery from unauthorized users using cryptography techniques. The Trusted Third Party (TTP) plays a vital part in reviewing all critical transaction communications between the sources to the destination as mentioned below. Availability. Availability guarantees that data are available to an authorized user on demand. It ensures that the UWSN network must be sufficiently healthy and provide services even if the system is attacked. Proper redundancy tactics, disaster recovery, proper monitoring, server clustering, and continuity of operations planning can provide availability for UWSNs. Detection and prevention from DoS are a significant approach to avoid high loss in case of natural disasters [12].

698

D. Jocil and R. Vadivel

Authenticity. Authentication must be provided for incoming packets to ensure that the packets are sent from a valid neighbor node. They can be exchanged in a protected way. Node authentication schemes, cooperative authentication schemes, and trusted secure identity authentication schemes are popular techniques to provide trusted Node-to-Node (N2N) cooperative authentication [12]. Non-denial. A node cannot claim that it is unable to carry out specific tasks, such as sending or receiving data. Digital signatures are used to achieve non-repudiation in UWSN communications. All packets swapped with the sink are encrypted and verified using a special key. SecFUN, Quartz, BLS, and ZSS are a few digital signatures used in the existing system model for authentication [13, 14].

4 UWSN Characteristics and Challenges 4.1 Unsecure UWSN Environment Nodes in underwater architecture are usually installed in opaque and hostile surroundings which creates a place for antagonists to capture and damage the communication nodes. It is practically incredible to implement physical counterbalancing to protect all of them [15, 16]. The secret contained in compromised nodes is often possible for leakage which should be handled by a logical removal of those compromised nodes by rekeying the whole network [17].

4.2 Imperfect Bandwidth Many researchers studied simulation analysis and real experiment-based network communication. Many issues that need to be addressed in the future are essential, mainly the Medium Access Control (MAC) protocol at the data link layer according to previous studies. Coordination of UWSN network nodes to shared channel access is the main objective of MAC and also should ensure that valid data are sent and the efficient reach of destination [18].

4.3 Unreliable Data Communication The traditional positioning and localization systems cannot perform successful communication in the underwater working environment. So, the node locations and the network topology are dismantled by underwater conditions which cause unreliable data transmission. The underwater clatter also causes communication harm, affecting data transfer reliability [19].

Network Security Risks, Challenges, and Solutions for Underwater …

699

4.4 Communication Channel and Transmission Range The underwater acoustic network communication’s standard frequency differs in communication ranges. The capacity of signal absorption based on the level of water depth is a distinct feature of the underwater environment. To reduce the signal absorption effects, the frequency should be minimized. Since the transmission range becomes longer, the possibility of disruption and high data collisions is a challenging concern [19].

4.5 Cooperation of Heterogeneous Nodes Satisfactory combination and data transfer between sensor nodes and underwater vehicles are necessary actions to sustain wide-ranging application developments. It becomes a critical task for heterogeneous node collaboration in a functioning system since communal standards and interfaces to support communication are lacking [20]. Moreover, the different sort of resources and heterogeneity that supports portable and static nodes in specific application implementations also face an additional challenge.

5 Primary Solutions for UWSN Challenges 5.1 Key Management Maintaining secrecy, authenticity, and non-repudiation calls for the employment of cryptographic methods and key management. Through the use of encryption, sensitive data can be transmitted through insecure networks like the underwater auditory network without worrying about unauthorized users reading or changing it. The distribution of certificates and keys must be supervised by the certificate authority in traditional public key cryptography techniques, which incurs high computation and communication costs [21, 22].

5.2 Trust Management A cryptographic protocols trust evaluation system is a crucial part of security defense and offers significant advantages for intrusion detection. Investigating trust management solutions for UWSNs is difficult due to the traits and challenges of UWSNs. The three major trust framework systems currently in use are hierarchical, distributed, and centralized schemes [21].

700

D. Jocil and R. Vadivel

5.3 Routing Security Routing security uses fundamental transportation and connection security techniques to safeguard both individual nodes and routing protocols. To create the network topology, nodes must also communicate with one another in the communication system. Two elements of routing security are secure forwarding and safe routing. To provide accurate network data and sustain the network connection, protected networking requires nodes to work together. Data packets must be protected during transport to stop unauthorized individuals from viewing, deleting, and changing them [22].

5.4 AI-Inspired Cybersecurity Solutions for Securing UWSNs In Underwater Wireless Sensor Networks, an AI-based intrusion detection system that can identify a range of potential threats is necessary. More effective and efficient AI-based cybersecurity solutions continue to be of interest. The purpose of artificial intelligence’s cybersecurity system is to provide the highest amount of protection against threats while providing the quickest and most efficient solutions for accident detection and recovery. To identify, group, and pinpoint underwater events, for instance, [23] uses a unique AI-centered technique. The efficacy of a novel AIbased method for tracking underwater events inside of water was demonstrated and evaluated in a real-time situation.

5.5 Big Data Integration The integration of big data with UWSNs is one of the well-liked developments that present difficulties in real-time analytics, accuracy, communication, and visualization. Due to the different data formats, protocols, and service constraints of multiple applications, it is both required and challenging for another system to modify. After deployment, the undersea vehicles require tricky reconfiguration. Because numerous constructors are using UAVs, it is challenging to develop cooperative communication between UWSN technologies [24].

6 Conclusion and Future Work This paper briefed security in UWSNs, outlining the specific architecture of these networks, security threats, requirements, characteristics, and challenges with possible

Network Security Risks, Challenges, and Solutions for Underwater …

701

primary solutions. The required techniques to overcome security issues are discussed in this paper. Finding solutions for the issues described below are essential. • New theoretical representations are very much an essential need in both computational and analytical progresses. • The large use of field experiments and testbeds is the essential route to the next step in UWASNs. The more accurate performance and analysis and characterization of the network system should be executed with field experiments and test beds. Such work may feed UWASN’s communication into the next generation. • Pre-installed security scheme sensor nodes are necessary in future work, and it is necessary to reconfigure the security scheme from time to time in case of attacks on the network. Network system design and implementation of such reconfiguration are a challenging and critical issue. These research issues remain wide open for future investigations.

References 1. Irfan Ahmad, Taj Rahman, Asim Zeb, Inayat Khan, Inam Ullah, Habib Hamam, Omar Cheikhrouhou.: Analysis of Security Attacks and Taxonomy in Underwater Wireless Sensor Networks. Wireless Communications and Mobile Computing, vol. 2021, Article ID 1444024, 15 pages, (2021). https://doi.org/10.1155/2021/1444024. 2. A. Al Guqhaiman, O. Akanbi, A. Aljaedi and C. E. Chow.: Lightweight Multi-factor Authentication for Underwater Wireless Sensor Networks. 2020 International Conference on Computational Science and Computational Intelligence (CSCI), (2020), pp. 188–194, DOI: https:// doi.org/10.1109/CSCI51800.2020.00039. 3. Ali T, Irfan M, Shaf A, Saeed Alwadie A, Sajid A, Awais M, Aamir M (2020) A secure communication in IoT enabled underwater and wireless sensor network for smart cities. Sensors 20:4309. https://doi.org/10.3390/s20154309 4. Karim S et al (2021) GCORP: geographic and cooperative opportunistic routing protocol for underwater sensor networks. IEEE Access 9:27650–27667. https://doi.org/10.1109/ACCESS. 2021.3058600 5. Krishnaswamy V, Sunilkumar S. Manvi.: Trusted node selection in clusters for underwater wireless acoustic sensor networks using fuzzy logic. Phys Commun 47(2021), 101388, ISSN 1874-4907, https://doi.org/10.1016/j.phycom.2021.101388 6. Shahapur S, Khanai R (2016) Localization, routing and its security in UWSN—A survey. Int Conf Electr, Electron, Optimization Techniques, ICEEOT 2016:1001–1006 7. Ateniese G, Capossele A, Gjanci P, Petrioli C, Spaccini D (2015) SecFUN: security Framework for Underwater acoustic sensor Networks. OCEANS 2015—Genova, pp 1–9 8. Ahmed MR, Aseeri M, Kaiser MS, Zenia NZ, Chowdhury ZI (2015) A novel algorithm for malicious attack detection in UWSN. In: 2nd International conference on electrical engineering and information and communication technology, ICEEICT 2015, no. May, pp 21–23, (2015). 9. Jiang S (2019) On securing underwater acoustic networks: a survey. IEEE Commun Surv Tutor 21(1):729–752 10. Suresh Wati, Nitin Rakesh, Parma Nand Astya, Dr. Ashish Kumar (2020) Attacks in underwater sensor network. Int J Interdisc Innov Res Dev (IJIIRD), ISSN: 2456-236X 05(Special Issue 01) 11. Premkumar Deepak S, Mukeshkrishnan MB (2022) Secured route selection using E-ACO in underwater wireless sensor networks. Intell Autom Soft Comput, 32(2):963–978. https://doi. org/10.32604/iasc.2022.022126

702

D. Jocil and R. Vadivel

12. Kajwadkar S, Jain VK (2018) A novel algorithm for DoS and DDoS attack detection in Internet of Things. In: 2018 Conference on information and communication technology (CICT), pp 1–4. https://doi.org/10.1109/INFOCOMTECH.2018.8722397 13. Ateniese G, Capossele A, Gjanci P, Petrioli C, Spaccini D (2015) Sec-FUN: security framework for underwater acoustic sensor networks. In Proceedings of the MTS/IEEE OCEANS, Genoa, Italy, May 2015 14. Basagni S, Petrioli C, Petroccia R, Spaccini D (2015) CARP: a channel-aware routing protocol for underwater acoustic wireless networks. Ad Hoc Netw 34:92–104 15. Kong, J, Ji Z, Wang W, Gerla M, Bagrodia R, Bhargava B (2005) Low-cost attacks against packet delivery, localization and time synchronization services in underwater sensor networks. In: WiSe—Proceedings of the 2005 ACM workshop on wireless security, pp 87–96. https:// doi.org/10.1145/1080793.1080808 16. Patron P, Petillot Y (2008) The underwater environment: a challenge for planning (2008). In: Proceedings of the 27th workshop of the UK Planning and Scheduling Special Interest Group, Edinburgh, UK, Dec 17. Dini G, Lo DA (2012) A secure communication suite for underwater acoustic sensor networks. Sensors 12(11):15133–15158. https://doi.org/10.3390/s121115133 18. Etter PC (1996) Underwater acoustic modeling, principles, techniques and applications, 2nd edn. E & FN Spon 19. Stojanovic M (2008) Underwater acoustic communications: design considerations on the physical layer. In: 2008 Fifth annual conference on wireless on demand network systems and services, (2008), pp 1–10. https://doi.org/10.1109/WONS.2008.4459349 20. Fattah S, Gani A, Ahmed I, Idris MYI, Targio Hashem IA (2020) A survey on underwater wireless sensor networks: requirements, taxonomy, recent advances, and open research challenges. Sensors 20(18):5393. https://doi.org/10.3390/s20185393 21. Taguigguig EH, Touati Y, Ali-Cherif A (2017) ECC based-approach for keys authentication and security in WSN. In: Proceedings of the 2017 9th IEEE-GCC conference and exhibition (GCCCE), Manama, Bahrain, 8–11 May 2017), pp 1–4 22. Simplicio MA, Silva MV, Alves RC, Shibata TK (2017) Lightweight and escrow-less authenticated key agreement for the internet of things. Comput Commun 98:43–51 23. Stork J, Wenzel P, Landwein S, Algorri M-E, Zaefferer M, Kusch W, Staubach M, BartzBeielstein T, Kohn H, Dejager H, Wolf C (2021) Underwater acoustic networks for security risk assessment in public drinking water reservoirs. Artif Intell (cs.AI), 2021. https://doi.org/ 10.48550/arXiv.2107.13977 24. Kim, B-S, Kim K-I, Shah B, Chow F, Kim KH (2019) Wireless sensor networks for Big Data systems. Sensors 19:1565. https://doi.org/10.3390/s19071565

False Data Injection Attack Detection in VANET Using Upgraded Grey Wolf Optimization Algorithm Using LSTM Classifier M. S. Bennet Praba and R. Rathna

Abstract Detection of cyberattacks is of utmost importance and is also needed in various applications like VANET. Despite the potential benefits in a number of areas such as traffic management, reduced fuel consumption, and driver assistance, VANET safety is a challenging area in defending itself from various cyber security attacks like False Data Injection Attack (FDIA). In our proposed system, a unique method using Grey Wolf Optimization Algorithm integrated into Long Short-Term Memory (LSTM) model is used to detect the attack FDIA to ensure the trusted data flow in Smart Intelligent Transport System. The primary goal of this system is to address the detection of fake data injection in cyberattacks. It is necessary to effectively control the data flow between the cars and employ the Grey Wolf Optimization method, an optimized feature extraction technique, in order to accomplish this goal. The acquired test and results showed that used technique ought to extra exactly and robustly perceive more than one sort of FDIAs with 98% precision. Keywords False data injection attack · Cyber-attack · VANET · Deep learning technique · Grey wolf optimization

1 Introduction The smart intelligent vehicle networking system permits vehicular communication as well as between those vehicles and pedestrians and other road users [1]. Regardless, new cyber-physical attacks can be made against intelligent car networking systems because of their openness and intelligence [2, 3]. An efficient algorithm should be created in order to handle connected vehicles with effectiveness and to properly fix any problems. A few studies look at various confirmation and protection strategies M. S. Bennet Praba (B) Department of CSE, SRMIST, Chennai, India e-mail: [email protected] R. Rathna Department of IT, SRMIST, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_54

703

704

M. S. Bennet Praba and R. Rathna

that have been put forth. Currently, processing and verifying these notifications takes a lot of calculation time in automobiles [4]. When sharing information with another car or a roadside unit (RSU), the vehicular ad hoc network is subject to numerous dangers and difficulties. The data obtained from another car or RSU needs to be reliable. Intruders shouldn’t mess with it. Additionally, the sender’s identity or privacy should be protected when transmitting information. It ought to be private. However, if the sender is delivering undesired or fraudulent signals, it damages other cars. In this situation, a reputable authority should disclose the identification of the vehicle. Researchers look into a variety of security and privacy-enhancing techniques [5]. Compared to normal vehicles, connected vehicles are more susceptible to cyberattacks and hostile enemies. Cyberattacks on the Dedicated Short Range Communication (DSRC) network compromise its dependability and provide risky situations. Cyberattacks on connected cars reduce system performance, result in severe financial losses, injuries, and energy losses. FDIA is a cyberattack that target Cyber-Physical Systems (CPS) integrity by gaining access to some of the sensors and providing false data to the controller [6]. The attacker must carefully plan his input to deceive the controller in order to carry out the attack because anomalous sensor measurements often result in an alarm [7]. The system’s fake data injection assault can be found using well-known fault detection algorithms such the Kalman filter [7, 8], linear and nonlinear observer design [7], etc. However, attackers can now launch cyberattacks against CPS. The potential of a few strategies have been investigated and taken into consideration for the development of supervised and semi-supervised approaches in paper [9], in accordance with the various types of attack detection components, based entirely on the device learning. Deep learning-based new methods for detecting FDIA in connected automobiles have recently been presented. This research focuses on detecting FDIA and providing an optimized solution against FDIA in intelligent vehicle networking systems. The rest of the paper is organized as follows. Section 2 represents the related work. Section 3 explains the proposed model. Section 5 focusses on experimental results and design. Section 6 concludes the work with future enhancement.

2 Related Works Numerous studies were conducted to find cyberattacks. The injection of false information into an Intelligent Transport System may send fake messages to other vehicles and may cause damage to others. Depending on their motivation, attackers are typically categorized as either self-centred or malicious. An avaricious driver can influence other vehicles by issuing warnings to encourage them to take detours, and it allows the attackers access to the road. A packet that looks to have come from the target node is sent by an attacker after selecting it as its target. In order to spread the fake data throughout the network, the attacker produces the packets for a destination node that is selected at random and broadcasts them [10]. A malicious vehicle can

False Data Injection Attack Detection in VANET Using Upgraded Grey …

705

cause traffic jams by sending unwanted information, fake accident alerts or to steal information. When there is no other vehicle available, this could be highly effective [11]. A support vector machine learning strategy is offered for the detection and identification of FDIA [12]. Injecting unwanted information in the transmitted data, FDIA can trick the standard detection mechanism. The finest attack strategies for the cyberphysical system of intelligent vehicle networking were offered by Guo et al. in [13] after researching the concealed characteristics of FDIA. General detection techniques cannot detect FDI attacks because they mimic established techniques. As a result, it is quite challenging to stop FDI attacks in intelligent transportation systems. Using bio-inspired methods, deep learning networks can be adjusted and optimized to attain minimal complexity and excellent performance for very large datasets. The feature selection has a significant impact in boosting intrusion detection system’s performance. Bio-inspired meta-heuristic algorithms used in engineering problems which imitates the behaviour of biological activities like searching for prey [14]. Many meta-heuristic techniques have many limitations like imbalance between explorative and exploitative capabilities of algorithms. Grey Wolf Optimization is widely used to overcome these issues. GWO algorithms are used in various areas like classical engineering design problems, image processing, clustering, surface waves, job scheduling, Unmanned Combat Aerial Vehicle (UCAV), Optimal Reactive Power Dispatch (ORPD), Bankruptcy Prediction, Economic Load Dispatch Problem, Optimization problem, and Smart Green House. And also, it is used in various machine learning operations like feature selection, clustering, classification, hyperparameter tuning, etc. [15]. Grey Wolf Optimization Algorithm has fewer parameters which is simple to implement, and it is widely used in the research field [16]. The Grey Wolf Optimization Algorithm (GWOA) produces better results than other intrusion detection systems.

3 Proposed Model When communicating between vehicles, messages gathered by one vehicle are sent to another vehicle. However, the intrusive party might insert bogus data, endangering the safety of the vehicle. The suggested intrusion detection system with updated Grey Wolf Optimization Algorithms uses the feature selection process to decrease the dimension of data in order to assure the safety of the vehicle. LSTM classifier is used to increase the system’s classification accuracy. Finally, it is determined whether the FDIA attack is real or fraudulent. And below is an explanation of the proposed model that is employed in the feature selection process.

706

M. S. Bennet Praba and R. Rathna

3.1 Deep Learning Technique For real time applications like VANET where a huge dataset is required, deep learning networks like Recurrent Neural Network (RNN), LSTM model is suggested. RNN cannot remember the prior data in VANET. LSTM can overcome this problem. We propose upgraded GWOA into Long Short Term-Memory for achieving highly accurate detection with less computational complexity. According to [17], deep learning techniques can be used to identify FDIA. Although various measures for detecting FDIA attacks are initially applied in smart grid, it can be used in other domains such as governance, finance, healthcare, medical imaging, fraud credit analysis, cyber security, defence [18].

3.2 FDIA-GWO-Based Intrusion Detection System The Smart Intelligence Transportation Intrusion Detection System (SITIDS) is placed in the smart vehicle and the installation procedure is shown in Fig. 1 in order to protect each vehicle against fake data injection attacks. Every packet that is received goes through feature extraction, which extracts the pertinent data from the packet and creates a component vector. The updated Grey Wolf Optimization Algorithm that has been suggested has better feature size reduction performance. The goal is to use an LSTM classifier to reduce the feature size and increase classification accuracy. The dataset includes information about the vehicle’s identification number, speed, location information, including longitude and latitude positions, sender ID, message count, receiving time, and sensor data. Only pertinent

Fig. 1 Model of SITIDS for message classification

False Data Injection Attack Detection in VANET Using Upgraded Grey …

707

features are chosen from the dataset after the feature selection technique has been applied. Longitude and latitude are two examples of data items that can be quite noisy. We consider characteristics computed over sequences of messages rather than single messages to address noise in the data. Features for noisy data are deleted and we also examine aspects that verify the accuracy of the supplied numbers, such as comparing the message count to that calculated from reported positions, which is inspired by the False Data Injection Attacker concept. There are additional factors that must be taken into account, such as throughput and packet delivery ratio. This data will then be fed into the machine learning classifier LSTM that is appropriate for VANET. The classifier then decides whether to accept or reject the message. Victims can insert false data in sensor data or they can inject false location data by modifying genuine data. If the classifier decides to accept the data, it will update the vehicle’s information by adding it to the table of nodes that surround it. The value of the x t h component in the vector is obtained using componentx= componentx,payload − componentx,avg

(1)

where componentx,payload is the xth feature obtained from payload and componentx,avg is the xth feature’s average value as determined by the other vehicles. The component vector is detected, and either a reliable or unreliable classification is given using GWO algorithm.

4 Proposed Algorithm 4.1 Grey Wolf Optimization Algorithm Detection system for intrusions were inspired by the GWO, which imitates the shrewd leadership and hunting tactics of grey wolves in the wild, who hunt in packs of five to twelve wolves, grey wolves are a canine species. Alpha, beta, delta, and omega are the four different types of wolf leadership hierarchies that GWO uses to solve optimization problems. The three primary phases of the GWO’s hunting behaviour are exploration, encirclement, and attack. The team leader, alpha or member, makes decisions on behalf of the group serves as its advisor. Figure 2 depicts the grade of the wolf hierarchy. While the fourth wolf omega is in charge of following the other wolves, the first three wolves are in charge of optimization [19]. The prey will be the optimal solution of the optimization. Most of the logic follows the equations: − → →− → −→ − DG = C . X p (t) − X (t)

(2)

708

M. S. Bennet Praba and R. Rathna

Fig. 2 Hierarchy of grade of wolf

− → − → − → −→ X (t + 1) = X p (t) − A . DG

(3)

− → − → − → where t denotes current iteration, A and C are coefficient vectors, and X p is the position vector of the prey. A = 2 a . r1 − a

(4)

C = 2 r2

(5)

where components of a are linearly decreased from 2 to 0 and r1 and r2 are random − → vectors with values from [0,1], calculated for each wolf at each iteration. A controls − → the trade-off between exploration and exploitation while C always adds some degree of randomness. Since we don’t know the real position of the optimal solu− → tion, X p depends on the 3 best solutions and the formulas for updating each of the agents (wolfs) are: − − − → − → −→ → − → −→ → − → −→ → − → − → − DG α = C 1 . X α − X , DG β = C 2 . X β − X , DG δ = C 3 . X δ − X (6) − → − → − → − → − → − → − → − → − → X 1 = X α − A 1, X 2 = X β − A 2, X 3 = X δ − A 3

(7)

− → − → − → X1+ X2+ X3 − → X (t + 1) = 3

(8)

− → − → − → where X and X (t + 1) and X (t + 1) represents the current and the updated position. The formula above indicates that the position of the wolf and will be updated accordingly

False Data Injection Attack Detection in VANET Using Upgraded Grey …

709

4.2 Upgraded Grey Wolf Optimization Algorithm An essential step in data mining and machine learning applications is dimension reduction utilizing feature selection techniques. Feature selection is one of the metaapplications, heuristic which involves choosing the most pertinent and instructive features while disregarding the noisy and redundant features. As a result, when the search space is exceptionally wide, feature selection is regarded as a challenging problem. When it comes to solving the FS problem, the binary version of GWOA performs better [20–22]. For intrusion detection systems, a modified version of the GWOA is suggested, particularly during the population initialization stage [23]. We provide a clever starting method to arrive at the optimal answer in the first few iterations, hastening the algorithm’s convergence. The primary change was made by using the wrapper-based approach to initialize the population rather than choosing it at random after consulting the filter-based information. The value of the information gained determines whether or not to choose a feature, and is used to construct the initial population. Equation 9 represents the calculation of the information gain for a particular feature sf. GoI (s f ) = −

Pb(Cl x ) log Pb(Cl x ) + P(s f ) P(Cl x |s f 0) log log P(sp0) + Pb s f s f | 0 Pb Cl x |s f | log Pb(Cl x |s f | 0)

(9)

where Cl represents features, x represents class labels, of the xth class is probability | shown by Pb(Cl x ). Probabilities represent the value of the Pb(s f and Pb s f ) classes. Pb(Cl x |s f ) and Pb Cl x |s f | are the class’s conditional probabilities when the feature is present sf. The MGWOA’s population is split into two groups. The initial one reflects the population injection ratio using the suggested modified technique. A feature that is important for classifying the instance is one with a high increase of information value. The suggested method makes sure that features with high information gain values are included in the initial population by applying this equation and the one that follows. Equation 10 depicts the initialized injected population, which is dependent on the information gain values. Pb(x){1, i f r no < Normalized GoI (x)0, i f r no ≥ Normalized GoI (x)

(10)

where Pb(x) is the binary representation of the xth feature in the initial population, and r no is a number in the range [0,1]. The other part represents the remaining population (1-injection ratio) and it is initialized as Pb(i) = (1, i f r no > 0.5 and 0, i f r no ≤ 0.5

(11)

710

M. S. Bennet Praba and R. Rathna

For multiple layer feed forward neural networks, a quick machine learning approach was described in [20]. The fitness function is seen as a crossover error rate. A low crossing error rate indicates successful performance, and the fitness function is calculated as Fitness f n =∝ ×(|FitPRate − FitNRate|) + β × |C F||N o f

(12)

where ∝ and β represent the weight which is between zero and one, where the feature chosen is represented as CF. Nof represents the total number of features. Fitness positive rate and negative rate is represented as FitPRate and FitNRate. Algorithm-1 explains the proposed architecture work S. No.

Algorithm-1 for proposed GWOA-LSTM model

01

Input = (Xi,1 = 1to n) where i represents the input attributes

02

Output: Detection of FDIA Attack

03

Begin

04

Initialize the population using equation (2) and (3). Twenty epochs were used to initialize, and the learning rate was 0.001

05

Randomly distribute the input bias, hidden layers, and weights

06

Set α, β, δ positions using equation (6) and initialize using equation (9) and (10)

07

While (true)

08

Calculate the fitness function using equation (12)

09

Update the agent by using equation (12)

10

If (fitness function== threshold)

11

Go to Step 16

12

Else

13 14

Go to Step 08 End

15

End

16

If Normal Data traces then output ≤ to 1) // No traces were discovered

17

FDIA attack were discovered if output > 1)

18

End

5 Experimental Design and Results OMENT++ and Python on a Windows 10 Pro Operating system were utilized as an experimental setup. All the learning models were implemented using NVIDIA Tesla k40 enabled Tensor flow v-4 backend with Keras5 higher-level. For normal and attack scenarios, for generating the dataset for VANET, data generation unit

False Data Injection Attack Detection in VANET Using Upgraded Grey …

711

is constructed using the combination of OMNET++, SUMO, TRaCl, VEINS, and Python-based attack injection module. The given Table 1 which is shown below tabulated the various performance metrics of the proposed upgraded GWO algorithm. To evaluate the performance of proposed framework, various standard parameters are observed in terms of accuracy, sensitivity, specificity, F1-score, and precision are calculated. The proposed learning model is trained with the training data and testing data is used to analyse the learning models. Training and testing accuracy of the proposed model for FDIA attack is given in Fig. 3. The training accuracy is found to be 98.5%, and the testing accuracy is found to be 98.45% in detecting the FDI attack. The receiver operating curve (ROC) helps in identifying a classifier’s effectiveness by displaying the actual positive rate versus the classifier’s false positive rate. Figure 4 shows that the maximum area covered by the proposed model is 0.93 I, and the maximum accuracy is 98.2% in detecting FDIA. Figure 5 shows that proposed models have achieved good results (98.45% accuracy, 98.34% precision, 98.4% recall, 98.35% precision, and F1-score 98.5%) under the standard instances. Table 1 Performance metrics used for evaluation of proposed framework S. No.

Standard parameters

Mathematical expression

01

Accuracy

02

Sensitivity

03

Specificity

04

Precision

05

F1-Score

TPoV+TNeV TPoV+TNeV+FPoV+FNeV V TPoV TPoV+FNeV × 100 TNeV TNeV+FPoV TNeV TPoV+FPoV Precison∗Sensitivity Precision+Sensitivity

Where TPoV—true positive values, TNeV—true negative values, FPoV—false positive values, and FNeV—false negative values

Fig. 3 Training versus testing accuracy for FDIA

712

Receiver Operating Characteristic curves True Positive

Fig. 4 Receiver operating characteristic curves for FDIA

M. S. Bennet Praba and R. Rathna

1 0.95 0.9 0.85

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

Fig. 5 Performance of the proposed framework in detecting under the regular instance and FDIA attack instance

Performance Metrics (%)

False Negative

98.6

Accuracy

98.5

Precision

98.4 98.3

Recall

98.2

Specificity

98.1

F1-Score

98

0

0.25

0.5

0.75

Probability of attacks

6 Conclusions and Future Enhancements The model proposed the upgraded version of Grey Wolf Optimization Algorithm for feature selection with LSTM model and detects FDIA attack effectively in smart intelligent transportation system. Proposed model has achieved good results with 98.34% precision, 98.4% recall, 98.35% precision, and F1-score 98.5% under the standard instances. Lower fitness values in the upgraded Grey Wolf Optimization with LSTM classifier implies greater performance. Simulation results show that the proposed model has shown better accuracy of 98% in detecting the FDIA attacks in the VANETS. Similar techniques can also be used to detect more attacks like man-in-the-middle attack, DoS attack, etc. and the proposed GWOA is expected to give more accuracy. But the proposed model needs vast research in considering more performance metrics and also needs comprehensive research in defending more attacks in VANET without sacrificing the network’s quality of service.

References 1. Zhu F, Li Z, Chen S, Xiong G (2016) Parallel transportation management and control system and its applications in building smart cities. IEEE Trans Intell Transp Syst 17:1576–1585 2. Work DB, Bayen AM (2008) Impacts of the mobile Internet on transportation cyber-physical systems: traffic monitoring using smartphones. In: Proceedings of the national workshop for research on high-confidence transportation cyber-physical systems: automotive, aviation and rail, Washington, DC, USA, 18–20; pp 18–20, November

False Data Injection Attack Detection in VANET Using Upgraded Grey …

713

3. Fallah YP, Huang C, Sengupta R, Krishnan H (2010) Design of cooperative vehicle safety systems based on tight coupling of communication, computing and physical vehicle dynamics. In: Proceedings of the 1st ACM/IEEE international conference on cyber-physical systems (ICCPS ‘10), Stockholm, Sweden, 13–15; pp 159–167, April ˇ 4. Hubaux JP, Capkun S, Luo J (2004) The security and privacy of smart vehicles. IEEE Secur Priv 2:49–55 5. Bennet Praba MS, Femilda Josephin JS (2020) Review on various authentication schemes and attacks on connected vehicles. IOP Conf Ser: Mater Sci Eng 993 6. Real-time False Data Injection Attack Detection in Connected Vehicle Systems with PDE modelling, July 2020, DOI https://doi.org/10.23919/ACC45564.2020.9147977, Conference: 2020 American Control Conference (2020) 7. Mo Y, Sinopoli B (2010) False data injection attacks in control systems. In: Preprints of the 1st workshop on Secure Control Systems, pp 1–6 8. Xie L, Mo Y, Sinopoli B (2010) False data injection attacks in electricity markets. In: 2010 First IEEE international conference on smart grid communications (SmartGridComm), IEEE, pp 226–231 9. Ozay M, Esnaola I, Vural FTY, Kulkarni SR, Poor HV (2015) Machine mastering strategies for assault detection with inside the clever grid. IEEE Trans Neural Netw Learn Syst 27:1773–1786 10. Mintemur O, Sen S (2017) Attack analysis in vehicular ad hoc networks, pp 35–46, 09 11. Sakiz F, Sen S (2017) A survey of attacks and detection mechanisms on intelligent transportation systems: Vanets and iov. Ad Hoc Netw 61:33–50 12. Cabelin JD, Alpano PV, Pedrasa JR (2021) SVM-based detection of false data injection in intelligent transportation system. In: Proceedings of the 2021 international conference on information networking (ICOIN), Jeju Island, Korea, 13–16, pp 279–284, January 13. Guo L, Yu H, Hao F (2021) Optimal allocation of false data injection attacks for networked control systems with two communication channels. IEEE Trans. Control Netw. Syst. 8:2–14 14. Mnasri S, Bossche AVD, Nasri N, Val T (2017) The 3D redeployment of nodes in Wireless Sensor Networks with real testbed prototyping. In Proceedings of the international conference on ad-hoc networks and wireless, Messina, Italy, 20–22 Sept 2017; Springer, Berlin/Heidelberg, Germany, pp 18–24 15. Emmanuel et al (2022) Application of Grey wolf optimization algorithm: recent trends, issues, and possible horizons. Gazy University, J Sci 35(2):485–504 16. Long W, Jiao J, Liang X et al (2018) Inspired grey wolf optimizer for solving largescale function optimization problems, Appl Math Model 60 17. Ahmed M, Najmul Islam AKM (2020) Deep learning: hope or hype. Ann Data Sci 7(3):427–432 18. Ahmed M, Pathan (2020) False data injection attack (FDIA): an overview and new metrics for fair evaluation of its countermeasure, Complex Adapt Syst Model 19. Alzaqebah A, Aljarah I (2022) A modified Grey wolf optimization algorithm for an intrusion detection system. Math Comput Sci 10(6):999 20. Al-Tashi Q, Rais HM, Abdulkadir SJ, Mirjalili S, Alhussian H (2020) A review of grey wolf optimizer-based feature selection methods for classification. Evol Mach Learn Tech. 273–286 21. Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381 22. Faris H, Aljarah I, Al-Betar MA, Mirjalili S (2018) Grey wolf optimizer: a review of recent variants and applications. Neural Comput Appl 3 23. Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216

Investigation of the Role of Test Size, Random State, and Dataset in the Accuracy of Classification Algorithms Raj Kishor Bisht and Ila Pant Bisht

Abstract Test size, random state, and dataset play an important role deciding the accuracy of an algorithm. This important part is somehow not given due weightage in research literature. The objectives of the present works are to explore the role of test size, random sate, and dataset in the accuracy of classification algorithms; to check whether there is any significant difference between average accuracy score with or without cross validation, and finally, to define confidence intervals for the accuracy scores of different classification algorithms. Twelve datasets and five classification algorithms have been selected randomly for the study. For each dataset and algorithm, accuracy is analyzed for different test sizes and random states. Graphical representation and Chi square test of independence is used to check the dependence of test size, random state, and dataset in accuracy. Paired t-test is used to check whether the average accuracy score with or without cross validation is the same. The t-test is used to define confidence intervals for accuracy scores of different classification algorithms. We find accuracy depends on test size, random state, and dataset with the observation that test size and random state play minor roles in defining accuracy as these affect the accuracy within a small interval while dataset play a major role as for different datasets, the accuracy scores lie in different intervals. Keywords Test size · Random state · Classification algorithms · Accuracy

1 Introduction Machine learning (ML) algorithms are quite popular nowadays due to their applications in different fields like business, medical, etc. Working with big datasets is itself a challenging task, thus ML algorithms play an important role in analyzing R. K. Bisht (B) School of Computing, Graphic Era Hill University, Dehradun, India e-mail: [email protected] I. P. Bisht Department of Economics and Statistics, Government of Uttarakhand, Dehradun, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3_55

715

716

R. K. Bisht and I. P. Bisht

big data. ML algorithms are based on the concept of training the model using a test set and after the model gets trained, testing its performance using a test dataset. A classification algorithm learns different features from training data that are relevant to predict certain outcomes. Performance of a classification algorithm is also an issue that depends how well an algorithm learns from the training data. There may be a good debate on the size of training and test data as accuracy of a classification algorithm somehow depends on these sizes also. Random state in ML software decides the set of entities in the test and training set. As per the combinatorics, if there are N elements in a population and we have to select a sample of n elements, then there are C(N , n) possible samples. Random state decides one of the samples from these C(N , n) samples. Change in a random state provides a new set of entities in the sample and hence a new accuracy score. Thus, random state also has a role in deciding the performance of ML algorithms and this needs to be studied. The study of the effect of test size and random state in predicting accuracy is quite important as in some of the fields like medical, a small error may lead to dangerous results. In almost all cases, whenever we apply an algorithm, we consider a particular test size and random state, that is, we select a single sample and our results are based on the choice of the sample. In case of k-fold validations, we consider K samples of the same size and our results are based on the average scores of these samples. We do not verify our results based on the statistical theory whether the results are statistically significant or not. Thus, there is a need to study the effect of test size and random state in predicting accuracy of classification algorithms. In the literature, however, less discussion has been made for the role of test size and random state in deciding the accuracy of classification algorithms but different aspects that affect accuracy were studied by many researchers. Size of a dataset is an important factor in deciding the performance of ML algorithms [1–5]. There are many other factors like features size, scaling techniques, etc. that affect accuracy of a classification algorithm [6]. An algorithm may provide one kind of performance in one dataset and another kind of performance in another kind of dataset. This indicates that dataset is a significant factor in deciding the accuracy of an algorithm. Further within a single dataset, for different test sizes, the accuracy score may be different. Thus, test size affects the accuracy of a ML algorithm [7, 8]. For an arbitrarily chosen test size, the obtained accuracy may not be a true representation of the accuracy, thus there is a need to analyze the variations in accuracy scores for different test sizes. One of the earlier works in this direction for pattern classifiers can be found in [9]. The authors of the work [10] observed a certain decrease in accuracy while decreasing training size. K-fold cross validation may help to provide an improved accuracy of a classifier [11]. The authors in their work [12] used some sample selection methods as well as k-fold cross validation for ML classification. In the work of [13], the authors examined K-nearest neighbor (KNN) and k-fold cross validation for accuracy of classification problems. Accuracy of ML algorithms for astronomical data is investigated in [14]. An overview of accuracy of classification algorithms in various ML applications is discussed in [15].

Investigation of the Role of Test Size, Random State, and Dataset …

717

The objective of the present study is to investigate the role of test size and random state in the accuracy of different classification algorithms. In precise, the present study investigates the following questions: Q1. Is the average accuracy for different test sizes and random states the same as the average accuracy with k-fold cross validation. Q2. Does accuracy of a classification algorithm depend on data size and random state? Q3. Find confidence intervals for accuracy scores of different classification algorithms.

2 Methodology We consider 12 different datasets (Table 1) randomly for the present study. Five datasets are from Kaggle: DS1, DS5, DS9, DS11, DS12 [16–20], and seven datasets are from UCI: DS2, DS3, DS4, DS6, DS7, DS8, DS10 [21]. We choose five different classification algorithms for the purpose: KNN, Naïve Bayes, Random Forest, Decision Tree, and Support Vector machine (SVM). We consider different test sizes (10, 15, …, 50%) and random states (0, 1, 2, …, 100) for analyzing accuracy scores of different algorithms for each dataset. We calculate accuracy scores with and without k-fold cross validation for each of the algorithms, datasets, test sizes, and random states. We use different statistical tests like paired t-test, Chi square test for analysis [22, 23]. Table 1 Datasets used for the study Dataset notation

Name of dataset

Size

Number of attributes

Type

DS1

Brain Tumor

DS2

Breast Cancer Wisconsin (Diagnostic)

3762

14

Numeric

699

10

Numeric

DS3 DS4

Chronic_Kidney_Disease

400

25

Mixed

Climate Model Simulation Crashes

540

19

Numeric

DS5

Diabetes

DS6

Diabetic Retinopathy Debrecen

Numeric

DS7

Eye State Classification EEG

DS8

Haberman’s Survival

306

4

DS9

Heart Failure Prediction

918

12

Mixed

DS10

MAGIC Gamma Telescope 19,020

11

Numeric

DS11

Rice Classification

18,185

11

Numeric

DS12

Water_potability

3276

10

Numeric

768

9

1151

19

Mixed

14,980

15

Numeric Numeric

718

R. K. Bisht and I. P. Bisht

To check whether the average accuracy scores of with and without k-fold cross validation methods are same or not, we apply paired t-test. The null hypothesis is framed as H0 : There is no significant difference between the average accuracy scores with and without k-fold cross validation of a classification algorithm X . The test statistics for paired t-test is defined as: t=

d √ sd / n

(1)

where d is the mean of the difference between paired observations (with or without k-fold cross validation), sd is the sample standard deviation of the distribution of the difference between paired observations, and n is the number of pairs under study. To check the dependency of test size and random state on accuracy score, we consider accuracy scores for different test sizes and random states. We use the Chi square test of independence for the purpose. We consider the null hypotheses H01 : Test size is independent of accuracy score and H02 : Random state is independent of accuracy score. We count observed frequency as the number of times an average accuracy score appears within an interval. The test statistic for Chi square test of independence is defined as: f i j − ei j 2 χ = (2) ei j i j 2

(2)

where f i j is the observed frequency and ei j is the expected frequency. Finally, √ confidence interval for accuracy scores of an algorithm X is calculated as x ± tα S/ n, where x is the mean, S is the sample standard deviation of the average accuracy scores of X for different datasets, n is the number of datasets, and tα is the tabulated value of t for n − 1 degree of freedom at α level of significance. For our study we choose α = 0.05.

3 Analysis In this section, we analyze the effect of size and random state in the performance of various classification algorithms. Accuracy (AC) of an algorithm is function of test size (T S), random state (RS), and dataset (DS). For an algorithm X , accuracy can be represented as follows: AC(X ) = f (T S, RS, DS)

(3)

Figures 1 and 2 show the effect of test size and random state in the accuracy scores of KNN classifiers without k-fold cross validation and with k-fold cross validation, respectively, for different datasets.

Investigation of the Role of Test Size, Random State, and Dataset …

719

Fig. 1 Accuracy of KNN classifier for different test size, random state, and datasets without k-fold cross validation

720

R. K. Bisht and I. P. Bisht

Fig. 1 (continued)

We observe that dataset has a major role, and test size and random state have minor role in predicting accuracy of an algorithm X . Given a dataset DSi , we found that there exists an interval [ai , bi ] ⊆ [0, 1] such that ai ≤ AC(X ) ≤ bi for any random state and test size. T S and RS only affect the variations in accuracy between ai and bi . Thus, for a dataset and given an algorithm, we get a least upper bound and greatest lower bound for accuracy score irrespective of test size and random state and we can say that accuracy is a function of T S and RS which gives the values within the interval [ai , bi ]. We further apply k-fold (k = 10) cross validation for each of the test size for a given random state and observe the average accuracies. We found that the interval [ai , bi ] is further reduced to [ai + δ, bi − δ], where δ is a very small number and can be interpreted as a cross validation impact factor. From Fig. 2, it is observed that the accuracy interval is reduced to a smaller interval for each dataset. To test the hypothesis H0 , we have shown the average accuracy scores with and without k-fold cross validation for 12 datasets using KNN method in Table 2. Upon calculation we get p− value = 0.161. At 5% level of significance, we accept the null hypothesis as the p− value is greater than 0.05 and conclude that the average accuracy using the two methods is same. The same results are obtained for other methods. This indicates that we can utilize the average accuracy of an algorithm for different test sizes and random states for further processing without going for k-fold validation as it is not time efficient. Algorithm 1 shows the steps of checking the independence of test size and random state on accuracy score of an algorithm.

Investigation of the Role of Test Size, Random State, and Dataset …

721

Fig. 2 Accuracy of KNN classifier for different test size, random state, and datasets with k-fold cross validation

722

R. K. Bisht and I. P. Bisht

Fig. 2 (continued) Table 2 Average accuracy scores of KNN classifier

Dataset

Average accuracy Without k-fold

With k-fold

DS1

98.13

98.24

DS2

96.17

97

DS3

94.85

95.61

DS4

93.09

92.52

DS5

74.09

72.79

DS6

61.47

61.54

DS7

88.24

87.58

DS8

70.85

70.1

DS9

86

85.1

DS10

83.52

83.38

DS11

98.84

98.89

DS12

63.84

62.74

Investigation of the Role of Test Size, Random State, and Dataset …

723

Algorithm 1: (Independence of test size and random state on accuracy score). 1. Select an algorithm and a dataset randomly. 2. for T S = 10 to 50 step size 5 for RS = 0 to 100 step size 1. calculate accuracy AC of the algorithm for given T S and RS. 3. 4. 5. 6. 7.

Find the minimum and maximum accuracy for all T S and RS. a = Min AC b = Max AC c = b−a 2 for T S = 10 to 50 step size 5

Observed Frequency for T S = Number of accuracy scores in the intervals [a, c) and [c, b]. 8. for RS = 0 to 100 step size 1 Observed Frequency for RS = Number of accuracy scores in the intervals [a, c) and [c, b]. 9.

Combine the observed frequencies for random state [0–10), [10–20),…, [90– 100]. 10. Apply Chi square test of independence for testing the independence of test size and Accuracy scores. Table 3 shows the observed frequencies of accuracy scores for different test sizes considering different random states for each test size. KNN classifier and ‘Breast Cancer Wisconsin (Diagnostic)’ dataset is used as an example. Upon calculation, we get p− value 0.017. Choosing 5% level of significance, we observe that p− value is less than 0.05, thus null hypothesis cannot be accepted and hence we conclude that there is some dependency between test size and accuracy score. Table 4 shows the observed frequencies of accuracy scores for different random states combined in the interval of twenties considering different test sizes for each interval of random states. KNN classifier and ‘Breast Cancer Wisconsin (Diagnostic)’ dataset is used as an example. Upon calculation, we get p− value 0.014. Choosing 5% level of significance, we observe that p− value is less than 0.05, thus null hypothesis cannot be accepted and hence we conclude that there is some dependency between Table 3 Observed frequencies for test size versus accuracy using KNN classifier Accuracy

Test size 10

15

20

25

30

35

40

45

50

95.6–97.8

15

23

19

23

18

13

8

10

11

97.8–1.00

86

78

82

78

83

88

93

91

90

101

101

101

101

101

101

101

101

101

Total

724

R. K. Bisht and I. P. Bisht

Table 4 Observed frequencies for random states versus accuracy using KNN classifier Accuracy

Random States 0–20

20–40

40–60

60–80

80–100

95.6–97.8

40

23

32

26

19

97.8–1.00

140

157

148

154

170

random state and accuracy score. We get the same results for other datasets and algorithms. As we have observed that accuracy score is dependent on test size and random state, in order to find a limit of accuracy score for different algorithms, we take the set of twelve datasets as a sample and apply five different algorithms to these datasets to get the average accuracy for different test sizes and random states. Table 5 shows the average accuracy scores of different datasets using different algorithms. For each of the algorithm, assumptions of t-tests are checked. Dataset are selected randomly and the variance in accuracy can be assumed the same as already we have shown that average accuracy scores lie in a small interval. We have checked the normality of each of the scores using the Shapiro Wilk test, all accuracy scores except Naïve Bayes score found normally distributed. Table 6 shows the 95% confidence interval for average accuracy scores of different intervals. Since Naïve Bayes did not satisfy the normality condition, we could not apply t-test. For other algorithms we found the confidence intervals for average accuracy scores. Thus with 95% confidence, we can conclude that accuracy of these algorithms should lie within the given range. An accuracy more than the upper limit or less than lower limit may be due to chance of a particular test size or random state. Table 5 Average accuracy scores for different datasets and algorithms Dataset

Average accuracy KNN

Naïve Bayes

Random forest

Decision tree

SVC

DS1

98.13

96.68

98.5

97.96

98.29

DS2

96.17

96

96.13

94.54

96.68

DS3

94.85

1

99.21

97.77

99.31

DS4

93.09

94.83

93.36

92.39

96.02

DS5

74.09

74.07

74.2

70.35

76.91

DS6

61.47

57.03

65.9

61.25

72.28

DS7

88.24

47.84

89.3

83.43

61.55

DS8

70.85

75.01

70.62

65.97

73.59

DS9

86

86.64

84.78

79.81

86.6

DS10

83.52

72.4

86.81

81.96

79.07

DS11

98.84

98.38

98.8

98.36

98.93

DS12

63.84

62.63

65.15

60.5

61.08

Investigation of the Role of Test Size, Random State, and Dataset … Table 6 Confidence interval for accuracy of different algorithms

Algorithm

725

Confidence interval for accuracy score

KNN

(75.58, 92.61)

Naïve Bayes

Not defined

Random Forest

(76.94, 93.52)

Decision Tree

(72.78, 91.27)

SVC

(74.14, 92.58)

4 Conclusion The present study was conducted to analyze the effect of test size, random state, and dataset in prediction of accuracy of different classification algorithms. We have shown that accuracy score depends on test size and random state. Further, we found that the test size and random state affect accuracy within a certain range, while the dataset has a major role in predicting the accuracy. We have also shown that the average accuracy score for different test sizes and random states is similar to the average accuracy score with k-fold cross validation. Based on a sample of different datasets, we have tried to provide a confidence interval of accuracy scores for each of the five algorithms. The findings are quite useful to provide a base for claiming that any accuracy score predicted by any algorithm may not be accurate up to that level. The confidence interval provides a range of accuracy scores of different classification algorithms that can be accepted with 95% confidence. Beyond this confidence interval, accuracy score may be by chance or due to a particular sample selection. Hence, these results are quite important with respect to defining accuracy scores for any algorithm. Here we have considered only five algorithms, the same work can be extended to other algorithms.

References 1. Sordo M, Zeng Q (2005) On sample size and classification accuracy: a performance comparison. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/11573067_20 2. Uçar MK, Nour M, Sindi H, Polat K (2020) The effect of training and testing process on machine learning in biomedical datasets. Math Probl Eng 2020. https://doi.org/10.1155/2020/ 2836236 3. Althnian A, AlSaeed D, Al-Baity H, Samha A, Dris A bin, Alzakari N, Abou Elwafa A, Kurdi H (2021) Impact of dataset size on classification performance: an empirical evaluation in the medical domain. Appl Sci (Switzerland). 11. https://doi.org/10.3390/app11020796 4. Prusa J, Khoshgoftaar TM, Seliya N (2016) The effect of dataset size on training tweet sentiment classifiers. In: Proceedings—2015 IEEE 14th international conference on machine learning and applications, ICMLA 2015. https://doi.org/10.1109/ICMLA.2015.22 5. Obuchowski NA (1998) Sample size calculations in studies of test accuracy. https://doi.org/10. 1177/096228029800700405

726

R. K. Bisht and I. P. Bisht

6. Shahidi P, Maraini D, Hopkins B (2020) Railcar diagnostics using minimal-redundancy maximum- relevance feature selection and support vector machine classification. Int J Progn Health Manag 7. https://doi.org/10.36001/ijphm.2016.v7i4.2524 7. Pal M, Mather PM (2003) An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens Environ. 86. https://doi.org/10.1016/S0034-4257(03)001 32-9 8. Rácz A, Bajusz D, Héberger K (2021) Effect of dataset size and train/test split ratios in qsar/qspr multiclass classification. Molecules 26. https://doi.org/10.3390/molecules26041111 9. Hughes GF (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14:55–63. https://doi.org/10.1109/TIT.1968.1054102 10. Hua A, Chaudhari P, Johnson N, Quinton J, Schatz B, Buchner D, Hernandez ME (2020) Evaluation of machine learning models for classifying upper extremity exercises using inertial measurement unit-based kinematic data. IEEE J Biomed Health Inform 24:2452–2460. https:// doi.org/10.1109/JBHI.2020.2999902 11. Luo D, Goodin DG, Caldas MM (2019) Spatial-temporal analysis of land cover change at the bento rodrigues dam disaster area using machine learning techniques. Remote Sens (Basel) 11. https://doi.org/10.3390/rs11212548 12. Ramezan CA, Warner TA, Maxwell AE (2019) Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification. Remote Sens (Basel). 11. https://doi.org/10.3390/rs11020185 13. Hulu S, Sihombing P (2020) Sutarman: analysis of performance cross validation method and K-nearest neighbor in classification data. Int J Res Rev 7:69–73 14. Srinivasaprasad S (2022) Measuring the accuracy of machine learning algorithms when implemented on astronomical data. In: Mukesh S, Sharma H, BK KJH, BJC (eds) Congress on intelligent systems. Springer Nature Singapore, Singapore, pp 667–676 15. Jen L (2021) A brief overview of the accuracy of classification algorithms for data prediction in machine learning applications. J Appl Data Sci 2. https://doi.org/10.47738/jads.v2i3.38 16. Jakesh B (2022) Brain tumor data set. Retrieved Date: 19 July 2022 17. Vincent S (2022) diabetes dataset. Retrieved Date: 19 July 2022 18. Fedesoriano: Heart failure prediction dataset. Retrieved Date: 19 July 2022 19. Mssmartypants: Rice type classification dataset. Retrieved Date: 19 July 2022 20. Aditya K (2022) Water quality dataset. Retrieved Date: 19 July 2022 21. Dua D, Graff C (2022) UCI machine learning repository. http://archive.ics.uci.edu/ml. Retrieved Date: 19 July 2022 22. Anderson DR, Sweeney DJ, Williams TA (2001) Statistics for business and economics. Cengage Learning India 23. Sharma JK (2007) Business statistics. Dorling Kindersley, New Delhi, India

Author Index

A Adithya Mohanavel, 335 Adwait Mahadar, 575 Aishwarya Mishra, 565 Akshit Karande, 321 Ali Akber Dewan, M., 261 Alka Singhal, 187 Amit Kumar, 77 Angeloni, Silvia, 299 Anitha, A., 553 Anjali, M. N., 235 Anjeneya Swami Kare, 163 Ankita Maitra, 435 Ankit Kumar Jain, 23 Ansuman Mahanty, 615 Anuranj Pullanatt, 553 Anushaa, S. S., 101 Aravindhan Thaninayagam, 503 Arghya Pathak, 615 Arjune, S., 629 Arshil Noor, 539 Arvind Kiwelekar, 321 Ashly Mary Tom, 345 Asmath Haseena, M. I., 335 Atharva Ghodmare, 575 Atishek Kumar, 149

B Bammidi Divyajyothi, 603 Bazaz, M. A., 209 Bennet Praba, M. S., 703 Bharat Bhushan Sharma, 589 Bhawana Tyagi, 125 Bidyapati Thiyam, 407 Bobby Sharma, 677

C Chander Prabha, 111 Chaudhary Dharminder, 101 Chernenkaya, Liudmila V., 527

D Darshana Othayoth, 503 Dhiraj Kumar Kadam, 223 Dinesh Acharya, U., 451 Dipak J. Prajapati, 589 Divya Arora Bhayana, 35 Divyakant Meva, 175 Durgarao, M. S. P., 101

E Eby Sebastian, 91

F Febin Daya, J. L., 345

G Gayana A. Jain, 641 Ghulab Nabi Ahmad, 539 Gourashyam Moirangthem, 489

H Harsha Gaikwad, 321 Harshita Mangotra, 63 Hira Fatima, 539 Hiruthick Roshan, R. S., 503 Hrishikesh Mondal, 615

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in Networks and Systems 686, https://doi.org/10.1007/978-981-99-2100-3

727

728 I Ila Pant Bisht, 715

J Jaspreet Singh, 111 Jayashree, T. R., 451 Jocil, D., 693

K Kavita Suryavanshi, 223 Keshab Nath, 235 Kiran Malik, 421 Kishorjit Nongmeikapam, 489 Komal Saini, 137 Koppula Manasa, 279 Kumari Misa, 235 Kurman Sangeeta, 603

L Lavika Goel, 565 Lavinia Nongbri, 489 Leo Joseph, L. M. I., 279 Le, Van Huyen, 527

M Maheshkumar H. Kolekar, 13 Mangesh Bedekar, 575 Manjushree Laddha, 321 Manoj Jha, 435 Meena Malik, 111 Mehak Aggarwal, 421 Minu, R. I., 311 Modalavalasa Divya, 603 Monali Ramteke, 477 Monica Shinde, 223 Moushumi Barman, 677 Mrinal Kanti Mandal, 615 Muskan Sharma, 391

N Narasa Reddy, K. R. S. V. V. P. P., 163 Narendra Kumar Jain, 51 Narendran Rajagopalan, 515 Naundhini, S., 101 Nazia Aslam, 13 Nidhi Gooel, 63 Ningthoujam Johny Singh, 489

Author Index O Omkar Dutta, 575 Om Prakash Verma, 35 Ovais Farooq, 209

P Palak Handa, 63 Paramita De, 361 Pooja Kumari, 23 Pooja Ravi, 261 Poonam Bansal, 421 Pragati Kumari, 391 Priyanka Kushwaha, 391 Priya Sawant, 463 Priyesh D. Hemrom, 13 Pushpanjali Kumari, 391 Pushpendra Kumar, 435

R Rahul Maurya, 51 Rajiv Singh, 125 Raj Kishor Bisht, 715 Rakesh Meena, 373 Ramesh Chandra Poonia, 91 Raswanth Prasath, S. V., 503 Rathna, R., 703 Ravi Kumar Ranjan, 589 Richa Yadav, 391 Rishabh Jain, 149 Rudresh Dwivedi, 149

S Saber, Bahareh Nazar Hosseini, 655 Saber, Reyhaneh Nazar Hosseini, 655 Sabiyath Fatima, N., 335 Sandeep Raghuwanshi, 373 Sandeep Sharma, 137 Saroja S. Bhusare, 641 Sarvat Ali, 201 Sayali Bhongade, 321 Sendil Vadivu, D., 515 Shafiullah, 539 Sheetal Girase, 575 Shital Raut, 201, 477 Shouvik Dey, 407 Shreeram Aithal, 641 Shreyas, E., 641 Shruthi, M., 515 Sivabalan, S., 311 Sohali Baisla, 421 Sreekaar, O., 641

Author Index Sreemathy, R., 463 Sreenidhi Ganachari, 247 Srinivasa Kumar, V., 629 Srinivasa Rao Battula, 247 Subba Reddy, N. V., 451 Subhashish Pal, 615 Suhail Ahmad Suhail, 209 Sukhwinder Kaur, 111 Suman Ramteke, 51 Sunandita Debnath, 1 Sunil Joshi, 373 Surbhi Sharma, 187

729 Swati Nigam, 125

T Tanvi Desai, 175 Tejal Kadam, 477 Tejash More, 235

V Vadivel, R., 693 Veeramma Yatnalli, 641