Innovations in Computer Science and Engineering: Proceedings of the Tenth ICICSE, 2022 9811974543, 9789811974540

This book features a collection of high-quality, peer-reviewed research papers presented at the 10th International Confe

525 32 22MB

English Pages 753 [754] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Innovations in Computer Science and Engineering: Proceedings of the Tenth ICICSE, 2022
 9811974543, 9789811974540

Table of contents :
Preface
Contents
Editors and Contributors
A Contemporary Analysis of Privacy Preserving Strategies in MANET
1 Introduction
2 Literature Review
3 Problem Identification
4 Results and Discussion
4.1 Testing Set-Up
4.2 Packet Transmission Under Privacy Preservation Condition
4.3 Routing Time Under Privacy Preservation Condition
5 Conclusion
References
An Explainable AI Approach for Diabetes Prediction
1 Introduction
2 Related Works
3 Explainable AI-Based Prediction System for Healthcare
4 Methodology
5 Experiments and Result
6 Conclusion
References
Classification of Cervical Cells Using Deep Learning Feature Extraction
1 Introduction
2 Related Work
3 The Motivation for the Proposed Work
3.1 Bilateral Filter and Genetic Algorithm
3.2 Morphological Operations
4 HSI Color Model
4.1 Algorithm Implementation
4.2 Experimental Design
5 Conclusion and Future Work
References
Nature-Inspired Techniques for Terrain Features Extraction
1 Introduction
1.1 Natural Computing
1.2 Image Classification
2 Proposed Method
2.1 Dataset
2.2 Kappa Coefficient
2.3 Overall Accuracy
3 Result Analysis
4 Conclusion and Future Scope
References
An Investigation on the Detection of Intrusions into a Network Using Convolutional Neural Networks
1 Introduction
2 Related Research
3 Proposed CNN-Based NIDS
4 Experiment Analysis and Obtained Results
5 Conclusions and Future Research
References
Hybrid Binary Whale Optimization Algorithm for Feature Selection Optimization Problem
1 Introduction
2 Hybrid Particle Swarm and Whale Optimization Algorithm (HPSOWOA)
2.1 Whale Optimization Algorithm (WOA)
2.2 Particle Swarm Optimization (PSO)
2.3 Hybrid PSOGWO
2.4 Proposed Binary Variant of PSOWOA (HBPSOWOA)
3 Results and Discussions
4 Conclusion and Future Scope
References
Transfer Learning for Arabic Question-Answering
1 Introduction
2 Literature Survey
3 The Developed Approach
4 Experimental Setup
4.1 Fine-Tuning AraBERT
4.2 Embedding
4.3 Evaluation Metrics
4.4 Results
5 Conclusion
References
Smart Healthcare System to Predict Ailments Based on Preliminary Symptoms
1 Introduction
2 Related Works
2.1 Coronary Heart Disease
2.2 Polycystic Ovarian Disease
2.3 Myopia
2.4 Colour Blindness
3 System Architecture
4 Implementation Details
4.1 Coronary Heart Disease
4.2 Polycystic Ovarian Disease
4.3 Myopia
4.4 Colour Blindness
5 Dataset
5.1 Coronary Heart Disease
5.2 Polycystic Ovarian Disease
6 Discussion of Results
6.1 Coronary Heart Disease
6.2 Polycystic Ovarian Disease
6.3 Myopia
6.4 Colour Blindness
7 Conclusion
References
Mathematical Models for the Ouroboros Protocol Based on Attacks Over Blockchain Systems
1 Introduction
2 Double-Spend Attack: General Overview
3 Bitcoin Double-Spend Attack
3.1 The Model of S. Nakomoto
3.2 The Model of M. Rosenfeld
3.3 Other Models
3.4 Models Comparison
4 Blockchain Splitting Attacks
4.1 Splitting Attack: General Overview
4.2 Bitcoin Splitting Attack
4.3 GHOST Splitting Attack
5 Ouroboros Double-Spend Attacks
5.1 General Overview
5.2 The Attacks on the Common Prefix
5.3 Probabilities of a Fork
6 Protocols Comparison
7 Conclusion
References
Hybrid Power Generation Forecasting Using an Intellectual Evolutionary Energy-Preserve Rate Clustering Technique
1 Introduction
2 Related Works
3 Materials and Methods
3.1 Basis of Wind Power
3.2 Time Series Clustering WPG
3.3 The EEPRC-Based Classification Model
3.4 EEPRC Algorithm
4 Result and Discussion
5 Conclusion and Future Work
References
Machine Learning-Based Indian Stock Market’s Price Movement Prediction and Trend Analysis
1 Introduction
2 Related Work
3 Proposed Model
4 Methods and Algorithms
4.1 Technical Parameters
4.2 Algorithms
4.3 Evaluation Criteria
5 Result Analysis
6 Conclusion
7 Future Scope
References
Machine Learning-Based Mortality Prediction of COVID-19 Patients
1 Introduction
2 Related Works
3 Methodology
3.1 Data Collection
3.2 Data Preprocessing
3.3 Study Design
4 Result
5 Conclusion
References
Smart Computer Monitoring System Using Neural Networks
1 Introduction
2 Literature Review
3 Methodology
3.1 Overall Design of Computer Monitoring System
3.2 Monitoring Software Design
3.3 Database Module Design
3.4 Neural Network Model
4 Results
5 Conclusion
References
Using Deep Learning to Perform Payload Classification
1 Introduction
2 Literature Review
2.1 Rule-Based Packet Classification
2.2 Classify Packets Using Deep Learning
2.3 Payload-Based Traffic Classification
3 Data Preprocessing
3.1 Data Split
3.2 Learning Data Generation
4 Deep Learning Models
4.1 Convolution Neural Network Architecture
4.2 Residual Network Architecture
4.3 Recurrent Neural Network Architecture
4.4 Long Short-Term Memory Architecture
4.5 CNN and RNN Combination Network Model Architecture
5 Model Tuning
5.1 CNN and ResNet Model Tuning
5.2 RNN and LSTM Model Tuning
6 Evaluation
6.1 Experiments Environment
6.2 Performance Metrics
6.3 Experiments Results
7 Conclusion
References
Malicious Domain Detection Using Memory Augmented Deep Autoencoder
1 Introduction
2 Literature Survey
3 Auto Encoders
3.1 Memory Augmented Deep Autoencoders
4 Proposed Method
5 Experiment
6 Conclusion and Future Work
References
Graph Analysis Using Page Rank Algorithm to Find Influential Users
1 Introduction
2 Problem Definition
3 Related Work
4 Proposed Method
5 Implementation
6 Results
7 Conclusion
References
Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Data Collection
3.2 Neutral Words
3.3 Hate Words
3.4 Offensive Words
3.5 Data Cleaning
3.6 RT’s and @ Removal
3.7 Unnecessary URL Removal
3.8 Special Character and Stop Word Removal
3.9 Data Labeling
3.10 Word Cloud
3.11 Data Analysis
4 Results and Discussion
5 Conclusion and Future Work
6 Contribution and Novelity
References
Secrecy Rate Optimization for Energy Efficient Cognitive Relay Networks
1 Introduction
2 Problem Formulation
2.1 Best Relay Selection (RS1)
2.2 Best Relay Selection (RS2)
2.3 Best Relay Selection (RS3)
3 Proposed Model
4 Simulation Result
5 Conclusion
References
Avian Influenza Prediction Using Machine Learning
1 Introduction
2 Related Work
3 Materials and Method
3.1 Data Collection
3.2 Data Preprocessing
3.3 Training and Testing
3.4 Classifier Application
3.5 Performance Evaluation
3.6 Dimensionality Reduction
3.7 Pandemic Prediction Using Facebook Prophet
4 Results and Discussion
4.1 Performance Validation with the Weather Conditions in Addition to Patient Symptoms Data
4.2 Summary of Results
4.3 Prediction/Forecast Using Facebook Prophet
5 Conclusion
References
Prediction and Comparison of Diabetes with Logistic Regression, Naïve Bayes, Random Forest, and Support Vector Machine
1 Introduction
2 Dataset
3 Methodology
3.1 Logistic Regression
3.2 SVM
3.3 SVM Functioning
3.4 KNN
3.5 Random Forest
3.6 Naïve Bayes
3.7 Gradient Boost
4 Conclusion
5 Results
References
Bone Cancer Detection Using Deep Learning
1 Introduction
2 Related Work
3 Proposed Solution
3.1 Architecture
3.2 Steps Involved in the CNN Algorithm for Pre-Processing the Input
4 Result and Discussion
4.1 Dataset
4.2 Experimental Results
4.3 Discussion
5 Conclusion and Future Work
References
Energy and Buffer Size-Based Routing Protocol for Internet of Things
1 Introduction
2 Related Work
3 The Proposed E-RPL Protocol
4 Result and Discussions
4.1 Number of Parent Changes
4.2 Average Energy Consumption
4.3 Average Packet Loss Ratio
4.4 Average End-To-End Delay
5 Conclusion
References
V-Shaped Binary Version of Whale Optimization Algorithm for Feature Selection Problem
1 Introduction
2 Binary Version of WO Algorithm
3 Results and Discussions
4 Conclusion and Future Scope
References
An Energy-Efficient Deep Neural Network Model for Photometric Redshift Estimation
1 Introduction
1.1 Related Work
2 Data
3 Proposed Model
3.1 Feature Engineering
3.2 Training and Validation
4 Implementations
4.1 Step1: Implementing the Initial Model with Low-Redshift Data
4.2 Step2: Implementing the Initial Model High-Redshift Data
4.3 Step3: Using High-Redshift Data, Put the Final/Proposed Model into Action
5 Results and Discussion
5.1 Metrics
5.2 Results and Output
5.3 Comparison with Other Methods
5.4 Energy Consumption
6 Future Enhancements
7 Conclusion
References
Deep Learning-Based Diabetic Retinopathy Screening System
1 Introduction
2 Related Work
3 Problem Definition and Existing System
4 Methodology
5 System Architecture
6 Results
7 Conclusion
References
Artificial Intelligence-Based Data Analytics Techniques in Medical Imaging
1 Introduction
2 Literature Survey
2.1 AI-Based BDA in Medical Imaging
2.2 AI-Based BDA in Medical Imaging Using Machine Learning Techniques
2.3 AI-Based BDA in Medical Imaging Using Deep Learning Techniques
References
Ensuring Data Protection Using Machine Learning Integrating with Blockchain Technology
1 Introduction
1.1 Background
2 Material and Methodology
2.1 Smart Contract and Security
2.2 The Collaboration of Machine Learning and Blockchain Technology
3 Implementation
4 Results and Outcomes
4.1 Advantages of Proposed Architecture Over Existing Approaches
4.2 Challenges to Machine Learning and Blockchain
5 Conclusion
References
Evaluation and Language Training of Multinational Enterprises Employees by Deep Learning in Cloud Manufacturing Resources
1 Introduction
2 Related Works
3 Materials and Methods
3.1 Services Analysis
3.2 Cloud Provider
3.3 Load Balancer (LB)
3.4 Resource Scheduler Using Fuzzy Logic
3.5 Cloud Allocator
4 Result and Discussion
5 Conclusion and Future Work
References
Development of a Cognitive Question Answering System to Learn Concepts for Placement Assistance
1 Introduction
2 Background and Related Work
2.1 Limitation of the Existing System
3 System Architecture
4 Proposed Methodology
4.1 User Capability Level Identification
4.2 Construction of Assertion Graph
4.3 QA Analyzer
4.4 Question Analyzer and Primary Search Analysis
4.5 Hypothesis Generation
4.6 Evidence Identification and Evidence Scorer
4.7 Final Evidence Identification and User Answer Validation and Resource Generator
5 Discussion and Results
5.1 Performance Evaluation
6 Conclusion and Future Work
References
Cervical Cancer Prediction Using Optimized Meta-Learning
1 Introduction
2 Proposed Work
3 Results and Discussion
4 Conclusions
References
An Ensemble Deep Closest Count and Density Peak Clustering Technique for Intrusion Detection System for Cloud Computing
1 Introduction
2 Related Work
3 Density Peak Clustering Technique and Three Closest Cluster Counts Called Deep DCP-CC
4 Results Analysis
5 Conclusion
References
Design of Concurrent Engineering Systems for Global Product Development Using Artificial Intelligence
1 Introduction
2 Related Works
3 Materials and Methods
3.1 Enterprise Modelling
3.2 Product Development Process with CE
3.3 Failure Mode and Effects Analysis
4 Result and Discussion
4.1 Global Industrial Growth Analysis
4.2 Grade of Failure Modes
5 Conclusion and Future Work
References
Comparison of Public and Critics Opinion About the Taliban Government Over Afghanistan Through Sentiment Analysis
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Data Collection
3.2 Data Cleaning
3.3 Unnecessary Data Removal
3.4 Web Scrapping News Articles
3.5 Data Labeling
3.6 Word Cloud
3.7 Special Character and Stop Words Removal
3.8 Proposed Architecture
4 Results and Discussion
4.1 Comparison of Sentiments of Tweets and News Articles
5 Conclusion and Future Work
References
Vehicle Object Detection Using Deep Learning-Based Anchor-Free Detector
1 Introduction
2 Related Work
3 Problem Definition
4 Proposed Solution
4.1 Dataset Creation
4.2 Object Detection Model
5 Experiment and Result Analysis
6 Conclusion
References
Certain Investigations of MEMS for Optimised Sensor Coverage
1 Introduction
2 Related Works
3 Coverage and Classification of Coverage
4 Coverage Problem Classification
4.1 Discovery of Network Coverage
4.2 Randomly Selected Coverage
4.3 Goal of Coverage
4.4 Algorithms for Coverage
4.5 Algorithms for k-Coverage
4.6 Point Coverage
4.7 Creating a Barrier
4.8 Protocols for Coverage-Aware Deployment
5 Conclusion and Future Work
References
Voice-Based Intelligent Virtual Assistant for Windows
1 Introduction
2 Literature Survey
3 System Design
4 Implementation
4.1 Dataset Creation
4.2 Speech Recognition
4.3 Text Cleaning
4.4 Task or Command Classification
4.5 Execution of Tasks
5 Results
6 Conclusion
References
Cryptocurrency Trading Bot with Sentimental Analysis and Backtracking Using Predictive ML
1 Introduction
2 Related Work
3 Literature Review Summary
4 Methodology
5 Results
6 Conclusion
References
Efficient Pseudo-Random Number Generator Using Number-Theoretic Transform
1 Introduction
2 Preliminaries
2.1 Pseudo-Random Number Generator
2.2 Chinese Remainder Theorem
2.3 Discrete Fourier Transform
2.4 Twiddle Factor
2.5 Lattice-Based Cryptography
3 Proposed Methodology to Generate Efficient Pseudo-Random Number Generator
4 Implementation
5 Result
6 Conclusion and Future Scope
References
Medical Images Analysis for Segmentation and Classification Using DNN
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 Data Collection
3.2 Data Pre-Processing
3.3 Model Architecture
3.4 Model Training
3.5 Model Testing
4 Results and Discussion
4.1 Comparative Analysis
5 Conclusion and Future Work
References
Robotic Basket and Intelligent Stacking Supported Automated Hygienic Shopping System
1 Introduction
2 Related Work
2.1 Research Gap
2.2 Objectives
3 Methodology and Design
4 Impact of the Proposed System
4.1 For Shopkeeper
4.2 For Customer
4.3 For Society
5 Conclusion
References
Traffic Sign Detection—A Module in Autonomous Vehicles
1 Introduction
2 Related Works
3 Description of Dataset
4 Methodology
5 Results and Discussion
6 Conclusion and Future Work
References
A Deep Learning Model for Arabic Fake News Detection Based on Transformers
1 Introduction
2 Related Works
3 Proposed Model: ArabFand
3.1 Preprocessing
3.2 Classification
4 Evaluation
4.1 Data Sets
4.2 Results
5 Conclusion and Future Work
References
Virtual Lab Simulator for Software Engineering Experiment Report to Evaluate Student Assessment
1 Introduction
2 Related Works
2.1 Research Gap
3 Methodology
3.1 Design
3.2 System Diagram
4 Implementation
5 Results and Discussion
5.1 Experimental Analysis
5.2 Performance Evaluation
5.3 Discussion
6 Conclusion
References
Analyze Dark Web and Security Threats
1 Introduction
1.1 Deep Web
1.2 Dark Web
1.3 Tor
2 Literature Review
3 Problem Statement
4 Objectives
5 Methodology
6 Machine-Learning Models
7 Results and Discussion
8 Conclusions
References
Deep Convolutional Neural Networks-Based Market Strategy for Early-Stage Product Development
1 Introduction
2 Related Work
3 Material Method
3.1 Analysis Function and Variables in NPD
3.2 New Product Development Process
4 Result and Discussion
5 Conclusion and Future Work
References
Calorie Count of a Fruit Image Using Convolutional Neural Network
1 Introduction
1.1 Related Work
2 Proposed Work
3 Discussion on Experimental Investigations
4 Results and Observations
5 Conclusion and Future Work
References
A Novel Adaptive Fault Tolerance Algorithm Towards Robust and Reliable Distributed Applications to Reuse System Components
1 Introduction
2 Related Work
3 Proposed System
4 Experimental Results
5 Conclusion and Future Work
References
Virtual Machine Migration Framework with Configuration Change Management
1 Introduction
2 Benefits of Virtualization
3 Virtualization Components and VMMS
4 Current State of Art
5 Proposed Computing Virtualization Management Framework
6 Results
7 Conclusion
References
Glioma Segmentation in MR Images Using 2D Double U-Net: An Empirical Investigation
1 Introduction
2 Methods
2.1 Pre-processing
2.2 Deep Neural Networks
3 Implementation
3.1 Data Preparation
3.2 Model Setup
3.3 Training Model
4 Results and Discussion
5 Conclusion and Future Work
References
A Stochastic Weighted Model for Task Scheduling and Resource Utilization in the Cloud
1 Introduction
2 Related Works
3 Methodology
3.1 Stochastic Model
3.2 Weighted Scheduling Algorithm
4 Experimental Analysis
5 Conclusion
References
Machine Learning and Recommendation System in Agriculture: A Survey and Possible Extensions
1 Introduction
2 Classification of ML and RS Algorithms
2.1 Machine Learning Algorithms
2.2 Recommendation Systems
3 Existing Literature Work
3.1 Work Has Been Done in Crop Recommendation
3.2 Work Has Been Done in Fertilizer Recommendation
3.3 Work Has Been Done in Pesticide Recommendation
4 Problems in Agriculture Sector
4.1 Reduction in Crop Yield
4.2 Reduction in Crop Quality
4.3 Reduction in Profit
4.4 Selection of Fertilizers
4.5 Crop Disease
4.6 Uneducated Farmers
4.7 Unpredictable Weather
4.8 Crop Demand
4.9 Hindered Decision-Making Process
5 Recent Trends and Future Augmentation in Agriculture
6 Conclusion
References
A Study on Accident Detection Systems Using Machine Learning
1 Introduction
2 Related Work
3 Models and Techniques for Accident Detection
4 ML/AI-Based Techniques
5 Conclusion
References
Prediction of Cardiac Arrest Using Ensemble Methods
1 Introduction
2 Literature Survey
3 Proposed Methodology
4 Result Analysis
5 Conclusion
References
Real-Time Object Detection and Tracking Design Using Deep Learning with Spatial–Temporal Mechanism for Video Surveillance Applications
1 Introduction
2 Object Detection and Tracking
3 Convolutional Neural Networks (CNNs)
3.1 Object Tracking
3.2 Simple Online Real-Time Tracking (SORT)
4 Design
4.1 Single Object Detection
4.2 Multiple Object Detection
4.3 Metrics of Performance
5 Conclusion
References
Equipment Planning for an Automated Production Line Using a Cloud System
1 Introduction
2 Related Works
3 Materials and Methods
3.1 Lean Production System
3.2 System Design of Automation System
3.3 Business Process Intelligence
3.4 Design Procedure of LAS
4 Result and Discussion
4.1 Accuracy of LAS
4.2 Prediction of Manual and LAS
5 Conclusion and Future Work
References
Comparison Between Property-Based Software Security Testing Technique and Fault Injection
1 Introduction
2 Background and Feasibility Study
2.1 Penetration Analysis
2.2 Formal Security Testing
2.3 Model-Based Security Testing
2.4 Fuzzy Testing
2.5 White Box Testing
2.6 Risk-Based Testing
3 Proposed Model
3.1 Fault Injection-Based Testing
3.2 Property-Based Testing
4 Implemented Software for Testing Software Vulnerability
4.1 Fault Injector
4.2 Property-Based Injector
5 Advantages and Disadvantages of Fault Injection Property-Based Technique
5.1 Fault Injection-Based Testing
5.2 Property-Based Testing
6 Basic Comparison Between Fault Injection and Property-Based Techniques
7 Conclusion
References
Sentiment Analysis on Social Media Data: A Survey
1 Introduction
2 Significance of Social Media
3 Social Networks: Cluster and Community
3.1 Clustering
3.2 Community Detection
4 Social Networks: Centrality Factors Measuring the Impact of a Node
5 Sentiment Analysis on Social Media Data
5.1 Data Acquisition
5.2 Sentiment Analysis Techniques
5.3 Research Observed Over Different Dimensions of Sentiment Analysis Over Social Media Data
6 Conclusion and Future Work
References
Smart Cradle System
1 Introduction
2 Related Work
3 Proposed Method
4 Algorithms
5 Block Diagram
6 Results
7 Conclusion and Future Work
References
A Soft Computing Based Approach for Pixel Labelling on 2D Images Using Fine Tuned R-CNN
1 Introduction
1.1 Post-fully Convolutional Networks
1.2 Contribution Highlights
1.3 Structure of the Paper
2 Related Work
3 Our Modeling Pipeline and Experimental Evaluation
3.1 Simulation Set-Up and Dataset
3.2 Methodology and Obtained Results
4 Conclusion
5 Future Scope
References
An Advanced Approach to Detect Plant Diseases by the Use of CNN Based Image Processing
1 Introduction
2 Research Review
3 Dataset Preparation
4 Methodology
5 Experiment and Results
5.1 Evaluation Metric
5.2 Variation of Training Ratio with Fixed Data Size
5.3 Variation of Data Size with Fixed Training Ratio
5.4 Variation of Epoch with Fixed Training Ratio and Fixed Data Size
6 Conclusions and Future Scope
References
Author Index

Citation preview

Lecture Notes in Networks and Systems 565

H. S. Saini Rishi Sayal A. Govardhan Rajkumar Buyya   Editors

Innovations in Computer Science and Engineering Proceedings of the Tenth ICICSE, 2022

Lecture Notes in Networks and Systems Volume 565

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

H. S. Saini · Rishi Sayal · A. Govardhan · Rajkumar Buyya Editors

Innovations in Computer Science and Engineering Proceedings of the Tenth ICICSE, 2022

Editors H. S. Saini Guru Nanak Institutions Ibrahimpatnam, Telangana, India A. Govardhan Jawaharlal Nehru Technological University Hyderabad, Telangana, India

Rishi Sayal Guru Nanak Institutions Ibrahimpatnam, Telangana, India Rajkumar Buyya Cloud Computing and Distributed Systems Laboratory (CLOUDS) University of Melbourne Melbourne, VIC, Australia

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-7454-0 ISBN 978-981-19-7455-7 (eBook) https://doi.org/10.1007/978-981-19-7455-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This volume contains 60 papers that were presented at Tenth International Conference on Innovations in Computer Science and Engineering (ICICSE-2022) held during September 16–17, 2022, at Guru Nanak Institutions, Hyderabad, India, in collaboration with Computer Society of India (CSI). The aim of this conference is to provide an vibrant virtual international form that hubs together the researchers, scientists, academicians, corporate professions and technically sound students under a roof to make it as a phenomenal, informative and interactive session which is acutely needed to pave the way to promote research advancements in the field of Computer Science and Engineering. ICICSE-2022 received more than 450 research papers from various subfields of Computer Science and Engineering. Each submitted paper was meticulously reviewed by our review committee consisting of senior academicians, industry professionals and professors from premier institutions and universities. • This conference was inaugurated and attended by top dignitaries such as Prof. Valentina Emilia Balas, Professor of Automation and Applied Informatics, University of Arad, Romania; Dr. Abdul Khadar Jilani, Program Director, University of Technology, Bahrain; Dr. D. D. Sharma, Senior Fellow CSI, Recipient of LTA Award, CSI Council Member; Dr. A. Govardhan, Rector, JNTUH Hyderabad; and Dr. Somitra Kumar Sanadhya, Professor, IIT Jodhpur, Rajasthan. • This conference has a fantastic line up of keynote sessions by eminent speakers, paper presentations sessions to present the latest outcomes related to advancements in Computing Technologies. • The keynote sessions were conducted on cutting-edge technologies such as Natural Language Processing, Artificial Intelligence and Block Chain Technologies, and invited speakers were Dr. Aruna Malapati, Professor, BITS Pilani Hyderabad; Prof. (Dr.) P. S. Grover, Professor of Computer Science Engineering, Kamrah Institute of Information Technology (KIIT); and Dr. Pilli Emmanuel Shubhakar, Associate Professor, Computer Science and Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, respectively.

v

vi

Preface

• The organization committee of ICICSE-2022 takes the opportunity to thank the invited speaker session chairs and reviewers for their unprecedented pandemic time. • The quality of the research papers is courtesy from respective author and reviewers to come up to the desired level of excellence. We have invited the program committee members and external reviewers in producing the best-quality research papers in a short span of time. We also thank CSI delegates toward their valuable suggestions in making this event a grand success. Hyderabad, India Hyderabad, India Hyderabad, India Melbourne, Australia

H. S. Saini Rishi Sayal A. Govardhan Rajkumar Buyya

Contents

A Contemporary Analysis of Privacy Preserving Strategies in MANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chinna Reddaiah, M. V. Narayana, M. Vijaya Sudha, and Ch. Subba Laxmi An Explainable AI Approach for Diabetes Prediction . . . . . . . . . . . . . . . . . Aishwarya Jakka and J. Vakula Rani Classification of Cervical Cells Using Deep Learning Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abolfazl Mehbodniya, Julian L. Webber, Devi Mani, D. Stalin David, Amudha Kandasamy, Rajasekar Rangasamy, and Sudhakar Sengan Nature-Inspired Techniques for Terrain Features Extraction . . . . . . . . . . Sharad Bajaj, Harish Kundra, Sheetal Kundra, Nehalika Neha, and Suyash Agrawal

1

15

27

43

An Investigation on the Detection of Intrusions into a Network Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. D. Patel and Ajeet Singh

53

Hybrid Binary Whale Optimization Algorithm for Feature Selection Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Ramya, E. Vinay Kumar, G. S. Gopika, and G. Manoj

63

Transfer Learning for Arabic Question-Answering . . . . . . . . . . . . . . . . . . . Abdullah M. Baqasah Smart Healthcare System to Predict Ailments Based on Preliminary Symptoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chirag Jagad, Ishika Chokshi, Devanshi Jhaveri, Himanshu Harlalka, and Prachi Tawde

75

87

vii

viii

Contents

Mathematical Models for the Ouroboros Protocol Based on Attacks Over Blockchain Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Sai Tejaswi Guntupalli and Khushi Saxena Hybrid Power Generation Forecasting Using an Intellectual Evolutionary Energy-Preserve Rate Clustering Technique . . . . . . . . . . . . . 121 Julian L. Webber, Vellingiri Jayagopal, Abolfazl Mehbodniya, Sudhakar Sengan, Priya Velayutham, Rajasekar Rangasamy, and D. Stalin David Machine Learning-Based Indian Stock Market’s Price Movement Prediction and Trend Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Athira, Arya Raj, Achu Pushpan, and R. C. Jisha Machine Learning-Based Mortality Prediction of COVID-19 Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 R. Ani, O. S. Deepa, M. Arundhathi, and J. Darsana Smart Computer Monitoring System Using Neural Networks . . . . . . . . . . 169 Stephen Jeswinde Nuagah, Bontha Mamatha, B. Hyma, and H. Vijaya Using Deep Learning to Perform Payload Classification . . . . . . . . . . . . . . . 183 Jayesh Thakur and Kaushik Rane Malicious Domain Detection Using Memory Augmented Deep Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Pavan Kartheek Rachabathuni, Hiranmayee Nandyala, G. Prasanthi, and Singamaneni Krishnapriya Graph Analysis Using Page Rank Algorithm to Find Influential Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 D. Venkata Swetha Ramana, T. Anusha, V. SumaSree, C. R. Renuka, and Taiba Sana Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Seyed Muzaffar Ahmad Shah and Satwinder Singh Secrecy Rate Optimization for Energy Efficient Cognitive Relay Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Md. Khorshed Alam and Trina Saha Avian Influenza Prediction Using Machine Learning . . . . . . . . . . . . . . . . . . 253 Maana Shori and Kriti Saroha Prediction and Comparison of Diabetes with Logistic Regression, Naïve Bayes, Random Forest, and Support Vector Machine . . . . . . . . . . . . 273 Sarthak Choudhary, Abhineet Kumar, and Sakshi Choudhary Bone Cancer Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 285 Mansoor Habib Mazumder and Maheshwari Prasad Singh

Contents

ix

Energy and Buffer Size-Based Routing Protocol for Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Tariq Ahamed Ahanger, Chatti Subbalakshmi, and M. V. Narayana V-Shaped Binary Version of Whale Optimization Algorithm for Feature Selection Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 S. Hameetha Begum, C. Balasubramanyam, J. T. Thirukrishna, and G. Manoj An Energy-Efficient Deep Neural Network Model for Photometric Redshift Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 K. Shreevershith, Snigdha Sen, and G. B. Roopesh Deep Learning-Based Diabetic Retinopathy Screening System . . . . . . . . . 331 Rajkumar Kalimuthu, Limbika Zangazanga, S. Jayanthi, and Ignatius A. Herman Artificial Intelligence-Based Data Analytics Techniques in Medical Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Prasanalakshmi Balaji, Prasun Chakrabarti, and Bui Thanh Hung Ensuring Data Protection Using Machine Learning Integrating with Blockchain Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Princy Diwan, Brijesh Khandelwal, and Bhupesh Kumar Dewangan Evaluation and Language Training of Multinational Enterprises Employees by Deep Learning in Cloud Manufacturing Resources . . . . . . 369 Arodh Lal Karn, Julian L. Webber, Abolfazl Mehbodniya, D. Stalin David, Balu Subramaniam, Rajasekar Rangasamy, and Sudhakar Sengan Development of a Cognitive Question Answering System to Learn Concepts for Placement Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 R. Dhana Lakshmi, Abirami Murugappan, and M. Srivani Cervical Cancer Prediction Using Optimized Meta-Learning . . . . . . . . . . 393 P. Dhivya, M. Karthiga, A. Indirani, and T. Nagamani An Ensemble Deep Closest Count and Density Peak Clustering Technique for Intrusion Detection System for Cloud Computing . . . . . . . 403 B. Sudharkar, V. B. Narsimha, and G. Narsimha A Soft Computing Based Approach for Pixel Labelling on 2D Images Using Fine Tuned R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Nedumaran Arappal, Ajeet Singh, and D. Saidulu

x

Contents

Design of Concurrent Engineering Systems for Global Product Development Using Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Arodh Lal Karn, Abolfazl Mehbodniya, Julian L. Webber, Vellingiri Jayagopal, D. Stalin David, Rajasekar Rangasamy, and Sudhakar Sengan Comparison of Public and Critics Opinion About the Taliban Government Over Afghanistan Through Sentiment Analysis . . . . . . . . . . 435 Md Majid Reza, Satwinder Singh, Harish Kundra, and Md Rashid Reza Vehicle Object Detection Using Deep Learning-Based Anchor-Free Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Mansi Verma and Maheshwari Prasad Singh An Advanced Approach to Detect Plant Diseases by the Use of CNN Based Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Sovan Bhattacharya, Ayan Banerjee, Saikat Ray, Samik Mandal, and Debkanta Chakraborty Certain Investigations of MEMS for Optimised Sensor Coverage . . . . . . 479 Abolfazl Mehbodniya, Muruganantham Rajamanickam, Julian L. Webber, D. Stalin David, Devi Mani, Rajasekar Rangasamy, and Sudhakar Sengan Voice-Based Intelligent Virtual Assistant for Windows . . . . . . . . . . . . . . . . 491 K. M. Bhargav, Akash Bhat, Snigdha Sen, A. Vamsi Kalyan Reddy, and S. D. Ashrith Cryptocurrency Trading Bot with Sentimental Analysis and Backtracking Using Predictive ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Abhishek Srinivas Murthy, A. Akshay, and Bhagyashri R. Hanji Efficient Pseudo-Random Number Generator Using Number-Theoretic Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Anupama Arjun Pandit, Atul Kumar, and Arun Mishra Medical Images Analysis for Segmentation and Classification Using DNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Abolfazl Mehbodniya, Satheesh Narayanasami, Julian L. Webber, Amarendra Kothalanka, Sudhakar Sengan, Rajasekar Rangasamy, and D. Stalin David Robotic Basket and Intelligent Stacking Supported Automated Hygienic Shopping System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Rishiraj Jagdish Tripathi, Aakanksha Tripathi, and Geeta Tripathi Traffic Sign Detection—A Module in Autonomous Vehicles . . . . . . . . . . . . 549 I. Amrita and Bhagyashri R. Hanji

Contents

xi

A Deep Learning Model for Arabic Fake News Detection Based on Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 Ahmed Binmahdfoudh Virtual Lab Simulator for Software Engineering Experiment Report to Evaluate Student Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Sushama A. Deshmukh and Geeta Tripathi Analyze Dark Web and Security Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 Samar Ansh and Satwinder Singh Deep Convolutional Neural Networks-Based Market Strategy for Early-Stage Product Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Gladson Maria Britto James, Abolfazl Mehbodniya, Anto Bennet Maria, Julian L. Webber, D. Stalin David, Rajasekar Rangasamy, and Sudhakar Sengan Calorie Count of a Fruit Image Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 B. Kirananjali, M. Himasai, T. Lakshmi Sujitha, and Y. Kalyan Chakravarti A Novel Adaptive Fault Tolerance Algorithm Towards Robust and Reliable Distributed Applications to Reuse System Components . . . 617 Lalu Banothu, M. Chandra Mohan, and Charupalli Sunil Kumar Virtual Machine Migration Framework with Configuration Change Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 Y. Niranjan, M. V. Narayana, and M. Vijaya Sudha Glioma Segmentation in MR Images Using 2D Double U-Net: An Empirical Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 Julian L. Webber, R. S. Nancy Noella, Abolfazl Mehbodniya, V. Ramachandran, D. Stalin David, Rajasekar Rangasamy, and Sudhakar Sengan A Stochastic Weighted Model for Task Scheduling and Resource Utilization in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Rajkumar Kalimuthu and Brindha Thomas Machine Learning and Recommendation System in Agriculture: A Survey and Possible Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 Krupa Patel and Hiren B. Patel A Study on Accident Detection Systems Using Machine Learning . . . . . . 675 S. Savitha and N. Sreedevi Prediction of Cardiac Arrest Using Ensemble Methods . . . . . . . . . . . . . . . . 687 K. Sreekanth and J. Hyma

xii

Contents

Real-Time Object Detection and Tracking Design Using Deep Learning with Spatial–Temporal Mechanism for Video Surveillance Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697 T. Kusuma and K. Ashwini Equipment Planning for an Automated Production Line Using a Cloud System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 K. Bhavana Raj, Julian L. Webber, Divyapushpalakshmi Marimuthu, Abolfazl Mehbodniya, D. Stalin David, Rajasekar Rangasamy, and Sudhakar Sengan Comparison Between Property-Based Software Security Testing Technique and Fault Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Trina Saha and Md. Khorshed Alam Sentiment Analysis on Social Media Data: A Survey . . . . . . . . . . . . . . . . . . 735 Kanchan Naithani and Y. P. Raiwani Smart Cradle System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 P. Harika, T. Chihnitha, V. Chaitanya, and M. Vani Pujitha Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759

Editors and Contributors

About the Editors Dr. H. S. Saini, Managing Director of Guru Nanak Institutions, obtained his Ph.D. in the field of Computer Science. He has over 32 years of experience at University/College level in teaching UG/PG students and has guided several B.Tech. and M.Tech. projects and 7 Ph.D. scholars. He has published/presented above 105 highquality research papers in International and National journals and proceedings of International conferences. He has published nine books with Springer. He is a lover of Innovation and is Advisor for NBA/NAAC accreditation process to many institutions in India and abroad. He is Chief Editor of many Innovative journals and chairing various International conferences. Dr. Rishi Sayal, Associate Director, Guru Nanak Institute of Technical Campus (Autonomous), has completed his B.E. (CSE), M.Tech. (IT), and Ph.D. (CSE). He has obtained his Ph.D. in Computer Science and Engineering in the field of Data Mining from prestigious Mysore University of Karnataka State. He has over 30 years of experience in training, consultancy, teaching, and placements. His current areas of research interest include Data Mining, Network Security, and Databases. He has published nine books with Springer. He has published wide number of research papers in International conferences and journals. He has guided many UG and PG research projects, and he is Recipient of many research grants from government funding agencies. He is Co-editor of various Innovative journals and convened lot of International and National conferences. Dr. A. Govardhan is presently Professor of Computer Science and Engineering, Rector, JNTUH, and Executive Council Member, Jawaharlal Nehru Technological University (JNTUH), Hyderabad (JNTUH), India. He did his Ph.D. from JNTUH. He has 27 years of teaching and research experience. He is Member on Advisory Boards and Academic Boards and Technical Program Committee Member for more than 100 International and National conferences. He is Member on Boards of Governors

xiii

xiv

Editors and Contributors

and Academic Councils for number of colleges. He has four monographs and 20 chapters in Springer, Germany. He has guided 89 Ph.D. theses, 3 M.Phil., and 150 M.Tech. projects. He has published 575 research papers at International/National journals/conferences including IEEE, ACM, Springer, Elsevier, and Inderscience. He has delivered more than 130 keynote speeches and invited lectures. He has chaired 31 sessions at the International/National conferences in India and abroad. He has the research projects (completed/ongoing) worth of Rs. 1.159 Crores. Dr. Rajkumar Buyya is Redmond Barry Distinguished Professor and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as Founding CEO of Manjrasoft Pvt. Ltd., a spin-off company of the university, commercializing its innovations in cloud computing. He served as Future Fellow of the Australian Research Council during 2012–2016. He received his Ph.D. from Monash University, Melbourne, Australia, in 2002. Dr. Buyya has authored/co-authored over 640 publications. He has co-authored five textbooks and edited proceedings of over 40 International conferences. He is one of the highly cited authors in Computer Science and Software Engineering (h-index = 134, g-index = 298, and 95,300+ citations). He has edited proceedings of over 35 International conferences published by prestigious organizations, namely the IEEE Computer Society Press and Springer Verlag.

Contributors Suyash Agrawal Guru Nanak Institutions Technical Campus, Ibrahimpatnam, India Tariq Ahamed Ahanger College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia A. Akshay Computer Science and Engineering, Global Academy of Technology, Bangalore, India Md. Khorshed Alam Department of Computer Science and Engineering, State University of Bangladesh, Dhaka, Bangladesh I. Amrita Department of Computer Science, Global Academy of Technology, Bangalore, India R. Ani Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, India Samar Ansh Department of Computer Science and Technology, Central University of Punjab, Bathinda, India T. Anusha Rao Bahadur Y Mahabaleswarappa Engineering College, Ballari, Karnataka, India

Editors and Contributors

xv

Nedumaran Arappal School of Electrical and Computer Engineering, Kombolcha Institute of Technology, Wollo University, Dessie, Ethiopia M. Arundhathi Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, India S. D. Ashrith Department of CSE, Global Academy of Technology, Bengaluru, India K. Ashwini Global Academy of Technology, Bangalore, India Athira Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, India Sharad Bajaj Amazon Web Services, Seattle, USA Prasanalakshmi Balaji Faculty of Information Technology, Post-Doctoral Researcher, Artificial Intelligence Laboratory, Ton Duc Thang University, Ho Chi Minh, Vietnam C. Balasubramanyam Department of Computer Science and Engineering, Guru Nanak Institute of Technology, Hyderabad, Telangana, India Ayan Banerjee Department of CSE, Dr. B. C. Roy Engineering College, Durgapur, West Bengal, India Lalu Banothu Professor, Department of Computer Science and Engineering, JNTUH College of Engineering, Hyderabad, India Abdullah M. Baqasah Department of Information Technology, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia S. Hameetha Begum Department of Computing, Muscat College, Muscat, Sultanate of Oman K. M. Bhargav Department of CSE, Global Academy of Technology, Bengaluru, India Akash Bhat Department of CSE, Global Academy of Technology, Bengaluru, India Sovan Bhattacharya Department of CSE, Dr. B. C. Roy Engineering College, Durgapur, West Bengal, India; Department of CSE, NIT Durgapur, Durgapur, West Bengal, India K. Bhavana Raj Department of Management Studies, Institute of Public Enterprise, Hyderabad, India Ahmed Binmahdfoudh Department of Computer Engineering, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia V. Chaitanya CSE Department, V. R. Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India

xvi

Editors and Contributors

Prasun Chakrabarti Deputy Provost, ITM SLS Baroda University, Vadodara Gujarat, India Debkanta Chakraborty Department of CSE, Dr. B. C. Roy Engineering College, Durgapur, West Bengal, India M. Chandra Mohan Professor, Department of Computer Science and Engineering, JNTUH College of Engineering, Hyderabad, India T. Chihnitha CSE Department, V. R. Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India Ishika Chokshi Dwarkadas J. Sanghvi College of Engineering, Mumbai University, Mumbai, Maharashtra, India Sakshi Choudhary SRM Institute of Science and Technology, Uttar Pradesh, Ghaziabad, India Sarthak Choudhary SRM Institute of Science and Technology, Uttar Pradesh, Ghaziabad, India J. Darsana Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, India O. S. Deepa Department of Mathematics, Amrita Vishwa Vidyapeetham, Coimbatore, India Sushama A. Deshmukh Maharashtra Institute of Technology, Aurangabad, Maharashtra, India Bhupesh Kumar Dewangan Department of Computer Science and Engineering, School of Engineering, O. P. Jindal University, Raigarh, India P. Dhivya Bannari Amman Institute of Technology, Sathyamangalam, Erode, India Princy Diwan Department of Computer Science and Engineering, Amity School of Engineering, Amity University, Raipur, India; Department of Computer Science and Engineering, School of Engineering, O. P. Jindal University, Raigarh, India G. S. Gopika Sathyabama Institute of Science & Technology, Chennai, Tamilnadu, India Sai Tejaswi Guntupalli SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India Bhagyashri R. Hanji Computer Science and Engineering, Global Academy of Technology, Bangalore, India; Department of Computer Science, Global Academy of Technology, Bangalore, India P. Harika CSE Department, V. R. Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India

Editors and Contributors

xvii

Himanshu Harlalka Dwarkadas J. Sanghvi College of Engineering, Mumbai University, Mumbai, Maharashtra, India Ignatius A. Herman DMI—Group of Institutions, Lilongwe, Malawi M. Himasai Velagapudi Ramakrishna Vijayawada, Andhra Pradesh, India

Siddhartha

Engineering

College,

Bui Thanh Hung Faculty of Information Technology, Artificial Intelligence Laboratory, Ton Duc Thang University, Ho Chi Minh, Vietnam B. Hyma Department of CSE, Guru Nanak Institutions Technical Campus, Ibrahimpatnam, India J. Hyma CSE, GITAM University, Visakhapatnam, India A. Indirani Bannari Amman Institute of Technology, Sathyamangalam, Erode, India Chirag Jagad Dwarkadas J. Sanghvi College of Engineering, Mumbai University, Mumbai, Maharashtra, India Aishwarya Jakka University of Pittsburgh, Pittsburgh, PA, USA Gladson Maria Britto James Department of Computer Science and Engineering, Malla Reddy College of Engineering, Secunderabad, Telangana, India Vellingiri Jayagopal Department of Software and System Engineering, School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India S. Jayanthi Department of Information Technology, Guru Nanak Institute of Technology, Hyderabad, India Devanshi Jhaveri Dwarkadas J. Sanghvi College of Engineering, Mumbai University, Mumbai, Maharashtra, India R. C. Jisha Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, India Rajkumar Kalimuthu School of Computers Science and Information Technology, DMI St. John the Baptist University, Lilongwe, Malawi; Department of Computer Science and Engineering, Noorul Islam Centre for Higher Education, Kanniyakumari, India Y. Kalyan Chakravarti Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India Amudha Kandasamy Department of Electronics and Communication Engineering, Kongunadu College of Engineering and Technology, Trichy, Tamil Nadu, India

xviii

Editors and Contributors

Arodh Lal Karn Department of Financial and Actuarial Mathematics, School of Mathematics and Physics, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, China M. Karthiga Bannari Amman Institute of Technology, Sathyamangalam, Erode, India Brijesh Khandelwal Department of Computer Science and Engineering, Amity School of Engineering, Amity University, Raipur, India B. Kirananjali Velagapudi Ramakrishna Vijayawada, Andhra Pradesh, India

Siddhartha

Engineering

College,

Amarendra Kothalanka Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, Vaddeswaram, Andhra Pradesh, India Singamaneni Krishnapriya Department of CSE (Cyber Security), Guru Nanak Institutions Technical Campus, Hyderabad, India Abhineet Kumar University of Delhi, New Delhi, India Atul Kumar Computer Science and Engineering, Defence Institute of Advanced Technology, Pune, India E. Vinay Kumar Guru Nanak Institute of Technology, Telangana, & Research Scholar, GITAM (Deemed to Be University, Andhra Pradesh), Visakhapatnam, Andhra Pradesh, India Harish Kundra Guru Nanak Institutions Technical Campus, Ibrahimpatnam, Hyderabad, Telengana, India Sheetal Kundra Guru Nanak Institute of Technology, Ibrahimpatnam, India T. Kusuma Global Academy of Technology, Bangalore, India T. Lakshmi Sujitha Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India R. Dhana Lakshmi Department of Information Science and Technology, College of Engineering Gunidy, Anna University, Chennai, India Ch. Subba Laxmi Guru Nanak Institutions Technical Campus, Hyderabad, Telangana, India Bontha Mamatha K L University, Vaddeswaram, Vijayawada, India; Department of CSE, Guru Nanak Institutions Technical Campus, Ibrahimpatnam, India Samik Mandal Department of CSE, Dr. B. C. Roy Engineering College, Durgapur, West Bengal, India

Editors and Contributors

xix

Devi Mani Department Of Computer Science, College of Science and Arts (Female), Sarat Abidah Campus, King Khalid University, Asir - Abha, Kingdom of Saudi Arabia G. Manoj Guru Nanak Institute of Technology, Telangana, & Research Scholar, GITAM (Deemed to Be University, Andhra Pradesh), Visakhapatnam, Andhra Pradesh, India; Department of Information Science and Engineering, Dayananda Sagar Academy of Technology and Management Bangalore, Bangalore, India Anto Bennet Maria Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India Divyapushpalakshmi Marimuthu Department of Computer Science and Engineering, GITAM University, Bengaluru, Karnataka, India Mansoor Habib Mazumder National Institute of Technology, Patna, Bihar, India Abolfazl Mehbodniya Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha, Kuwait City, Kuwait Arun Mishra Computer Science and Engineering, Defence Institute of Advanced Technology, Pune, India Abirami Murugappan Department of Information Science and Technology, College of Engineering Gunidy, Anna University, Chennai, India T. Nagamani Kongu Engineering College, Erode, India Kanchan Naithani Department of Computer Science and Engineering, HNB Garhwal University, Srinagar Garhwal, Uttarakhand, India R. S. Nancy Noella Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India Hiranmayee Nandyala Department of CSE, Sri Vasavi Engineering College, Tadepalligudem, India M. V. Narayana Department of Computer Science & Engineering, Guru Nanak Institutions Technical Campus, Hyderabad, Telangana State, India; Guru Nanak Institutions Technical Campus, Hyderabad, Telangana, India Satheesh Narayanasami Department of Computer Science and Engineering, St. Martin’s Engineering College, Secunderabad, Telangana, India G. Narsimha JNTUHUCES, Sultanpur, Telangana, India V. B. Narsimha Department of CSE, University College of Engineering, OU, Hyderabad, Telangana, India Nehalika Neha Guru Nanak Institutions Technical Campus, Ibrahimpatnam, India

xx

Editors and Contributors

Y. Niranjan Useful Sensors Inc., Suite 180, Mountain View, CA, USA Stephen Jeswinde Nuagah Department of Electrical Engineering, Tamale Technical University, Tamale, Ghana Anupama Arjun Pandit Computer Science and Engineering, Defence Institute of Advanced Technology, Pune, India Hiren B. Patel Kadi Sarva Vishwavidhyalaya, Gandhinagar, India Krupa Patel Kadi Sarva Vishwavidhyalaya, Gandhinagar, India N. D. Patel Institute for Development and Research in Banking Technology (IDRBT), Hyderabad, TS (500057), India G. Prasanthi Department of CSE, Sri Vasavi Engineering College, Tadepalligudem, India M. Vani Pujitha CSE Department, V. R. Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India Achu Pushpan Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, India Pavan Kartheek Rachabathuni Department of Information Engineering, University of Florence, Florence, Italy Y. P. Raiwani Department of Computer Science and Engineering, HNB Garhwal University, Srinagar Garhwal, Uttarakhand, India Arya Raj Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, India Muruganantham Rajamanickam Department of Information Technology, TKR College of Engineering and Technology, Telangana, India V. Ramachandran Department of Computer Science and Engineering, GITAM University, Bengaluru, Karnataka, India D. Venkata Swetha Ramana Rao Bahadur Y Mahabaleswarappa Engineering College, Ballari, Karnataka, India V. Ramya Excel Engineering College, Komarapalayam, Tamilnadu, India Kaushik Rane Department of Information Technology, Mumbai University, Mumbai, India Rajasekar Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru, Karnataka, India Saikat Ray Department of CSE, Dr. B. C. Roy Engineering College, Durgapur, West Bengal, India Chinna Reddaiah Extreme Networks, Reading, United Kingdom

Editors and Contributors

xxi

A. Vamsi Kalyan Reddy Department of CSE, Global Academy of Technology, Bengaluru, India C. R. Renuka Rao Bahadur Y Mahabaleswarappa Engineering College, Ballari, Karnataka, India Md Majid Reza Department of Computer Science and Technology, Central University of Punjab, Bathinda, Punjab, India Md Rashid Reza Department of Computer Science and Information Technology, Mahatma Gandhi Central University, Motihari, Bihar, India G. B. Roopesh Global Academy of Technology, Bengaluru, Karnataka, India Trina Saha Department of Computer Science and Engineering, State University of Bangladesh, Dhaka, Bangladesh D. Saidulu Department of Information Technology, Guru Nanak Institutions Technical Campus, Hyderabad, Telangana, India Taiba Sana Rao Bahadur Y Mahabaleswarappa Engineering College, Ballari, Karnataka, India Kriti Saroha Centre for Development of Advanced Computing, Noida, India S. Savitha BMS Institute of Technology, Bengaluru, India Khushi Saxena Banasthali Vidyapith, Rajasthan, India Snigdha Sen Department of CSE, Global Academy of Technology, Bengaluru, Karnataka, India Sudhakar Sengan Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu, India Seyed Muzaffar Ahmad Shah Department of Computer Science and Technology, Central University of Punjab, Bathinda, India Maana Shori Centre for Development of Advanced Computing, Noida, India K. Shreevershith Global Academy of Technology, Bengaluru, Karnataka, India Ajeet Singh SML2029 Research and Consulting Pvt Ltd, Banjara Hills, Hyderabad, TS (500034), India; School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Telangana, India Maheshwari Prasad Singh National Institute of Technology, Patna, Bihar, India Satwinder Singh Department of Computer Science and Technology, Central University of Punjab, Bathinda, Punjab, India N. Sreedevi CMR Institute of Technology, Bengaluru, India

xxii

Editors and Contributors

K. Sreekanth CSE, GITAM University, Visakhapatnam, India; CSE, Nalla Narasimha Reddy Education Society’s Group of Institutions, Hyderabad, India Abhishek Srinivas Murthy Computer Science and Engineering, Global Academy of Technology, Bangalore, India M. Srivani Department of Information Science and Technology, College of Engineering Gunidy, Anna University, Chennai, India D. Stalin David Department of Information Technology, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamil Nadu, India Chatti Subbalakshmi Department of Computer Science & Engineering, Guru Nanak Institutions Technical Campus, Hyderabad, Telangana State, India Balu Subramaniam Department of Computer Science and Engineering, Paavai Engineering College, Namakkal, Tamil Nadu, India B. Sudharkar Department of Computer Science and Engineering, JNTUH, Hyderabad, Telangana, India V. SumaSree Rao Bahadur Y Mahabaleswarappa Engineering College, Ballari, Karnataka, India Charupalli Sunil Kumar Professor & Dean in CSE, Apollo University, Chittoor, Andhra Pradesh, India Prachi Tawde Dwarkadas J. Sanghvi College of Engineering, Mumbai University, Mumbai, Maharashtra, India Jayesh Thakur Department of Information Technology, Mumbai University, Mumbai, India J. T. Thirukrishna Department of Computer Science and Engineering, Guru Nanak Institute of Technology, Hyderabad, Telangana, India Brindha Thomas Department of Information Technology, Noorul Islam Centre for Higher Education, Kanniyakumari, India Aakanksha Tripathi Maharashtra Institute of Technology, Aurangabad, MS, India Geeta Tripathi Guru Nanak Institutions Technical Campus Hyderabad, Ibrahimpatnam, Telangana, India Rishiraj Jagdish Tripathi University of Hertfordshire, Hartfield, UK J. Vakula Rani CMR Institute of Technology, Bengaluru, Karnataka, India Priya Velayutham Department of Computer Science and Engineering, Paavai Engineering College, Namakkal, Tamil Nadu, India Mansi Verma National Institute of Technology Patna, Patna, India

Editors and Contributors

xxiii

M. Vijaya Sudha Ramachandra College of Engineering, Eluru, Andhra Pradesh, India H. Vijaya Department of CSE (CS/DS), Guru Nanak Institutions Technical Campus, Ibrahimpatnam, India Julian L. Webber Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha, Kuwait City, Kuwait Limbika Zangazanga School of Computers Science and Information Technology, DMI St. John the Baptist University, Lilongwe, Malawi

A Contemporary Analysis of Privacy Preserving Strategies in MANET Chinna Reddaiah, M. V. Narayana, M. Vijaya Sudha, and Ch. Subba Laxmi

Abstract Taking into account all of the important things, did a study on a variety of security flaws (mostly wormhole attacks) in MANET. It uses two types of nodes: trusted nodes and trusted nodes. The specialised and non-specialised mobile nodes at the disaster site are similar to the trusted and trusty mobile nodes at the disaster site. All of these things help build trust in the trust management system that has been suggested. Network nodes are judged on how well other nodes can compare their own data to that of another node. If two nodes are close to each other, the veracity of a node can be found out. Truth values that fall below a certain level are thought to be bad. The trust management method has been added to the routing protocol in order to find a safe route in the MANET. Keywords MANET · Routing time · Packet delivery · Privacy analysis

1 Introduction Considering all of the important factors, Sharma and Sharma [1] performed a study on a variety of security vulnerabilities (mostly wormhole attacks) in MANET. Trusted mobile nodes and trusted mobile nodes are the two kinds of nodes used in the proposed MANET architecture. The specialised and non-specialised mobile nodes at the catastrophe site are analogous to the trusted and trusted mobile nodes. Reputation, suggestion, and context all play a role in building trust in the suggested trust management system. Reputation of a network node is determined by how well other nodes can compare their own data to that of another node. The veracity of a node is derived from its nearby nodes. Truth values below a threshold are considered malevolent. C. Reddaiah Extreme Networks, 250 Longwater Ave, Reading RG2 6GE, United Kingdom M. V. Narayana (B) · Ch. S. Laxmi Guru Nanak Institutions Technical Campus, Hyderabad, Telangana, India e-mail: [email protected] M. V. Sudha Ramachandra College of Engineering, Eluru, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_1

1

2

C. Reddaiah et al.

The trust management technique has been implemented into the routing protocol to find a reliable route in the MANET. The suggested trust-based routing protocol computes the path’s veracity before selecting it for data delivery. Route trust was calculated explicitly by a trusted node, based on recommendations from other nodes along the path. The MANET was protected against packet drop attacks and its variations, such as active forge and resource consumption attacks, thanks to the suggested approach. On the basis of security, wormhole attacks were avoided and detected in this survey article. It is one of the most serious attacks on wireless mobile ad hoc networks, and it occurs on the network layer. Malicious nodes always create an illusion for both the sender and the recipient in a wormhole attack. Every device in a network is termed a node, and each node takes on the responsibilities of a client and a router in that network. MANETs are developed for a specific activity, if the task is clearly defined, active, and less infrastructure oriented. Authenticated Security Framework (ASF) was developed specifically for MANETs to address the unique security challenges they provide. As shown by Krishnamurthy et al. [2], ASF offers a well-organised technique for protecting routing and application data, ensuring that MANET delivers consistent, confident, and reliable message transaction with all valid nodes in a network. Systems for keeping track of the passage of time are getting more complex in response to the rising need for such tools. Simple systems are rapidly evolving into more complex, useful ones. Wired computer networks gave way to wireless ones, and rapid wireless infrastructure was a new fad that emerged. As mentioned by Kshatriya et al. [3], MANET was one of numerous wireless networks that evolved. Traditional routing protocols and security mechanisms do not function with MANETs because of their dynamic topology and lack of centralization. Many assaults and destructive actions are not conceivable in other networks because of the unique properties of MANETs. Passive and active assaults are more likely to occur when packets are handled by network nodes. With the ever-increasing challenges to the security of MANETs, a variety of solutions are being developed. A detection engine is proposed here to identify these actions at the MAC layer. Wireless 802.11 protocols with CSMA/CA are used. To begin, the detection engine adjusts to the MAC layer’s CTS/RTS model. Malicious behaviour is reported based on delays in packet processing or broadcasts. Anti-malicious activity detection is improved by using timing mechanisms in the MAC layer. Lastly, the communication network is cleared of erroneous nodes. The FSM implementation is used to verify the outcomes and theoretical values of the experiment. Proposed benefits are also outlined in this section. The rest of the paper is organised as follows: Sect. 2 presents the existing methods; Sect. 3 presents the proposed methodology; Sect. 4 experimental results; Sect. 5 Conclusion.

A Contemporary Analysis of Privacy Preserving Strategies in MANET

3

2 Literature Review After setting the initial context of this research, in this section, the parallel and recent research outcomes are analysed. As Kumar et al. [4] correctly pointed out, today, a successful mobile circumstantial network relies on the integration of a wide range of functions within the network (such as routing and administration), as well as security and QoS management. Currently, when security and quality of service (QoS) are considered separately, both have a detrimental impact on network performance. QoS and security methods are likely to be affected by this, as well as the MANET’s most vital and critical services. Achieving security and a high level of quality are the two goals that are discussed in this article. Protocol design and implementation for policy-based network administration and methodology for key management in a MANET are the best ways to achieve these goals. This novel wireless technology, known as a “mobile ad hoc network” (or “MANET”), connects a collection of mobile nodes in a dynamically decentralised manner without the need of a base station or centralised management, with each mobile node acting as a router. MANET topology is always changing due to the dynamic nature of MANET creation and the freedom it has to move at random. As a stand-alone network, MANET may be linked to other networks. When compared to stationary nodes, mobile nodes have a smaller footprint and less power and memory. There are certain drawbacks to MANET, despite the fact that it is extensively used in a variety of vital sectors. As established by Abdel-Fattah et al. [5], MANET is susceptible to a number of various forms of attacks on the protocol stack. Unlike traditional networks, which need substantial set-up time and resources, mobile ad hoc networks may be set up and deployed in minutes. Bhargavi and Raju [6] said that all of the nodes in this kind of network function as routers in addition to their usual functions. The topology of a MANET constantly changes due to the network’s mobility and dynamic nature, which allows all nodes to move freely. As a result, packets must be routed carefully from the point of origin to the point of destination. A multi-hop network such as a MANET has a lot of security difficulties since the packets must transit through intermediary nodes that aren’t necessarily authentic. Security-based routing schemes that take into account information from neighbouring nodes are discussed in this study. It assigns a trust value to every node in the network based on the trust information it receives from its neighbours and its previous transaction history. Routing pathways should not include hostile nodes if this value falls below a pre-determined threshold established by the coordinator nodes. This ensures that the path is safe. This protocol has been developed on NS-2, and findings reveal that this protocol outperformed conventional routing techniques in terms of packet delivery and throughput. Mobile ad hoc networks are few studies that encourage the collecting of securityrelated data in MANETs. B4SDC, a blockchain method for collecting security-related

4

C. Reddaiah et al.

data in MANETs, is proposed in this study. Liu et al. [7] conducted a detailed investigation and conducted experiments to demonstrate the usefulness and effectiveness of B4SDC. People’s need for wireless connectivity is clearly shown by the studies of Kumari and Vydeki [8]. Due to the absence of a central co-coordinator in MANETs, this is one of the most susceptible wireless networks. The security of MANETs may be threatened by any malicious nodes. For example, rogue nodes might lure packets from a source node and then dump them without delivering them to their intended destination. Black-hole attacks are one of these security threats. Use of ad hoc on demand distance vector (AODV), a reactive routing technology, is the primary goal of this research. There is a comparison of performance measures such as PDR and throughput between the black-hole afflicted MANET and the regular network at low, medium, and high traffic levels. NS-2 is used for the simulation. Because of its broad range of applications, such as defence, MANET security has been a hot research area for the last two decades, according to the study by Yadav et al. [9]. Numerous attempts have been made in this regard. This issue may not be entirely resolved by the current set of security algorithms, methodologies, models, and frameworks. This work proposes a “novel simple but effective outlier detection scheme-based security algorithm for protecting the ad hoc on demand distance vector (AODV) reactive routing protocol from a black-hole attack in a mobile ad hoc environment, which is motivated by various existing security methods and outlier detection”. Network simulator findings show that the suggested technique is simple, resilient, and effective over the original AODV protocol and other approaches. In a MANET, many mobile nodes work together to route packets in a multi-hop way without the need for any kind of centralised control. One of the most difficult tasks in a MANET is locating the rogue node. MANETs are vulnerable to a variety of security threats, including wormholes, black holes, grey holes, and a rushed assault. A grey hole attack’s misbehaviour is likely to affect network performance. As a result, defensive mechanisms for MANETs are being developed to identify and prevent the occurrence of grey hole nodes in the network. Anti-theft mechanisms for MANETs have been studied by Roshani and Patel [10]. The emergence of wireless mobile ad hoc network (MANET) has made it possible to communicate at any time, any place, without the backing of a pre-existing network infrastructure. As a result, MANET poses new security difficulties because of its unique properties, which are not seen in ordinary wired networks or cellular wireless networks. A temporal network like MANET needs the cooperation of its nodes in order to achieve the necessary information exchange efficiency. Traditional MANET reactive routing protocols have used different power-aware techniques in order to extend network lifetimes due to the small size and mobility of MANET devices. Using just the watchdog security methods to protect MANETs against black-hole attacks has become problematic with the advent of power-aware strategies. According to Ochola et al. [11], this study proposes a framework that overcomes the watchdog methods’ vulnerability in the presence of power-aware routing protocols in order to correctly identify and destroy black-hole nodes.

A Contemporary Analysis of Privacy Preserving Strategies in MANET

5

To secure ad hoc wireless networks such as MANETs, wireless sensor networks, and other types of ad hoc networks, trust might be a soft security solution (WSNs). A malicious node might join and damage the network because of the networks’ openness. Using trust models, a network’s nodes may be separated into good guys and bad guys. As a result of the variations between MANETs and WSNs, the trust mechanisms used by each kind of network must also change. In this paper, several MANET and WSN trust models are briefly discussed. Some of the most significant considerations for creating a new trust model are outlined below. The trust models for MANETs and WSNs are also discussed in detail. To extend the life of a mobile ad hoc network, energy conservation is the most important consideration. The bounds of mobile ad hoc nodes have been taken into account while designing a safe geographic routing protocol with data compression. Secure multi-hop strong path protocol (SMHSP) is based on advanced AODV. In the context of wireless and mobile ad hoc networks, the term “AODV” refers to the routing protocol known as ad hoc on demand distance vector (AODV). It creates a single-destination route. Routing protocols may be used for both unicast and multicast traffic. As a result, the amount of transmission is reduced. Use the MLZW technique to compress data for safe routing.–MLZW–Vimala and Srivatsa [12] applied the suggested approach, which is quicker and compresses a smaller quantity of data, resulting in an increase in efficiency. “The KECCAK algorithm is used as the foundation for Secure Hash Algorithm-3 functionalities. Some works have implemented SHA3-256 for safe routing in the hybrid routing approach of mobile ad hoc networks in the work of Dilli and Reddy [13]. (MANETs). An HMAC has been employed to ensure the integrity and authenticity of the data. Zone Routing Protocol (ZRP), a hybrid MANET routing protocol, was utilised in network simulator 2 as a hybrid routing approach (NS2)”. To boost throughput and packet delivery percentage, this study introduces an HMAC-SHA3512 algorithm in ZRP. This results in an increased end-to-end latency. HMAC-SHA3512 has been shown to perform 50% better than equivalent versions of HAMCSHA3-256 when run on a 64-bit Intel Processor, according to this performance study”. According to Alsumayt et al. [14], a mobile ad hoc network (MANET) is a selfconfiguring, dynamic, and non-fixed infrastructure made up of many nodes. Without an administrative point of contact, these nodes interact with one other. MANET, on the other hand, is vulnerable to a variety of assaults, including DoS attacks. A denial of service attack (DoS attack) is serious because it prohibits authorised users from accessing their services. DDoS assaults may be detected using the Monitoring, Detection, and Rehabilitation (MrDR) technique. To determine whether or not nodes can be trusted, the MrDR approach uses a trust value calculator. This study compares the MrDR DoS attack detection approach to the current Trust Enhanced Anonymous on demand routing Protocol (TEAP), which is similarly based on trust. TEAP technique and the suggested approach are compared on the basis of their packet delivery ratio and their network overhead. According to the findings, the MrDR approach outperforms the TEAP method in network performance.

6

C. Reddaiah et al.

Amraoui et al. [15] found that everyone is interested in wireless connection technology in their daily lives. In addition, the usage of mobile devices and applications based on wireless networking is on the rise. Mobile ad hoc networks may benefit from the security concept’s improved architectural support in the future Internet architecture (MANETs). Because this is an essential cross-cutting study area, this notion is slowly taking shape. The uniqueness of this work is in the references and instructions, and it provides to help readers navigate their way to fresh studies in related subjects. There is an introduction to MANETs and their development patterns before it explains security needs for such networks, sketching an outline of threats and attacks. As a result, a new cross-cutting study area called the Cooperative Internet of MANETs (CIMANETs) is defined for the first time. Also included is a comparison of different current security algorithms, as well as an explanation of these algorithms. Routing in a mobile ad hoc network (MANET) is essential, and it should be done quickly before a node departs the network. When used in a pleasant and gratifying manner, MANET routing methods are vulnerable to a wide range of attacks. In terms of remote system innovation, MANET is one of the most enticing domains. MANET is today one of the most vibrant and energetic fields of communication between systems. For example, a MANET is a self-sufficient group of mobile nodes that communicate with each other through distant connections and coordinate in an appropriate manner so as to provide convenience without a set infrastructure. MANET has transmission speed constraints; however, it allows for self-ruling communication across diverse clients. Node mobility and route topology changes lead to an unpredictability in the system’s architecture over time. Decentralisation requires safe route identification among nodes in order to facilitate communication. In order to include trustworthy nodes in the route discovery process, trust is calculated among nodes. New routing methods are provided in this paper that identify routes between trustworthy nodes and regularly update the routing table information according to network topology changes. Alapati and Ravichandran [16] shown that the suggested approach has a superior routing methodology than current techniques. The mobile ad hoc network (MANET) consists of a large number of nodes. They are only able to communicate with each other in a very small space. The dynamic performance of the network is degraded by the presence of malicious nodes in the MANET, which disrupts normal routing. During the routing process, malevolent nodes are always trying to outsmart their surrounding nodes, since all nearby nodes just send the reply and response of the neighbouring nodes. The activity of intermediary nodes is critical in the routing process, which moves at a steady pace. During the course of the project, it advised one method of preventing rogue nodes in the network from discarding packets. Detecting an attacker by employing the notion of detecting a connection to pass data or information between sender and recipient is the suggested approach in this article. Detection and prevention of packet drop on link, via node is done by IDS security system. The system not only detects malicious nodes but also takes steps to stop them. Data packets are dropped in an abnormally large number, revealing the identity of the attacker. It may be prevented by selecting an alternative route where the attacker undertaking malicious behaviour is not among the senders to recipients. The response of malicious nodes confirms the harmful activity

A Contemporary Analysis of Privacy Preserving Strategies in MANET

7

performed to the adjacent nodes or intermediate. After blocking hostile nodes that engage in network-wide harmful activities, the highly recommended IDS system both protects the network and improves its performance. Performance indicators like as PDR, throughput, and others are used to assess how well an IDS is performing in the face of an attack. In the study of Chourasia and Boghey [17], it was shown that safe routing increases data reception and decreases data dropouts in the network. “MANET is a sort of network in which autonomous nodes link directly without a top-down network design or a central controller”, as shown by Kamel et al. [18, 19]. Due to the lack of base stations in MANET, nodes must depend on one another to send messages. Because nodes may move around in a MANET, the connections between them are insecure. Denial of service attacks may be launched by malicious nodes at the network layer, which is known as a black-hole attack. STAODV (ad hoc on demand distance vector) has been offered as a safe and trust-based technique to increase the security of the AODV routing protocol. Malicious nodes that attempt to attack the network are isolated based on their past knowledge in this method. There is a trust level assigned to each node in order to determine how trustworthy it is. To avoid the black-hole attack, each incoming packet will be scrutinised. Further, in the next section of this work, the research challenges are identified.

3 Problem Identification After the detailed analysis of the recent research outcomes for privacy preservation, in this section of the work, the existing research challenges are listed. • Keeping users’ personal location information private in mobile ad hoc networks (MANETs) is a major difficulty. Most of the current LPP solutions safeguard the privacy of the user while compromising the ability to obtain the position on the server-side, that is, valid devices other than the user cannot receive the location in most circumstances. • When it comes to a variety of applications, such as geographic routing and location verification, location information must be available at a trustworthy server or access point. More widespread location-based services are projected to be used as networking and caching technologies advance, increasing the danger of leakage of location information through wireless channels. • Wireless channels need to protect user privacy while yet allowing them to get their actual position.

4 Results and Discussion In this section of the work, under an identical test condition, the parallel research attempts are analysed.

8 Table 1 Experimental set-up

C. Reddaiah et al. Parameter name

Value

Number of nodes

100

Range of the network (metres)

500

Initial network energy (%)

100

Mean distance between the nodes (metres)

35.8

Number of node with privacy preserved

62

Number of test iterations

100

Fig. 1 Experimental set-up

4.1 Testing Set-Up Firstly, the initial testing conditions are elaborated here (Table 1). The experimental set-up is also visualised graphically here (Fig. 1). The experimentation is conducted for 100 iterations with 100 nodes; however, here only 25 iterations are furnished in the upcoming sub-sections.

4.2 Packet Transmission Under Privacy Preservation Condition Secondly, during the testing conditions, the packet transmission characteristics of the network are analysed here (Table 2). Although theoretical and practical constraints have been established to the total capacity of wireless ad hoc networks, their decentralised character makes them suited for a wide range of applications where central nodes cannot be depended upon. Ad hoc networks are ideal for emergency scenarios such as natural catastrophes or military conflicts since they need little set-up and may be deployed in a matter of minutes. Ad hoc networks may be swiftly constructed because to the availability of dynamic

A Contemporary Analysis of Privacy Preserving Strategies in MANET

9

Table 2 Packet Delivery Analysis Trail sequence

Research outcome

Packet type

Number of packets sent

Number of packets received

Successful packet delivery (%)

1

[2]

Pt-Data

51

50

98.04

2

[2]

Pt-Control

79

61

77.22

3

[2]

Pt-Data

53

51

96.23

4

[2]

Pt-Control

86

58

67.44

5

[2]

Pt-Data

61

58

95.08

6

[4]

Pt-Control

84

58

69.05

7

[4]

Pt-Control

58

54

93.10

8

[4]

Pt-Data

64

60

93.75

9

[4]

Pt-Control

74

70

94.59

10

[4]

Pt-Control

62

51

82.26

11

[6]

Pt-Control

80

59

73.75

12

[6]

Pt-Control

66

55

83.33

13

[6]

Pt-Data

75

64

85.33

14

[6]

Pt-Control

60

50

83.33

15

[6]

Pt-Control

73

69

94.52

16

[11]

Pt-Data

86

56

65.12

17

[11]

Pt-Control

61

60

98.36

18

[11]

Pt-Control

70

70

100.00

19

[11]

Pt-Data

66

66

100.00 100.00

20

[11]

Pt-Control

53

53

21

[12]

Pt-Data

86

61

70.93

22

[12]

Pt-Data

85

73

85.88

23

[12]

Pt-Control

69

66

95.65

24

[12]

Pt-Data

78

55

70.51

25

[12]

Pt-Data

57

54

94.74

and adaptive routing technologies. The obtained results are analysed graphically here (Fig. 2).

4.3 Routing Time Under Privacy Preservation Condition Further, the routing time analysis is carried out here in this sub-section and presented here (Table 3). Decentralised networks have the benefit of being more reliable than centralised ones because of the several hops involved in the transmission of data. There is less

10

C. Reddaiah et al.

Fig. 2 Packet delivery analysis

Table 3 Routing time analysis Trail sequence

Research outcome

Number of nodes in the routing path

Routing time (ns)

1

[2]

3

9.30

2

[2]

4

8.61

3

[2]

3

5.62

4

[2]

7

4.61

5

[2]

3

8.31

6

[4]

4

5.81

7

[4]

4

6.54

8

[4]

4

5.20

9

[4]

6

7.63

10

[4]

7

5.60

11

[6]

3

8.29

12

[6]

5

8.81

13

[6]

3

8.26

14

[6]

7

8.76

15

[6]

7

6.13

16

[11]

4

4.96

17

[11]

4

8.82

18

[11]

3

7.48

19

[11]

5

5.90

20

[11]

6

5.18

21

[12]

7

5.10

22

[12]

4

6.21

23

[12]

3

9.77

24

[12]

4

8.21

25

[12]

7

9.33

A Contemporary Analysis of Privacy Preserving Strategies in MANET

11

Fig. 3 Routing time analysis

risk of single-point failure with MANETs since the data may go over several pathways instead of relying on a single point of failure, as is the case in cellular networks. Isolation and separation from the network are problems that may be resolved as the MANET architecture develops over time. Additionally, MANETs are more flexible (you can set up an ad hoc network anywhere using mobile devices), more scalable (you can quickly add additional nodes to the network), and less expensive to administer than traditional networks (no need to build an infrastructure first). The results are visualised graphically here (Fig. 3). Henceforth, it is conclusive that the work by Ochola et al. [11] demonstrates the higher performance during all types of analysis.

5 Conclusion A research on a range of security issues (mainly wormhole attacks) in MANET was conducted after taking into consideration all of the key factors. There are two kinds of nodes used in this system: trustworthy nodes and trusted nodes. The specialised and non-specialised mobile nodes at the catastrophe site are analogous to the trusted and untrustworthy mobile nodes at the disaster site in terms of their capabilities. All these factors contribute to the development of trust in the trust management system that has been proposed. Other network nodes are evaluated on their ability to compare their own data to that of another node, which is a measure of their performance. It is possible to determine the validity of a node if two nodes are near to one other in proximity. Truth values that fall below a given threshold are undesirable. It has been decided to include the trust management approach into the routing protocol in order to discover a safe path in the MANET.

12

C. Reddaiah et al.

References 1. Sharma PK, Sharma V (2016) Survey on security issues in MANET: wormhole detection and prevention. In: 2016 International conference on computing, communication and automation (ICCCA), Greater Noida, India, pp 637–640 2. Krishnamurthy H, Ashokkumar PS, Patil SS, Swamy M (2017) An authenticated security framework protecting routes and data in MANET. In: 2017 IEEE International conference on computational intelligence and computing research (ICCIC), Coimbatore, India, pp 1–4 3. Kshatriya N, Mallawat K, Biswas AS (2016) Security in MANET using detection engine. In: 2016 International conference on computing, analytics and security trends (CAST), Pune, India, pp 128–132 4. Kumar M, Bhandari R, Rupani A, Ansari JH (2018) Trust-based performance evaluation of routing protocol design with security and QoS over MANET. In: 2018 International conference on advances in computing and communication engineering (ICACCE), Paris, France, pp 139– 142 5. Abdel-Fattah F, Farhan KA, Al-Tarawneh FH, AlTamimi F (2019) Security challenges and attacks in dynamic mobile ad hoc networks MANETs. In: 2019 IEEE Jordan international joint conference on electrical engineering and information technology (JEEIT), Amman, Jordan, pp 28–33 6. Bhargavi VS, Raju SV (2016) Enhancing security in MANETS through trust-aware routing. In: 2016 International conference on wireless communications, signal processing and networking (WiSPNET), Chennai, India, pp 1940–1943 7. Liu G, Dong H, Yan Z, Zhou X, Shimizu S (2022) B4SDC: a blockchain system for security data collection in MANETs. IEEE Trans Big Data 8(3):739–752 8. Kumari B, Vydeki D (2017) Performance analysis of MANET in the presence of malicious nodes. In: 2017 International conference on nextgen electronic technologies: silicon to software (ICNETS2), Chennai, India, pp 79–83 9. Yadav S, Trivedi MC, Singh VK, Kolhe ML (2017) Securing AODV routing protocol against black hole attack in MANET using outlier detection scheme. In: 2017 4th IEEE Uttar Pradesh section international conference on electrical, computer and electronics (UPCON), Mathura, India, pp 1–4 10. Roshani P, Patel A (2017) Techniquesto mitigate grayhole attack in MANET: a survey. In: 2017 International conference on innovations in information, embedded and communication systems (ICIIECS), Coimbatore, India, pp 1–4 11. Ochola EO, Eloff MM, van der Poll JA (2016) Beyond watchdog schemes in securing MANET’s reactive protocols operating on a dynamic transmission power control technique. In: 2016 SAI computing conference (SAI), London, UK, pp 637–643 12. Vimala S, Srivatsa SK (2017) Security using data compression in MANETS. In: 2017 Third International conference on advances in electrical, electronics, information, communication and bio-informatics (AEEICB), Chennai, India, pp 528–531 13. Dilli R, Reddy PCS (2016) Implementation of security features in MANETs using SHA-3 standard algorithm. In: 2016 International conference on computation system and information technology for sustainable solutions (CSITSS), Bengaluru, India, pp 455–458 14. Alsumayt A, Haggerty J, Lotfi A (2018) Evaluation of detection method to mitigate DoS Attacks in MANETs. In: 2018 1st International conference on computer applications & information security (ICCAIS), Riyadh, Saudi Arabia, pp 1–5 15. Amraoui H, Habbani A, Hajami A, Bilal E (2017) Security & cooperation mechanisms over mobile ad hoc networks: a survey and challenges. In: 2017 International conference on electrical and information technologies (ICEIT), Rabat, Morocco, pp 1–6 16. Alapati YK, Ravichandran S (2019) Efficient route identification method for secure packets transfer in MANET. In: 2019 Third international conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC), Palladam, India, pp 467–471

A Contemporary Analysis of Privacy Preserving Strategies in MANET

13

17. Chourasia R, Boghey RK (2017) Novel IDS security against attacker routing misbehavior of packet dropping in MANET. In: 2017 7th International conference on cloud computing, data science & engineering - confluence, Noida, India, pp 456–460 18. Kamel MBM, Alameri I, Onaizah AN (2017) STAODV: a secure and trust based approach to mitigate blackhole attack on AODV based MANET. In: 2017 IEEE 2nd Advanced information technology, electronic and automation control conference (IAEAC), Chongqing, China, pp 1278–1282 19. Reddy VB, Negi A, Venkataraman S (2016) A comparison of trust in MANETs and WSNs. In: 2016 IEEE 6th international conference on advanced computing (IACC), Bhimavaram, India, pp 577–581

An Explainable AI Approach for Diabetes Prediction Aishwarya Jakka and J. Vakula Rani

Abstract Diabetes mellitus is one of the chronic diseases worldwide. According to World Health Organization global report 2019, diabetes was the ninth major cause of mortality. Machine learning techniques have been extensively used in healthcare applications for medical diagnosis and surgical applications. These algorithms often exhibit superior performance and could not explain the predictions because of complex behavior and black-box nature. Explainable AI (XAI) systems have become more popular for post hoc explanations and model transparency. This approach helps to understand the complex structure of ML models and make them transparent and trustworthy in the data-driven and fact-based decision-making process. The practical implementation of the XAI system, the local interpretable model-agnostic explanations (LIME) for diabetes prediction are presented in this paper. The experimental results demonstrate the explanations and factors contributing to diabetes and help the medical practitioners in decision-making to address clinical diagnosis and treatment measures. Keywords Artificial intelligence (AI) · Machine learning · Explainable Artificial Intelligence (XAI) · Local interpretable model-agnostic explanations (LIME)

1 Introduction Diabetes mellitus is a chronic condition that affects people all over the world. Diabetes was the ninth largest cause of mortality in 2019, according to the World Health Organization (WHO) global report, with an estimated 1.5 million deaths. Diabetes can lead to blindness, kidney failure, heart attacks, and lower limb amputation if left untreated or undiagnosed. Hence, early detection of the condition could save lives. A. Jakka University of Pittsburgh, Pittsburgh, PA, USA e-mail: [email protected] J. Vakula Rani (B) CMR Institute of Technology, Bengaluru, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_2

15

16

A. Jakka and J. Vakula Rani

Type I and Type II diabetes are the two most common forms. Type I diabetes is a hereditary disorder in which the human body cannot generate insulin. Type II diabetes is non-insulin-dependent mainly because of lifestyle [1, 2]. AI can assist healthcare professionals in a variety of ways, including patient care and administrative tasks. AI and machine learning models have been studied extensively and used in medical diagnosis, social networks, self-driving cars, natural language processing, and chatbot systems [3, 4]. Machine learning models are complex and act as complex black-box models. They provide predictions without explanations, giving users little guidance to know the appropriate process [5]. ML-based solutions in healthcare have limited adoption and utility because of their opaque nature. In order to have a better understanding of the model, there should be an effective approach to improve the explainability and utility of its outputs [6, 7]. In recent years, XAI has become more popular for decision support systems and predictive model interpretations [8, 9]. The internal evaluations, performance, and actions are communicated to the external users by the XAI system. Healthcare professionals/end-users are required to know how the XAI model decides and how the algorithms act in different situations by analysis and visualization of data [10]. Explainable AI is one of the most reliable approaches with high predictive performance levels, and it explains how and why AI models make predictions. The main advantage of XAI is model behavior, transparency, and making better decisions [11]. It bridges the gap between the AI system and the end-user for better accuracy and explainability. An overview of the XAI system is shown in Fig. 1. XAI model provides the information and explanations for the questions about input, output, performance, how, why, why not, what if, etc. It answers the questions regarding input dataset, model performance, and output with different clinical scenarios or patient populations [3, 10]. It gives explanations about the model performance with different clinical scenarios, and how the input parameters are selected and configured. Numerous explainable AI techniques are described in the literature. There are two main categories of XAI techniques: intrinsic explanations and post hoc explanations shown in Fig. 2 [7]. Intrinsic explanations involve making the internal functionality of the AI model accessible to the end user. This is typically done by using interpretable machine learning models, such as short decision trees or sparse linear models, which have simple structures and are easy to interpret. Post hoc explanations, on the other hand, are provided after the model has been trained and involve

Fig. 1 Overview of the XAI system

An Explainable AI Approach for Diabetes Prediction

17

Fig. 2 Types of XAI methods

a trade-off between predictive performance and interpretability. These methods can be model-agnostic, meaning they can be applied to any model, or model-specific, meaning they are tailored to a specific type of model. Examples of intrinsic methods include decision trees, KNN, linear regression, and logistic regression, while examples of post hoc methods [12] include neural networks, dimensionality reduction, and text and visual explanations. Model-specific and model-agnostic post hoc explanations are two types of post hoc explanations. Model-specific is limited to specific model classes and necessitates an examination of the model’s internal structure. Within the domain of explanation, it provides either global or local explanations. Individual predictions could be explained using global explanations, but they are less accurate than local explanations. Modelagnostic, on the other hand, does not necessitate any prior knowledge of the model’s internal structure. However, it requires the analysis of model predictions on a set of input data. The most popular model-agnostic explanation methods used by the researchers are LIME and SHAP [12–14]. In this paper, we introduced the XAI framework for healthcare applications and tested an experimented XAI approach for diabetes prediction. This paper is organized into six sections. The introduction and related work are explained in the Sects.1 and 2. Sections 3 and 4 discusses the proposed explainable AI-based prediction system for healthcare and the methodology. Sections 5 and 6 presents the experiments, findings, and explanations, conclusion and future work of this work.

2 Related Works Zhou et al. [2] proposed an XAI system to bridge the gap between user requirements for interpreting predictive model outcomes and AI systems’ technical capabilities. This question bank is about the reliability of the prediction as assessed by metrics, as well as how questions address the process in the prediction. The authors of Madumal et al. [8] worked on a DSS based on XAI that helps in the clinical diagnosis of chronic diseases. For the various applications, the XAI system can perform feature identification and feature significance. HbA1c and BMI have been recognized as

18

A. Jakka and J. Vakula Rani

the two most relevant variables in the case of diabetes, according to the model, as per standard practice. Collaris et al. [11] have demonstrated visualizations to help model interpretation by enhancing correctness and confidence. The proposed visualizations give data scientists a detailed picture of a feature prediction, allowing them to diagnose models, make decisions, and justify model interpretability. Zhang et al. [12] investigated the use of XAI in clinical DSS systems from a technological viewpoint. According to their findings, explainability is a prerequisite for addressing the difficulties in a long-term, sustainable manner that is consistent with professional standards and values.

3 Explainable AI-Based Prediction System for Healthcare The XAI approach helped to understand the complex structure of ML models and make them transparent and trustworthy in data-driven and fact-based decisionmaking processes [15]. This model helps the AI-based decision support system by answering the question (i) what has been done, (ii) to explore the knowledgebased actions, (iii) verification of present knowledge. The XAI methods can help in decision-making and provide explanation to the medical practitioners. The proposed framework of the XAI model for diabetes prediction is shown in Fig. 3. It has three phases (i) training, (ii) prediction, (iii) interpretation/explanation phase. In the training phase, the machine learning model is trained by using historical data and fine-tuning the hyperparameters to get an optimized trained model. In the second phase, the model is ready for the prediction on new data. Next, the XAI layer provides transparency and interpretability. The predicted outputs are interpreted by analyzing the factors contributing to the disease and help the medical practitioners in decision-making to address clinical diagnosis and treatment measures.

4 Methodology We applied LIME to the diabetes dataset to illustrate this phenomenon and dataset is downloaded from the UCI ML repository. This dataset contains 768 samples in that 500 samples are positive for diabetes, and 268 are non-diabetes. Dataset contains eight independent variables and one dependent variable. The dataset is found to be inconsistent due to missing values. A few columns in the dataset have missing values, including glucose, blood pressure, skin thickness, insulin, and BMI. These missing values were filled with the median of the column. The dataset also has output class imbalance sample distribution. In order to train a good classification model, it needs to have an equal number of instances for each class. Otherwise, it will be biased to predict patients with negative diabetes. Oversampling or under-sampling methods are used to handle this problem. A balanced dataset gives accurate predictions without bias. Here, the oversampling method is used to adjust the output class

An Explainable AI Approach for Diabetes Prediction

19

Fig. 3 Proposed framework of XAI model for diabetic prediction

distribution. Now, the dataset consists of a total of 1000 samples and 500 samples for each class of diabetic and non-diabetic. The dataset is divided into 70% of train data and 30% of test data. Decision tree and neural network models are chosen for prediction and XAI Lime for analysis, and explanations of predicted outcomes. The decision tree classifier classifies the output class by building a decision tree. Each internal node in the tree represents a test on a feature, and each branch descending from that node corresponds to one of the possible values for that feature. Each leaf node represents a class label that specifies the decision after processing all the features. Neural networks are a series of algorithms that mimic the operations of a human brain to recognize relationships between vast amounts of data.

20

A. Jakka and J. Vakula Rani

5 Experiments and Result We have implemented a decision tree model and used grid search to tune the hyperparameters in order to optimize the model. Hyperparameter tuning is the process of adjusting the hyperparameters of a model in order to improve its performance on a given dataset. Grid search involves specifying a grid of hyperparameter values and then training and evaluating a model for each combination of values. The optimization process has resulted in a reduction in the depth of the tree from 13 to 8, which may have improved the speed and performance of the model. Figure 4 shows the diagram generated by the optimized model. Table 1 describes the feature importance for false-negative generated by the decision tree (DT) model, and Table 2 presents the prediction. The decision tree model gives higher importance to glucose, BMI, and age. This is an indication for medical practitioners to focus on these patients’ features to make the prediction. Furthermore, receiver operating characteristics for the decision tree will be plotted to understand how the decision tree makes the prediction is shown in Fig. 5. The model accuracy is 75%, and the area under the curve for the optimized DT is 83.9%. Trained the data using neural network model, the hidden layers are modeled and tuned to get the best model. ReLU activation function with threshold zero is used in the hidden layers, and SoftMax is used for the output layer to get the binary output of diabetic or non-diabetic. The prediction accuracy depends on the number of hidden layers and nodes chosen for both training and validating the dataset. This neural network model is designed with three hidden layers and one output layer. The model was built on training data, and cross-validation was done to fine-tune the model. The model was fine-tuned by changing hyperparameters to find the best parameters to minimize the loss and maximize the accuracy of the model. The accuracy of the model is 76%, and the area under the ROC curve is 85.6%. The learning curve is plotted to understand how the neural network model learns. The visualizations of the learning curve, loss curve of train and test data, and ROC-UC curve of the optimal neural network model are shown in Fig. 6a, 6b and 6c, respectively. Now, the model is ready for the new predictions. The XAI model was tested for model evaluation and interpretation of the test data. The run time LIME library understands the local instances of the samples and gives the interpretations. The test sample instances are selected randomly, and the explanations and interpretations generated by the neural network for true negative and false positive cases were shown in Fig. 7a, 7b and 7c. The left side of the sub-figure displays selected features along with learned linear parameters, while the right side corresponds to the feature value of the sample. The system describes two working rules and gives parameter values to get the output. The classification of the sample into two classes: class-0 (non-diabetic), and class-1 (diabetic), according to the most significant features, glucose, age, and body mass index (BMI). As given in Rule 1 if the value of the glucose is less than or equal to100 and BMI is less than or equal to 30, age is less than or equal to 30, pregnancies are less than or equal to 4 and skin thickness is less than or equal to 35 and insulin is less

Fig. 4 Tree construction by the decision tree model after hyperparameter tuning

An Explainable AI Approach for Diabetes Prediction 21

22 Table 1 Feature importance by DT false negative

Table 2 Predictions by decision tree

A. Jakka and J. Vakula Rani Features

Weightage

Glucose

0.4694138690088672

BMI

0.24092997980695113

Age

0.12287304277967197

BloodPressure

0.08283588881637777

DiabetesPedigreeFunction

0.04959101423514869

Pregnancies

0.02135644169606756

Insulin

0.01299976365691569

SkinThickness

0.0

Patient id: 90 Predicted: No diabetes True diagnosis: Diabetes Patient features Pregnancies: 5.0 Glucose: 115.0 BloodPressure: 76.0 Skinthickness: 29.0 Insulin: 125.0 BMI: 31.2 DiabetesPedigreeFunction: 0.343 Age: 44.0

Fig. 5 ROC for decision tree

than or equal to 35, then the system will predict the patient is non-diabetic. Whereas the values in Rule 2 if the values of the parameters are glucose greater than 100 and BMI greater than 30, age greater than 30, pregnancies greater than 4, skin thickness greater than 35, and insulin greater than 35, then the system will predict the patient is diabetic. The present model gives an accuracy of 85%.

An Explainable AI Approach for Diabetes Prediction

23

Fig. 6 6a Learning curve 6b Loss curve 6c ROC curve

This system provides the general rules to predict output and the performance attributes of the algorithm which helps the user to analyze performance. It tracks the model’s behavior on a global scale and inspects to find out whether the model has performed any undesired operation.

6 Conclusion XAI (eXplainable AI) approaches are designed to provide insight into how AI models make decisions and predictions. This is particularly useful in healthcare applications, where it is important for medical practitioners and end-users to understand how the AI system is making decisions and predictions. In the study, the use of LIME (Local

Fig. 7 7a LIME model output for random sample [test case true negative (TN)] 7b LIME model output for random sample [case false positive (FP)] 7c Prediction and explanations for case false positive (FP)

24

A. Jakka and J. Vakula Rani

Fig. 7 (continued)

Interpretable Model-agnostic Explanations) allowed for the identification of features and their importance in the diabetes prediction model. The model gave greater weight to glucose, BMI, and age, and the XAI system provided explanations that helped medical practitioners and end-users understand how the model was working. By providing insight into the decision-making process of the AI model, XAI approaches can help healthcare professionals understand and use the system more effectively. The decision tree model can be explored further with ensemble techniques and deep learning models in future work.

An Explainable AI Approach for Diabetes Prediction

25

References 1. Jakka A, Rani VJ (2019) Performance evaluation of machine learning models for diabetes prediction. Int J Innovative Technol Exploring Eng (IJITEE). 8(11):1–5. ISSN: 2278–3075 2. Zhou H et al (2020) Diabetes prediction model based on an enhanced deep neural network. J Wireless Com Network 2020:148. https://doi.org/10.1186/s13638-020-01765-7 3. Yang CC (2022) Explainable artificial intelligence for predictive modelling in healthcare. J Healthc Inform Res 6(2):228–239. https://doi.org/10.1007/s41666-022-00114-1. PMID: 35194568; PMCID: PMC8832418 4. Amann J et al (2020) Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak 20:310. https://doi.org/10.1186/s12911-020-013 32-6 5. Wang X et al (2021) Exploratory study on classification of diabetes mellitus through a combined random forest classifier. BMC Med Inform Decis Mak 21:105. https://doi.org/10.1186/s12911021-01471-4 6. Gerlings J et al (2021) Reviewing the need for explainable artificial intelligence (xAI). In: Proceedings of the 54th Hawaii international conference on system sciences 7. Markus AF et al (2021) The role of explainability in creating trustworthy artificial intelligence for health care: A comprehensive survey of the terminology, design choices, and evaluation strategies. J Biomed Inform 113:103655 8. Madumal P et al (2019) A grounded interaction protocol for explainable artificial intelligence. In: Proceedings of the 18th International conference on autonomous agents and multiagent systems (AAMAS 2019). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, pp 1033–1041 9. Belle V et al (2020) Principles and practice of explainable machine learning. arXiv:2009.11698v1 10. Zhou Z et al (2021) S-LIME: stabilized-LIME for model explanation. In: KDD ‘21: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 2429–2438 11. Collaris D, van Wijk JJ (2022) Comparative evaluation of contribution-value plots for machine learning understanding. J Vis 25:47–57 12. Zhang Y, Weng Y, Lund J (2022) Applications of explainable artificial intelligence in diagnosis and surgery. Diagn 12(2):237. https://doi.org/10.3390/diagnostics12020237 13. Lundberg S et al “LIME” 14. Kaur J, Suryakant, Kaur K (2021) Explainable AI in diabetes prediction system. Acta Sci Med Sci 5(10):131–136 15. Naik H et al (2021) Explainable artificial intelligence (XAI) for population health management – an appraisal. Eur J Electr Eng Comput Sci (EJECE) 5(6):64–76. ISSN: 2736–5751

Classification of Cervical Cells Using Deep Learning Feature Extraction Abolfazl Mehbodniya, Julian L. Webber, Devi Mani, D. Stalin David, Amudha Kandasamy, Rajasekar Rangasamy, and Sudhakar Sengan

Abstract A practical approach is required for segmenting cervical cancer pictures. This study shows a novel approach to cervical cancer picture segmentation. An HSI color model is created by first converting an RGB image to an HSI color model. For the saturation and intensity components, thresholding is used to generate binary images. A new mask is created by combining the binary images. Using the linked component idea, the nucleus and cytoplasm are appropriately segregated. This technique delivers higher accuracy than the edge and region-based segmentation when A. Mehbodniya · J. L. Webber Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha, Kuwait e-mail: [email protected] J. L. Webber e-mail: [email protected] D. Mani Department Of Computer Science, College of Science and Arts (Female), Sarat Abidah Campus, King Khalid University, Asir - Abha, Kingdom of Saudi Arabia e-mail: [email protected] D. Stalin David Department of Information Technology, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamil Nadu 600072, India e-mail: [email protected] A. Kandasamy Department of Electronics and Communication Engineering, Kongunadu College of Engineering and Technology, Trichy, Tamil Nadu 621215, India e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru, Karnataka 561203, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_3

27

28

A. Mehbodniya et al.

segmenting the nucleus. A suggested framework for determining the cancer stage as normal or abnormal is based on nucleus size and other factors. The experimental findings and performance assessment revealed that the suggested system delivers excellent outcomes. The performance of the proposed system is compared with that of the current system using the above parameters, and it is demonstrated that the image is segmented successfully with excellent features. The execution is done and implemented in MATLAB. Keywords Medical image segmentation · Cervix · CIN · HIS · Thresholding · Morphology

1 Introduction Cervical malignancy is a disease which is generally caused by a virus named Human Papilloma Virus (HPV). The WHO estimates that there are approximately 530,000 new cases of cervical cancer each year. The disease can be cured if diagnosed at an early stage. Early sexual contact, multiple sexual partners, and conception of prevention pills are the different causes of the disease [1]. The different symptoms of cervical cancer are weight loss, fatigue, back pain, leg pain, leakage of urine, bleeding between periods, and abnormal changes in the cervix. Pap test colposcopy is commonly used to examine the cervix and vagina [2]. The test identifies irregular cells in the cervix, classified from average to abnormal. If the test result is positive, you are found to be infected with cervical cancer. Numerous drawbacks of the Pap test include expert dependence, low sensitivity, and the necessity for repeated examinations. Cervical image segmentation begins at this point. Various techniques, such as thresholding, level setting, active contour, k-means, water-shed, and morphological operations, are implemented to segment pap test-obtained image processing into different objects. The image of the female cervix is exposed in Fig. 1. The method proposed relies on the ability to identify all nuclei in an image and exclude all other components. The information about nuclei will tell you if the nucleus is normal or abnormal. The increase in the precision of segmentation is calculated using quantitative methods. Fig. 1 Cervix structure

Classification of Cervical Cells Using Deep Learning Feature Extraction

29

The major problem is image degradation, such as blur, noise, and color or contrast imperfection, severely affecting segmentation. Segmenting and classifying are the two most essential parts of any computer-aided screening system. The main goal of segmentation is to separate the nucleus from the background area. Cell classification categorizes the cells into different stages [CIN-0, CIN-1, and CIN-2]. Image noise is eliminated during the preprocessing step. An enhanced image is converted into an HSI color model. With the help of thresholding, segment the nucleus to form the saturation element of an input image. Labeling is done using the connected component concept [3]. The pixels can be labeled if they are connected or have similar intensity values in a connected part. The rest of the article Sect. 1 is cervical malignancy. Section 2 is related works. Section 3 is the proposed model of cervical malignancy findings. Section 4 is the result and discussion of the work. Section 5 is the conclusion and future work of the article.

2 Related Work In image processing, automated segmentation is a difficult task in general. Several algorithms for the segmentation of microscopic images have been published. For segmenting cervical images, the scientists utilize different mainstream procedures, such as clustering, edge detection, threshold, region-based methods, and watershed transformation. An integrated system algorithm for detecting and classifying cervical cancer was proposed by the author [4]. Image classification and processing are performed using the Artificial Neural Network Fuzzy Inference Method (ANFIS) and water-shed segmentation techniques, which are compared to other techniques. Malignancy was assigned to an irregular image using a detailed set of fuzzy rules. Tests on mice demonstrate that the proposed method is feasible and provides better precision in identifying tumor forms. An author [5] proposed an algorithm for segmenting cervical cancer. The adaptive median filter is used in the cervical image preprocessing to reduce the noise from the image. HAAR wavelets discuss the abnormal staining effect in cervical images. The technique of the bit-slice plane accomplishes context subtraction. The preprocessed images are segmented using the Intersecting Cortical Model (ICM) and cuckoo algorithms. The system developed by the author [6] uses color features and k-means clustering to identify the nucleus in cervical cytology images. Colors are separated using de-correlation stretching, and k-means are used to classify the regions of interest. K-means works well for classifying images of single cells. A false segmentation occurs if the intensity of the image surface is not similar; choose a centroid carefully. The author [7] proposed the segmentation of overlapping cells in Pap smear images using distance-metric and morphological operations. The morphology operation and thresholding aid in the initial partitioning of the cytoplasm in the cervical image, and the overlapping area on the cytoplasm locale is then separated using morphological operations and distance criteria on each pixel. It was proposed by the author [8]

30

A. Mehbodniya et al.

to automate cell division and classification in pap smears. A single cervical cell image is sectioned into the nucleus, cytoplasm, and background using FCM. This strategy is not continuously suitable for images with more than one cell. By ranking the cells based on their feature characteristics and modeling abnormality degrees, classification is presented as a grouping problem. A method for determining whether or not a cervical cell is cancerous put cervical biopsy images into context with nuclei, red blood cells, stroma, and a global threshold value. Global thresholding does not make use of spatial information and is sensitive to noise. Several local thresholding problems can be overcome using local threshold algorithms, but the major drawback is that they depend upon many parameters. According to the author [9], a planar curve with an energy function is defined as an active contour. Contour initialization and poor convergence to boundary concavities were two of the most significant issues in the snake model. The image of intensity is complemented, threshold merged, and smoothed with the product’s image. Using the extended minima function, the local minima are made smaller. Then, using the Hill-Climbing technique-based water-shed algorithm, the multi-scale gradient of the resulting grayscale image is broken up into different parts.

3 The Motivation for the Proposed Work The quality of the medical images that have been captured is not always suitable due to the heat generated, bad sensors, and the parameters. Many more methods are available for the density, but there is no more consideration of the quality of the output. Different image quality has also been degraded when an image is undergone through image processing methods like segmentation and preprocessing. The preprocessing output will also be reflected in the output of the segmentation. Many segmentation algorithms fail to maintain the good features of medical images. Effective processing can be done before segmentation to produce the expected output to avoid this problem. A genetic algorithm (GA) is used for the effective selection of the coefficient of the kernel. By taking the correct value of the filter, an image can be decomposed, and further preprocessing can be done effectively [10]. The HSI color model is the perfect option for developing image processing algorithms. The HSI model also helps in extracting the correct information. The correct threshold value is chosen, and the same is applied to the intensity and saturation components by increasing the number of iterations. In the global threshold, a value is assigned to all the pixel images, but the pixel’s intensity in the image is not precise and uniform. Global thresholding extracts a few essential or relevant parts of an image. In a microscopic image, the pixel’s intensity value can be changed at the beginning of the process. A local threshold was used; further, it avoids data loss. By combining the binary images, a proper mask can be produced, and it is further used as the input image to separate the nucleus from the cytoplasm background [11]. The following problems can be avoided: contour initialization, poor convergence, poor boundary identification, and over-segmentation. The issue that needs to be addressed is over-segmentation and

Classification of Cervical Cells Using Deep Learning Feature Extraction

31

Fig. 2 Flow of the proposed system

Fig. 3 Overview of proposed model

keeping the delicate features of an input image during segmentation. Motivated by the fundamental factor, image preprocessing and segmentation have been done in this paper. It is also justified that the proposed methodology will provide better results and clearance than the existing techniques. The overall working principle of the proposed architecture for the framework is given in Fig. 2. This represents an input image and is then subject to segmentation. Ultimately, characteristics are taken from a segmented image in which the cancer is marked as regular or abnormal. Figure 3: The first methodology contains preprocessing using a GA and bilateral filter. It includes steps for segmenting the nucleus from the cytoplasmic background. In module II, the following concepts are used. Thresholding, morphology, masking, and connected component labeling produce the best segmentation of cervical images [12].

3.1 Bilateral Filter and Genetic Algorithm A generation is one iteration of a GA, a meta-heuristic algorithm miming natural selection. The primary idea of this problem is that in that spatial domain, the image with the near pixels would have the nearby location. Consequently, this is the reason why nearby intensity values can be contained. The term “range filtering” can be

32

A. Mehbodniya et al.

described as being around the pixel levels with variation and weight decay. The weight will be defined here by the image strength [13]. A GA is a meta-heuristic algorithm that mimics the natural selection procedure. The GA is an iterative procedure, with a generation called the population in each iteration. The GA aims to maximize the bilateral filter range and domain sigma. Step 1 Step 2 Step 3 Step 4

Generate a random sample whose size depends on the problem (P). For Each generation, fitness filters the population. There is a reduction in the genetic crossover between parent and child. Genetically modified children are born with more options for solving problems. Step 5 When the terminating condition is met, the process is ended [14].

3.2 Morphological Operations Mathematical morphology is an effective method for detecting different issues in image handling and PC vision. Numerical morphology theory uses binary, grayscale, and color images for preprocessing or feature analysis. When it comes to non-linear operations, morphological image handling is all about shaping or altering the highlights in an image. Images can be processed using various mathematical morphologies, which extract image shape features by working with various structural elements. See Fig. 4 for a representation of the A33 structural element. Set of coordinates = (−1,1), (0, 1), (1,1), (0, 0), (1, 0), (−1,1), (0, 1), (1, 1). The input image’s background pixels are considered when dilation by this structuring element occurs in computing processes (Fig. 5). In order to have the shaping entity’s origin finding concur with the input pixel, we consider placing it on top of the training images for each grayscale image [15].The input pixel is set to the foreground value if a structural element pixel matches a foreground pixel in the image below. Each foreground pixel is taken into account when calculating the image-eroding effect of this structuring element. Structuring elements are superimposed on top of a pixel in an image so that origins are aligned with the coordinates of pixels. After a structuring element has been added, the input pixel is not affected [16]. A pixel’s neighbors are defined by a “stacking element” matrix. It is the application’s requirements that dictate how the Structural Element (SE) is constructed. Fig. 4 Set of coordinate values

Classification of Cervical Cells Using Deep Learning Feature Extraction

33

Fig. 5 Structured 3-inch disc

Because abnormality sizes can vary, medical images should have scalable structural elements. Typically, the shape and size of the mask are selected subjectively. Disc molded masks have been used for therapeutic images more frequently than other masks and are represented in Eq. (1). These operations are defined as Eqs. (2) and (3). ( ) Se = strel ' disk' ,3

(1)

Dilation( f ⊕ k) = (x, y) = Max{ f (x − m, y − n) + k(m, n)}

(2)

Erosion( f ⊕ k) = (x, y) = Max{ f (x − m, y − n) − k(m, n)}

(3)

An image (grayscale or binary), operated by the corresponding structuring element (K), is called “f ” and its pixels are called “x/y” in this example. The opening operator is the same as using Eqs. (4) and (5) on the same image when performing erosion and dilation. Small details are removed, and connections are strengthened by the opening operator while the closing operator fills in any gaps. The appearance and fit of a mask are purely personal preferences. Masks in the shape of discs are frequently used in medical imaging [17, 18]. AOB = (A Θ B) ⊕ B

(4)

A − B = (A ⊕ B) Θ B

(5)

4 HSI Color Model Multimedia, graphics, and computer vision all employ color image processing. Color models include RGB, CMY, YIQ, and HSI. RGB is proposed to be enhanced by HSI. The equation changing from RGB to HSI or vice versa is more entangled than other color models. Color objects are viewed by humans in terms of their hue, saturation, and intensity. In a color image, the HSI color model separates the components of

34

A. Mehbodniya et al.

intensity, hue, and saturation. In order to create image processing algorithms that are based on human-friendly color descriptions, the HSI color model is a must-have tool. The RGB image is transformed into HSI color space using the proposed algorithm. Specific hue, saturation, and intensity are three aspects of an output image. Hue This is a color property that describes a pure color. Saturation The percentage of white light that dilutes a pure color. Intensity Because it is so subjective, it is nearly impossible to gauge one’s level of brightness. The Red–Green–Blue (RGB) color model is an orderly representation of color images. However, since the three components of the RGB color model are highly interrelated, the chromatic data cannot be explicitly used. The HSI color space is an exceptionally imperative and alluring color model for image handling applications since it represents colors correspondingly to how the human eye detects color. Therefore, HIS color space is embraced as it is helpful to change over from RGB, which is conjointly related to human recognition by utilizing the conditions Eqs. (6), (7), and (8). { h = cos

−1

0.5[(r − g) + (r − b)]

}

]1/2 ; h ∈ [0, ∏] for b ≤ g [ (r − g)2 + (r − b)(g − b) s = 1 − 3 ∗ min(r, g, b); s ∈ [0, 1] i=

(R + G + B) ; i ∈ [0, 1] (3 ∗ 255)

(6) (7) (8)

Connected Component Labeling Connected Component Analysis (CCA) is a unique method in image processing that outputs a picture and labels its pixels in terms of pixel connectivity. The labelling should be possible if pixels are associated with each other or on the off chance that all pixels share comparable escalated values in an associated component. The undesirable little regions in the binary image were expelled by expelling all associated components having less than a threshold roughly equivalent to the area of a small Nucleus present in an image. Each associated segment was considered a cell bunch. The acquired binary image is then labelled by applying a 4-connected component analysis, and each labelled area is checked for the likelihood of being a core in a picture by calculating region-based features, i.e., region and Actual pixels in the connected component’s region strength. ( strength. / ) Strength = Area Convex Area .

Classification of Cervical Cells Using Deep Learning Feature Extraction

35

4.1 Algorithm Implementation Image thresholding techniques are used to analyze the images. Using image thresholding, you can divide an image into foreground and background. Binary images are generated from grayscale images using this image analysis method, which isolates individual objects. Thresholding is used to extract an object from its foundation by assigning an intensity value to every pixel. The global thresholding is exceptionally reasonable for bi-model medical images, which implies that most of the pixels are distributed over two overwhelming regions. The gray threshold function uses an Otsu technique to pick the threshold to limit the interclass difference of the high-contrast pixels. It is an image thresholding algorithm for decreasing a gray-level picture into a binary image, Eq. (9). σ 2 ω(t) = ω0 (t)σ02 (t) + ω1 (t)σ12 (t)

(9)

where ω0 & ω1 —Probability of two classes separated by a threshold ‘t’. σ2 0 & σ2 1 —Variance of two classes. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9 Step 10

Input the microscopic biopsy image. Remove the noise in the image using a bilateral filter and GA. The HSI color model proposed in this project has been improved. With thresholding, separate the Nucleus from the image’s saturation. Globally threshold an image’s intensity. Separately morph saturation and intensity. Saturation and intensity are converted to binary images. By combining binary images, a new mask is made. Labeling uses connected components. The nucleus is segmented to extract and classify features.

4.2 Experimental Design Several initial parameters, like threshold levels and morphological operations, are established prior to conducting experiments. The threshold value varies from 0.1 to 1.0. Morphological and connected component labeling is commonly used in experiments. The experiments are carried out for all types of images. The proposed approach is correctly fit for segmenting the Nucleus from the background. The proposed approach is evaluated and compared with the performance metrics like precision, recall, PSNR, MSE, SSIM, AD, NK, SC, LMSE, and NAE. The different stages of cervical cancer images are shown in Fig. 6. The cervical cancer stages are mild, moderate, severe, and carcinoma. Each stage of the nucleus varies in size. The output of the proposed algorithm is shown in Fig. 7. Precision ( / is defined) as the ratio of TP to the maximum number of predicted positives TP (TP + FP) or the percentage

36

A. Mehbodniya et al.

Fig. 6 The 4 Classes of cervical cells

of the positive class that was correctly predicted. / The recall is the fraction of points in class 1 predicted by the model as class 1, (TP) (TP + FN).

Fig. 7 Output images (a-k)

Classification of Cervical Cells Using Deep Learning Feature Extraction

Precision = Recall = TP TN FP FN

TP TP + FP

37

(10)

TP TP + FN

(11)

Pixel location is inside in accurate and segmented image Pixel location is inside in the real and outside in segmented image Pixel location is outside the real and inside in the segmented image Pixel location is outside in real and outside in segmented image also.

PSNR Eq. (12) is given as follows: ( PSNR = 10 log10

2552 MSE

) (12)

Equation (13) calculates the MSE: )2 ∑ N −1 ∑ N −1 ( ˆ i,j − u i,j i=0 j=0 u MSE = ∑N−1 ∑ N −1 ( )2 i=0 j=0 u i,j

(13)

where u i, j is the original and uˆ i, j is the segmented image. An Eq. (14) is needed for this calculation. (2μu μv + c1 )(2σuv + c2 ) )( ) SSIM(u, v) = ( 2 μu + μ2v + c1 σu2 + σ μ2v + c1

(14)

where μu the average of u is, μv is the average of v, c1, and c2 are constants, σuv is the covariance of u and ‘v, σu2 is the variance of u, and σv2 is the variance of the v. The average difference (AD) is calculated using the following Eq. (15). It implies the average difference between an input image and a segmented image. u(i,j) is the original image, and v(i,j) is called a segmented image. 1 ∑∑ AD = (u(i, j ) − v(i, j ))2 MN i=1 j=1 N

M

(15)

The NK is calculated using the following Eq. (16): ∑N ∑M NK =

j=1 (u(i, j ) ∗ v(i, j )) ∑N ∑M 2 i=1 j=1 (u(i, j ))

i=1

(16)

Here are the equations used to calculate the SC: Eq. (17). Figure 8 shows PSNR values of different segmentation techniques images such as mild, moderate, severe, and

38

A. Mehbodniya et al.

carcinoma at different threshold levels. Figure 9 shows precision values of different segmentation techniques on images such as mild, moderate, severe, and carcinoma at different threshold levels. ∑N ∑M 2 i=1 j=1 (v(i, j )) (17) SC = ∑ N ∑ M 2 i=1 j=1 (u(i, j )) The following equation is used to calculate the LMSE Eq. (18). ∑m−1 ∑n−1 [ LMSE =

i=1

{

' j=1 O{ f ( j, j )} − O f (i, ∑m−1 ∑n−1 2 i=1 j=1 [O{ f (i, j )}]

j)

}]2 (18)

For the purpose of calculating the NAE, we will use Eq. (19): { }| ∑ M ∑ N || | ˆ j=1 i=1 |O {F(i, k)} − O F( j, k) | NAE = ∑M ∑N j=1 i=1 |O{F(i, k)}|

(19)

Figures 10 and 11 show SSIM values of different segmentation techniques, such as mild, moderate, severe, and carcinoma at different threshold levels. Figure 12 shows MSE values of different segmentation techniques, such as mild, moderate, severe, and carcinoma at different threshold levels. Fig. 8 PSNR

Fig. 9 Precision

Classification of Cervical Cells Using Deep Learning Feature Extraction

39

Fig. 10 SSIM

Fig. 11 SSIM with Region

Fig. 12 MSE

This paper compares the proposed segmentation technique to edge-based and region-based segmentation. In edge-based segmentation, the shape, size, and number of nucleus cells are not identified. In region-based segmentation, the following problems occurred: over-segmentation, selection of initial centroid (k-means), and no guarantee of producing ideal results. The highest PSNR values for the proposed techniques are 45.14523, 47.48173, 61.38371, and 62.17337, respectively. In the proposed segmentation technique, a threshold of 0.1 can achieve a high PSNR (>40 dB). The proposed method achieves improved precision, recall, and SSIM. The lowest MSE values are 0.03142, 0.02541, 0.09139, and 0.03182.The low values for AD, NK, SC, LMSE, and NAE for the proposed method show that the quality of the image is high.

40

A. Mehbodniya et al.

5 Conclusion and Future Work To remove the nucleus from the cytoplasm while minimizing over-segmentation, the proposed method uses a threshold, morphological operation, and connected component labeling. Identifying and removing non-nucleus cells in an image is critical to the proposed approach. If the information about the nuclei is accurate, it can reveal whether or not the nucleus is normal. The improvement in segmentation accuracy can be measured quantitatively. The primary objective of the proposed method is to improve the quality of segmentation. THE Proposed algorithm has increased the PSNR, Precision, Recall, and SSIM values. The AD, LMSE, MSE, and NK values have been reduced. It shows that segmentation is done very precisely. In the future work, the accuracy of segmentation could be improved, making it easier to tell if cervix tissue is normal or not.

References 1. Bandyopadhyay H, Nasipuri M (2020) Segmentation of pap smear images for cervical cancer detection. In: IEEE Calcutta conference, pp 30–33 2. Nandanwar PD, Wadhai VM, Chanchlani AS, Thakare VM (2021) Analysis of pixel intensity variation by performing morphological operations for image segmentation on cervical cancer pap smear image. In: International conference on computational intelligence and computing applications, pp 1–6 3. Gautam S, Bhavsar A, Sao AK Harinarayan KK (2018) CNN based segmentation of nuclei in PAP-smear images with selective pre-processing. In: Medical imaging, digital pathology 4. Gençtav A, Aksoy S, Önder S (2012) Unsupervised segmentation and classification of cervical cell images. Pattern Recognit 45(12):4151–4168 5. Guo P et al (2016) Nuclei-based features for uterine cervical cancer histology image analysis with fusion-based classification. IEEE J Biomed Health Inform 20(6):1595–1607 6. Jaya S, Latha M (2020) Channel-based threshold segmentation of multi-class cervical cancer using mean and standard deviation on Pap smear images. In: International conference on electronics and sustainable communication systems, pp 721–726 7. Kaaviya S, Saranyadevi V, Nirmala M (2015) PAP smear image analysis for cervical cancer detection. In: IEEE International conference on engineering and technology, pp 1–4 8. Labeit A, Peinemann F, Kedir A (2013) Cervical cancer screening service utilization in UK. Sci Rep 3(1):2362 9. Navarro M, Razmilic D, Araos I, Rodrigo A, Andia M (2018) Rendimiento de la mamografiaespectral de energia dual con contrasteen la deteccion de cancer de mama: experienciaen un centro de referencia. Rev Med Chile 146:141–149 10. Rahaman MM et al (2020) A survey for cervical cytopathology image analysis using deep learning. IEEE Access 8:61687–61710 11. Rahmadwati, Naghdy G, Ros M, Todd C, Norahmawati E (2011) Cervical cancer classification using gabor filters. In: IEEE First international conference on healthcare informatics, imaging and systems biology, pp 48–52 12. Sangworasil M et al (2018) Automated screening of cervical cancer cell images. In: 11th Biomedical engineering international conference, pp 1–4 13. Sudhakar S, Pandian SC (2012) Secure packet encryption and key exchange system in mobile ad hoc network. J Comput Sci 8(6):908–912

Classification of Cervical Cells Using Deep Learning Feature Extraction

41

14. Sudhakar S, Pandian SC (2016) Hybrid cluster-based geographical routing protocol to mitigate malicious nodes in mobile ad hoc network. Int J Ad Hoc Ubiquitous Comput 21(4):224–236 15. Priyadarshni AU, Sudhakar S (2015) Cluster based certificate revocation by cluster head in mobile ad-hoc network. Int J Appl Eng Res 10(20):16014–16018 16. Sudhakar S, Pandian SC (2015) Investigation of attribute aided data aggregation over dynamic routing in wireless sensor. J Eng Sci Technol 10(11):1465–1476 17. Sudhakar S, Pandian SC (2013) Trustworthy position-based routing to mitigate against the malicious attacks to signifies secured data packet using geographic routing protocol in MANET. WSEAS Trans Commun 12(11):584–603 18. Sudhakar S, Pandian SC (2013) A trust and co-operative nodes with affects of malicious attacks and measure the performance degradation on geographic aided routing in mobile ad hoc network. Life Sci J 10(4s):158–163

Nature-Inspired Techniques for Terrain Features Extraction Sharad Bajaj, Harish Kundra, Sheetal Kundra, Nehalika Neha, and Suyash Agrawal

Abstract A key technique in remote sensing for automated pattern recognition and analysis of satellite data is satellite image classification, which enables the automatic comprehension of a lot of data. In this paper, we are analyzing various nature-inspired techniques such as hybrid CS and PSO algorithm, hybrid CS and ACO algorithm, hybrid FPAB/BBO algorithm, biogeography-based classification, and hybrid PSO and Firefly. These nature-inspired techniques have a huge influence that is crucial to remote sensing applications. These techniques are highly helpful for categorizing the characteristics of the terrain, and they produced outstanding findings that demonstrate higher effectiveness and higher kappa coefficient values. In this paper, we analyzed various nature-inspired techniques used for the satellite image classification. Keywords Natural computing · Image classification · Satellite image · Remote sensing · Particle swarm optimization · Kappa coefficient · Ant colony optimization · Cuckoo search · Firefly-based optimization

1 Introduction Without physically touching the thing, place, or phenomenon under inquiry, information can be acquired using the science and art of remote sensing. Reconnaissance, the construction of mapping products for both military and civilian uses, the assessment of environmental harm, growth control, monitoring of land use, soil assessment, urban planning, radiation monitoring, and agricultural yield appraisal are a few uses

S. Bajaj Amazon Web Services, Seattle, USA H. Kundra (B) · N. Neha · S. Agrawal Guru Nanak Institutions Technical Campus, Ibrahimpatnam, India e-mail: [email protected] S. Kundra Guru Nanak Institute of Technology, Ibrahimpatnam, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_4

43

44

S. Bajaj et al.

for remotely sensed imagery. One of the most crucial methods for identifying different natural terrain characteristics is the classification of remote sensing photographs.

1.1 Natural Computing Nature, which serves as a source of inspiration, is vital to the development of technology and the resolution of computationally challenging problems because it offers more compelling and appealing solutions. To include these, a new terminology has been developed. Techniques for computing that were influenced by nature are known as “Natural Computing”. Natural Computing refers to computational processes found in nature as well as artificially created computing that takes its cues from nature. Therefore, Natural Computing can be described as the computing process using artificially created, nature-inspired approaches that act as natural computation agents. Here, artificial systems are understood to be those that were created using human intellect and have the same functional capabilities as natural systems. It is divided into four main fields in order to help people better comprehend the nature and ideas behind Natural Computing, based on swarm intelligence techniques, Artificial Immune System techniques, Geosciences techniques, and Human Mind Modeling techniques. The goal of swarm intelligence (SI), a branch of artificial intelligence (AI), is to create intelligent multi-agent systems by modeling their behavior after that of social animals like flocks of birds or schools of fish, as well as of social insects like ants, termites, bees, and wasps. Examples of swarm intelligence-based algorithms include the Bat Algorithm, biogeography-based optimization [1], Ant Colony Optimization, Cuckoo Search, Particle Swarm Optimization [2], Artificial Bee Colony Optimization, Firefly algorithm, and more. Swarm intelligence [3] research aims to improve a number of soft computing and naturally inspired approaches. The idea of remote sensing is used to collect data from satellites in order to derive natural landscape features. Remote sensing is a technique for gathering, processing, and analyzing geospatial data and related satellite images without actually being in the target area. The proposed work studies different Natural Computing techniques and develops their algorithms for categorizing landscape aspects. The data images are categorized using hybrid algorithms based on swarm intelligence. Alwar, a place with many features and a well-known reputation, is located in Rajasthan, India. This area has the following feature classes: barren, rocky, vegetation, urban, and water. Indian satellite LISS-III and Canadian satellite LISS-III were used to capture the multi-spectral photos of the Alwar region.

Nature-Inspired Techniques for Terrain Features Extraction

45

1.2 Image Classification The main component of the approach taken to address the land cover mapping issue is image classification. As a result, the classification of satellite images has emerged as an important area of image processing study. The right set of picture features is required for every form of classification.

2 Proposed Method A comparative study between the hybrid CS and PSO algorithm [4], the hybrid CS and ACO algorithm [5], the hybrid FPAB/BBO algorithm [6], the biogeographybased classification [7], and the hybrid PSO and Firefly [8] have all been examined in this study. Tables 1, 2, 3, 4, 5 show the error matrix of the respective algorithms. The number of pixels successfully classified by algorithm is indicated by the error matrix’s interpretation along columns. For instance, in Table 2, the suggested algorithm properly categorized 138 of the total 150 vegetation pixels in the first column as vegetation, whereas ten were incorrectly labeled as rocky and two as barren. There are no incorrectly identified water pixels. Table 1 Error matrix after applying biogeography-based satellite image classification Vegetation Vegetation Urban

Urban

Rocky

Water

Barren

Total

127

9

0

0

2

138

0

88

1

0

32

121 202

Rocky

6

2

176

1

17

Water

0

0

3

69

0

73

Barren

17

91

20

0

119

247

150

190

200

70

170

780

Total

Table 2 Error matrix after applying satellite image classification using a hybrid FPAB/BBO algorithm Vegetation Vegetation

Urban

Rocky

Water

Barren

Total

138

11

0

0

1

150

Urban

0

90

2

0

36

128

Rocky

10

0

181

0

28

219

Water

0

0

0

70

0

70

Barren

2

89

17

0

105

213

150

190

200

70

170

780

Total

46

S. Bajaj et al.

Table 3 After using the hybrid CS and ACO algorithm, the error matrix Vegetation Vegetation

Urban

Rocky

161

1

4

Water

Barren

Total

0

0

166 176

Urban

0

159

0

0

15

Rocky

0

0

94

2

0

94

Water

0

0

3

72

0

75

0

0

0

0

52

52

161

160

101

74

67

563

Barren Total

Table 4 Error matrix upon use of hybrid CS and PSO method Vegetation Vegetation

Urban

Rocky

161

1

0

Water 0

Barren 0

Total 162

Urban

0

149

0

0

5

154

Rocky

0

0

101

0

0

101

Water

0

0

0

74

0

74

Barren Total

0

10

0

0

62

72

161

160

101

74

67

563

Table 5 Hybrid PSO and Firefly algorithm for extraction of natural terrain features error matrix Vegetation

Urban

Rocky

Water

Barren

Total

161

1

0

0

0

162

Urban

0

158

0

1

10

169

Rocky

0

0

101

0

0

101

Vegetation

Water

0

0

0

73

0

73

Barren

0

1

0

0

57

58

161

160

101

74

67

563

Total

2.1 Dataset For our dataset, we have considered some locations that have good land cover characteristics, such as vegetation, water, urban areas, barren areas, and rocky areas. The image is 548 ∗ 474 pixels in size. The proposed image is subjected to the suggested algorithms, and a categorized image with various classifications is produced. In addition to the RGB (RED, GREEN, AND BLUE) band, green band, MIR (MIDDLE INFRA-RED), RS1 (RADAR SET1), RS2 (RADAR SET2), NIR (NEAR INFRA-RED), and DEM bands, there are other bands to be taken into consideration (DIGITAL ELEVATION MODEL). Figure 1 shows the different bands of the satellite picture used as input.

Nature-Inspired Techniques for Terrain Features Extraction

47

Fig. 1 Alwar region’s band satellite image

2.2 Kappa Coefficient The discrete multivariate technique used to evaluate the findings of the error matrix is known as the kappa coefficient. In comparison with overall accuracy metrics [9– 11], the kappa statistic provides a more useful statement evaluation of accuracy by incorporating and observing both the off diagonal of the rows and columns and the diagonal observations. For the image of Alwar after formula is applied to the below error matrix, it will get the kappa coefficient: N kˆ =

Σr i=1 x ii − i=1 (x i+ ..x +i ) Σ r 2 N − i=1 (xi+ ..x+i )

Σr

r = Total no. of rows in the error matrix x ii = The no of observations in ith row and ith column (which is on the major diagonal) x i+ = Total of observations in row i x +i = Total no. of observations in column i N = Total no. of observations included in matrix.

2.3 Overall Accuracy The accuracy assessment is carried out in the course of the picture classification procedure to ascertain the effectiveness of our suggested method. This accuracy statement seeks to quantify the effectiveness with which the pixels in the investigated region were assigned to the appropriate feature classes. The widely used error matrix

48

S. Bajaj et al.

in remote sensing [12] is used to assess the categorization accuracy. The link between the identified reference data and the related outcomes of an automated classification is compared using error matrices, category by category. A categorized image cannot reasonably have every pixel tested. In order to conduct the experiment, a set of arbitrarily chosen reference pixels is used. Referenced pixels are areas of the categorized image where their actual features can be identified. The overall total accuracy is calculated as ratio of accurate observations to all classifications. This metric of accuracy can be determined as follows: O = Total number of valid classifications (sum of all values along the major diagonal) divided by the Total no. of classifications.

3 Result Analysis In this paper, we have done a comparative study on the five algorithms which are hybrid CS and PSO, hybrid CS and ACO, hybrid FPAB/BBO, biogeography-based categorization, and hybrid PSO and Firefly algorithms; their kappa coefficient is 0.9633, 0.9422, 0.6792, 0.6715, 0.97, respectively, and their overall accuracy is 97.67, 95.55, 74.87, 74.23, 97.15, respectively. Figures 2, 3, 4, 5, 6 show the algorithms applied on the 7-band of Alwar image and their respective classified image obtained. The black, green, yellow, red, and blue color represents barren, vegetation, rocky, urban, and water, respectively. In Fig. 7, comparison is done between the kappa coefficient of the algorithms. In Fig. 8, comparison is done between the overall accuracy of these algorithms. From comparative study, it is clear that hybrid PSO and Firefly algorithm has the highest kappa coefficient, i.e., 0.97, whereas hybrid CS and PSO has the highest overall accuracy of 97.69.

Fig. 2 Using biogeography technique, the original Alwar image (on the left) and the classified image (on the right) are compared

Nature-Inspired Techniques for Terrain Features Extraction

49

Fig. 3 Using hybrid FPAB/BBO algorithm, the original Alwar image (on the left) and the classified image (on the right) are compared

Fig. 4 Using a hybrid CS and ACO technique, the original Alwar image (on the left) and the classified image (on the right) are compared

Fig. 5 Using a hybrid CS and PSO technique, compare the original Alwar image (on the left) and the classified image (on the right)

50

S. Bajaj et al.

Fig. 6 Comparing the original image on the (left) and classified image on the (right) using hybrid PSO and Firefly for natural terrain feature extraction

Kappa Coefficient 1.2 1 0.8 0.6 0.4 0.2 0 Hybrid FAAB/BBO

Biogeoraphy based algorithm

Hybrid CS /ACO

Hybrid CS / PSO

Hybrid PSO/Firefly

Hybrid CS / PSO

Hybrid PSO/Firefly

Fig. 7 Kappa coefficient comparison of various algorithms

Overall Accuracy 120 100 80 60 40 20 0 Hybrid FPAB/BBO

Biogeoraphy based algorithm

Hybrid CS /ACO

Fig. 8 Overall accuracy comparison of various algorithms

Nature-Inspired Techniques for Terrain Features Extraction

51

4 Conclusion and Future Scope This paper examined variables to contrast the findings from examining several natureinspired methodologies, including hybrid CS and PSO, hybrid CS and ACO, hybrid FPAB/BBO, biogeography-based categorization, and hybrid PSO and Firefly algorithms. As a gauge of elicited knowledge, the kappa (KHAT) coefficient has been employed. Additionally, it provides a method for us to assess the knowledge content using several supervised and mixed categorization paradigms. Overall, accuracy is calculated to compare the effectiveness of the algorithms. The research’s future objectives include suggesting changes to the algorithm that would enhance the kappa coefficient even further and also get better overall accuracy.

References 1. Simon D (2008) Biogeography-based optimization. IEEE Trans Evol Comput 12(6):702–713 2. Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett 28(4):459–471 3. Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: 1998 IEEE International conference on evolutionary computation proceedings. IEEE World congress on computational intelligence (Cat. No.98TH8360), pp 69–73 4. Kundra H, Sadawarti H (2015) Hybrid algorithm of CS and PSO for natural terrain feature. Res J Inform Technol 5. Kundra H, Sadawarti H (2013) Hybrid algorithm of CS and ACO for image classification of natural terrain features. Int J Adv Comput Sci Commun Eng (IJACSCE) 1(1) 6. Johal NK, Singh S, Kundra H (2010) A hybrid FPAB/BBO algorithm for satellite image classification. Int J Comput Appl 6(5):31–36 7. Panchal VK, Singh P, Kaur N, Kundra H (2009) Biogeography based satellite image classification. Int J Comput Sci Inf Secur 6(2):269–274 8. Deepam S, Kundra H (2016) Hybrid algorithm of particle swarm optimization and firefly for natural terrain feature extraction. Int J Comput Sci Inf Secur (IJCSIS) 14(12):752 9. Congalton RG (1991) A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens Environ 37(1):35–46 10. Story M, Congalton RG (1986) Accuracy assessment: a user’s perspective. Photogram Eng Remote Sens 52(3):397–399 11. Verbyla DL, Hammond TO (1995) Conservative bias in classification accuracy assessment due to pixel-by-pixel comparison of classified images with reference grids. Int J Remote Sens 16:581–587 12. Lillesand T, Kiefer RW, Chipman J (1979) Remote sensing and image interpretation, 5th edn. Wiley, pp 586–592

An Investigation on the Detection of Intrusions into a Network Using Convolutional Neural Networks N. D. Patel and Ajeet Singh

Abstract As attacks on the network environment are rapidly becoming more sophisticated and intelligent in the recent years, the limitations of the existing signaturebased intrusion detection system are becoming more evident. In order to solve this problem, machine learning research on constructing automated network intrusion detection system (NIDS) is being actively conducted. NIDS model research using the convolution neural network (CNN) algorithm is conducted in this study. For image classification-based CNN algorithm learning, discretization of a continuous algorithm is added to the preprocessing step. The linear relationship between the predictor variables is converted into easy-to-read data, and then a square matrix is added. The pixel image structure matched to the matrix structure is learned on the model. Network packet data (UNSW-NB15) is used to consider the execution of the model, and accuracy, precision, and recall were used as statistical performance indicators. It is observed that the test results outperform for detecting specific attack categories. The proposed framework is giving 97.05% accuracy corresponding to DoS Exploits, fuzzers, generic, and reconnaissance attack classes. Keywords Intrusion detection system · Advanced persistent threat · UNSWNB15 · CNN · Machine learning · Continuous variable discretizations

1 Introduction Recently, with the advent of the internet of things (IoT) and various wearable gadgets, Internet technology contributes to more timely information and business performance. However, as the Internet is being exploited in multiple areas, the number of attack surfaces exposed to attacks is increasing and attempts to break into networks N. D. Patel (B) Institute for Development and Research in Banking Technology (IDRBT), Hyderabad, TS (500057), India e-mail: [email protected] A. Singh (B) SML2029 Research and Consulting Pvt Ltd, Banjara Hills, Hyderabad, TS (500034), India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_5

53

54

N. D. Patel and A. Singh

to obtain undue profits such as personal information acquisition, counterfeiting, and cyber-terrorism are also growing [1]. As time goes by, the attacking technique tends to become more intelligent, and many researchers are studying various countermeasures to respond to this. Firewalls and intrusion detection systems are representative countermeasures for general methods for responding to intrusions that are becoming more intelligent for these Internet services. Firewalls serve to block suspected intrusions but cannot block all intrusions. Their role in intrusion detection systems is increasing for many requirements and monitors network activity by identifying attacks and alerting network administrators. An intrusion detection system (IDS) that filters various IDS approaches have emerged using artificial intelligence techniques such as data mining (DM), machine learning (ML), deep learning (DL), statistical analysis, genetic algorithms (GA), artificial neural networks (ANNs), fuzzy logic, linear regression (LR), and swarm intelligence [2]. In this paper, we used the UNSW-NB15 dataset. The class imbalance of normal and abnormal data was adjusted for data preprocessing and proposes the CNN-based network IDS. In this paper, convolution neural network (CNN)-based IDS research is conducted for IDS that detects normal and malicious network traffic, and model learning and evaluation are conducted using UNSW-NB15 dataset, which is a representative network traffic. The rest of the paper is organized as follows: Sect. 2 examines the existing studies related to ML-based intrusion detection, and Sect. 3 proposes CNN-based NIDS. In Sect. 4, we derive the experiment analysis and obtained results. Finally, in Sect. 5, conclusions and future research directions are described.

2 Related Research Janarthanan and Zargari [3] proposed a high-dimensional feature analysis on NSLKDD dataset that can perform research in anomaly detection and an improved UNSW-NB15 dataset. Khammassi and Krichen [4] used the wrapper method to select features, LR as a classifier, and GA algorithm as a function search strategy. Moustafa and Slay [5] proposed a model that can reduce the center point and false alarm rate of feature values using the association rule mining algorithm. Patel et al. proposed a novel attribute selection techniques by using the NSL-KDD dataset for IDS [6] and got the highest accuracy for DoS types attack. Kamarudin et al. [7] proposed an anomaly based IDS using an ensemble classification method by removing irrelevant or overlapping attributes in the feature selection procedure using a logitboost-based, LR algorithm. Mwitondi and Zargari [8] proposed a data flow application method for search-based intrusion detection using clustering and cross-validation. Belouch et al. [9] presented a two-step classifier based on the RepTree algorithm and protocol subset. Guha [10] collected network data of the cyber system based on the multimodal artificial neural network (MANN) algorithm and proposed an unsupervised learning model using CNN, elbow method, and k-means clustering algorithm. Moustafa et al. [11] improved the performance of the decision engine by sniffing and collecting network data based on the Dirichlet mixed model, and analyzing and filtering the

An Investigation on the Detection of Intrusions into a Network Using …

55

data. CNN is a model used in fields such as image classification and face recognition. It was created from the architecture of the human brain that processes the visual senses. The whole image is vectorized, then partially sampled through a filter and a feature map is created. The weights are calculated through the feature map and transmitted to the output layer for learning [12]. In the study of Azizjon et al. in order to solve the problem of unbalanced dataset, the problem of UNSW-NB15 dataset with unbalanced problem was solved by random oversampling method, and 1D using CNN, the highest accuracy of 91.2% was obtained [13]. The related works that were discussed are more prone to overfitting the model compared to deterministic nonlinear algorithms. Most of the currently available methods in literature get stuck while finding optimal local minimum solution. Results are promising, but exact approaches are only able to handle a few hundred or thousand variables at most (so, high-dimensional data cannot be handled by exact approaches).

3 Proposed CNN-Based NIDS In this study, in order to enhance the classification performance of learning models on the UNSW-NB15 dataset, we propose an architecture as shown in Fig. 1. First, the data characteristic type of UNSW-NB15 is identified through Sect. 3, and then the preprocessing with the continuity variable discretization algorithm [14] is added to the general preprocessing. Through this, it is processed into data having a linear relationship to the relational analysis of the predictor variables, and converted into an image pixel form using a square matrix to convert it into an input form of the CNN algorithm and used in the experiment. The core strength of chosen CNN is to provide an efficient dense neural network which performs prediction more optimally. The core computational power of CNN is to identify uniquely distinct variables all by itself without any external aid. We demonstrate the performance of this proposed model using accuracy, precision, and recall, which are used with ML and DL models. UNSW-NB15 Characterization The UNSW-NB15 dataset, network packets were collected using IXIA PerfectStorm in Cyber Range Lab1). And it is a public dataset that classifies normal/abnormal data using Argus and Bro- IDS tools. The UNSW-NB15 dataset used in this paper consists of 257,673 counts, which are stored and distributed in each file, and the number of normal and abnormal data as follows: normal (93,000), reconnaissance (13,987), backdoor (2329), DoS (16,353), exploits (44,525), analysis (2677), fuzzers (24,246), worms (174), shellcode (1511), and generic (58,871) classified into 1 normal and 9 attack types. UNSW-NB15 dataset has a total of 45 features. Data Preprocessing Continuous data such as continuous and discrete is accompanied by the problem of degrees of freedom (DOF), which complicates the training of the model by creating nonlinear correlations to the predictors. For example, the network packet size is the

56

N. D. Patel and A. Singh UNSW-NB15 / IoTID20 (IDS Dataset)

Pre-Processing : Normalization Binary

Categorical

Discrete

Continuous

On-Hot Vector

KBinDiscretizer

Min - Max Scale

Combine Feature Network Traffic to Image Conversion

Chanel 1, Grayscale Image

Chanel 2, Color Image

Filter 32 Kernel size 2

Filter 32 Kernel size 2 CNN Learning Model

Pool Size 2 Pool Size 2

Filter 128 Kernel size 2

Filter 128 Kernel size 2 Flatten

Flatten

Softmax

Softmax Output

Normal Analysis

Backdoor

Exploits DoS

Generic Fuzzers

Shellcode

Reconnaissance

Worms

Performance Evaluation Matrix Accuracy

Precision

Fig. 1 Proposed CNN-based NIDS architecture

Recall

F1-Score

An Investigation on the Detection of Intrusions into a Network Using …

57

biggest characteristic that discriminate DoS from network packets, because both packet sizes of 10,000,000 and 100,000,000 convey the same information that the packet size is large, Packet_size = 1 if size > 10,000,000. When expressed in a discretized way like else 0, it is illustrated in a linear connection with the predict attribute, so it can be transformed into data that is easy to interpret. Therefore, in this study, we propose a preprocessing procedure that consists a discretization algorithm for continuous and discrete properties to the ML preprocessing. The preprocessing process explanation of each method is as follows: 1. Data is classified into binary, categorical, discrete, and continuous, which are the characteristics of the UNSW-NB15 dataset. 2. All nominal data are encoded into categorical type and then changed into one-hot vector expressions [15]. 3. Continuous data is normalized to a range of 0 to 255 through the “Min–max Scaler” [16]. 4. Discrete data properties complete preprocessing through the KBinDiscretixer algorithm [14]. 5. Binary type data is comprised of 0/1, and it is utilized without any certain preprocessing. Network Traffic to Image The data that has completed the preprocessing process in Sect. 3 that can be expressed as image color channels, and it is converted like a pixel image to be processed by a convolution operation. The input for training and validation was generated as two types of images corresponding to a square matrix. First one is a RGB color channels image, and the three colors are overlapped and matched with an M × N × 3 pixel arrangement. The another one is grayscale image and it is corresponded with an M × N × 1 pixel_array. Finally, the training and evaluation dataset is ready when the pre-processing is completed. CNN Learning Model CNNs are modified neural networks (NNs) that use convolution and aim to learn feature representations of data and have the following differences over DNNs, which are the most basic NNs in DL. By generating different feature maps at each layer, CNN can construct many convolution kernels. Due to the fact that each region of an adjacent neuron in a layer is connected to a neuron in the next layer’s feature map, the Kernel can be shared across all information areas. In the CNN, there is a convolutional layer and a polling layer, and a fully connected layer is used to calculate outputs such as classifications and distances between properties depending on the learning purpose. Pooling layers reduce computation costs and enhance the receptive field by reducing the number of parameters that are coupled between convolutional layers. The design of this study uses CNN with these characteristics as a performance analysis model for IDS data turned into images as it shows high performance in image and signal processing fields. Figure 1 represents the proposed CNN-based architecture, with the architecture and parameters of each layer specified for each

58

N. D. Patel and A. Singh

layer. A grayscale or color image is viewed as the input of the model, and the output layer’s softmax function observes both the normal label and attack label.

4 Experiment Analysis and Obtained Results Experiment Setup Details The hardware test environment was tested on a PC with Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz, 32 GB-RAM, and windows 10 pro operating system installed. Experimental simulations were performed using T ensorflow, scikit learn, seaborn, Pandas, NumPy, and Keras, which are the most used ML frameworks [17]. Python was used as a programming language. Learning Parameters (Hyper Parameter) The learning parameters (Hyper Parameter) are given Epoch = 100, Batch = 256, Optimixer = Stochasticgradientdescent(SGD), LearningRate = 2e − 4,Loss = SparseCategoricalCrossentropy for constructing the learning model. Performance Evaluation The accuracy, precision, and recall are used as statistical performance metrics in ML and DL classification models and the formulas are presented in Table 1. Experimental Results The experimental results are presented in Table 2, the accuracy, precision, and recall of the overall performance are shown. In terms of intrusion detection, this corresponds to a false positive, where normal network traffic is classified as attack traffic. Despite the lower priority of false positives compared to misclassification of an attack as normal, the performance of false negatives in intrusion detection is crucial. Based on an intrusion detection viewpoint, recall performance for individual classes, the percentage of correct answers predicted by the model, and false negatives. The comparison of proposed method with a significant existing work is depicted in Table 3. Table 1 Statistical performance metrics

Statistical Performance Metrics Accuracy = TP+TN TP+TN+FP+FN

Precision =

TP TP+FP

Recall =

TP TP+FN

0.9090

Recall

0.8597

0

495

10

649

Worms

0

2

Total

0

Shellcode

0

0

1

Normal

Reconnaissance

0

9

Generic

5

2

6

5

Exploits

4

Fuzzers

0

DoS

478

4

610

8

Analysis

Backdoor

Backdoor

Analysis

0.9657

4077

0

0

90

0

68

6

12

3889

6

6

DoS

0.9792

11,220

4

50

35

0

212

60

10,777

66

2

14

Exploits

Table 2 Validation confusion matrix for UNSW-NB15 dataset

0.9753

5094

2

19

17

0

30

4903

14

23

51

35

Fuzzers

0.9776

14,475

1

5

2

0

14,445

6

12

2

0

2

Generic

1

23,278

0

0

0

23,278

0

0

0

0

0

0

Normal

0.9575

3567

0

25

3358

0

6

26

133

13

6

0

Reconnaissance

0.7493

399

0

296

2

0

4

19

43

30

5

0

Shellcode

0.6136

31

27

0

0

0

1

0

3

0

0

0

Worms





0.6136

0.7493

0.9575

1

0.9776

0.9753

0.9792

0.9657

0.8597

0.9090

Accuracy





0.8709

0.7418

0.9414

1

0.9979

0.9625

0.9605

0.9538

0.9656

0.9399

Precision

An Investigation on the Detection of Intrusions into a Network Using … 59

60 Table 3 Comparative analysis

N. D. Patel and A. Singh UNSW-NB15 IDS dataset comparison with existing methods Accuracy (%) Precision (%) Recall (%) Proposed Method

≈97.05

Azizjon et al. [13] 90.91

≈95.56

≈96.25

85.54

96.07

5 Conclusions and Future Research In this study, a CNN-based NIDS model was designed to detect normal and malicious network traffic. The designed model with a continuous variable discretization algorithm improves the classification performance for predictors. The proposed method was used for various properties of UNSW-NB15 analyzed through previous studies, the interpretability of the model for predictors was increased through the discretization algorithm. In addition, in order to learn the CNN algorithm, network traffic was mapped to a square matrix and converted into an image pixel structure. The UNSWNB15 dataset was used for training and validation. In order to evaluate the learned model, we used the statistical preliminaries (accuracy, precision, and recall), which gives outperforming results for detecting certain attack categories.

References 1. Hajiheidari S, Wakil K, Badri M, Navimipour NJ (2019) Intrusion detection systems in the internet of things: a comprehensive investigation. Comput Netw 160:165–191 2. Soniya SS, Vigila SMC (2016) Intrusion detection system: classification and techniques. In: 2016 International conference on circuit, power and computing technologies (ICCPCT). IEEE, pp 1–7 3. Janarthanan T, Zargari S (2017) Feature selection in UNSW-NB15 and KDDCUP’99 datasets. In: 2017 IEEE 26th International symposium on industrial electronics (ISIE). IEEE, pp 1881– 1886 4. Khammassi C, Krichen S (2017) A GA-LR wrapper approach for feature selection in network intrusion detection. Comput Secur 70:255–277 5. Moustafa N, Slay J (2017) A hybrid feature selection for network intrusion detection systems: central points. arXiv preprint arXiv:1707.05505 6. Patel N, Mehtre B, Wankar R (2021) Novel attribute selection technique for an efficient intrusion detection system. Int J Inf Priv Secur Integrity 5(2):154–172 7. Kamarudin MH, Maple C, Watson T, Safa NS (2017) A logitboost-based algorithm for detecting known and unknown web attacks. IEEE Access 5:26190–26200 8. Mwitondi K, Zargari S (2017) A repeated sampling and clustering method for intrusion detection. In: International conference in data mining (DMIN’17), CSREA Press, pp 91–96 9. Belouch M, El Hadaj S, Idhammad M (2017) A two-stage classifier approach using reptree algorithm for network intrusion detection. Int J Adv Comput Sci Appl 8(6):389–394 10. Guha S (2016) Attack detection for cyber systems and probabilistic state estimation in partially observable cyber environments. Arizona State University 11. Moustafa N, Slay J, Creech G (2017) Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks. IEEE Trans Big Data 5(4):481–494

An Investigation on the Detection of Intrusions into a Network Using …

61

12. Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 29(9):2352–2449 13. Azizjon M, Jumabek A, Kim W (2020) 1D CNN based network intrusion detection with normalization on imbalanced data. In: 2020 International conference on artificial intelligence in information and communication (ICAIIC). IEEE, pp 218–224 14. Liu H, Setiono R (1997) Feature selection via discretization. IEEE Trans Knowl Data Eng 9(4):642–645 15. Uysal AK, Murphey YL (2017) Sentiment classification: feature selection based approaches versus deep learning. In: 2017 IEEE International conference on computer and information technology (CIT). IEEE, pp 23–30 16. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238 17. Gulli A, Pal S (2017) Deep learning with Keras. Packt Publishing Ltd

Hybrid Binary Whale Optimization Algorithm for Feature Selection Optimization Problem V. Ramya, E. Vinay Kumar, G. S. Gopika, and G. Manoj

Abstract In order to address feature selection (FS) optimization problems, a binary variant of hybrid particle swarm optimization (PSO) and whale optimization algorithm (WOA) is suggested in this article. The original PSOWOA is a hybrid algorithm that takes advantage of both PSO and WOA abilities. Although the higher performance, the initial hybrid solution is ideal for ongoing search space challenges. However, the selection of the feature is a binary task. Thus, a binary variant of the hybrid PSOWOA named HBPSOWOA is suggested to obtain a more significant reduction in dimension under the presumption of the feature selection accuracy. The K-nearest neighbors (KNN) is used to find the best possible solution. A benchmark database from the UCI machine learning repository is used to evaluate the HBPSOWOA algorithm. The findings show that when using a variety of performance metrics, best features, run time, and accuracy, HBPSOWOA significantly exceeded grey wolf optimizer (GWO), PSO, sine cosine algorithm (SCA), and WOA. Keywords Classification · Feature selection · Hybrid binary particle swarm optimization whale optimization algorithm (HBPSOWOA) · Particle swarm optimization (PSO) · Whale optimization algorithm (WOA)

V. Ramya (B) Excel Engineering College, Komarapalayam, Tamilnadu, India e-mail: [email protected] E. V. Kumar · G. Manoj Guru Nanak Institute of Technology, Telangana, & Research Scholar, GITAM (Deemed to Be University, Andhra Pradesh), Visakhapatnam, Andhra Pradesh, India e-mail: [email protected] G. Manoj e-mail: [email protected] G. S. Gopika Sathyabama Institute of Science & Technology, Chennai, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_6

63

64

V. Ramya et al.

1 Introduction Data mining (DM) is seen as the most rapidly growing IT sector due to the extensive daily data captured and the need to turn the data into meaningful data [1]. The DM includes multiple preprocessing processes (processing, filtering, integration, decreasing, etc.), presentation of knowledge, and pattern evaluation [2]. One of the critical preprocessing stages is the FS to eliminate irrelevant or inappropriate features of a particular dataset. In general, FS strategies are divided into two approaches, such as filter or wrapper strategies [3, 4]. The first class consists of methods unrelated to classifiers and operates on a date directly. These methods are usually correlated with variables. Wrapper methods of selection include classifiers and find associations between variables, on the other hand. As the analysis suggests, the wrapper-based methods of classification techniques are more robust than filter-based methods [5, 6]. There are three key essentials must usually be defined when using a wrapperbased method: classifier (for instance, KNN and support vector machine (SVM)), feature subset evaluation measures, and the search process for the subset with the best features. It is difficult and computationally costly to find the optimum set of features. The latest metaheuristics of various optimization problems (for instance, feature selection, engineering design, data mining, and machine learning) tend to be efficient and reliable techniques [7, 8]. Two conflicting concepts need to be observed while using or modeling metaheuristic to discover the search space and use the optimum solution identified [9, 10]. The excellent balance between exploitation and exploration would increase the efficiency of the search algorithm. To reach a successful balance, one alternative is to use a hybrid strategy where two or several methods are combined in order to boost the efficiency of each method, and the resulting hybrid method is called a memetic method. Therefore, a binary variant of the HBPSOWOA is proposed, and it is used for continued space search to improve the selection and identification of features. In view of the WOA shortcomings, this paper presents an effective algorithm called HBPSOWOA. In order to balance search capacity using a nonlinear convergence variable, the proposed algorithm implements a technique for PSO to guarantee the diversity of the population and lower the risk of slipping into a local optimum [11– 13]. This allows greater accuracy with a minimum feature subset to be selected. The test results reveal that the proposed method has the highest accuracy, efficiency, and a greater dimension reduction ability compared to GWO, PSO, SCA, and WOA. In this article, a binary variant of the HBPSOWOA is proposed, and it is used as a selection tool for wrappers. The significant contribution would be to use appropriate operators to overcome binary challenges with PSOWOA.

Hybrid Binary Whale Optimization Algorithm for Feature Selection …

65

2 Hybrid Particle Swarm and Whale Optimization Algorithm (HPSOWOA) 2.1 Whale Optimization Algorithm (WOA) The whale optimization algorithm (WOA) constructed based on the Swarm Intelligence Algorithm. The fascinating action of the humpback whale is known to be the unique hunting process. This strategy for hunting is called bubble-net feeding. There are three stages of the hunting of the humpback whale, such as reducing the range of hunting, spiral path for hunting, and hunting random searches. The latest best candidate’s response is set in the traditional WOA near the target prey. The latter whales are going to change their approach toward their best. Numerically, the WOA imitates social actions as follows. | | → i = || B→ · Y→ (t) − Y→ (t)|| (1) D →i Y→ (t + 1) = Y→' (t) − A→ · D

(2)

where the current iteration is denoted as t, Y→ (t) denotes the best position vector, Y→ (t) denotes the current position vector, and and A→ denotes the vector coefficients vectors and is given as follows. A→ = 2 · a→ · r→1 − a→

(3)

B→ = 2 · r→2

(4)

( a→ = 2 1 −

t Max_t

) (5)

where r→1 and r→2 are the random vectors that vary between [0, 1], the maximum number of iterations is denoted as Max_t, and the value of a→ reduces linearly from 2 to 0. The algorithm has two stages, such as exploration and exploitation. The exploitation stage is separated into two phases, such as the shrinking encircling technique and spiral updating, in which the position of the agent is updated using Eq. (6). Y→ (t + 1) =

{

→i , p < 0.5 Y→' (t) − A→ · D → i · ebl · cos 2πl, p ≥ 0.5 Y→' (t) − D

(6)

where l denotes a random value that varies between [−1, 1], the value of b is fixed, and p is a random number that is uniformly distributed. In contrast with the above hunting strategies, whales are also can check randomly, which maintains population diversity. In the exploration stage, the value of A is random in the range of [−1,1]

66

V. Ramya et al.

to allow the population to move far from the current location, and it is modeled as follows: | | → i = || B→ · Y→rand (t) − Y→ (t)|| (7) D →i Y→ (t + 1) = Y→rand (t) − A→ · D

(8)

2.2 Particle Swarm Optimization (PSO) PSO was introduced to solve many engineering optimization problems. It imitates the intelligence of bird swarms in nature as a swarm intelligence method. Each particle in PSO is defined by a velocity and position vector. Each particle takes individual intelligence and is looking for the best solution until now. The particles see how best the particles have been found until now. The vectors of velocity and position are updated with the following formula. t t Vit+1 = w · Vit + c1 r1 · pbest − Yit + c2 r2 · gbest − Yit i i

Yit+1 = Yit + Vit+1

(9) (10)

2.3 Hybrid PSOGWO The hybrid PSOWOA methodology was suggested for other engineering optimization problems. The core principle of PSOWOA is to improve the method’s ability to use PSO to explore WOA to achieve all optimizing strengths. In the HPSOWOA, the search agent’s location is modified; instead of using regular mathematical expressions, inertia is monitored continuously to explore and exploit the humpback whales. The individual population checks for the global best path in the updating process of the WOA. The WOA is being utilized to update the search agent and guide the population in the search space for the best solution in every iteration. Each population is considered to be a particle in PSO. The technique of PSO improves the current population and prevents local optimum traps. Eventually, compute each individual’s fitness. If there is no best solution, no population update. Else, the population is updated as follows. ) ) ( ( t Vit+1 = w · Vit + c1r1 · Y→' (t) − Yit + c2 r2 · Y→' (t) − Yi,g

(11)

Hybrid Binary Whale Optimization Algorithm for Feature Selection …

Yit+1 = Yit + Vit+1

67

(12)

where w is weight factor, c1 and c2 are inertia constants, r 1 and r 2 are random numbers t denotes best individual solution. between [0,1], and Yi,g

2.4 Proposed Binary Variant of PSOWOA (HBPSOWOA) Feature selection is by default a binary task; therefore, the method explained in Sect. 2 thus could be used without adjustments to obtain the solution. A binary variant of the PSOGWO must be built to resolve the functional selection issue. In the initial PSOGWO, the populations can continually move around the solution space, as it has vector with a real continuous domain. The location update for agents operating in a binary space can be updated using the following equation. { t+1 Ydim =

1, if sigmoid(Y (t + 1)) ≥ rand 0, otherwise

(13)

where Ydt+1 denotes binary position update at iteration t in dim dimension, the random number is denoted as rand that varies uniformly between [0,1], and the expression for sigmoid is given as follows [30]: sigmoid() =

1 1 + e−10(x−0.5)

(14)

Equation (6) is updated as follows: { Yidim

=

) ( 1, if Yidim + bstepdim ≥ 1 0, otherwise

(15)

where bstepdim is a binary step in dim, and it can be mathematically represented as follows: ) ( { 1 if cstepdim ≥ rand dim bstep = (16) 0, otherwise where cstepdim is continuous value in dim, and it can be mathematically represented as follows: cstepdim =

1 1+e

→ dim −0.5) −10( A→dim · D i

(17)

68

V. Ramya et al.

Fig. 1 Flowchart of the proposed HBPSOWOA

Considering the optimal solutions updated in Eq. (15) in HPSOWOA, the exploitation and exploration are managed by a mathematically based inertia constant [31]. | | → i = ||B · Y→' (t) − w · Y→ (t)|| (18) D Accordingly, the vectors of velocity and position are updated as follows: ) ) ( ( t Vit+1 = w · Vit + c1r1 · Yidim − Yit + c2 r2 · Yidim − Yi,g

(19)

Yit+1 = Yit + Vit+1

(20)

The pseudocode of the proposed HBPSOWOA is presented in the Algorithm. The flowchart is shown in Fig. 1. Algorithm: Pseudocode of HBPSOWOA Initialize w, A, a, and B Initialize the search agent of N p whale positions randomly between [1,0] Find the solution for the given fitness function Find the search agent fitness by Eq. (18) While (t < Max_t) For every search agent The particle velocity is updated using Eq. (19) Position of search agent is updated into a binary value using Eq. (20) End for Update w, a, A, and B (continued)

Hybrid Binary Whale Optimization Algorithm for Feature Selection …

69

(continued) Compute each particle by the given fitness function Update the population position and t = t + 1 End while

The solution is described in a one-dimensional matrix in this analysis. The vector length corresponds to the number of features. 0 and 1 in the feature selection problem has the following meaning. • 0 indicates that the feature is not selected. • 1 indicates that the feature is selected. The problem of feature selection is by default two objective problems. The target is to locate the least features, and another is to optimize the accuracy of classification. Equation (21) is considered a fitness function to solve both objective functions taken into consideration of the KNN classifier. | | |R| (21) F = α E R (D) + β || || C where the objective function is denoted as F, the length of a subset of the selected features are noted as R, the total number of features is C, E R (D) refers to the error rate of the classification, the symmetrical arguments to the length of the subset are denoted as α and β, and the classification efficiency is calculated as β = 1 − α and α ∈ [0,1]. The KNN classifier would be used in this paper to ensure that the target features seem to be the most significant.

3 Results and Discussions The BGWOPSO is checked against five datasets, as presented in Table 1 obtained from the UCI machine learning repository, in order to evaluate the proposed binary algorithm. A group of other algorithms, PSO, GWO, WOA, and SCA, are compared with the HBPSOWOA. The control parameter of various algorithms is listed in Table 2. The databases were chosen to represent different types of problems in various situations. The entities are split randomly into three separate subgroups in each database, specifically, training, testing, and validation. In the laboratory tests, KNN is used, and the best value of K is equal to five. In the meantime, through the learning process, each whale position generates one feature subset. The test dataset is used throughout the optimization procedure to review and validate the KNN classifier’s output in the validation subclass. Then, the HPSOWOA is applied to direct the feature selection process at the same time. In this study, each dataset is subdivided with crossvalidation. K-1 folds are used in K-overlap cross-approval for training and testing,

70

V. Ramya et al.

Table 1 Datasets utilized for the validation

Table 2 Control parameters of all algorithms

S. no Dataset 1

Colon

2

ORL

Number of samples Number of features 62

2000

1024

400

3

Yale

1024

165

4

Lung

3312

203

5

Lymphoma 4026

96

S.no

Parameters

Value

1

Search agents, N p

10

2

Maximum number of iterations, Max_t

100

3

Dimension, dim

Based on the features

4

Inertia constant, c1 and c2

0.5

5

Argument, α

0.99

6

Argument, β

0.01

7

Total number of runs

10

and remainder of the overlap can be used for validation. The solution suggested is repeated X times. The solution proposed in this paper is then evaluated K ∗ X times for each database. The training and testing statistics are also calculated. Table 3 lists the outcomes of the HBPSOWWOA for the feature selection (FS) problem after 10 individual runs of the algorithm. For the first four datasets, the average accuracy of the HBPSOWOA is higher than 94%, and the proposed HBPSOWOA also achieves the best fitness function value for all datasets except the ORL dataset. In addition, the selected features are also less for all datasets except Colon and Yale datasets. The computational time of the proposed HBPSOWOA is also less for all selected datasets. Table 3 Results obtained by the proposed Hybrid Binary Particle Swarm Optimization and Whale Optimization Algorithm S. no

Dataset

Average accuracy

Best fitness

1

Lymphoma

94.7368

0.05216

Selected features 22

Computation time (seconds) 7.12

2

Colon

100

1.521E-05

120

6.11

3

Lung

100

0.000129

43

8.14

4

ORL

96.25

0.03919

211

5.25

5

Yale

87.8788

0.1211

108

4.69

Hybrid Binary Whale Optimization Algorithm for Feature Selection …

71

The performance of the proposed Hybrid Binary Particle Swarm Optimization and Whale Optimization Algorithm is compared to Partial Swarm Optimizer, Grey Wolf Optimizer, Whale Optimization Algorithm, and Sine Cosine Algorithm. Table 4 displays that the average accuracy of the HBPSOWOA is better than all other algorithms for all datasets except ORL. The bold letter in all the tables indicates the best result. For a better visibility, the bar graph for classification accuracy is illustrated in Fig. 2. The average selected features obtained by proposed HBPSOWOA and other techniques are listed in Table 5. The proposed HBPSOWOA gives better outcomes than PSO, GWO, WOA, and SCA on all datasets except Colon and Yale. Examining the outputs in Tables 4 and 5, a significant difference is identified in the selected features and the accuracy. It is also observed that the proposed HBPSOWOA is performing better in choosing fewer features with good classification accuracy. For a better visibility, the bar graph for selected features is illustrated in Fig. 3. Tables 6, 7, and 8 list the statistical measures for all the datasets, such as best fitness, worst fitness, and standard deviation Table 4 Average classification accuracy of all algorithms Dataset

PSO

GWO

WOA

SCA

HPSOWOA

1

Lymphoma

89.4737

94.7368

94.7368

89.4737

94.7368

2

Colon

91.6667

83.3333

83.3333

100

100

S. no

3

Lung

100

100

97.5

100

100

4

ORL

91.25

98.75

91.25

93.75

96.25

5

Yale

60.6061

75.7576

78.7879

78.7879

87.8788

110

PSO SCA

Average Accuracy

100

GWO

WOA

HBPSOWOA

90

80

70

60

50

Lymphoma

Colon

Lung

Fig. 2 Average classification accuracy of all datasets

ORL

Yale

72

V. Ramya et al.

Table 5 Average selected features of all algorithms S. no

Dataset

PSO

GWO

1

Lymphoma

1811

337

WOA 64

SCA

HPSOWOA

27

22 120

2

Colon

826

208

177

35

3

Lung

1523

384

1536

87

43

4

ORL

459

215

534

223

211

5

Yale

470

172

388

30

108

(SD) obtained during multiple runs of all algorithms. In terms of all statistical data, it is confirmed that the proposed HBPSOWOA is performing better. In addition to statistical data, the convergence curve of all the selected algorithms is provided for all datasets to the readers for a better understanding. It is clear from the discussion that the proposed algorithm is superior in handling the feature selection problem. 2000

PSO

Average Selected Features

SCA

GWO

WOA

HBPSOWOA

1500

1000

500

0

Lymphoma

Colon

Lung

ORL

Yale

Fig. 3 Average selected features of all datasets

Table 6 Best fitness function of all algorithms S. no

Dataset

PSO

GWO

WOA

SCA

HPSOWOA

1

Lymphoma

0.1087

0.05294

0.05226

0.1043

0.05216

2

Colon

0.08663

0.166

0.165

0.000175

1.521e-05

3

Lung

0.00458

0.001159

0.02498

0.000131

0.000129

4

ORL

0.0911

0.01438

0.09184

0.06038

0.03919

5

Yale

0.3946

0.2417

0.2138

0.2103

0.1211

Hybrid Binary Whale Optimization Algorithm for Feature Selection …

73

Table 7 Worst fitness function of all algorithms S. no

Dataset

PSO

GWO

WOA

SCA

HPSOWOA

1

Lymphoma

0.1091

0.1612

0.1091

0.1611

0.1091

2

Colon

0.0873

0.3348

0.4999

0.2524

0.0872

3

Lung

0.02984

0.02987

0.02972

0.0297

0.0544

4

ORL

0.1164

0.05463

0.1039

0.09189

0.1288

5

Yale

0.4848

0.3649

0.365

0.3049

0.3648

SCA

HPSOWOA

Table 8 SD of fitness function of all algorithms S. no

Dataset

PSO

GWO

WOA

1

Lymphoma

6.23e-05

0.0359

0.02392

0.02072

0.02475

2

Colon

0.00017

0.0503

0.04453

0.0725

0.01823

3

Lung

0.002514

0.01215

0.00146

0.01019

0.01493

4

ORL

0.00786

0.011122

0.00122

0.00878

0.01857

5

Yale

0.01956

0.04096

0.0456

0.02254

0.05239

4 Conclusion and Future Scope This paper suggests HBPSOWOA, which utilizes a nonlinear convergence factor for scanning and presents the PSO approach in the update stage. The HBPSOWOA can balance the feature dimensions and accuracy, enhance the update process of the conventional method, and therefore determine the optimal feature subset by ensuring the classification accuracy. The validation utilizes five test benchmark datasets from the UCI repository to verify various facets of other competitive algorithms in order to access the efficiency of the proposed HBPSOWOA. The finding demonstrated that, compared to all the selected competitive algorithms, the proposed HBPSOWOA obtained better results. Furthermore, the results exposed that the HBPSOWOA produces less computation time and a minimum feature subset with high classification accuracy. The proposed HBPSOWOA can also be applied for various other optimization problems.

References 1. Neggaz N, Houssein EH, Hussain K (2020) An efficient henry gas solubility optimization for feature selection. Expert Syst Appl 152:113364 2. Yi JH, Deb S, Dong J, Alavi AH, Wang GG (2018) An improved NSGA-III algorithm with adaptive mutation operator for big data optimization problems. Future Gener Comput Syst 88:571–585 3. Sayed SAF, Nabil E, Badr A (2016) A binary clonal flower pollination algorithm for feature selection. Pattern Recogn Lett 7:21–27

74

V. Ramya et al.

4. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:491–502 5. Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: Proceedings of the science and information conference, London, UK, pp 372–378 6. Kumar KS, Suganthi N, Muppidi S, Kumar BS (2022) FSPBO-DQN: SeGAN based segmentation and fractional student psychology optimization enabled deep Q network for skin cancer detection in IoT applications. Artif Intell Med 139:102299. ISSN 0933–3657 7. Li F, Achyut S, Kumar BS (2021) Fog-internet of things-assisted multi-sensor intelligent monitoring model to analyse the physical health condition. Technol Health Care 29(6):1319–1337 8. Jia L, Kumar BS, Parthasarathy R (2022) Research and application of artificial intelligence based integrated teaching-learning modular approach in colleges and universities. J Interconnection Networks 22(Suppl 2):2143006. https://doi.org/10.1142/S0219265921430064 9. Kumar BS, Karthik S, Arunachalam VP (2018) Upkeeping secrecy in information extraction using ‘k’ division graph based postulates. Cluster Comput 22(Suppl 1):57–63. SpringerLink. ISSN 1386–7857. https://doi.org/10.1007/s10586-018-1705-2, pp 1–7 10. Ganeshan R, Muppidi S, Thirupurasundari DR, Kumar BS (2022) Autoregressive-elephant herding optimization based generative adversarial network for copy-move forgery detection with interval type-2 fuzzy clustering. Signal Process Image Commun 108:116756. ISSN 09235965, pp 1–10 11. Zhang Y, Qin G, Cheng L, Marimuthu K, Kumar BS (2021) Interactive smart educational system using AI for students in the higher education platform. J Mult Valued Logic Soft Comput 36:83–98 12. Kumar BS, Ranjitham PK, Karthekk KR, Gokila J (2016) Survey on various small file handling strategies on Hadoop. In: 2016 International conference on communication and electronics systems (ICCES). IEEE, pp 1–4 13. Prabu MK, Kumar BS, Karthik S (2015) Optimized scheduling for data anonymization in cloud using top down specialization. Int J Appl Eng Res IJAER 10(41):30546–30549

Transfer Learning for Arabic Question-Answering Abdullah M. Baqasah

Abstract Intent classification aims at determining the customers’ future actions based on the produced text or the used language. This process differs from other issues of topic classification that study matter and subjective text classification like sentiment and emotion classification dealing with the present state of affairs. This paper focuses on the problem of the Arabic question-answering by applying several machine learning (ML) methods. The latter are compared on an open-sourced academic dataset. However, small-scale human-labeled training data, which reduces the generalization capacity, represents the major weakness of Arabic QuestionAnswering dataset. It reduces generalization capability, particularly when analyzing rare words. In the last decades, a novel language representation model, called Arabic bidirectional encoder representations from transformers (AraBERT), has been introduced to make easier the pre-training of deep bidirectional representations on largescale unlabeled corpora. Subsequently, novel methods have been suggested and applied in various natural language processing tasks after simple fine-tuning. Experiments show that the support vector machines (SVMs) models outperform the other models, are more efficient, and can be integrated into a basic prototype responder. Keywords Intent classification · Support vector machine · AraBERT · Machine learning · Long short-term memory

1 Introduction Question-answering (QA) system provides adequate answers to questions written/expressed in natural languages (French, English, etc.). QA is, for instance, a critical natural language processing (NLP) problem and a long-standing artificial intelligence milestone. QA systems make it possible for the user to express a question in the form of natural language and get a direct and brief response. QA systems are A. M. Baqasah (B) Department of Information Technology, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_7

75

76

A. M. Baqasah

now found in search engines and phone conversational interfaces, and they are fairly good at answering simple snippets of information. In fact, in the case of closed domains that are not practical, language is characterized by a high dynamicity and people have an unlimited number of ways to pose a question. Afterward, the information retrieval-based QA system was behind the appearance of a new search experience. It analyzes factoid questions answered by simple facts the system questions as search queries to extract the most pertinent responses. For instance, given the following context:

We ask the question:

We expect the QA system responds with something like the following:

Deep learning (DL) architectures and algorithms have recently witnessed important progress in several domains like sentiment analysis, image recognition, and speech processing [1]. In fact, since 2017, transformer models have been shown to outperform existing approaches for this task. Currently, many pre-trained transformer models exist, including BERT [2], GPT-2 [3], and XLNet [4]. BERT, standing for bidirectional encoder representations from transformers, is an NLP algorithm released by Google and has demonstrated state-of-the-art results for eleven NLP tasks as it allows the system to learn from input text in a bidirectional way instead of analyzing sentences from left-to-right or right-to-left. GP-2 is a unidirectional transformer-based language model that generates a contextualized representation of each token by focusing on various parts of the input sentence. XLNet is a generalized autoregressive (AR) pre-training method that uses a permutation language modeling objective to combine the advantages of AR and autoencoding (AE) methods. Our idea in this paper is to use AraBERT, a pre-training BERT for Arabic language understanding developed by [5], to construct a system of open-domain factual Arabic question-answering. We fine-tune AraBERT for the task of QA and use it for inference. The main objective of the present research work consists in proving if good findings provided by Arabic QA could be obtained by applying recently emerged transformers, combined with word embeddings input representations. We conducted experiments on Arabic Reading Comprehension Dataset (ARCD), containing 1,395 questions, and a Stanford Question Answering Dataset (Ara-SQuAD) including

Transfer Learning for Arabic Question-Answering

77

48,344 questions. Simulation findings demonstrate that the introduced model provided better accuracy, compared to the existing ones. We organize this paper as follows: Sect. 2 presents the related work. Section 3 describes the developed approach. In Sect. 4, we explain the experimental setup in detail and discuss the obtained results. Finally, Sect. 5 provides a conclusion and future work.

2 Literature Survey Question-answering includes many activities such as yes/no responses, true–false questions, and the presentation of complex results obtained by sources of various data. In fact, QA systems were designed according to various assumptions related to a specific task the system must perform as well as to the resources available to it. This section presents some existing approaches to question-answering. Due to the development in the studied field and the importance that the detection of the intended question requirements in a natural way has gained over the last few years, QA systems have been applied to analyze linguistically asked questions. Among these systems, we can mention MASQUE [6]. The latter first represents the queries logically. Subsequently, it ensures their conversion into a database query that will be employed, later, in information retrieval. MASQUE separates the mapped and linguistic procedures. Besides, [7] utilized semantic means and statistical tools in the mapping process. They were also applied by [8] to classify queries and specify the answer types. After that, QA systems focused more on the open-domain. QA systems and question-answering became popular [9, 10] and widely employed to provide short, detailed, and question-specific answers [11]. After being launched in 1999 by the TREC evaluation campaign, open domain QA became a yearly program [12]. Later on, it confronted many challenges related to open domain questions (e.g., their problems set an assessment as well as a rise of the volume and complexity). Moreover, it presents a dataset for answer generation. The huge amount of data available on the web makes it considered a data source [13]. Open domain QA systems are also open/closed domain QA systems [14]. They are essentially used to answer factoid questions by applying various techniques (e.g., snippet tolerance and word match [15]) in order to answer factoid questions. The factoid QA systems outputs can be texts, XML, or Wiki documents [16]. The knowledge base of those introduced by [17–19] is generally extended by leveraging the huge amount of data on the web to retrieve answers to queries employing linguistic methods and rule-based techniques. Apart from the previously mentioned sources, the semantic web is also used by QA systems [20]. Moreover, a template-based technique was applied in [21], on resource description framework (RDF) data relying on SPARQL. Added to that, phrase-to-concept mapping was employed in a lexicon trained from paraphrases corpus [22]. This corpus was built from WikiAnswers. Besides in [23] authors

78

A. M. Baqasah

designed a question-answering prototype based on syntactic. Semantic web interface using Pattern (SWIP) was introduced [24] to create a pivot query that formed a hybrid solution combining the formal SPARQL target query and the natural language question. A phrase-level dependency graph was employed [25] to define question structure. Subsequently, a database was utilized in order to produce an example of the generated patron. In [26], authors suggested a multi-staged open domain QA framework taking advantage of the available datasets and the web. In the domain of using deep learning models for processing Arabic text, [27] applies a deep learning model that exploits a gated recurrent unit network (GRUN) to detect the mother tongue of the authors for learners of the Arabic language. In addition, [28] adopts the deep learning model using convolutional neural network (CNN) and long short-term memory (LSTM) methods to detect age and gender from Twitter feeds. In the QA system, the answer processing is the most important process. It applies extraction methods on the results provided by the document processing module in order to give an answer that necessitates the integration of data extracted from various sources, summarization, and solving uncertainty or contradiction issues. QA systems can be classified not only according to their structures, but also based on the paradigm they implement: 1. Information Retrieval (IR) QA: In this system, search engines are used for answer retrieval, while filters and ranking are applied to the recovered passage. 2. NLP QA: In this system, linguistic intuitions and ML techniques are used for the extraction of answers from the retrieved snippet. 3. Knowledge Base (KB) QA: Such a system is used to search for answers in the structured data source or the knowledge base and not from an unstructured one. In fact, standard database queries are utilized to substitute word-based searches. This paradigm considers structured information like ontology. The latter depicts the concepts’ conceptual representation and shows how they are related in a particular domain. It is defined as a knowledge base having a more sophisticated form, compared to the relational database. Structured languages, such as SPARQL, were introduced for the execution of queries and the retrieval of knowledge from this base. 4. Hybrid QA: Efficient QA systems employ several kinds of resources, particularly due to the wide use of modern search engines and the increasing communitycontributed knowledge available on the web. In the hybrid approach, IR QA, NLP QA, and KB QA are combined. IBM Watson [29] is a well-known instance of this paradigm.

3 The Developed Approach In this section, we describe the developed system of reading comprehension used in open domain question-answering. It mainly focuses on “machine reading at scale” (MRS). The suggested system is employed to search for an answer to a question

Transfer Learning for Arabic Question-Answering

79

in a huge corpus containing unstructured and irredundant documents. It combines the challenges of document retrieval, utilized to find the relevant documents, and those of machine comprehension of text used to determine the answers from those documents. Its objective is to make simpler the input sequence and maintain the most important information from which the answer can be deduced. An LSTM bidirectional recurrent neural network (RNN) architecture was, therefore, combined with word embedding techniques in the performed simulations. RNN and LSTM networks used to be the main applied models in the NLP field until the apparition of pre-trained language models and transformer models. These new models have caused a revolution in this field by dropping the recurrent part and only keeping attention mechanisms. They have been widely adopted and achieved state-of-the-art results for many complex and difficult NLP tasks. One of the leading pre-trained language models is BERT (introduced in Sect. 1). BERT is pre-trained on large corpora using two unsupervised tasks: masked language modeling (MLM) and next sentence prediction (NSP). MLM is a fill-in-the-blank task used to train the BERT model in order to fine-tune it later. The main idea of MLM is to mask one token or more, then the model will predict the most likely substitution for each according to the context of words surrounding the masked word. NSP allows the BERT model to learn longer-term dependencies across sentences. More precisely, it allows the BERT model to predict if the second sentence in the pair is the following sentence in the original document. Our second challenge consists in designing AraBERT that uses the BERT model. BERT was used by many previous researchers in several NLP tasks in different languages. In this study, the BERT-base configuration with six encoder blocks, 192 hidden dimensions, three attention heads, and 128 maximum sequence lengths was employed, as illustrated in Fig. 1. Indeed, the BERT model architecture is a multi-layer bidirectional transformer encoder that utilizes the original transformer model. Its input representation consists of segment embedding and concatenation of word embeddings [30]. In singlesentence classification and tagging tasks, there was no discrimination in segment embedding. In fact, special classification embedding ([CLS]) and a special token ([SEP]) were implanted as the first token and the final token, respectively. Considering the following input token sequence x = (x1, . . . , x T ), the output of BERT written below was obtained: H = (h1, . . . , hT ). The transformer library built by Hugging Face,1 which is an extremely useful implementation of the transformer models in both TensorFlow and PyTorch, was used. The input is the question tokens and the paragraph tokens divided by the special token [SEP]. Ti is the final hidden vector of AraBERT. The new parameters learned during fine-tuning are a start vector S and an end vector E and the probability of 1

https://huggingface.co/docs/transformers.

80

A. M. Baqasah

Fig. 1 Overall architecture of the developed system

word i is the start/end of the answer span calculated as a dot product separating Ti and S or E followed by a Softmax. In the performed analysis, tenfold cross-validation and leave-one-out techniques were utilized in order to evaluate the developed model’s performance. Long Short-Term Memory Bidirectional RNN Model. The LSTM model, a bidirectional recurrent neural network (BRNN), can be defined as an enhanced version of the RNNs structure. It was first presented by Hochreiter et al. [11] who substituted the conventional neuron by the memory cell supervised with the forget input and output gates in order to solve the traditional RNNs vanishing gradient problem. This model may be trained in positive and negative time directions without restricting the use of the input information to a preset future frame. A BRNN can be trained without the constraint of utilizing the input information up to a predetermined future frame. LSTM networks create a novel architecture named a memory cell containing two memory blocks and an output layer. The LSTM cell calculates its internal state based on the iterative process presented below. However, in multiple blocks, the computations were arbitrarily iterated for each block. i t = σ (Whi h t−1 + Wxi xt + Wci ct−1 + bi )

(1)

) ( f t = σ Wh f h t−1 + Wx f xt + Wc f ct−1 + b f

(2)

Transfer Learning for Arabic Question-Answering

81

ot = σ (W oh t−1 + Wxo xt + W oct + bo )

(3)

ct = f t  ct−1 + i t  tanh(Wxc xt + Whc h t−1 + bc )

(4)

h t = ot  tanh(ct )

(5)

where i t : The input gate presents the quantity of new data which will be transferred via the memory cell. f t : The forget gate that deletes data from the memory cell. ot : The output gate demonstrates the amount of data to be presented in the following step. ct : The sigmoid function. σ : Final output. : The element-wise vector product. W : The parameter matrices having various sub-scripts. b: The bias vector. Support Vector Machines (SVM). It searches a hyperplane dividing data into two different categories with the maximal margin separating the closest samples in each class. Naive Bayes (NB). NB is a probabilistic technique intensively used to classify texts thanks to its high efficiency and relative simplicity. Indeed, it builds the conditional probability distributions of the primary characteristics given a class label from merely the training data. Therefore, unseen data are classified based on the comparison of the class likelihoods. Decision Tree (DT). A classifier constructs the training instance’s hierarchical tree where the condition on the feature value should be employed in order to partition data hierarchically. It is worth noting that ensemble methods employ several learning algorithms of decision trees to improve predictive results. K-Nearest Neighbor (KNN). In text classification, the KNN algorithm classifies data by finding the K closest matches in training data and then predicting using the label of the closest matches. It is a proximity-based classifier that applies distancebased measures.

82

A. M. Baqasah

4 Experimental Setup The data collection process is applied to facilitate QA dialogs between crowd workers. The Arabic Reading Comprehension Dataset (ARCD)2 ; containing 1,395 questions asked by crowd workers on Wikipedia articles, and a machine translation of the Stanford Question Answering Dataset (Arabic-SQuAD), including 48,344 questions, are used. The experiments were performed on a train/dev/test split of questions/answer pairs. In this split, no section was shared among the various folds. In the training set, questions had a single reference answer, whereas dev and test questions had five references each. The process of searching a solution to the problem was divided into two distinct steps. First, a basic model was created and used to classify only the major class. Subsequently, another second model was utilized for the categorization of the second class. To attain the first objective, the whole sequence, which is the input of the network, was mapped into a vector presenting predetermined and constant dimensions that are equal to the number of the major categories. Indeed, this process was applied to predict that each word forming the question (based on the temporal order and after being mapped through the level of embedding) would be the input of the LSTM having a final state mapped via a classical neural network using Softmax activation in one of the six most important prediction classes.

4.1 Fine-Tuning AraBERT This model selected a span of text including the answers by expecting a “start” token and an “end” token on the condition that the latter should appear after the former. In the training phase, the final embedding of each token was fed into two different classifiers; each of which had a single set of weights applied to every token. Afterward, the dot product of the output embeddings and the classifier were fed into a Softmax layer for the development of a probability distribution over all tokens. More precisely, those given the highest probability of being a “start” token were, thus, selected. Then, the above-mentioned steps were iterated for the “end” token. All hyper-parameters were, for example, turned on the data. In fact, the max length was 50, while the batch size was 128. Adam algorithm [9] was employed to optimize the developed model with an initial learning rate equal to 5e-5. However, the dropout probability was 0.1 and the maximum number of epochs was chosen from the list (1, 5, 10, 20, and 30).

2

https://metatext.io/datasets/arabic-reading-comprehension-dataset-(arcd).

Transfer Learning for Arabic Question-Answering

83

Fig. 2 Overall architecture of ELMo (depicted from [2])

4.2 Embedding Word embedding or word vector is considered the most famous learned representation for texts and documents. It allows to representation of each word used in a specific language by a vector and subsequently words having the same meaning will end up with similar embedding values. The Word2Vec model was created based on the corpus of Arabic Wikipedia. A pre-trained embedding GloVe [31], having a window with a size equal to 5 (1 center word + 2 words before and 2 words after), a minimum frequency equal to 10, and a dimension of 300, was used. Contrary to the bag of words model, word embedding is able to solve the sparse matrix problem by translating large sparse vectors which are high-dimensional data into a lower-dimensional space. Also, it could deal easily with semantic relationships as vectors having semantically similar items will be placed close to each other. In the NLP field, downstream tasks must be able to understand the actual context of every word. This has represented a big challenge to word embedding. In order to solve this context problem, contextual embeddings methods were proposed. They represent a context-dependent representation. They associate a representation to each word of the sentence based on its context. The embeddingsbased context is established using pre-trained models. For example, embeddings from language models (ELMo) [32] is widely used to create contextualized word embeddings. The overall architecture of ELMo is presented in Fig. 2.

4.3 Evaluation Metrics The various readers were assessed by applying three distinct metrics. More precisely, the first is the exact match (EM) utilized to calculate the ratio of predictions that match with the truth file. However, secondly, the (macro-averaged) F1-score was employed to measure the mean overlap between predictions and truth files. The third

84 Table 1 Models’ performance on ARCD and Arabic-SQuAD datasets

A. M. Baqasah Model

EM

F1

SM

AraBERT

91.0

88.8

79.9

RNN-LSTM

86.6

80.2

76.1

SVM

96.5

91.0

87.3

NB

92.8

76.5

70.2

DT

88.2

83.8

80.6

KNN

76.5

74.2

70.2

metric is that of sentence match (SM). It was employed to compute the percentage of predictions belonging to the same sentence in a paragraph.

4.4 Results Table 1 depicts the performance of the introduced model in terms of sentencelevel semantic frame accuracy, F1, and intent classification accuracy on the ARCD and Arabic-SQuAD datasets. It shows that the AraBERT model presents very good performance results compared to existing models on datasets test as it attained EM accuracy, F1, and SM equal to 91, 88.8, and 79.9%, respectively. Compared to other languages, ARCD and Arabic-SQuAD datasets contain multiple domains and are characterized by a larger vocabulary. For the more complex datasets, AraBERT increases considerably the sentence-level semantic frame accuracy, which reveals the important generalization capability of the AraBERT model.

5 Conclusion In this paper, we propose a QA system based on the AraBERT transformer. Through the obtained findings, we noticed that the proposed AraBERT model outperforms the majority of other traditional models, demonstrating the efficacy of exploiting the relationship between trained models and transfer experience. The developed AraBERT model enhanced intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on ARCD and Arabic-SQuAD datasets. In future work, we will assess the suggested approach on other large-scale and more complex QA datasets. We will also study the efficiency of the external knowledge and BERT combined.

Transfer Learning for Arabic Question-Answering

85

References 1. Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv CSUR 51(5):1–36 2. Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 3. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9 4. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) XLNET: generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems 32 (NeurIPS) 5. Antoun W, Baly F, Hajj H (2020) AraBERT: transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104 6. Androutsopoulos I, Ritchie G, Thanisch P (1993) Masque/sql: an efficient and portable natural language query interface for relational databases. In: Proceedings of the 6th international conference on industrial and engineering applications of artificial intelligence and expert systems, Gordon and Breach Publisher Inc., Edinburgh, pp 327–330 7. Burke RD, Hammond KJ, Kulyukin VA, Lytinen SL, Tomuro N, Schoenberg S (1997) Question answering from frequently asked question files: experiences with the FAQ finder system. AI Mag 18(2):57–66 8. Carbonell J, Harman D, Hovy E, Maiorano S, Prange J, Sparck-Jones K (2000) Vision statement to guide research in question & answering (Q&A) and text summarization. Rapp Tech, NIST 9. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 10. Fader A, Zettlemoyer L, Etzioni O (2013) Paraphrase-driven learning for open question answering. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Sofia, Bulgaria, pp 1608–1618 11. Hochreiter S, Schmidhuber J (1996) LSTM can solve hard long time lag problems. Adv Neural Inf Process Syst 9:473–479 12. Voorhees EM (2003) Overview of the TREC 2003 question answering track. In: Proceedings of the twelfth text REtrieval conference 13. Katz B, Felshin S, Yuret D, Ibrahim A, Lin J, Marton G, McFarland AJ, Temelkuran B (2002) Omnibase: uniform access to heterogeneous data for question answering. In: International conference on application of natural language to information systems, Springer, Berlin, Heidelberg, pp 230–234 14. Mishra A, Jain SK (2016) A survey on question answering systems with classification. J King Saud Univ Comput Inf Sci 28(3):345–361 15. Lee YH, Lee CW, Sung CL, Tzou MT, Wang CC, Liu SH, Shih CW, Yang PY, Hsu WL (2008) Complex question answering with ASQA at NTCIR 7 ACLIA. In: Proceedings of NTCIR-7 workshop meetings, entropy 16. Lopez V, Uren V, Sabou M, Motta E (2011) Is question answering fit for the semantic web?: a survey. Seman Web 2(2):125–155 17. Soricut R, Brill E (2006) Automatic question answering using the web: beyond the factoid. Inf Retrieval 9(2):191–206 18. Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 19. Riloff E, Thelen M (2000) A rule-based question answering system for reading comprehension tests. In: ANLP-NAACL 2000 Workshop: reading comprehension tests as evaluation for computer-based language understanding systems 20. Sun H, Duan N, Duan Y, Zhou M (2013) Answer extraction from passage graph for question answering. In: Proceedings of the twenty-third international joint conference on artificial intelligence, pp 2169–2175

86

A. M. Baqasah

21. Unger C, Bühmann L, Lehmann J, Ngomo ACN, Gerber D, Cimiano P (2012) Template-based question answering over RDF data. In: Proceedings of the 21st international conference on World Wide Web, pp 639–648 22. Vanitha, Sanampudi SK, Lakshmi IM (2010) Approaches for question answering system. IJEST 3:992–995 23. Xu J, Licuanan A, May J, Miller S, Weischedel RM (2003) Answer selection and confidence estimation. In: New directions in question answering, pp 134–137 24. Pradel C, Peyet G, Haemmerlé O, Hernandez N (2013) SWIP at QALD-3: results, criticisms and lesson learned. In: 3rd Open challenge on question answering over linked data (QALD), Valencia, Spain, pp 1–13 25. Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923 26. Zhou C, Sun C, Liu Z, Lau F (2015) A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 27. Mechti S, Alroobaea R, Krichen M, Rubaiee S, Ahmed A (2020) Deep learning model for identifying the Arabic language learners based on gated recurrent unit network. Int J Adv Comput Sci Appl 11(5):620–627 28. Alroobaea R, Alafif S, Alhomidi S, Aldahass A, Hamed R, Mulla R, Alotaibi B (2020) A decision support system for detecting age and gender from twitter feeds based on a comparative experiments. Int J Adv Comput Sci Appl 11(12):370–376 29. Ferrucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, Lally A, Murdock JW, Nyberg E, Prager J, Schlaefer N, Welty C (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79 30. Wang P, Xu J, Xu B, Liu C, Zhang H, Wang F, Hao H (2015) Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (short papers), vol 2, pp 352–357 31. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543 32. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365v2

Smart Healthcare System to Predict Ailments Based on Preliminary Symptoms Chirag Jagad , Ishika Chokshi , Devanshi Jhaveri , Himanshu Harlalka , and Prachi Tawde

Abstract With the advancement of technology, there have been a plethora of breakthroughs in the healthcare sector. As a result of this, smart health care has progressively come into focus. In this paper, we have proposed and implemented a system that aids in the prediction and diagnosis of four medical ailments. The purpose of the proposed system is to make technology more useful in solving problematic healthcare challenges. By using this technology, we can interpret data which is obtained by diagnosing various symptoms, analysing data, and then predicting the ailment. Through this system, we aspire to enhance public health by providing a system in their pocket that can diagnose ailments. The current lifestyle of humans has seen a drastic change over the past few years. Eating habits and the evolution of technology have played a major role in the same. Keeping this in mind, we have chosen myopia, colour blindness, polycystic ovarian disease (PCOD), and probability of heart attack to implement in our system. We employ the power of machine learning to predict the probability of heart attack, PCOD, and myopia. A comprehensive introduction and literature work on the prediction of these ailments, along with the methods used to predict them, are presented in this paper. Furthermore, we have discussed the results that prove the robustness and reliability of our system. Keywords Health care · Coronary heart disease (CHD) · Polycystic ovarian disease (PCOD) · Myopia · Colour blindness · Machine learning

C. Jagad (B) · I. Chokshi · D. Jhaveri · H. Harlalka · P. Tawde Dwarkadas J. Sanghvi College of Engineering, Mumbai University, Mumbai, Maharashtra, India e-mail: [email protected] P. Tawde e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_8

87

88

C. Jagad et al.

1 Introduction The maintenance and advancements in health—physical or mental by adopting precaution, diagnosis, or cure of any medical ailment is defined as health care [1]. With the evolution of technology, steps have been taken to gradually digitise traditional medicine. Smart health care utilises a variety of innovative technologies that have resulted in overall development, making medicine more systematic and convenient [2]. Machine learning and artificial intelligence are booming, and they can be applied in a plethora of fields, including health care. AI in health care is enhancing medical treatment and patient experiences, from speedier diagnosis to robot-assisted procedures [3]. Smart healthcare leverages a new generation of technologies, viz., big data, cloud computing, and artificial intelligence, making health care more effective, accessible, and customised [2]. It focuses on illness prevention rather than its treatment after it has occurred. Routine screening and counselling helps us achieve the same. Chronic illnesses such as diabetes, heart disease, arthritis, and others need ongoing healthcare treatments such as screening, medication, symptom surveillance, and lifestyle modification. The majority of these can be automated by smart healthcare systems using health apps, smart devices, health bots/telemedicine, and standardised processes. This increases patients’ quality of life while also streamlining care-giving in a cost-effective manner [4]. Our system focuses on the prediction of medical ailments that have been becoming increasingly common in recent times due to the current lifestyle of people. In today’s digital-oriented world, people have been relying on the latest technology for numerous things. They have become too busy that often their health takes a backseat. To help increase the awareness of the changes in their lifestyle that have led to people being prone to ailments like heart attack, PCOD, myopia, colour blindness, we aim to build a system that would be available to them anytime and anywhere. With our system, we aim to diagnose ailments in their earlier stages, which will save lives and increase awareness of the same. Heart disease is the leading cause of death worldwide. According to the WHO, 17.9 million people die from heart disease each year, accounting for 32% of all global fatalities. Of these global deaths, coronary heart disease (also known as a heart attack) accounts for 85% of deaths and is by far the most prevalent and lethal of all heart illnesses [5]. Through this paper, we propose an AI-driven method of predicting heart attack probability along with predicting the heart rate using a facial video. We measure tiny head movements generated by the Newtonian reaction to the entry of blood at each beat to extract heart rate. The proposed system would aid people to check the probability of heart attack and the heart rate in a contactless manner. Polycystic ovarian syndrome is a hormonal disorder that is largely prevalent among women of reproductive age. The average age group affected by PCOD ranges from 15 to 44 years [6]. It affects 6–12% of women of progenitive age [7]. In this paper, we propose to predict PCOD using ML technologies by answering a few

Smart Healthcare System to Predict Ailments Based on Preliminary…

89

lifestyle-related questions. Early prognosis of PCOD will aid in its swifter treatment which can be controlled by making some alterations in lifestyle choices. Myopia, popularly known as nearsightedness or shortsightedness is a visual ailment where an individual sees nearby objects clearly and the distant objects appear blurry. Myopia usually worsens between infancy and adolescence based on age, gender, and initial degree of myopia [8], reaching 70–90% of the population in the Asian countries [9]. As a result, we propose a system to forecast an individual’s approximate visual acuity score and determine whether or not they require medical assistance. Colour blindness, also known as colour vision deficiency, is a visual ailment in which an individual cannot differentiate among certain colours. This condition is usually inherited, where red-green colour blindness is the most common type. Another colour deficiency, blue-yellow colour blindness, exists but is quite uncommon and it cannot be tested. In our system, we concentrate on testing for red-green colour blindness. This method also enables the users to take the colour blindness test on their palms without the need to visit a clinic. In the light of the aforementioned considerations, we chose these diseases. This paper includes the related works as well as the system architecture. The implementation of predicting the said diseases, database details, and the discussion of their results is covered in detail in the remaining sections of the paper.

2 Related Works In this section, we have elaborated the literature survey referred to implement our proposed system.

2.1 Coronary Heart Disease Heart-rate-from-motion algorithms attempt to carefully measure the cyclic head motion of roughly the expected frequency, then work backwards to infer a heart rate. The authors of [10] take an input video of a person’s head and return a pulse rate as well as a series of beat locations that can be used for beat-to-beat variability analysis. They first use feature tracking to extract the motion of the head and then map the motion of the head to a 1D axis. They use principal component analysis (PCA) for signal processing filtering techniques to target motion only in the frequency range corresponding to heart rate to remove sources of noise. The remaining mixed motion is then decomposed into sub-motion vectors, with the most periodic motion vectors corresponding to the heart rate. The findings demonstrate that precise pulse rate measurements may be obtained from head movements. The majority of our beat interval distributions resembled the ECG distributions in terms of qualitative similarity, demonstrating that they actually represent true physiological variability.

90

C. Jagad et al.

In [11], the heart rate is determined by the variation in facial skin colour caused by blood circulation. Face recognition paired with object tracking is utilised to generate a series of face rectangles, which are sampled for colour changes in the pipeline’s following stages. The average of the colour in a selected area of interest (ROI) on the face represents a signal corresponding to the heart rate. A heart-rate frequency may be derived from this signal via signal processing using fast Fourier transform (FFT). Though this method shows impressive results, one major challenge is that it should work well on people with different types of skin colour.

2.2 Polycystic Ovarian Disease A prior study was conducted in an attempt to discover repeating patterns among symptoms of PCOD patients with the help of frequent itemset mining. It also concentrated on the Apriori algorithm, which was used to forecast who might be prone to this ailment. Frequent itemset mining was used, which divided the dataset into groups and then multiple classifications, association, and clustering techniques were applied. The data was then passed through numerous preprocessing techniques in order to attain maximum accuracy. The classification algorithms used in this study are decision tree, Naive Bayes, random forest, and artificial neural network (ANN). However, these results did not give 100% accuracy [12]. In [13], the authors had proposed to identify if the woman has PCOD based on the ultrasound and four medical parameters FSH, LH, BMI, and cycle length with the help of machine learning techniques. The input to the system was the ultrasound along with four medical parameters mentioned above. This study aimed to statistically evaluate metabolic and clinical features on the basis of the probability density function and box plot. Thereafter, the statistical parameters were fed to the two statistical models, i.e. logistic regression and Bayesian classifier for the prediction. For binary categorical outcomes, such as PCOD and normal, the logistic regression model is utilised. In such instances, multivariate logistic regression offers the most accurate forecast based on the dependent variable.

2.3 Myopia The most common methodology used to detect myopia is by using Snellen’s chart. In this, the user has to stand at a distance of approximately 1.2 m from the screen. For each line the user can read, he/she will have to scroll to the next line of Snellen’s chart [14]. This will continue till the user cannot read a line or reaches the end of the chart. The user has to check the result table with the visual scores corresponding to the line he/she could not read. Thus, the visual score is his/her visual acuity. Recent literature [15] has used another technique where the user will hold the phone in front of him and an algorithm will calculate the distance between the user’s

Smart Healthcare System to Predict Ailments Based on Preliminary…

91

eye and the front camera including the angle of the phone. The font of Snellen’s chart will vary with respect to the distance between the camera and the user. Now, the user has to read out the letters he can see. The speech-to-text converter will check if the user has read the letters out correctly and will go to the next set of letters. Eventually, it will give out the visual acuity score using a formula.

2.4 Colour Blindness “Automatic testing of colour blindness” is a system designed by Dey et al. [16] which can enable a user to test for this ailment using a computer. The implementation of this proposed system is based on the idea that a colour-blind person cannot differentiate between similar colours in which he is colour blind. For this, the digital eight (8) is considered. As it consists of seven lines, those are permuted with varying line and background colours to test for the type of colour blindness. The above method dates back to 2010 and is not currently the most effective way to test for colour blindness. Thus, we use the more traditional approach of Ishihara testing which we have implemented to work on a mobile phone [17]. Twenty one images consisting of Ishihara plates are shown each containing a numeric value. The user is asked to choose the value he sees or whether he is unable to see a value partially or not at all. At the end of the 21 plates, the programme summarises the test presenting which answer is correct and which is not, the final score and the final diagnosis decision according to the instructions sheet.

3 System Architecture The proposed system architecture has been described in Fig. 1. A new user will sign up using their Google Account and will be prompted to fill the health journal which contains certain medical attributes like height, weight, blood group, current medical ailments, and medication. These details are stored in the database. The user can then access all the medical tests. An already registered user will be able to log in and will be directed to choose among the four tests. In the PCOD module, the user fills the form containing 19 questions and the result is displayed once the form is submitted. Similarly, in the heart attack probability module along with the form the user has to upload a video of their face for the result to be displayed. In the myopia module, users will see the tumbling E chart, speak out its direction and based on the answers the test will proceed further. Once the test is finished the result is displayed. For the colour blindness module, 21 Ishihara plates will be displayed and the user has to identify and enter the number visible to them on it. Based on the answers, the result will be displayed. Once any test has been completed, the result is displayed and stored in the database for the user to keep track of their health.

92

C. Jagad et al.

Fig. 1 System architecture

4 Implementation Details In this section, we have discussed the methods and algorithms used to predict the ailments. The website of the proposed system has been built using HTML, CSS, JavaScript, Bootstrap using Django framework and the application is built using Flutter. Firebase has been used as the authentication database which is common to both. Once the user logs in, they can choose from the four ailments. For data preprocessing as well as training and testing, we have used Python libraries like Pandas, NumPy, and Scikit Learn.

4.1 Coronary Heart Disease Our approach takes a video (20–25 s long) of a person’s face and other attributes affecting heart attack (mentioned in Sect. 5) as input and returns a heart attack likelihood and heart rate. Our heart-rate detection approach is based on Balakrishnan’s [10] approach. The head movements associated with heart activity are minor and are mixed in with a wide range of other involuntary head motions. He [18] discovered that the vertical axis is the ideal axis for measuring the movement of the upright head induced by pulse. All other involuntary head motions are filtered out since they do not contribute to pulse detection and potentially interfere with our predictions. We use the Haar cascades classifier [19] to recognise faces and only consider the centre of the face, which is 50% breadth and 90% length, to choose the region of interest. We also remove the area surrounding the eyes because their movement are not related to blood flow. We extract the motion of the feature head points in our region of interest using feature tracking. The resting pulse rate of a healthy adult is between [0.75, 2] Hz or [45, 120] beats per minute. Hence, we filter out any frequencies less than 0.75 Hz.

Smart Healthcare System to Predict Ailments Based on Preliminary…

93

However, frequencies larger than two can be effective for peak detection. We utilise PCA to separate the motion corresponding to the pulse, which is then projected onto a 1D signal, allowing us to recover individual beat boundaries from the trajectory’s peaks. PCA aids in the decomposition of trajectories into a number of independent source signals, from which we choose the source that best matches the pulse. This frequency is used to calculate the average pulse rate and to identify the peaks that correspond to the ECG ground truth signals. As a result, we get the heart rate. The obtained heart rate is fed into the classification model along with other attributes to predict the probability of a heart attack. For our classification model, we divided our dataset into 8:2 train and test split and trained our model on various classifiers with different parameters to filter out the best model as discussed in Sect. 6.

4.2 Polycystic Ovarian Disease The approach to predict PCOD requires the user to answer 19 lifestyle and food intake habit questions. Most of the questions require a YES or NO answer from the user. This approach is based on the one described in [12]. The 19 attributes and the class label in the dataset were preprocessed using python libraries like Pandas and NumPy. We then divided the dataset into training and testing using the Scikit learn train test split module with an 8:2 split. We applied various classification algorithms as discussed in Sect. 6. Based on the answers given by the users to the 19 lifestyle and food intake habit questions, the result will be displayed.

4.3 Myopia The tumbling E chart is used to predict a person’s visual acuity in order to determine whether or not they have myopia. To help patients who are unable to read the alphabet, we use the tumbling E chart. For greater accuracy, the user must adhere to a set of instructions before beginning the test. The user’s honesty, earphones, and a calm environment are crucial. The user must make a right angle with his body by extending his arm far out. We use linear regression on a database of heights and arm lengths because each user’s arm length is unique. Consequently, we predict the most accurate arm length of the user for a particular user’s height. The ratio between the actual 1.2 m distance needed in a clinic and the corresponding arm length of the user is used to determine the size of the tumbling E chart image. An instruction voice note is played prior to each question, and the user is then expected to respond by saying the letter E’s direction aloud. To verify the accuracy of this response, speech-to-text conversion was used. This process keeps going until the user either responds to the direction incorrectly, is unable to read any further, or his vision becomes slightly blurred.

94

C. Jagad et al.

4.4 Colour Blindness The goal is to digitalise the standard Ishihara testing done using physical plates and check whether the user is suffering from colour blindness. Before starting the test, the user has to follow a certain set of instructions for better accuracy: 1. Proper room lighting and phone brightness should be ensured, 2. The user is expected to wear any glasses if necessary for near vision, and 3. The user should hold the screen at a comfortable reading distance (14 inches or 35 cm). The system has been implemented such that a person can access the test from their phone. This method has been approved by a certified doctor and we have incorporated his suggestions. In this test, 21 images consisting of Ishihara plates [20] are shown each containing a numeric value one by one. The user is asked to choose the value he sees or whether he is unable to see a value partially or not at all. At the end of the 21 plates, the program summarises the test presenting which answer is correct and which is not, the final score and the final diagnosis decision according to the instructions sheet.

5 Dataset This section contains all the details of the dataset used for the implementation of our proposed system.

5.1 Coronary Heart Disease The dataset for the probability of heart attack is publicly available on the Kaggle website and comes from an ongoing cardiovascular study of Framingham, Massachusetts people. The categorization purpose is to forecast the patient’s risk of developing coronary heart disease in future (either 0 for no risk or 1 for risk). The data collection contains information about the patients. It has about 4200 records with 15 qualities. Each characteristic is a possible risk factor. There are risk variables that are demographic, behavioural, and medical in nature [21]. Following are all the attributes considered in the dataset: gender, age, education, current smoker, cigarettes per day, BP medicines, prevalent stroke, prevalent hypertension, diabetes, total cholesterol, systolic blood pressure, diastolic blood pressure, BMI, heart rate, and glucose.

Smart Healthcare System to Predict Ailments Based on Preliminary…

95

5.2 Polycystic Ovarian Disease The PCOD dataset has 19 attributes and one class label. Out of the total 19 attributes, 14 are binary attributes whose values can be either YES or NO. Of the remaining five attributes, two are time attributes and the remaining three are categorical attributes. Of the three categorical attributes, two of the attributes have four classes and the third has three classes. In order to collect data, we circulated a survey with the questions same as the attributes of the aforementioned dataset. The attributes are mainly basic health and lifestyle questions to be answered by the user to get the predicted result. This dataset was taken from [12], and we also circulated a survey with the attributes mentioned in the dataset to get more data for better accuracy. To avoid biased prediction of our classifier, we used the synthetic minority oversampling technique (SMOTE) [22], which generates synthetic samples for the minority class and assists in overcoming the overfitting problem caused by random oversampling.

6 Discussion of Results In this section, we have discussed about the experimental results conducted and the accuracy obtained for the methods used to predict the ailments.

6.1 Coronary Heart Disease To test our heart-rate prediction module, we performed tests on 50 random users of different ages and skin colours. We have taken most of the data from the users when they are in a normal state, which is the ideal state. But to check outliers, we have also collected a few data when users are in an abnormal state, i.e. just after running, just after gymming, when the user is stressed, etc. Table 1 is a sample table that displays observed and predicted heart rate of five users in their normal state from our test users. Table 1 Heart-rate experimental testing results of five test users in their normal state

Observed heart rate

Predicted heart rate

80

82

77

78

79

79

82

83

86

85

96 Table 2 Heart-rate experimental testing results of five test users in their abnormal state

Table 3 Heart attack prediction train and test accuracy (%)

C. Jagad et al. Observed heart rate

Predicted heart rate

114

114

109

111

118

119

106

104

103

102

Classification algorithm

Train

Test

Random forest

100

87

Decision tree

100

80

K neighbours

86

78

Gradient boost

83

81

SVC

65

67

CatBoost

94

89

We chose mean absolute percentage error (MAPE) as the metric to evaluate the results. MAPE is a measure of the prediction accuracy of a predicting method in statistics. It is expressed by the formula | | 100% Σn || ai − pi || i=1 | n ai |

(1)

We got MAPE = 2.389% for the 50 test cases that we used, which is a decent score. Table 2 also show that our model is good at handling outliers, i.e. abnormal heart rates. For the classification model, we experimented using random forest classifier, decision tree classifier, K-nearest neighbour classifier, gradient boosting classifier, support vector machine classifier, and CatBoost classifier. Table 3 shows the accuracy comparison of all the models and among all, the CatBoost classifier showed the best performance with 94% training accuracy and 89% test accuracy. Hence, we chose CatBoost classifier as our classification model to predict the probability of a heart attack.

6.2 Polycystic Ovarian Disease Similarly, for the PCOD classification model, we used the random forest classifier, decision tree classifier, Naive Bayes classifier, and CatBoost classifier. Table 4 shows the accuracy comparison of all the models and among all, here too CatBoost gave us

Smart Healthcare System to Predict Ailments Based on Preliminary… Table 4 PCOD prediction train and test accuracy (%)

Table 5 Myopia experimental testing results

Classification algorithm

97

Train

Test

Decision tree

99

79

Naive Bayes

84

88

CatBoost

98

84

Random forest

86

85

Predicted visual acuity-lens number

Original lens number

20/50 = − 1.00

− 1.25

20/10 = Normal

Normal

20/200 = − 2.50

− 3.0

20/60 = − 1.25

− 1.25

20/100 = − 1.75– − 2.00

+ 1.25

the highest training and testing accuracy of 98% and 94%, respectively. Hence, we decided to go ahead with the CatBoost classifier for the PCOD classification model.

6.3 Myopia To test our myopia module, we performed tests on 50 random users of different ages and heights who were affected by myopia, hypermetropia and had normal vision. To check the accuracy of myopia, we tested the application on a set of users. The predicted visual acuity is converted to lens number [23]. From Table 5, we can clearly see that most of the results are very close to the actual value. We see the major error occurring for the user who has reading glasses. We got MAPE = 8.485% for the 50 test cases that we used, which is a decent score.

6.4 Colour Blindness To check the accuracy of colour blindness, we tested the application on a set of 100 users. By taking the mean of accuracies of all the classes mentioned below, we received an accuracy of 96.75%. The result is seen in Table 6.

98 Table 6 Colour blindness experimental testing results

C. Jagad et al. Result

Accuracy (%)

Normal vision

77/80 = 96.25

Mild deutan colour blindness

7/8 = 87.5

Strong deutan colour blindness

4/4 = 100

Mild protan colour blindness

6/6 = 100

Strong protan colour blindness

2/2 = 100

7 Conclusion Since medical services are not accessible or economical for everyone, despite having good infrastructure and cutting-edge technologies, we have proposed a smart healthcare system that aims to assist users by informing them about their medical state and keeping them health-aware. In this paper, we have proposed a smart healthcare system to predict four medical ailments. A thorough study has been done on the current methods used to predict the ailments, and more accurate methods have been proposed to predict the same. From the experimental results, we can conclude that the methods used to predict all the four ailments are accurate enough to indicate if the users are at risk. The accuracy of the predictions can be further improved by incorporating the latest medical reports of the user. In future, we can implement the system with more ailments to provide a holistic health prediction experience, recommend doctors based on the test results, and suggest natural doctor proclaimed remedies/medicines for the ailments they suffer from.

References 1. Healthcare Retrieved from https://en.wikipedia.org/wiki/Health_care. Accessed on 24 Jun 2022 2. Tian S, Yang W, Grange JML, Wang P, Huang W, Ye Z (2019) Smart healthcare: making medical care more intelligent. Glob Health J 3(3):62–65. https://doi.org/10.1016/j.glohj.2019. 07.001. Retrieved from www.sciencedirect.com/science/article/pii/S2414644719300508 3. 40 AI in healthcare examples improving the future of medicine (2022). Retrieved from https:// builtin.com/artificial-intelligence/artificial-intelligence-healthcare 4. Smart healthcare: how IoT can improve health services. Retrieved from www.techaheadcorp. com/blog/smart-healthcare-solutions/. Accessed on 9 Jul 2022 5. Cardiovascular diseases (CVDs). Retrieved from www.who.int/news-room/fact-sheets/detail/ cardiovascular-diseases-(cvds). Accessed on 24 Jun 2022 6. Polycystic ovary syndrome (PCOS): symptoms, causes, and treatment. Retrieved from www. healthline.com/health/polycystic-ovary-disease#what-is-pcos. Accessed on 24 Jun 2022 7. The prevalence of polycystic ovary syndrome, its phenotypes and cardio-metabolic features in a community sample of Iranian population: Tehran lipid and glucose study. https://doi.org/10. 3389/fendo.2022.825528. Accessed on 24 Jun 2022 8. Tricard D, Marillet S, Ingrand P, Bullimore MA, Bourne RRA, Leveziel N (2021) Progression of myopia in children and teenagers: a nationwide longitudinal study. British J Ophthalmol. https:// bjo.bmj.com/content/bjophthalmol/early/2021/03/11/bjophthalmol-2020-318256.full.pdf

Smart Healthcare System to Predict Ailments Based on Preliminary…

99

9. Fredrick DR (2002) Myopia. BMJ 324(7347):1195–1199. https://doi.org/10.1136/bmj.324. 7347.1195 10. Balakrishnan G, Durand F, Guttag J (2013) Detecting pulse from head motions in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3430–3437 11. Rahman H, Ahmed MU, Begum S, Funk P (2016) Real time heart rate monitoring from facial RGB color video using webcam. In: The 29th Annual workshop of the Swedish artificial intelligence society (SAIS), Malmo¨, Sweden. Linköping Electronic Conference Proceedings, No. 129, Linköping University Electronic Press 12. Vikas B, Sarangi S, Chilla M, Bhargav KS, Anuhya B (2017) A literature reviewon the rising phenomenon PCOS. Int J Adv Eng Technol 10(2):216 13. Mehrotra P, Chatterjee J, Chakraborty C, Ghoshdastidar B, Ghoshdastidar S (2011) Automated screening of polycystic ovary syndrome using machine learning techniques. In: 2011 Annual IEEE India conference, pp 1–5. IEEE 14. Myopia digital test. Retrieved from www.mdcalc.com/visual-acuity-testing-snellen-chart. Accessed on 24 Jun 2022 15. Agarwal A, Abhishek K, Kumar V, Kumar V, Prasad N, Singh M (2015) Dr. eye: an android application to calculate the vision acuity. Procedia Comput Sci 54:697–702. https://doi.org/10. 1016/j.procs.2015.06.082 16. Dey S, Roy S, Roy K (2010) Automatic testing of color blindness 17. Marey HM, Semary NA, Mandour SS (2015) Ishihara electronic color blindness test: an evaluation study. Ophthalmol Res Int J 3(3):67–75 18. David DH, Winokur ES, Sodini CG (2011) A continuous, wearable, and wireless heart monitor using head ballistocardiogram (BCG) and head electrocardiogram (ECG). In: 2011 Annual international conference of the IEEE engineering in medicine and biology society, pp 4729– 4732. IEEE 19. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 1, pp I–I. IEEE 20. Ishihara plates for colour blindness. Retrieved from https://www.color-blindness.com/ishiha ras-test-for-colour-deficiency-38-plates-edition/. Accessed on 24 Jun 2022 21. Framingham CHD. Retrieved from https://www.kaggle.com/datasets/captainozlem/framin gham-chd-preprocessed-data. Accessed on 24 Jun 2022 22. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority oversampling technique. J Artif Intell Res 16:321–357 23. Conversion of visual acuity to lens number. Retrieved from https://www.happyeyesight.com/ get-20-20-vision/. Accessed on 24 Jun 2022

Mathematical Models for the Ouroboros Protocol Based on Attacks Over Blockchain Systems Sai Tejaswi Guntupalli and Khushi Saxena

Abstract The decentralized ledger technology has garnered traction worldwide due to the high degree of security that it entails. The secure functioning of blockchains through this technology has rendered results so promising that it drew the attention of the world’s most revered scientists. In this study, the researchers offer an analysis of diverse consensus protocols and adversarial models. We mainly considered classical Bitcoin-like proof-of-work protocol and the Greedy Heaviest-Observed SubTree (GHOST) protocol for the study. The objective of this initiative is to present a comparative analysis between the proof-of-work protocol that underlies Bitcoin and the novel proof-of-stake algorithm that was introduced by Ouroboros. Keywords Ouroboros · Blockchain · Greedy Heaviest-Observed Sub-Tree (GHOST) protocol · Bitcoin · Splitting attack

1 Introduction The Bitcoin is a payment system where digitally signed transactions are grouped into blocks and stored securely in a structure called blockchain. A blockchain is a sequence of blocks linked via hash pointers where each new block contains a hash of the previous block. This structure preserves an ordered list of transactions that uniquely determines state of the system. Unlike other centralized payment systems, in Bitcoin, once a transaction is added to the blockchain, it could not be considered as confirmed immediately. A user needs to wait some time to be sure that the transaction is set in stone in the blockchain. This is because of decentralized nature of the system where everyone can add blocks to the blockchain. To guarantee consistency among different users and to preserve

S. T. Guntupalli SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India K. Saxena (B) Banasthali Vidyapith, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_9

101

102

S. T. Guntupalli and K. Saxena

inability to revert previously added blocks, a special mechanism is used called proofof-work. The following idea underlies a proof-of-work system: a computational effort (calculation of a hash value below some target) should be spend to produce a block. Only a chain of blocks with the most computations is considered valid. As the blockchain technology evolves, the alternatives to the computationally heavy proof-of-work mechanism appear. The one most promising is called proofof-stake: it does not require heavy computations to produce blocks, instead, a block producer is chosen through a fair procedure among all stakeholders in the system. The Ouroboros is a good example of such a system [1]. To the best of our knowledge, it is the first provably secure proof-of-stake protocol with rigorous security guarantees. The concept of a blockchain could be undermined if someone would have a possibility to revert blocks by submitting a chain that would substitute the one currently accepted. For example, it can lead to the following attack: some buyer pays to a merchant with Bitcoins, after the corresponding transaction is included into the blockchain, the merchant accepts a payment and sends a product to the buyer, and upon receiving the product, the buyer issues a conflicting chain of blocks which does not contain the payment to the merchant but instead sends coins back to the buyer. So, as long as the merchant cannot be sure that the payment is irreversible, it is insecure to deliver the product. Satoshi Nakomoto argues in the original Bitcoin white paper [2] that the system is secure (with some probability) against such attacks unless 50% or more of the total computational power possessed by an adversary. The described double-spend attack is relevant not only for Bitcoin, but also for other proof-of-work systems, for instance, those based on GHOST algorithm [3], as well as for proof-of-stake systems, like Ouroboros. In this paper, we describe known double-spend models for Bitcoin and present new mathematical models for the Ouroboros protocol that allows to calculate the security bounds for different types of adversaries. We also provide the results of splitting attack simulations for Bitcoin and GHOST algorithms. Finally, we compare different protocols, as a measure we focus on the block confirmation time needed to provide reasonable security guarantees for the users. The paper is structured in the following way: the second chapter describes classical double-spend attack for Bitcoin as it was introduced by S. Nakomoto. The third chapter describes splitting attacks that are relevant to those proof-of-work systems where the time difference between blocks is not negligible compared to the block propagation time. The fourth chapter represents two adversarial models for Ouroboros proof-of-stake protocol. The last chapter summarizes and compares results from the previous chapters.

2 Double-Spend Attack: General Overview This section represents an overview of a double-spend attack that could happen in a blockchain-based system. As we briefly mentioned before, it does not really matter

Mathematical Models for the Ouroboros Protocol Based on Attacks …

103

what type of consensus mechanism underlies the system, and a double-spend could happen in both proof-of-work and proof-of-stake systems. Here, we describe the main essence of the attack. As it follows from the name, the whole idea of a double-spend attack is to use the same coins twice. In general, it implies that someone pays for some goods, but after receiving them, reverts the payment so both goods and money are in the hands of an attacker. While it is infeasible to change the transaction with the payment itself (cause that would require falsifying a digital signature), it is possible to reject an entire block which includes the transaction. For doing this, an attacker needs to substitute a valid sub-chain of blocks with a new one that has a bigger score (score calculation depends on the actual blockchain type). Even though this attack requires tremendous resources (computational in the case of proof-of-work or financial in the case of proof-of-stake system), it could be profitable.

3 Bitcoin Double-Spend Attack In this section, we give an overview of the existed mathematical models of the Bitcoin double-spend attack. The first one was introduced by S. Nakomoto in the original Bitcoin white paper [2]. M. Rosenfeld continued research of the Nakomoto’s model and improved it in [4]. We also look into two models proposed by Pinzon et al. [5] that introduce a notion of time advantage to the original model that was analysed by Nakomoto and Rosenfeld. Worth noting that presented theoretical models for the double-spend attack could also be applied to another Bitcoin-like proof-of-work systems.

3.1 The Model of S. Nakomoto S. Nakomoto considers the scenario when an adversary tries to generate secretly an alternate chain that would be longer (in terms of computational difficulty) than the honest chain. The race between the adversarial and honest chains is characterized as Binomial Random Walk. Given that an adversary starts with some deficit K (the honest chain is longer than adversarial on K blocks), the probability that an adversary would ever catching up with the honest chain is analogous to the Gambler’s Ruin problem and could be calculated as follows:  1 if p ≤ q , qK =   K q p if p > q where

104

p q qK

S. T. Guntupalli and K. Saxena

Fraction of hashing power that is possessed by honest nodes (equivalent to the probability that an honest node finds next block); Fraction of adversarial hashing power (equivalent to the probability that an adversary finds next block); Probability that an adversary would ever catch up from the deficit of K blocks.

Assuming that an adversary starts to work on malicious fork right after the payment transaction is included into the blockchain (so do not wait for z blocks after which it is confirmed by the merchant), he may have mined some number of blocks so the deficit K is reduced. The adversarial progress will be a Poisson distribution with the expected value λ = z pq . The overall probability of the successful double-spend attack can be found by multiplying the Poisson density for each possible amount of progress by the probability of catching up from the remaining deficit: Σ∞ λk e−λ · DSN (q, z) = k=0 k! =1−

Σz k=0

    z−k q p if k ≤ z 1 if k > z    z−k λk e−λ 1− q p k!

(1)

More detailed explanation of this model can be found in the original paper [2].

3.2 The Model of M. Rosenfeld Another well-known mathematical model for the Bitcoin double-spend attack, except those presented by S. Nakomoto, is the model of M. Rosenfeld. In [4], he clarifies and expands the work of S. Nakomoto. The same basic model is taken: for a successful double-spend attack, an adversary needs to catch up from z = n − m blocks where n is the number of confirmations that a user waits before sending goods and m is the number of blocks that an adversary is expected to mine during the confirmation period. The author considers the catching up process as a Markov chain, where each step is defined as finding a block by the honest node or adversary:

zi+1 =

zi+1 with probability p, zi+1 with probability q.

The attack succeeds if z ever reaches − 1. Let us denote by α z the probability that an adversary would be able to catch up when he is z blocks behind. If z < 0, then αz = 1, otherwise αz = pαz+1 + qαz−1 ,

Mathematical Models for the Ouroboros Protocol Based on Attacks …

105

where q is a fraction of hashing power possessed by the adversary (equivalent to the probability that he will find the next block) and p = 1 − q. In this case, the probability to catch up with z blocks can be defined as follows:   max(z+1,0) αz = min q p, 1  1 if z < 0 or q > p, =   z+1 q p if z ≥ 0 and q ≤ p.

(2)

Similar to the model of S. Nakomoto, it is possible that an adversary mines some number of blocks, while the merchant waits for n confirmations in the honest chain. Recall that S. Nakomoto considers the progress of an adversary in this case as a Poisson distribution with expected value n qp . M. Rosenfeld took another assumption; he models the progress as a negative binomial distribution. The probability that an adversary will mine a given number of blocks m is P(m) =

m+n−1 n m p q m

(3)

It follows that the probability of a successful double-spend attack where a merchant waits for n confirmations and an adversary succeeds to find m + 1 blocks during the confirmation period is equal to DSR (q, n) =

Σ∞

P(m)αn−m−1 ⎧ m+n−1 ⎨ 1 − Σn (pn qm − pm qn ) if q ≥ p, m=0 = m ⎩ 1 if q ≥ p. m=0

(4)

An interested reader could find more rigorous description of this model in the original paper [4].

3.3 Other Models It is worth mentioning two theoretical models that was presented by Pinzon et al. [5]. The first one generalizes the model of M. Rosenfeld by adding an extra parameter that represents the time advantage of an adversary. The second one, which is called “time-based model”, is completely different from those described in the previous sections. In this model, the length of the valid and adversarial chains is assumed to be equal. Instead, authors are focused on the time parameter t that represents the time difference between the nth block in both the adversarial and honest chains.

106

S. T. Guntupalli and K. Saxena

As far as these models are consistent with the model of M. Rosenfeld and give almost the same results, we do not examine them deeply.

3.4 Models Comparison Since all considered models are aimed to estimate the probability of the same doublespend attack in Bitcoin, the results are similar except differences between the models of S. Nakomoto and others. The models of M. Rosenfeld and C. Pinzon et al. give exactly similar results (assuming that the time advantage in the models of C. Pinzon is equal to zero). Table 1 gives the values computed for different models. It represents the number of blocks that a user should wait to be 99.9% sure that his transaction would not be reverted by an adversary.

4 Blockchain Splitting Attacks This section provides a description of a splitting attack that was described in [6]. It could be considered as a variation of a double-spend attack since the main goal is to create a fork of the required length. The splitting attack is targeted on the proof-ofwork-based protocols with a short block generation time that is comparable to the block propagation time in the network. Table 1 The number of blocks that a user should wait to be 99.9% sure that his transaction would not be reverted by an adversary with the given hashing power Adversarial hashing power

The model of S. Nakomoto

The model of M. Rosenfeld

The model of C. Pinzon (generalized)

0.1

5

6

5

0.15

8

9

8

0.2

11

19

12

0.25

15

20

19

0.3

24

32

32

0.35

42

58

58

0.4

89

133

134

539

541

0.45 0.46

844

846

0.47

1502

1505

0.48

3382

3387

0.49

13,533

13,544

Mathematical Models for the Ouroboros Protocol Based on Attacks …

107

Fig. 1 The fork that keeps running while adversary is able to equalize the lengths of both branches with malicious blocks (marked with M)

We will start with a general overview of a splitting attack and then provide some experimental results showing the possibility of its application to different proof-ofwork consensus protocols.

4.1 Splitting Attack: General Overview In contrast to classic double-spend attack described in Sect. 2, where an adversary is supposed to create a fork secretly and publish it only when it is needed, the splitting attack is public for all nodes from the beginning. Moreover, not only an adversary contributes blocks into the forked branch but also honest nodes [7]. The idea of the attack is the following: when a fork of depth 1 accidentally happens, an adversary splits its hashing power on both branches to keep their lengths1 equal as long as possible. In this case, honest miners would also be split due to their arbitrary chose between branches of equal lengths. When honest miners publish a new block on the one of the branches, an adversary publishes block on the other branch to keep the fork running (see Fig. 1). If branches are of the same length, then adversary does nothing so again honest miners are split in half [8]. So, an adversary tries to keep both chains balanced by their length. If lengths differ, an adversary extends the chain, that is behind, by publishing some amount of blocks needed to equalize lengths of both chains. The attack continues till adversary has sufficient amount of blocks for each chain in his reserves. If he cannot equalize chains lengths at the end of some round, then the attack is finished. A notion of a round was initially taken from [9], it represents a complete round of information propagation to all nodes in a p2p network. In practice information, propagation is a random variable with an order of tens of seconds. In the described model, it is assumed that one full communication round takes 12.6 s (this is the average block propagation time in Bitcoin network [10]). A general essence of the splitting attack is the following: when the time of block generation is comparable to the time of block propagation, then the probability of generation 2 or more blocks in the same round (and at the same block height) becomes non-negligible. In this case, at the beginning of the next round, the network would 1

In the most proof-of-work protocols, the actual criterion for branch selection is the branch difficulty, i.e. the winning branch is the one that required the most computations to create. For simplicity and because it is usually the case in a real world, it is assumed that all blocks have the same difficulty so the longest chain would be the most difficult one.

108

S. T. Guntupalli and K. Saxena

be split into two branches. An adversary leverages such block collisions to keep the fork running. Thus, an important parameter that facilitates a splitting attack is the number of POW solutions (mined blocks) per complete round of information propagation. In [6], where this parameter is designated as f , it was shown that when f decreases and gets closer to 0, then the probability of a splitting attack decreases too (an adversary needs almost 50% of the hashing power to make a split). And vice versa, when f increases, the security bound becomes worse (the attack becomes feasible with less than 50% of the hashing power). The splitting attack is the most effective when f ≥ 1, i.e. at the rate of 1 block per round or more. It follows that a short block generation time (relative to the block propagation time) creates favourable conditions for a splitting attack to occur. Hence, it becomes interesting to investigate resistance of proof-of-work protocols with different values of the parameter f . In our experiments, we take two most widespread protocols: Bitcoin and GHOST. Further sections represent experimental results obtained during the computational modelling for both protocols.

4.2 Bitcoin Splitting Attack As it is known, the average block generation time in Bitcoin is equal to 10 min [2]. Given that the average block propagation time is 12.6 s [10], the parameter 12.6 = 0.021. In what follows, it is more suitable to use the parameter k instead f = 10.60 of f that shows an average amount of communication rounds between 2 consecutive blocks: k = f1 . It is interesting to estimate the possibility of a successful splitting attack for the original chose k = 47.6 made in Bitcoin and see how the security degrades in the case when k decreases. To accomplish this, we did an experimental analysis on the described attack. The results of the simulations are summarized in Fig. 2. It is shown what fork length an adversary can maintain with the probability of success at least 0.1%. It is easy to see that when the time between blocks decreases, an adversary gets a chance to create longer fork [11]. Our simulation shows that for the chose of k = 47.6 (like in Bitcoin), 6 confirmations are needed to be sure that the probability of a splitting attack is less than 0.1% (considering an adversary that possesses 35% of the hashing power). If we assume that the average block generation time is equal to the block propagation time (so that k = 1), then 9 confirmation is needed for the same level of security.

Mathematical Models for the Ouroboros Protocol Based on Attacks …

109

Fig. 2 The fork length that an adversary with a given hashing power can create with the probability of success at least 0.1%. Different lines represent different chose of the parameter k

4.3 GHOST Splitting Attack In this chapter, we first briefly describe the GHOST algorithm itself and then continue with the splitting attack application. GHOST Overview The Greedy Heaviest-Observed Sub-Tree (GHOST) protocol was initially proposed as an improvement of the Bitcoin protocol that allows to reduce the time between blocks while preserving the same level of security [3]. The main modification that was suggested is that blocks that are not included in the main chain can still contribute to a chain’s irreversibility. The basic observation behind the protocol is that the blocks that are built on top of some block B add additional weight to block B even if they are not in the main chain. So, in contrast to the Bitcoin protocol, where only the blocks that are in the main chain contributes to the difficulty of this chain, in GHOST, a whole sub-tree of blocks is considered (Fig. 3). More information in [3]. Since it is declared by the authors that the GHOST protocol has a comparable security even with short block generation time (it is stated that even when blocks are issued every second, the security level is the same as in the original Bitcoin protocol Fig. 3 The calculation of the chain’s difficulty D is shown for the Bitcoin and GHOST protocols. As it could be seen, in GHOST, even the blocks that are not included in the main chain add weight to it

110

S. T. Guntupalli and K. Saxena

Fig. 4 The fork length that an adversary with a given hashing power can create for the GHOST protocol with the probability of success at least 0.1%. Different lines represent different chose of the parameter k

[3]), it becomes interesting to investigate resistance of the GHOST protocol against splitting attack. Splitting Attack The splitting attack for the GHOST protocol is slightly different compared to Bitcoin [6]. There are two differences: 1. An adversary has to compensate the difference in the total number of honestly mined blocks on both branches at the end of each round, while in Bitcoin-like protocols, he has to compensate only the maximal number of honestly mined blocks to keep both chains balanced. 2. All blocks produced by an adversary are always valid. This facilitates an attack for adversary, because he can just mine the first nodes after the common prefix of the two branches. In contrast, in Bitcoin, an adversary has to extend only the head of diverging chains, so all blocks must be recent. The results of the simulation (Fig. 4) show that the attack is extremely effective when the parameter k is near to 1.

5 Ouroboros Double-Spend Attacks This section provides the analysis of the Ouroboros protocol. As stated in [1], it is the first provably secure proof-of-stake blockchain protocol with rigorous security guarantees, comparable to those achieved by the Bitcoin blockchain protocol. First, we briefly discuss the protocol itself and then present two models for different types of adversaries.

Mathematical Models for the Ouroboros Protocol Based on Attacks …

111

Fig. 5 A general scheme of the Ouroboros protocol. The time is divided into slots, and each slot has an associated stakeholder who should produce a block in this slot. It is not necessary that the block in the given slot will be produced (for instance, a corresponding stakeholder could be offline at the moment), but there is a strict rule that only one block can be produced in the slot

5.1 General Overview As previously stated, the Ouroboros is a proof-of-stake protocol, thus it does not require heavy computations for block production. While in the proof-of-work protocols like Bitcoin, the blocks are produced by the miners (which are not necessarily have a stake in the system); in Ouroboros, only the stakeholders can produce blocks. Given that the stakeholders are well incentivized to keep the overall stability of the system (as it would consequently keep the value of their coins), it creates an additional incentive for block producers to act honestly, thus making a system more secure in general. The main idea behind the protocol is that the time is divided into so-called epochs, and each epoch consists of predefined number of slots [12]. Each slot has an associated stakeholder that should produce a block during the time of that slot. The model requires a synchrony among stakeholders, and the blocks that are produced in the incorrect timeslots are considered invalid. At most one block could be produced in the given slot (Fig. 5). The owners of the slots are chosen randomly before the beginning of the epoch. A randomness for a selection procedure is generated collectively by a set of stakeholders using special cryptographic protocol based on the PVSS scheme (Publicly Verifiable Secret Sharing [13]).

5.2 The Attacks on the Common Prefix Following the terminology given in [1], the attack that consists in a fork creation is called an attack on a common prefix. There are two possible models for an adversary that is going to create a fork: the one that immediately discovers an adversarial behaviour and the one that leaves an adversary covert. We will briefly describe both of them. Despite the rule that a slot winner can produce only one block per slot in the given chain of blocks, nothing can prevent him from creating several blocks in the same slot but on the different chains, thus creating a fork (see Fig. 6). An adversary

112

S. T. Guntupalli and K. Saxena

Fig. 6 An adversary that possesses some slots (depicted in red) tries to split honest slot winners on two chains, thus facilitating an attack

can facilitate an attack by publishing blocks in both chains forcing the honest slot winners to be split between them. In what follows, we will call such an adversary as general adversary. While the described attack provides an adversary significant opportunities, it leaves a suspicions “audit trail”—multiple signed blocks at the same slot, which immediately discovers malicious behaviour. This motivates to consider a restricted class of covert adversaries, who produce not more than one block per slot (though not necessarily in the expected slot [1]). An interested reader could find more details in [1]. General Adversary A central point of the security arguments given in [1] is the notions of the characteristic and forkable strings. A characteristic string is a binary string {0, 1}n where each element indicates a slot that is assigned either to adversary (denoted with 1) or to honest user (denoted with 0). A forkable string is a characteristic string with such disposition of adversarial slots that allows fork creation. Understanding density of the forkable strings among all characteristic strings will help to determine the probability of an attack. In [1], an upper bound on the probability of a string being forkable is given. In our research, we are interested in the exact probabilities of forks. To obtain such probabilities, we utilize recursive algorithm that detects a forkable string (see lemma 4.18 in [1] for more details): m(w1) =

(λ(w) + 1, μ(w) + 1) and if λ(w) > μ(w) = 0 (λ(w) − 1, 0)

if λ(w) = 0 (0, μ(w) − 1) m(w0) = (λ(w) + 1, μ(w) − 1) otherwise,

(5)

where w = characteristic string; m(w) = (λ,μ)—a state of the string w represented by two variables λ and μ; m(∈) = (0,0)—initial state of the algorithm. Given a characteristic string w and initial state m(∈), the state is updated sequentially with each element of the string. Finally, when all elements from w are processed, the variable μ is checked: if μ ≥ 0, then the string w is forkable, otherwise it is not.

Mathematical Models for the Ouroboros Protocol Based on Attacks …

113

Fig. 7 The matrix shows the probabilities of a random characteristic string w of length n turns out in state m(w) = (i, j). It is indexed by all possible λ and μ values that could be reach by the string of length n

Having such an algorithm, it is possible to calculate the overall probability of a fork for a string of particular length. It could be done by constructing a matrix of probabilities of all possible states (Fig. 7). The matrix could be calculated iteratively using the following rules (based on the algorithm 5): 0 p00,0 =1 and pi,j = 0, for i /= 0 or j /= 0, n−1 n−1 n =lam1 · q · pi−1, pi,j j−1 + mu1(1 − q)pi+1, j+1 n−1 n−1 + mu2(1 − q)pi+1, 0 + lam2(1 − q)p0, j+1

lam1 =

1 if i> 0, 0 otherwise;

lam2 =

mu1 =

1 if j /= 0, 0 otherwise;

mu2 =

1 if i= 0, 0 otherwise;

1 if j = 0, 0 otherwise;

where q is a fraction of the adversarial stake. Finally, the probability that an adversary with q fraction of stake would be able to create a fork of n slots could be defined as follows: n n DS(q, n) = Xi=0 Xj=0 p(i, jn)

(6)

Note that it is also possible to estimate the probability of a fork by simulating an attack directly. It could be done by generating random binary strings (taking into account the probability of adversarial slot) and checking them with algorithm 5. The results are conformed to those obtained analytically with Eq. (6). Covert Adversary As previously stated, a covert adversary tries to keep an attack in secret until he creates a branch of sufficient length. In this case, adversarial behaviour would be to refrain from publishing blocks in the honest chain (Fig. 8). In the classical double-spend attack, it is assumed that an adversary has to create a fork of at least n blocks where n is the number of confirmations that a user waits

114

S. T. Guntupalli and K. Saxena

Fig. 8 A covert adversary tries to accumulate sufficient amount of slots (depicted in red) to overcome an honest chain at some moment in future

before sending goods or providing service. In this formulation, the attack with a covert adversary is basically close to the Bitcoin double-spend attack (see Sect. 2). Therefore, the probability of a fork after n blocks could be easily calculated using, for instance, the model of S. Nakomoto (see Sect. 3.1, Eq. (1)). Because of the deterministic nature of the block creation process in the Ouroboros protocol, it is more convenient to consider the security bounds as the number of slots that a user should wait to be sure (to some degree) that a fork cannot be created (opposite to the number of blocks in the classical model) [14]. In our model, for a successful attack, an adversary needs to create a fork of l slots (or longer). To do this, he needs to possess at least half of the slots at some point after the slot l. The probability of this event consists of two components: the ability of the adversary to accumulate some slots before the slot l and the ability to catch up with the deficit (if any) after the slot l. We assume that neither honest users nor adversary do not skip their slots, so there no gaps. The number of slots that an adversary would get during the period of l slots is a random variable that follows a Binomial distribution. The probability to get exactly m slots is the following: P(m) =

l qm pl−m m

(7)

where q A relative amount of stake possessed by an adversary (that corresponds to the probability of having a slot under adversarial control); p 1 − q − a relative amount of stake possessed by honest users. The probability of catching up from z = n − m slots (where n = l − m is the number of honest slots) could be defined as a particular case of the Gambler Ruin problem [15]:   z C(z) =

q p

if q < p and z >0

1

otherwise.

Mathematical Models for the Ouroboros Protocol Based on Attacks …

115

It follows that the probability of a successful attack where an adversary creates a fork of l slots is equal to: DS(q, l) =

Σl

P(m)C(l − 2m) l−2m Σ[l / 2] l q m l−m = q (1 − q) m=0 m 1−q Σl l + qm (1 − q)l−m m=[l / 2]+1 m m=0

(8)

5.3 Probabilities of a Fork In order to get insights on the density of forks produced by different types of adversaries and to compare them with other consensus protocols, we made calculation using equations from Sect. 5.2. The results are given in Table 2. Because a synchrony between time slots is assumed in the Ouroboros protocol, it does not make sense to consider the parameter k (time between blocks) as we did for other consensus protocols. Table 2 The number of slots that a user should wait to be 99.9% sure that his transaction would not be reverted by an adversary with the given stake

Adversarial stake

General adversary

Covert adversary

0.1

15

11

0.15

23

17

0.2

35

25

0.25

55

39

0.3

94

63

0.35

181

115

0.4

443

265

0.45

1990

1077

0.46

3214

1687

0.47

5953

1992

0.48

14,157

2090

0.49

61,922

116

S. T. Guntupalli and K. Saxena

6 Protocols Comparison This section provides comparison between different consensus protocols and adversarial models described in the previous sections. As a unified measure, we took the number of block confirmations (or time slots in case of Ouroboros) needed to be sure that a given block cannot be removed from the blockchain with the probability at least 99.9% (in other words, the longest fork that an adversary with a certain hashing power/stake can create with the probability of at least 0.1%). The chosen measure appears to be relevant for a real-world application because it shows how long a user should wait before accepting a payment transaction, thus decreasing the possibility of the considered attacks to sufficient level. The summarized results are presented in Table 3. It includes two models for Ouroboros (with general and covert adversaries), classic Bitcoin double-spend attack, Bitcoin splitting attack (including hypothetical Fast Bitcoin with reduced block generation time to one per communication round, e.g. 12.6 s) and GHOST splitting attack (both with 10 min and 12.6 s blocks). To get further insights on the usability of the considered protocols, it is helpful to compare them by the average confirmation time. As long as different protocols have different time between blocks, this would give us more accurate picture of the security guarantees provided by protocols against different types of attacks. The time between two consecutive slots in the Ouroboros system is expected to be 20 s. The average time to mine Bitcoin block is 10 min [2]. During the analysis of Table 3 The number of slots that a user should wait to be 99.9% sure that his transaction would not be reverted by an adversary with the given hashing power (or stake in case of Ouroboros protocol). Note that for Ouroboros, the values in the table represent the number of slots, while for other protocols, they represent the number of blocks Adversarial stake (hashing power)

Ouroboros general adversary

Ouroboros covert adversary

Bitcoin (Rosenfeld)

Bitcoin splitting

Fast bitcoin splitting

GHOST splitting

Fast GHOST splitting

0.1

15

11

6

3

6

3

6

0.15

23

17

9

4

7

4

8

0.2

35

25

13

4

8

6

11

0.25

55

39

20

5

9

9

19

0.3

94

63

32

6

10

9

30

0.35

181

115

58

8

12

11

73

0.4

443

265

133

9

14

12

185

0.45

1990

1077

539

14

18

13

509

0.46

3214

1687

844

0.47

5953

1992

1502

0.48

14,157

2090

3382

0.49

61,922

13,533

Mathematical Models for the Ouroboros Protocol Based on Attacks …

117

Table 4 An average confirmation time (in minutes) that guarantees, with the probability more than 99.9%, that a block would not be reverted from the blockchain Adversarial stake (hashing power)

Ouroboros general adversary

Ouroboros covert adversary

Bitcoin (Rosenfeld)

Bitcoin splitting

Fast bitcoin splitting

GHOST splitting

Fast GHOST splitting

Block generation time

20 s

20 s

10 min

10 min

12.6 s

10 min

12.6 s

0.1

5

3.6

60

30

1.2

30

1.2

0.15

7.6

5.6

90

40

1.4

40

1.6

0.2

11.6

8.3

130

40

1.6

60

2.3

0.25

18.3

13

200

50

1.8

90

4

0.3

31.3

21

320

60

2.1

90

6.3

0.35

60.3

38.3

580

80

2.5

110

15.3

0.4

147.3

88.3

1330

90

2.9

120

38.8

0.45

663.3

359

5390

140

3.7

130

106.9

the splitting attack, we also estimate the security bounds for the Bitcoin with reduced block generation time (12.6 s per block). The GHOST values of block generation time are the same as for Bitcoin. Table 4 and Fig. 9 show how long (in minutes) a confirmation period should be to reduce the probability of an attack to less than 0.1%. We can note that the Ouroboros protocol allows to confirm the block in 5 min in the worst case (considering an adversary with 10% of the total resources), while Bitcoin needs almost 60 min to provide the same level of security. The splitting attack is more effective for the systems with short block generation time, but in general case, it is not better than the classical double-spend attack.

7 Conclusion In this paper, we presented the analysis of the different consensus protocols and adversarial models. A consensus protocol is a decentralized system for reaching agreement with economic rewards in an online environment, without the help of any central authority or referee. It relies on market forces or voting processes, rather than on the good will of trusted individuals. Further, adversarial models are used in cybersecurity and economics applications to model the behaviour of an adversary. The main goal in this paper was to compare the well-known proof-of-work protocol, that underlies Bitcoin, with the new proof-of-stake algorithm that was introduced in Ouroboros. We also had a look at the GHOST algorithm that is initially

118

S. T. Guntupalli and K. Saxena

Fig. 9 Comparison of the expected confirmation period (in minutes) for different protocols and adversarial models

intended to improve Bitcoin consensus. As a measure of comparison, we considered a transaction confirmation time that allows us to be sure that the probability of a double-spend attack is less than 0.1%. Together with the known results for Bitcoin, we presented two models for two types of attacks on Ouroboros protocol (for the general and covert adversaries). The models allow the calculation of the exact number of slots needed to achieve the required level of security. It was shown that Ouroboros protocol allows to achieve the required security level with significantly shorter confirmation period compared to Bitcoin. We also examined a splitting attack that is targeted on the protocols with short block generation time. Our simulations showed the possibility of the attack for the Bitcoin and GHOST protocols with 10 min and 12.6 s blocks. The obtained results allow us to determine the security bounds for the Ouroboros system. It becomes extremely important for a real-world application because it will help users to figure how long they should wait before accepting the transaction. Future Work Future work involves cutting-edge protocol developments in this field while considering scalability, latency, and reliability issues. One characteristic of GHOST protocols is its low envelope cost thus advantage over standard TCP. Blockchain as simple distributed datastore enables nodes to store records independently without central coordination. Meeting the low energy utilization requirement will allow lower economic and ecological demands allowing for more massive deployment of more transactions from large companies and institutions. To date, most of parameters used in traditional protocol were considered constant in Bitcoin with exception to Bmax. GHOST adopted other parameters as work-in-progress like MEXT and MAX BLOCK SIZE*100 slow growth factor. Most Bitcoin’s current researchers expected blockchain technologies to facilitate improved information interchange and proxy services for marketplaces and both reference service providers highly anticipate that these promising financial instruments are attractive candidates for institutional investors.

Mathematical Models for the Ouroboros Protocol Based on Attacks …

119

References 1. Kiayias A, Russell A, David B, Oliynykov R (2017) Ouroboros: a provably secure proof-ofstake blockchain protocol. In: Cryptology ePrint Archive, Report 2016/889 2. Nakomoto S (2008) A peer-to-peer electronic cash system. [email protected], www.bitcoi n.org 3. Sompolinsky Y, Zohar A (2013) Accelerating bitcoin’s transaction processing. Fast money grows on trees, not chains. In: IACR Cryptology ePrint Archive, 2013/881 4. Rosenfeld M (2014) Analysis of hashrate-based double-spending. In: arXiv preprint arXiv:1402.2009 5. Pinzon C, Rocha C (2016) Double-spend attack models with time advantage for bitcoin. Electron Notes Theoret Comput Sci 329:79–103 6. Kiayias A, Panagiotakos G (2015) Speed-security tradeoffs in blockchain protocols. In: Cryptology ePrint Archive, Report 2015/1019 7. Barbian G, Mellentin F (2021) The cardano proof-of-stake protocol “ouroboros” 8. Neu J, Sridhar S, Yang L, Tse D, Alizadeh M (2021) Securing proof-of-stake nakamoto consensus under bandwidth constraint (longest chain consensus under bandwidth constraint). In: arXiv preprint arXiv:2111.12332 9. Garay J, Kiayias A, Leonardos N (2015) The bitcoin backbone protocol: analysis and applications. In: Advances in Cryptology—EUROCRYPT. 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, Proceedings, Part II, pp 281–310 10. Decker C, Wattenhofer R (2013) Information propagation in the bitcoin network. In: IEEE Internation coriference on peer-to-peer computing (P2P), Trento, Italy, pp 1–10 11. Aragon N, Blazy O, Deneuville J-C, Gaborit P, Z´emor G (2022) Ouroboros: an efficient and provably secure KEM family. IEEE Trans Inf Theory 68(9):6233–6244 12. Byg RL (2022) Planetary politics: a manifesto: by Lorenzo Marsili, Cambridge, UK: Polity press, 2021, 136 pp., $12.95 (paperback), ISBN: 978–1509544776 13. Schoenmakers B (1999) A simple publicly verifiable secret sharing scheme and its application to electronic voting. In: Advances in cryptology—CRYPTO 99. 19th Annual International Cryptology Conference, Santa Barbara, California, USA. Proceedings, vol 1666 of Lecture Notes in Computer Science, Springer, pp 148–164 14. Bala K, Kaur PD (2022) A novel game theory based reliable proof-of-stake consensus mechanism for blockchain. Trans Emerg Telecommun Technol 33(9):e4525 15. Feller W (1970) An introduction to probability theory and its applications. Wiley, New York

Hybrid Power Generation Forecasting Using an Intellectual Evolutionary Energy-Preserve Rate Clustering Technique Julian L. Webber, Vellingiri Jayagopal, Abolfazl Mehbodniya, Sudhakar Sengan, Priya Velayutham, Rajasekar Rangasamy, and D. Stalin David Abstract The significant growth of Wind Power (WP) has significant challenges due to the inconsistency and uncertainty of Wind Power Generation (WPG) to ensure the power system. Solar energy and WP calculation are great ways to overcome the problem. Mainly voltage and frequency fluctuations and harmonics have a high impact on the power generation and distribution system; in any high case, the power quality of grid systems is reduced. This can be primarily solved with proper design, advanced rapid response control facilities and optimization of power systems. A new J. L. Webber · A. Mehbodniya Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha, Kuwait e-mail: [email protected] A. Mehbodniya e-mail: [email protected] V. Jayagopal Department of Software & System Engineering, School of Information Technology & Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu 632014, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] P. Velayutham Department of Computer Science and Engineering, Paavai Engineering College, Namakkal, Tamil Nadu 637018, India e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru 561203, Karnataka, India e-mail: [email protected] D. Stalin David Department of Information Technology, Vel Tech Multi Tech Dr.Rangarajan Dr.Sakunthala Engineering College, Chennai 600072, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_10

121

122

J. L. Webber et al.

type of Multi-term Wind Energy (WE) with an IEEE 30 bus forecasting system based on digital weather forecasting and error modification methods is presented. From the Evolutionary Energy- Preserve Rate Clustering (EEPRC) Technique-based technique for the evaluates the probability ratio in Solar and wind error energy prediction, the error assessment model is proposed, which has the advantage of estimating from real wind farms and statistical Weather Forecast (WF) data and has proven to be a nominal method for improving the predictive accuracy of Multi-term WPG. The primary step of the Energy-Preserve Rate Clustering (EPRC) method is to calculate the uncertainty of real-time data with the predicted value. The computed values are helpful for the increases in Solar WPG production. The proposed wind forecasting system is utilized in the Coastal area-based energy generation. Also, it has been demonstrated that wind speed and WPG with its probability ratio have been evaluated with the EEPRC-based forecasting method, especially for its powerful predictive capability for the complex properties of the controller function. Keywords Solar energy · Wind power · Grid forecasting · Energy-preserve rate · Clustering

1 Introduction In renewable energy resources, WP has multiplied around the world. The WPG shows a significant fluctuation depending on wind speed. Therefore, the evaluation of the WPG is always related to certain uncertainties. Accurate wind forecasting in 2019–2022 (519 Giga watts) WP has generated globally. Also, in the prediction, 1000 Giga watts of power from WPG will increase globally. It is designed to support enormous power systems or microgrids within the scope of the intra-platform resource scheduling model, with a horizon of 5–10 min, which can effectively protect delivery system operators and progress power network control and management. The WP Forecasting technique is one of the most challenging Deals with the problem of wind flow in electrical systems. There is much examination on WP Prediction error. From the recent analysis of wind forecasting, the high penetration error rate will occur, even though similar control optimization like Support Vector Machine (SVM), Partial Swarm Optimization (PSO), and Artificial Neural Network (ANN) are executed to reduce the error rate. However, these previous techniques are subject to uncertainty in the WPG, which is enormous WP errors during the forecast; also, it is difficult to couple the generated power to the grid system. The uncertainty of the WP characteristics of the system problems during operation sends a large volume of WE to the grid system. The force of the wind depends on various weather conditions, especially the speed. WP is expected to have wind speeds based on Numerical Weather Forecast and Forecast (NWFP) models and their associated power curves, wind force, and speed. The system’s primary process to ensure stability is the WP System, which is the wind farm load forecast release power needed. Power load prediction is an

Hybrid Power Generation Forecasting Using an Intellectual …

123

essential part of the energy network project. Based on the reasonable provisions for the Grid Action Plan, such as unit planning and control progression of the system, whether or not the wind flow-based load forecast is accurate, the energy network will directly impact the project and other management systems [1]. WP data is general time series data. The time series technique is considered as (hourly, weekly, and monthly) data are periodically monitored to enhance the power generation capacity of the wind plant. The information is continuously updated with time, and the value of a set of data becomes varied with time. Once the WP prediction model is recognized, the current value cannot be added to the model because the prediction model cannot evolve the process with the recent time series data with high accuracy; during the progression, the forecast model will increase the mathematical complexity. Numerous calculations are used to increase this monitoring data’s computational complexity and accuracy, including the realtime forecasting model in establishing the research. The Neuron classifier network is inspired by the EEPRC Technique, a method for examining the WPG in different periodic conditions, and the data is stored in the clustering space. The clustering process is classified into four layers: Data Prediction Layer (DPL), Data Storage Layer (DSL), Data Management Layer (DML), and Computational Layer (CL). These hidden layers are trained at every operation of the forecasting WPG with different time series. Also, the learning capability of neural networks is contingent on network model principles. In this process, sustaining primary information means adding or removing hidden layers; the operation of neurons is to transform network topology and learn new knowledge to create prediction accuracy so the predictive model can be improved [2]. The organization of the article is Sect. 1 is the introduction to the wind forecasting method. Section 2 is related works. Section 3 is the proposed AI model. Section 4 is the result and discussion about the work. Section 5 is the conclusion and future work of the paper.

2 Related Works The WPF is essential but challenging because of the randomness of the air. But recently, things like statistical and physical modelling, more accurate weather forecasts, a more complex operating kernel or more instruction inputs, and predicting how long something will last have been done in planning the process. Developing a network method for predicting WP Quantitation is proposed and rigorously evaluated in this analysis. More penetration and associated methods can be used for WP in one area by creating many first-to-many NWPs. Network for Correction and Power Mapping Wind and force interaction patterns are predictable and varied output range factors [3]. The grid’s reliability is significantly increased with the help of a precision WP Forecast. The procedure described in this system is ANN in wind speed and

124

J. L. Webber et al.

power forecasts and historical data for WP plants. A new statistical short-term predictive model of WP, i.e. Analytical Relations Wind Plant and Forecasting Technology (ARWPT), is proposed. The suggested models collect weather data, the most critical WF parameters. It combines electric forecasts from three different NWP sources as well as the final predictions of generation hybrids. Compared with the suggested method, which also uses the Ordinary Least Squares (OLS) method, the information related to the wind time-domain power generation is essential for using WE [4]. The short-term multitime parameter method prediction of the Joint Probability Density Function (JPTF) for the windmill. This method works by making the prediction point of the WP the SVM is used, and the probability ratio of the SVM prediction error is assessed by the SVM, which accepts that a Gaussian distribution follows prediction errors. From the analysis of the previous technique, the different problems like probability function, prediction error, and accuracy are identified by using the EEPRC method, and the difficulties will be optimized [5–7].

3 Materials and Methods The specific analysis of the multi-term WP forecasting EEPRC models estimates WP curves. The proposed method reduces the sum of squared errors between observed and forecast values. The regression needs least-squares parameter estimation for specific functions previously defined, whilst an EEPRC technique is more determined by training data and learning algorithms. IEEE 30 bus loads in the system are executed for the analysis system to spontaneously regulate model parameters based on dynamic conditions according to time series [8]. Figure 1 shows the proposed block diagram.

Fig. 1 Framework of the proposed system model

Hybrid Power Generation Forecasting Using an Intellectual …

125

3.1 Basis of Wind Power Turbine size, wind speed and direction, position, and parallel turbine generators’ dynamic performance and load distribution [9] impact wind turbine power generation. The output characteristics of the wind turbine used in this work (PWT is the wind speed) are as follows: Power is supplied by the  power generated by the Wind = (Air Density ∗ Sweep Area ∗ Cube of Velocity) 2, Eq. (1) [10, 11]. pW =

1 2(A W )(V )3

(1)

where p = Power in watts (W); ρ = Air density in (kg/m3 ); A W = Swept area by air (m2 ); V = Wind velocity (m/s). The power density or specific power on a particular site is expressed based on air density and wind speed in Eq. (2), 1 P = ρV 3 A 2

(2)

The total power produced by the system can be assumed as the power compensated by the converter for improved power supplies. The power generation of the wind system can be expressed Eq. (3) as: PT = N W ∗ PW

(3)

where PT = volume of power generated; PW = power generated by wind turbines; N W = No of wind turbine.

3.1.1

Evolutionary Energy- Preserve Rate (EEPR)-Based MPPT System

The operating point of WECS is not only on the strength of the maximum extractable power but also on improving the WPG using a DC-DC converter, as shown in Fig. 2. The MPPT concept should be increased; WE generation relative to the speed of the air is intercepted with the wind can be enlarged [12]. The MPPT is designed to obtain maximum power from wind generation and optimize the generation voltage using a DC-DC converter. The steps from the wind-based MPPT algorithm are as follows: Step 1: Collect the set of information from the wind system, including settings such as network, WP (voltage, current), etc. Step 2: Field measurements predict wind-based power generation from historical data. Step 3: The speed-dependent voltage for the wind system will be analysed and changed to the PWM of the converter, switching and maintaining the DC voltage [13].

126

J. L. Webber et al.

Fig. 2 Wind-based MPPT system

Step 4: The change in the slope of the DC link voltage continuously monitors the current as it changes with the speed of the air. Step 5: If the DC link voltage slope is higher than the threshold, run Z 0 · Δi L ref 1 = the reference inductor Z 1 , slope ‘x’ speed transition offset interval.where Δi L ref 1 current determining step and the flywheel current gradient or decrease. Z 1 , adjust the setting according to its value. Step 6: Eq. (4) compares and identifies the inductor current cycle Δi L 1 = Δi L 1 (Z ) − i L 1 (Z − 1)

(4)

Step 7: When p × Δi L 1 is higher than ‘0’, the current increases the step size Δi L 1 = Z 1 × Δp, and when it is less than ‘0’, it is reduced by the step size Δi L 1 = −Z 1 × Δp. Step 8: Changes in inductor current is given in Eq. (5), ref ref Δi L ref 1 (Z + 1) = Δi L 1 (Z ) + Δi L 1

(5)

Step 9: The system and average wind speed work in the normal prediction mode, and the model is predicted by sudden changes in wind speed.

3.1.2

Bi-Directional Converter-Based Battery Management

Figure 3 demonstrates the control circuit for the recommended Bi-directional Converter (BDC)-based battery energy system. This converter circuit consists of an input capacitor, MOSFET switches, an output inductor and a capacitor with a battery. The diodes are anti-parallel body diodes of power MOSFETs and. If the energy from the power supply is large enough, the circuit will operate to recover the excess energy stored in the battery in the charging mode. Without a charge, the circuit will continue to function in discharge mode to provide the required energy to the load. So, whether or not the power supply is working right, it seems likely that the BDC can give the grid consistent and reliable power [13, 14].

Hybrid Power Generation Forecasting Using an Intellectual …

127

Fig. 3 Bi-directional battery-based battery management

3.2 Time Series Clustering WPG When generating large amounts of data through wind turbines running normally. See the training model. Time series and self-improvement methods are often used. Develop predictive models. However, inconsistent WP behaviour can update the system generation in different time spaces in training models [15]. It is problematic to obtain results using this; that’s the training model. So choosing something like that which is clustering-based data mining as the same as training models, can be considered a great way to improve forecast accuracy.

3.2.1

Samples

WPGs with different respect periods have a powerful impact on WFS. Select the model with a similar wind time series-based clustering system. Historical data makes it possible to forecast an hour, day, and month. The proposed model of the wind foresting system will consider the different time series-based clustering used to forecast the model with high accuracy, Eq. (6). W P = WPSmin + WPSAverage + WPSmax + . . . n

3.2.2

(6)

Clustering Algorithm

Clustering is related to unsupervised learning; there are no labels in the training data. This work utilizes the EEPRC technique to minimize related patterns, and categories are a type. A typical clustering algorithm includes ‘E’ clustering and Expectations Maximizing (EM). Evolutionary (E)-clustering is used; the clustering method is justified because evolutionary clustering can measure large data sets effectively. The algorithm is based on a clustering method in partitioning, which is usually determined by unity to calculate the distance. The core of E-clustering is at the centre points, and partitions should be randomly selected in terms of the distance between E and the focal point. With the proper Euclidean distance, the algorithm’s data is given to

128

J. L. Webber et al.

the closest focal point; this is what counts. WP E =

1 NE E x i=1 i NE

(7)

3.3 The EEPRC-Based Classification Model Only the EPRC is used for the predictive method. The time series clustering of predictive is performed in the previous section. For each user, these are four-step clustering, and association analyses, like DPL, DSL, DML, and CL, are made with the layer graph shown in Fig. 4. The results for each load are then added to the graph threading. DPL, DSL, DML, and CL. The large volume of data is produced by the normal operation of the wind turbine. Training models can be viewed as time series, and EEPRC methods are used to arrange predictive models. However, the random behaviour of wind may cause conflicts in the training samples. Then, it is challenging to obtain the desired result when using the training sample. Therefore, on special days, they are classified into the same category as training samples that can be measured as an excellent way to improve prediction accuracy. In Fig. 5, [P1 , P2 , P3 …PN ] is the input variable, [Q1 , Q2 , Q3 …QN ] is the weight between the source response and hidden layers, and [R1 , R2 , R3 …RN ] is the weight between the hidden and output layers, with x(t) the estimated gain. Also, the common sigmoid function can be written as Eq. (8) and used as a transfer function to solve a nonlinear problem.

Fig. 4 EEPRC-based classification model

Hybrid Power Generation Forecasting Using an Intellectual …

129

Fig. 5 Data prediction layer

Z (t) = f (x) =

1 1 + e−x

(8)

Similarly, the Eq. (9) can be formulated by the ‘U’ layer. u = f1



u k zk

(9)

Therefore, the advance of forwarding propagation ends. Then the error signal ‘e’, are created by the function of Eq. (10), e=

1 (Z i (t) − u i )2 2

(10)

The total system load is based on the predicted results of the total individual load.

3.4 EEPRC Algorithm Step 1 Step 2 Step 3 Step 4 Step 5 Step 6

Set the Network Structure. Fixed the threshold of forecast precision. The weight indicator classifies each hidden node as a hidden node. Add hidden time weight changes to the influencing factors. Distance re-Evaluating System. Performance analysis.

130

J. L. Webber et al.

4 Result and Discussion To simulate the conceptual model, MATLAB software is used. To verify the performance of the method, numerical values. The simulation has been done on four groups’ real “energy data”. In this work, the proposed forecasting technique is simulated and related to other techniques. This work proposes to solve the probabilistic error ratio based on the forecasting data for the WPG. The following details are analysed in the result section. The probability distribution of WPG real-time data is analysed with the EEPRC method based on the WP and grid power systems. For the irregular distribution model, WPG with different time series analyses with the probability ratio is analysed with the grid power system. The WPG is analysed with the battery storage unit, and its probability ratio is calculated concerning the grid power value. To calculate the performance, two invalid computational methods: (a) Root Mean Square Error (RMSE) and (b) Mean Absolute Error (MAE), are proposed. In the above equation, P' I is the actual power, pi is the predictive power, and N is the number of test data. The wind, as mentioned above, power model and its power generation data are evaluated with the proposed EEPRC approach based on the forecasting systems Eqs. (11) and (12). / RMSE = MAE =

)2 1 N ( pi − Pi' i=1 N N ) 1 ( pi − Pi' N i=1

(11)

(12)

Figure 6 represents the IEEE 30 bus single line diagram and shows a transmission line in a standard 80-node system. IEEE 30 has generated six buses: 1, 2, 5, 8, 11, and 13 branch converter transformer on-load tap. In buses 17, 20, and 24, reactive power sources are being considered. The minimum and series line data, bus data, maximum limit of generated data, and control variables are accepted. The transformer transparent piping system requires the upper and lower bus via various transformers. In this analysis, we ran 72 tests to determine how to deal with OPF’s different objective functions. Figure 7 shows that the hourly prediction data of the proposed WPG is evaluated with the predictive error ratio. In addition, the hourly power generation of the WP is computed with grid power, determining whether it is necessary or not to interconnect both power sources. If the error prediction ratio is below 0.20%, the WP is connected to the grid. From the survey, the average power generation WPF is 1545 kW with a probability ratio of 0.638%. The proposed EEPRC method-based forecasting system will precisely analyse the hourly based WPG. Figures 8 and 9 show the performance analysis of the WPG under different time durations; by comparing with the grid power generation, it is evaluated for forecasting system the compensation accessibility is implemented to stabilize the system. Figures 10 and 11 show the daily prediction data of the proposed WPG system,

Hybrid Power Generation Forecasting Using an Intellectual …

131

Fig. 6 Hourly prediction data

Fig. 7 Daily predictive rate

which is compared with the real grid power data and evaluated with the predictive error ratio. The system’s unreliability in this regard is regarded as its inability to meet daily peak loads. The grid power generation is integrated whenever WP equals the grid power generation. Based on a weekday’s demand-based WPG system average ratio of 1516.14 kW. Furthermore, the obtained results are used to evaluate 0.688% of the predictions. The frequent power demand calculation is a must because the WPG’s performance is described by the weekly data. Figure 12 shows the WPG based on the weekly data with a correct prediction ratio. The proposed WP model, which is compared with the real grid power data, shows that the monthly prediction data shown in Fig. 13 shows that the predictive ratio is meagre. The examination result shows that the average monthly power is 1605.16 kW with a probability ratio of 0.67. Different weather conditions will produce different amounts of power, so the monthly WPG data are needed for validation. Figure 14

132

Fig. 8 Hourly WPG

Fig. 9 Weekly power demand prediction

Fig. 10 Monthly predictive rate

J. L. Webber et al.

Hybrid Power Generation Forecasting Using an Intellectual …

133

Fig. 11 Average predictive rate

Fig. 12 Monthly power generation

shows the monthly power generation data and the prediction ratio, which is vital for the stability forecasting system. Figure 15 shows the average power analysis of the system based on the grid power, and without grid power, the power rate efficiency is also measured in this system. Finally, based on the result, the error ratio is calculated. Figure 16 shows the battery-based energy management; during the initial stage of operation, the storage system is under charging conditions, so it cannot provide any energy to the grid system. When fully charged, it will deliver energy when WPG is low. From the analysis of the above, battery-based energy compensation during the mean ratio is evaluated; during the peak power generation, the battery

134

J. L. Webber et al.

Fig. 13 Battery management chart

will compensate for the maximum operation time of 84.59 h, which is evaluated by the proposed EEPRC system. Fig. 14 Voltage magnitude of the bus system

Fig. 15 Active power losses

Hybrid Power Generation Forecasting Using an Intellectual …

135

Fig. 16 Performance of WP with cost ratio

Based on the result of the analysis, the battery storage element will provide energy to the system when the grid power system is unavailable. Depending on the battery capacity, it will compensate for the 84.89 h of energy management in this forecasting system. The energy demand will be evaluated based on actual demand and predictive demand. The practical calculation of the total energy demand for WPG is calculated. Energy demand is a significant factor in the bus system, and the energy demand in the bus system during the differential period—the optimal power flow with IEEE 30 bus system load level and high load condition. Red and violet lines indicate voltage levels. Predictions and nominal values. The measured power losses were analysed using 100 iterations, and the power flow was controlled using the proposed EEPRC technique. In this system, the performance and cost ratio of the proposed grid power model is evaluated based on the parameters of the wind speed (m/s), wind speed (kph), and power density in (watts/m2 ); based on this data, the cost of the WP is predicted. The forecasting of the utility cost ratio for the respective watts; because of the different kinds of power produced in various periods, the cost ratio needs to be evaluated. The performance of the data is classified by hourly, weekly, and monthly analysis, and predicting 7 min. of the low root’s mean and absolute error can be accomplished by EEPRC.

5 Conclusion and Future Work WP forecasting uses the EEPRC approach in this work. This system mainly used a data cluster based on power conditions with different time series. The correlation data is used to compute the distance between different time clusters. Implementing the proposed EPRC method reduces complexity and also accurately analyses the probability ratio of the WP system, which is used to improve the stability and accuracy of grid-connected WP. To prove its effectiveness, the method is recommended to be tested based on accurate WPG data. From the result analysis hourly, the average power generation WPF is 1545 kW with a probability ratio of 0.638%. WPG system average ratio of 1516.14 kW on weekdays, also 0.688% of prediction. In the monthly demand-based WP system, the examination result shows that the average monthly power is 1605.16 kW with a probability ratio of 0.67. The RMSE is 3.26, and the

136

J. L. Webber et al.

MAE is 3.26; these results show that the recommendations have significant benefits. The simulation test results show that the WP is highly efficient in synchrony with the grid power enhanced prognostic model. A WP load forecasting EEPRC algorithm with excellent utility. In the future, the forecasted WP data will be utilized for the WP compensation system. Also, in the future, hybrid solar and wind-based forecasting will be implemented and analysed for its combined power generation capability and load demand compensation ratio.

References 1. Aburiyana G, Aly H, Little T (2021) Direct net load forecasting using adaptive neuro-fuzzy inference system. In: IEEE Electrical power and energy conference (EPEC), pp 131–136 2. Dubey NK, Chawla MPS, Malviya L (2021) An artificial neural network-based forecasting strategy for estimating weather parameters: application for sizing stand-alone renewable power system. In: 10th IEEE International conference on communication systems and network technologies (CSNT), pp 279–284 3. Loka R, Parimi AM, Srinivas S (2022) Model predictive control design for fast frequency regulation in hybrid power system. In: 2nd International conference on power electronics & IoT applications in renewable energy and its control (PARC), pp 1–5 4. Mohandes B, Wahbah M, Moursi MSE, El-Fouly THM (2021) Renewable energy management system: optimum design and hourly dispatch. IEEE Trans Sustain Energ 12(3):1615–1628 5. Möws S, Wiegel B, Becker C () Day-ahead optimization of frequency containment reserve for renewable energies and storage. In: IEEE PES Innovative smart grid technologies Europe (ISGT Europe), pp 1–5 6. Pan W, Gao H (2021) A multi-source coordinated spinning reserve capacity optimization considering wind and photovoltaic power uncertainty. In: 6th Asia Conference on power and electrical engineering (ACPEE), pp 1346–1350 7. Radosavljevi´c J, Arsi´c N, Štatki´c S (2021) Dynamic economic dispatch considering WT and PV generation using hybrid PSOS-CGSA algorithm. In: 20th International symposium INFOTEHJAHORINA (INFOTEH), pp 1–6 8. Shah S, Koralewicz P, Gevorgian V, Liu H, Fu J (2021) Impedance methods for analyzing stability impacts of inverter-based resources: stability analysis tools for modern power systems. IEEE Electrification Mag 9(1):53–65 9. Wang Q, Hobbs WB, Tuohy A, Bello M, Ault DJ (2022) Evaluating potential benefits of flexible solar power generation in the southern company system. IEEE J Photovoltaics 12(1):152–160 10. Wu Z et al (2021) Real-time scheduling method based on deep reinforcement learning for a hybrid wind-solar-storage energy generation system. In: The 10th Renewable power generation conference (RPG), pp 409–413 11. Sudhakar S, Pandian SC (2012) Secure packet encryption and key exchange system in mobile ad hoc network. J Comput Sci 8(6):908–912 12. Sudhakar S, Pandian SC (2016) Hybrid cluster-based geographical routing protocol to mitigate malicious nodes in mobile ad hoc network. Int J Ad Hoc Ubiquitous Comput 21(4):224–236 13. Sudhakar S, Pandian SC (2015) Investigation of attribute aided data aggregation over dynamic routing in wireless sensor. J Eng Sci Technol 10(11):1465–1476

Hybrid Power Generation Forecasting Using an Intellectual …

137

14. Sudhakar S, Pandian SC (2013) Trustworthy position-based routing to mitigate against the malicious attacks to signifies secured data packet using geographic routing protocol in MANET. WSEAS Trans Commun 12(11):584–603 15. Sudhakar S, Pandian SC (2013) A trust and co-operative nodes with affects of malicious attacks and measure the performance degradation on geographic aided routing in mobile ad hoc network. Life Sci J 10(4s):158–163 16. Priyadarshni AU, Sudhakar S (2015) Cluster based certificate revocation by cluster head in mobile ad-hoc network. Int J Appl Eng Res 10(20):16014–16018

Machine Learning-Based Indian Stock Market’s Price Movement Prediction and Trend Analysis Athira , Arya Raj , Achu Pushpan , and R. C. Jisha

Abstract Stock markets play an essential role in the economy by allowing entrepreneurs to raise funds and businesses to grow their operations with the help of market finance. In this project, we design a web application tool that does both the stock price prediction and trend prediction by applying some of the most important machine learning algorithms. We forecast the price of the stock by using the bestperforming algorithm for the specified dataset along with buy or sell recommendations. For prediction, regression-type models such as linear regression, lasso regression, decision tree regression, ridge regression, Stochastic Gradient Descent (SGD), and Support Vector Regression (SVR) are used in this project. Moving Average Convergence Divergence (MACD) and Fibonacci retracements are also used to determine the uptrend and downtrend. We have found that linear regression performs well compared to the other algorithms. Keywords Decision tree regression · Fibonacci retracements · Lasso regression · Linear regression · Moving average convergence divergence (MACD) · Stochastic gradient descent · Stock market · Support vector regression

Athira (B) · A. Raj · A. Pushpan · R. C. Jisha Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, India e-mail: [email protected] A. Raj e-mail: [email protected] A. Pushpan e-mail: [email protected] R. C. Jisha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_11

139

140

Athira et al.

1 Introduction The sector which plays an essential role in the economy of a country is Financial Sector. The stock market is one of the key requirements for a thriving financial industry. It includes day-trading, long-term investment, and profit-booking of company shares. The method of estimating the future value of a company’s financial stocks is known as stock market forecasting. It has a considerable impact on the Indian economy and the value of the Gross Domestic Product (GDP). The entire market value of all products and services produced in a particular year is termed as GDP. It also reveals us how a country’s economy do better to that of other countries. As a result, as the GDP rises, we may anticipate the country’s economy to grow as well. The stock price forecast based on historical data will tremendously assist consumers in understanding and be helpful in determining where and how to invest in order to prevent losing money. For statisticians and financial analysts, stock market forecasting is a challenging task. The main rationale for this projection is that it entails purchasing stocks that are expected to gain in value and selling those that are expected to fall in value [1]. A Trading volume or Volume of Trade is one of the most essential factors influencing the fluctuations in the stock market. The number of lots purchased and sold on a daily basis is referred to as stock trading volume. Despite the fact that the chance of a share’s value falling too far owing to market volatility is minimal, there are still hazards. These variations, which impacts on the volume of trade and market value too, are tough to anticipate. The impact of market changes on the rise or fall in risk for investors or even people’s behaviour in terms of capital savings or investment. So, anticipating stock market performance through several techniques and methodologies may help investors act with more confidence by taking into consideration the risks and volatility of an investment and understanding when to acquire at the lowest price and sell at the highest price [2]. The dataset for the proposed work is taken from the National Stock Exchange of India (NSEI) website. It has various parameters like Date, Open, High, Low, Close, Adjusted Close, Volume, and so on. The date on which the specific stock results are recorded is referred to as “date”. When a stock market commences, the “open” price is the inaugural price at which stocks trade, whilst the “high” price is the top price recorded on the trading day [3]. The parameter “close” refers to the stock price at the end of a trading day or session when a stock market closes, and the lowest price on the trading day is referred to as “low”. The parameter “adjusted close” or “adjusted closing price” is the price at the end of a trading day or period that has been modified for company activities like stock dividends or stock splits. It’s mostly used to look at how a stock’s price could fluctuate in the future. The term “volume” is the number of shares purchased and sold during a specific period of time. Figure 1 shows a sample dataset model of Infosys. In the proposed work, regression-type models such as Linear Regression, Lasso Regression, Decision Tree Regression, Ridge Regression, Stochastic Gradient Descent (SGD), and Support Vector Regression (SVR) are implemented. Fibonacci

Machine Learning-Based Indian Stock Market’s Price Movement …

141

Fig. 1 Sample dataset

Retracements, along with Moving Average Convergence Divergence (MACD) as a technical indicator, are used to tell whether to buy or sell stocks. The root mean square error (RMSE) value and the R2 score are applied for the evaluation. In mathematics, regression analysis is a quantitative approach for discovering and modelling the connections between variables. It can be also used to analyse which independent variables impact which dependent variables and to assess the types of associations which exist between them. It also defines the response variable as a function of one or more potential determinants. It is called as the response function which is seldom chosen in such a way that it predicts responses without inaccuracies. So, we use regression analysis to try to minimise mistakes and average them to zero or less. It may also be used to anticipate an event’s possibility of a dependent variable given unconnected factors—that is, the average value of the dependent variable, whilst the variables which are independent stay unchanged [4]. As for the algorithms, linear regression can be used to determine the state of the independent labels and dependent labels, which aids with inside the forecasting of values. It works for some of the exclusive independent labels. The linearity of the variables is an essential prerequisite for linear regression. Furthermore, the correlation values are useful in figuring out any dependency existence. It can also be used to make predictions. It’s a form of supervised learning model. Decision tree regression also comes under the supervised learning model. It’s a form of tree-based structure that’s utilised to anticipate the dependent variable’s numeric outcomes. It is also known as the MP5 algorithm. It recognises an object’s properties and trains a tree-based model to anticipate future data and generate meaningful continuous output. Support Vector regression is a type of supervised learning model. It is pretty similar to SVM apart from that it is a regression algorithm with one extra configurable parameter ∈ (epsilon). SVR has the distinct feature of attempting to minimise the generalised error bound instead of the actual training error in order to achieve high performance by fitting inside a certain threshold. The L1 regularisation strategy is used by the Least Absolute Shrinkage and Selection Operator Regression (LASSO) or Penalised Regression Method, whereas the L2 regularisation approach is used by Ridge Regression. Penalised and ridge regression, generally known as “L1 and L2 regularisation”, are two methods for reducing model complexity and avoiding overfitting. Ridge regression reduces multi-collinearity and also the complexity of the model by reducing the coefficients whereas the penalty regression method can aid

142

Athira et al.

in feature selection as well as minimising over-fitting. Stochastic gradient descent (SGD) is a machine learning optimisation method that finds the values of parameters or coefficients of functions that minimise a cost function. It essentially performs a simple SGD learning method that can fit linear regression models with a number of penalties and loss functions. The technical indicators used are Moving Average Convergence Divergence (M ACD) and Fibonacci retracements. MACD is all about the convergence and divergence of two moving averages. Convergence happens when two moving averages advance in the same path, whereas divergence occurs when they move oppositely. A traditional MACD is built with a 12 day and a 26 day Exponential Moving Average (EMA). Both EMAs are calculated using closing prices. The convergence and divergence (CD) value is calculated by subtracting the 26 EMA from the 12 day EMA. The ‘MACD Line’ is a simple line graph that represents this. Fibonacci retracement levels are horizontal lines generated from the Fibonacci sequence that indicates where support and resistance might be foreseen. Each level is allotted a percentage. It shows how far a prior move has been retraced. 23.6% (Ratio-0.2316), 38.2% (R-0.382), 50% (R-0.5), and 61.8% (R-0.618) are the Fibonacci retracement levels. The indicator is beneficial because it may be established between any two significant price points, such as “high” and “low”. Between those two positions, the indicator will generate levels. The rest of the paper is structured as follows. In Sect. 2, reviews the existing works in this area. Section 3 describes the proposed model. Section 4 describes and methods and algorithm implemented. Section 5 describes the Result analysis. Section 6 describes the conclusion. Section 7 describes the Future scope.

2 Related Work There are a lot of chances to occur uncertainty in the stock market trend and price. As a result, predicting it’s far extraordinarily hard. Stock markets give possibilities for investors to earn while also posing hazards. Scholars and professional investors have studied stock market timing strategies extensively, producing ideas and hypotheses along the way. The stock market has such a large impact on individual and national economies, forecasting the price of a stock and analysing the trend of the market is crucial in various circumstances and also the factors such as economics, politics, and the news which affect the stock market price prediction. Table 1 provides the comparison of different machine learning algorithms with regard to the related work. For as long as stock markets has incepted, a great deal of study has gone into constructing models that can forecast stock price movements and closing stock price. Regression methods are one type of machine learning technology. Legendre introduced the first kind of regression, the least squares technique, in 1805. Gauss developed the Gauss- Markov theorem in 1821, which is a further extension of the idea of least squares. Francis Galton coined the term “regression” to characterise a biological process in the nineteenth century [5]. Siew and Nordin [5] found a set of

Machine Learning-Based Indian Stock Market’s Price Movement …

143

Table 1 Comparison of different techniques Technique

Advantage

Disadvantage

Parameter used/evaluation metrics

SMO regression

Preforming better compared to other regressions

Gives better performance only in ordinal form

Closing price

MLP (Multi-layer perceptron)

Performs well on test dataset

Its functioning depends Foreign exchange on the training quality (FEX) rates, silver and gold prices, interest rates, oil prices, news, and social media feeds

Lasso regression

Performs better than ridge

Performance is based on the dataset chosen

Random forest

Performs better with input data in which technical attributes are as continuous values

Performance is least on Direction of stock and trend deterministic stock price index data

CNN

Performs better compared to RNN and LSTM

Massive training data is required

RMSE and MAPE

Day stamp, time stamp, transaction id, stock price, volume

standardised ordinal data, the results of regression algorithms could be enhanced for predicting stock price patterns. They use two datasets, one with the original source data in real numbers and the other with the converted values from dataset 1 in ordinal form. The result on dataset 2 shows that using modified ordinal data improves the result. One of the regression approaches performed on dataset 2 exhibit greater rates of correlation coefficients with reduced margins of error when compared to dataset 1. Dataset 2 has stronger rates of correlation coefficients with reduced margins of error than dataset 1, with the exception of one regression methodology. The SMO Regression approach was one of the regression strategies employed on dataset 2 and produced an acceptable result. Usman et al. [6] use Multi- Layer Perceptron (MLP), Support Vector Machine (SVM), Radial Basis Function (RBF), and Single Layer Perceptron (SLP) to forecast the stock trend of the Karachi Stock Market and compare the results. Foreign exchange (FEX) rates, silver and gold prices, interest rates, oil prices, news, and social media feeds are amongst the characteristics employed in the model. Autoregressive Integrated Moving Average (ARIMA) and Simple Moving Average (SMA) are two classic statistical approaches that are utilised as input. The SVM method fared well on the training data set, whereas the MLP algorithm performed well on the test data set. The test set must be distinct from the examples used in the training set in order for the model to be evaluated on entirely fresh and unknown scenarios. As a result, MLP appears to be more effective in forecasting market performance. Roy et al. [7] use LASSO and Ridge Regression for predicting the stock price of the Goldman Sachs Group and compare the results. The performance of these techniques are contrasted using the root mean square error (RMSE) and mean absolute

144

Athira et al.

percentage error (MAPE).It was discovered that the LASSO regression method’s testing set MAPE is lower than the Ridge regression method’s testing set MAPE, implying that LASSO is superior to the Ridge. Milosevic [8] discusses a machine learning-assisted approach for predicting longterm equity movement. JRip, Support Vector Machines with Sequential Minimal Optimisation, C4.5 decision trees, Logistic Regression, Random Trees, Naive Bayes, Random Forest, and Bayesian Networks are used to train models. To begin, we ran all of these algorithms through a tenfold cross validation with all of the data. Indicators and price history are employed as attributes. Then finished a manual attribute selection by eliminating attributes one by one. The features, will then determine if the algorithm’s performance has improved or not. While using machine learning with specified features, random forests remained the highest performing method. Patel et al. [9] examine and contrast four different prediction models. Support Vector Machine (SVM), Artificial Neural Network (ANN) [10, 11], Naive Bayes, and Random Forest with two ways for supplying data to these models and anticipating the direction of stock movement and the stock price index for Indian stock markets. The first approach computes technical metrics using stock trading data, whereas the 2nd employs trend deterministic data to represent these technical features and then precision is calculated. According to the findings, for the first method of input data in which technical attributes are shown as continuous values, random forest surpasses the remaining models used for prediction in terms of competence in general. Sreelekshmy et al. [12] uses RNN, LSTM, and CNN to present a formalisation for stock price prediction based on deep learning. CNN architecture has been found as the best model for detecting changes in trends. Kuttichira et al. [13] use Dynamic Mode Decomposition and the assumption that the stock market is a dynamic system, a price forecast was proposed. DMD generates forecasts for a larger number of businesses at once. The interdependencies between the firms are taken into account while making these projections. Ashok and Prathibhamol [14] uses ARIMA and LSTM for prediction. For time series calculations, the ARIMA representations are well known. This paper sheds light on the common practise of using the ARIMA model when building stock price-based analytical work. In order to predict future movements of stock prices based on the historical data, this work also focuses on the convention of LSTM networks.

3 Proposed Model The proposed model, which is the web application tool focuses on forecasting the stock’s closing price and recommending whether to purchase or sell the stock. The datasets are obtained from the National Stock Exchange (NSE) website, which included various sectors such as information technology (IT), pharmaceuticals, and healthcare, amongst others. The architecture diagram is shown in Fig. 2. After importing the dataset, we began Data pre-processing with the data cleaning step. We checked the dataset for missing values and eliminated null values. The data of

Machine Learning-Based Indian Stock Market’s Price Movement …

145

Infosys is visualised using a set of charts such as the price chart (Fig. 3), the annual profit triangle, and more. Data is normalised before using the machine learning techniques such as Linear regression, decision tree regression, support vector regression (with various kernels such as linear, radial basis function (RBF), poly), lasso regression, stochastic gradient descent (SGD), and ridge regression. The RMSE and R2 score values are used to compare and analyse the findings. A recommendation to purchase or sell the stock is made using technical indicators such as the Moving Average Convergence Divergence (MACD) and Fibonacci retracement. Figure 4 shows the website’s home page, with the “Introduction” part displayed above, where you may choose any dataset collected from the NSE site and our desired “Start date” and “End date.” The stock chart is displayed based on the selected parameters. The same thing may be done under the “Stock Price Predictor” link, along with applying all of the

Fig. 2 Architectural diagram

146

Athira et al.

Fig. 3 Price chart

Fig. 4 Home page of the website

models. A graph of every model’s “actual vs predicted value” is displayed along with the RMSE and R2 score, which are shown in the Results’ section. The “Stock Trend Predictor” section, the up/down movement has shown. This is the basic overview of the web application tool.

4 Methods and Algorithms 4.1 Technical Parameters The technical parameters used in the project are. MACD is a very effective technical indicator that is often applied. It is a momentum trend indicator. We may infer from the full-form of MACD that it is all about the

Machine Learning-Based Indian Stock Market’s Price Movement …

147

convergence and divergence of two moving averages. Convergence takes place when two moving averages advance in the same path, whilst divergence takes place when they move oppositely. A typical MACD is comprised of 12 day and a 26 day Exponential Moving Average (EMA). Both exponential averages are calculated using closing prices. We use the following equation (Eq. 1) to get the convergence and divergence (CD) value. MACD = 12 day EMA − 26 day EMA

(1)

The ‘MACD Line,’ which is a basic line graph, represents this. There is also a signal line, which is the 9-period exponential average of the MACD line. We interpret the MACD using buy and sell signals. Fibonacci Retracements is a technical indicator that can help you identify emerging trends by comparing Fibonacci numbers and retracement. It aids in determining the price prediction based on the trend. Fibonacci retracement levels are horizontal lines derived from the Fibonacci sequence that indicate possible resistance and support values. A percentage is assigned to each level. The percentage reflects how much of a previous move has been retraced. Fibonacci retracement levels are 23.6% (R0.2316), 38.2% (R-0.382), 50% (R-0.5), and 61.8% (R-0.618). Equations for uptrend and downtrend retracements are: UR = High − ((High − low) ∗ %levels)

(2)

DR = Low + ((High − low) ∗ %levels)

(3)

where UR is uptrend retracement and DR is downtrend retracement. For an uptrend, the difference between the highest and lowest price for that particular time period is subtracted from the highest price and is then multiplied with the percentage level. For a downtrend, the difference is added with the lowest price and is then multiplied with the percentage levels.

4.2 Algorithms Linear Regression One of the most fundamental and extensively used Machine Learning algorithms is linear regression. It’s a regression model that describes the relationship between variables using a straight line. To locate the line of the best fit across your data, it looks out the regression coefficient(s) value, which minimises the model’s overall error. It can be Simple Linear Regression or Multiple Linear Regression. The equation of Simple Linear Regression is y = b0 + b1 ∗ x1

(4)

148

Athira et al.

where y is a dependent variable, b0 is intercept or a constant value, x1 is an independent variable, and b1 is a coefficient to x Decision Tree Regression The decision tree algorithm is a type of supervised learning method. It’s a two-in-one algorithm that may be used for Regression and classification. The decision tree regression is also known as M5P algorithm. As like the name, this method uses a model of tree to predict the target. If the target is in the form of continuous values, we use decision tree regression and if it is in the form of label, we use decision tree classification. A decision tree generates an output by asking a sequence of questions, which helps the model to predict the output accurately. All the questions asked by the model are in the form of Yes/No or True/False. In decision tree classification, it uses information gain and entropy for splitting the nodes which plays a significant part in the accuracy of the model. But in the case of regression it uses Mean Squared Error (MSE) for the splitting of the nodes. The questioning of the model will continue until it reaches a leaf node. Support Vector Regression SVM or support vector machine also can be used for the prediction of categorical and continues values. If the method used is for the prediction of continues variables, then it is called Support Vector Regression or SVR. It incorporates a collection of arithmetic operations known as the kernel. SVR algorithm uses different types of kernel functions. These functions might be of several forms. Examples include linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid functions. For making SVR tuned models, use parameters set of epsilon (epsilon specifies a margin of tolerance in which mistakes are not penalised) and C (The penalty parameter C reflects the misclassification or error term) and to find the best fit parameter use Gird search for that c and epsilon are the two hyperparameters. Lasso and Ridge Regression Least Absolute Shrinkage and Selection Operator Regression and Ridge regression, also known as “L1 and L2 Regularization”, are two strategies for minimising model complexity and preventing overfitting. The method LASSO uses the L1 regularisation approach, whereas Ridge Regression implements the L2 regularisation approach. L2 does not eliminate sparse models or coefficients and instead minimises the squared sum of the values of the coefficient. As a result, Lasso Regression is simpler to comprehend than Ridge Regression. Ridge regression lowers multi-collinearity as well as model complexity by lowering coefficients. Least Absolute Shrinkage and Selection Operator (LASSO) Regression can help with feature selection while also reducing over-fitting. When the set of features is less than the set of observations, the LASSO technique might offer sparse results. Lasso Regression equation is Minimisation objective = LS Obj + λ

(5)

where LSObj is the Least Squares Objective, which is just the linear regression objective minus regularisation and = λ is the controlling element that determines the degree of regularisation. As the value of λ rises, so does the bias, but the variance reduces as the degree of shrinkage increases. Ridge Regression’s equation is

Machine Learning-Based Indian Stock Market’s Price Movement …

y =e+X ∗B

149

(6)

where Y symbolises the dependent variable and X signifies the independent variable, B denotes the regression coefficients to be obtained, and e represents residual errors. We take into account the variance that is not assessed by the general model, when we add the lambda function to this equation. There are procedures that may be taken once the data has been prepared and determined as being suitable for L2 regularisation. Stochastic Gradient Descent (SGD) is an optimisation technique that is unrelated to any specific family of machine learning techniques. It applies convex loss functions to linear classifiers and regressors like (linear) SVMs and Logistic Regression. The SGD Regressor is a linear model fitted using SGD minimisation of a regularised empirical loss. The loss gradient is computed one observation at a time, and the model is updated along the way by using a decreasing rate of learning. The regulariser as a penalty is also employed.

4.3 Evaluation Criteria R2 Score is a critical assessment indicator for evaluating the performance of regression models. The R2 score has a value between 0 and 1. The R2 number reveals how well the independent variables explain the variation in the model. The ideal model will have an R2 score of 1.In the proposed model linear regression gives high R2 score and then comes the SVR tuning model. RMSE The Root Mean Score Error (RMSE) is a measurement that calculates the average distance between the model’s projected readings and the original values in the dataset. It’ll be a positive number. The lesser the RMSE, the better a model is in “fitting” a dataset. In the proposed model Linear regression shows less RMSE value. These performance factors have been employed in several studies and are feasible techniques for evaluating the model’s resilience for daily prediction.

5 Result Analysis This is based on the IT sector, we recommend when to buy or sell the stock (Fig. 5).The black triangle indicates to buy the stock and the upside down triangle of red colour indicates to sell the stock. This chart is drawn using Fibonacci Retracement. Coming to the closing price prediction, higher the R2 score and lower the RMSE gives the best accurate model (Fig. 6).From all sectors, linear regression predicts more accurately with the highest R2 score and lowest RMSE value. The “Actual vs Predicted” graph of linear regression of all the three sectors are shown (Fig. 7). Support vector regression (SVR) with tuned parameters is the next one, then lasso and ridge regression. Stochastic gradient is not suitable for large and complex datasets.

150

Athira et al.

Fig. 5 Fibonacci retracement chart

Fig. 6 Result comparison

Decision tree regression is the least performing one. SVR with other kernels is not considered since they give inaccurate values. From them, the linear kernel is better than the poly and RBF kernels.

6 Conclusion During this epidemic, the stock market has become a source of income for a large number of people. Inventory market traders and speculators want to expand their revenues by analysing marketplace data. The main goal of this proposed model is to find a better model for forecasting the closing price value by analysing which one performs better based on the R2 score and the RMSE value. In this paper, we suggest the best method for predicting the closing price value by comparing various machine learning algorithms along with a recommendation to buy or sell the stock. By analysing three different sectors, such as IT, pharmaceuticals, and healthcare, we have come to the conclusion that linear regression gives better accuracy and performs better compared to the other machine learning models. The next better performing model is SVR with tuned parameters having a linear kernel. The Lasso regression follows, which performs slightly better than the ridge regression. The performance of the Stochastic gradient descent (SGD) regression depends on the size of the datasets.

Machine Learning-Based Indian Stock Market’s Price Movement …

151

Fig. 7 Linear regression comparison-IT, Banking and Pharma sectors

It is not that suitable for larger datasets. Decision tree regression is the weakest performing algorithm. It predicts the closing price accurately for small datasets. A recommendation on whether to sell or buy the stocks is made using technical indicators such as the MACD and Fibonacci Retracements.

7 Future Scope Chart pattern analysis is being incorporated as a future addition to analyse trend movement. As we know, deep learning is one of the types of machine learning and is much advanced too. For closing price prediction, we are planning to implement some of the deep learning models such as ANN, LSTM, RNN and do a comparison of them too.

152

Athira et al.

References 1. Nabipour M, Nayyeri P, Jabani HSS, Mosavi A (2020) Predicting stock market trends using machine learning and deep learning algorithms via continuous and binary data; a comparative analysis. IEEE Access 8:150199–150212. https://doi.org/10.1109/ACCESS.2020.3015966 2. Boone LG, Richardson PC (1999) Stock market fluctuations and consumption behaviour: some recent evidence 3. Kumar PN, Mohandas VP (2010) An analysis of existing artificial stock market models for representing Bombay stock exchange. Inter J Comp Sci Eng Tech (IJCSET) 2:20–26 4. Sharma A, Bhuriya D, Singh U (2017) Survey of stock market prediction using machine learning approach. In: 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), pp 506–509 5. Siew HL, Nordin MJ (2012) Regression techniques for the prediction of stock price trend. In: 2012 International Conference on Statistics in Science, Business and Engineering (ICSSBE), pp 1–5 6. Usmani M, Adil SH, Raza K, Ali SSA (2016)Stock market prediction using machine learning techniques. In: 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), 2016, pp 322–327 7. Roy SS, Mittal D, Basu1 A, Abraham A (2015) Stock market forecasting using LASSO linear regression model. Springer International Publishing Switzerland 8. Milosevic N (2018) Equity forecast: predicting long term stock price movement using machine learning. Papers 1603.00751. arXiv.org, revised Nov 9. Patel J, Shah S, Thakkar P, Kotecha K, Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques 10. Gopalakrishnan EA, Menon VK, Soman KP (2018) NSE stock market prediction using deeplearning models. Proced Comp Sci 132:1351–1362 11. Nair BB, Kumar PN, Prasad SR, Singh LM, Vijayalakshmi K, Sai Ganesh R, Reshma J, Forecasting short-term stock prices using sentiment analysis and artificial neural networks. J Chem Pharm Sci 12. Sreelekshmy S, Vinayakumar R, Gopalakrishnan EA, Vijay KM, Soman KP, Stock price prediction using LSTM, RNN and CNN-sliding window model 13. Kuttichira DPL, Gopalakrishnan EA, Vijay KM, Soman KP (2017) Stock price prediction using dynamic mode decomposition. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp 55–60 14. Ashok A, Prathibhamol CP (2021) Improved analysis of stock market prediction: (ARIMALSTM-SMP). In: 2021 4th Biennial International Conference on Nascent Technologies in Engineering (ICNTE)

Machine Learning-Based Mortality Prediction of COVID-19 Patients R. Ani , O. S. Deepa , M. Arundhathi, and J. Darsana

Abstract Coronavirus Disease 2019 (COVID-19) has impacted our world. Risk classification during hospitalization is essential for medical planning and allocation of resources, and hence for lowering death rates, especially in underdeveloped countries. Many characteristics related to patients that influence illness severity, such as pre-existing comorbidities, can be employed to improve this prediction. Finding a biomarker which helps to identify individuals who need prompt treatment and determine their mortality risk has thus become a pressing yet difficult task. There is no advanced tool available for this. As a result, our research aims to develop a MLbased predictive model as well as a decision support system that can predict mortality based on clinical and health characteristics. We used different ML algorithms such as Random Forest, Support Vector Machines, XGBoost Classifier, and Logistic Regression for survival rate prediction in covid patients. Two feature selection methods, Information Gain and SVM-RFE were also incorporated. Accuracy comparison on 3 different datasets using the ML algorithms was done and the important biomarkers that helps in mortality prediction were identified. Keywords COVID-19 · Information gain · LR · ML algorithms · Random forest · SVM · SVM-RFE · XGBoost

R. Ani · M. Arundhathi · J. Darsana (B) Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, India e-mail: [email protected] R. Ani e-mail: [email protected] M. Arundhathi e-mail: [email protected] O. S. Deepa Department of Mathematics, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_12

153

154

R. Ani et al.

1 Introduction The COVID-19 has fast spread over the world since its outbreak in Wuhan, China, and has become a global health calamity. Coronaviruses aren’t a recent occurrence. These RNA viruses have been around since the mid-sixties. The common aspect is that it is identical to a cold, but it causes upper respiratory tract disease. Two forms of coronavirus had been identified well before the 2019 Wuhan outburst. The virus severe acute respiratory syndrome-coronavirus (SARS-COV) causes a severe respiratory infection. In 2003, the Coronavirus was discovered in the southern Chinese province of Guangdong. In Saudi Arabia in 2012, a new Middle East respiratory syndrome coronavirus (MERS-COV) was found. We are currently dealing with the new coronavirus, which was discovered in Wuhan in December of 2019. Covid-19 was discovered on January 7, 2020. The rate of its spread accelerated from December 2020 to March 2020. It was then labeled a pandemic by the World Health Organization (WHO). This condition affects a person’s respiratory system, and some people, specifically those with a healthy immune system, will eventually recover without specific therapy. Elderly people, especially with comorbidities including respiratory disease, cancer, diabetes, and cardiovascular illness, are much more exposed. COVID-19 is not merely a respiratory one, but a multisystemic disease. It can transmit because it is dispersed by droplets into the air from infected patients via speaking, coughing, and sneezing, as well as encountering affected things or surfaces. According to the WHO, frequent handwashing, disinfection, social distancing, wearing masks, and not touching your face can help prevent infection. Fever, a dry cough, and exhaustion were the most prevalent symptoms, according to the WHO, while headaches, sore throat, diarrhea, conjunctivitis, loss of smell, and rashes were less common, and serious symptoms included breathing issues, chest discomfort, and loss of speech and mobility. Varied nations took various methods to combat the virus. It involves countrywide lockdowns, curfews, and travel restrictions, among other things. However, many infected individuals were unable to receive effective treatment due to late detection and the virus’s peculiar and unidentified origin. Researchers from all across the world have been trying to figure out what caused it to spread. Researchers concentrated on establishing new methods for evaluating affected people at different stages of disease in order to determine significant correlations between both the patient’s clinical characteristics and the probability of succumbing to the disease. Machine learning, a subfield of AI, has been utilized successfully in a variety of case studies to draw conclusions. Machine learning algorithms linked to covid have been developed in large numbers. COVID-19 has a broad variety of clinical manifestations, spanning from nonspecific or moderate infection to severe pneumonia needing special care in high-risk patients. Acute respiratory infection symptoms can be severe in some patients, and they can be worsened by acute respiratory distress syndrome and multiorgan failure, which can be fatal. The need for more research into these features in various settings is crucial for expanding our knowledge.

Machine Learning-Based Mortality Prediction of COVID-19 Patients

155

The influx of patients has thrown the entire healthcare system into disarray and presented substantial challenges. As a result, one of the most pressing challenges in COVID-19 management is precise and timely detection of high-risk individuals. Medical judgment and resource management can be aided by early risk classification. High-risk patients, for example, may be admitted to an ICU for careful observation and organ treatment. According to various studies, biomarkers have been identified to aid in the categorization of COVID-19 individuals with an elevated threat of serious disease and fatality by providing essential information about the patients. As a result, a predictive model based on ML as well as a decision support system that predicts mortality based on clinical and health characteristics is required. In our study, we used 3 datasets to analyze the disease severity in covid patients with the help of demographic as well as clinical features. We used different ML algorithms such as Random Forest (RF), Support Vector Machines (SVM), XGBoost Classifier, and Logistic Regression (LR) for survival rate prediction in covid patients. Two feature selection methods, Information Gain and Support Vector MachineRecursive Feature Elimination (SVM-RFE) were also incorporated. The rest of the paper is organized into 3 main parts-related works which give a brief overview of methods used by different authors, a methodology section where all the ML algorithms and feature selection methods we used are discussed and the result section tells the outcomes we got after doing our work.

2 Related Works The case study’s findings from the relevant studies are addressed below. Machine learning is an area of AI that emphasizes on creating algorithms that learn from past experiences and evolve without being expressly designed. The machine learning field has grown in popularity over the years as a means of solving a variety of real-world challenges. Machine learning techniques are of 3 types, namelysupervised, unsupervised, and reinforcement learning. In supervised learning, the algorithm is permitted to train from a dataset with pre-defined labels. The two primary types of supervised learning are classification and regression. Unsupervised algorithms, on the other hand, seek to train from un-labeled datasets. The algorithms extract features and identify patterns from the unlabeled dataset. Whereas unsupervised machine learning approaches include clustering and dimensionality reduction of big and high-dimensional datasets. The program learns by trial and error in reinforcement learning. In the training phase, a reward and punishment mechanism are used [1]. Multidimensional, complex, heterogeneous, and nonlinear data are found in electronic health records (EHRs). Machine learning can aid in the full utilization of clinical information in EHRs by facilitating fact questioning and sophisticated judgment call. Furthermore, ML algorithms may be developed using huge amounts of patient EHRs, understand incredibly complicated correlations between features, and

156

R. Ani et al.

also outperform humans in challenging tasks like picture classification and detecting trends in historical data [1]. The author of [2] used correlation methods and combined statistical comparison with ml algorithms to investigate clinical datasets of COVID-19 patients with pre-existing outcomes. Student t-tests for continuous variables, Pearson correlations among various blood sample counts, and chi-square tests for categorical variables were used for the statistical analysis. The ML algorithms included extreme gradient boosting (XGBoost), random forest (RF), k-nearest neighbor (KNN), decision tree (DT), ANN-based deep learning sequential models, light gradient boosting machine (LGBM), gradient boosting machine (GBM), and support vector machine (SVM). To gauge feature relevance, Shapley Additive Explanation (SHAP) values were computed for each model. The immature granulocytes levels and lactate seemed to get the best value for prediction among the blood indicators that can provide accurate details about the intensity of COVID-19 symptoms. In terms of disease severity prediction, hemoglobin, D-dimer, ferritin, procalcitonin, platelet, erythrocyte sedimentation rate, and brain natriuretic peptide levels all differed significantly from the normal control group. After evaluating the data with machine learning techniques, we discovered that all of the deployed models performed well in every assessment matrix, with accuracy ratings exceeding 80% and RF was the best performer with an accuracy of almost 0.93. The author of [3] used EMR data from patients for the validation and development of the prognostic model. Feature selection, categorization, and statistical approaches were used for the missing value approximation. For prediction, logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), and support vector machine (SVM) techniques were employed & for feature selection, recursive feature elimination approach was used. With AUC scores of 0.91–0.94, XGBoost-based prediction models exhibit a greater accuracy. The most predictive criteria for all four death prediction models studied were age and minimal oxygen saturation. The author of [4] employed five methods namely, elastic net (EN) model, LR, partial least squares (PLS) regression, bagged flexible discriminant analysis (FDA), and RF. Despite slight differences in AUROC between the bagged FDA (0.899) and LR (0.895) and RF (0.922), LR was chosen as the ultimate model due to elevated interpretability and simplicity. To compare models’ performance, AUROC was used. All five models chose 4 features: Age, d-dimer level, lymphocyte count, and highsensitivity C-reactive protein level as the important predictors for mortality. The author of [5] used univariate and multivariate analysis to determine distinct early predictors of acute COVID-19, and multivariate LR analysis was utilized to build the prognostic model. The predictive value of the prognostic model and each initial predictor was checked using ROC curve. Age, lymphopenia, elevated neutrophil– lymphocyte ratio (NLR), hypertension, and hypoalbuminemia were found to be important indicators for serious COVID-19 patients in both univariate and multivariate analyses. The newly built prediction model had a specificity of 84.2 percent and sensitivity of 90.5 percent for predicting serious COVID-19 cases and also had an AUC of 0.92.

Machine Learning-Based Mortality Prediction of COVID-19 Patients

157

For risk stratification, the author of [6] used 2 neural network models, NNet1 with 9 and NNet2 with 32 predictors. Picking predictors was done using the GA method, and building neural network models was done using the cross-validation method. Traditional LR models had been compared to the final neural network models. NNet1 model’s accuracy was frugal, whereas NNet2 model’s accuracy was the highest. The linear regression models were outperformed by NNet1 with an AUC of 0.806 and NNet2 with AUC of 0.922. Non-survivors were substantially older than survivors, with greater levels of high-sensitive troponin I, D-dimer, C-reactive protein, and αhydroxybutyrate dehydrogenase, as well as a lowe lymphocyte count. Hypersensitive troponin I, creatine kinase isoenzyme, and BNP were discovered to be new risk factors. The author of [7] used three SVM models with invasive and non-invasive, and combination of both groups. Data from clinical (9 features), demographics (2 features), and laboratories (26 features) had been used to develop SVM classifiers. Regularized Neighborhood Component Analysis (NCA) was used to analyze the main aspects for death prediction. Support vector machine-recursive feature elimination (SVM-RFE) was a feature assessment technique that recursively eliminates low-importance features using the SVM weights. For determining the significance of selected subsets of features and assessing overall significance in a robust and efficient framework, sparse linear SVM was used. To analyze the predictive accuracy of noninvasive and invasive variables with reference to the result, they integrated a sparse regularization framework, least absolute shrinkage and selection operator (Lasso) with SVM (linear). Findings suggested that non-invasive features would be capable of predicting mortality in a similar way to the invasive model and comparable to combined model. Non-invasive model, when compared to the invasive model, offers good performance with lesser features, referring to the event of increased predictive data across different non-invasive features, namely age, cardiovascular disorders, and SPO2, according to sparsity analyzation and SVM-RFE feature inspection results. Among laboratory features, PTT, BUN, and LDH exhibited strongest death probability scores. The author of [8] used XGBoost algorithm. “Time to Event” was calculated utilizing XGBoost algorithm and the Cox Proportional Hazard Model. Age, lym phocyte percent in DLC, INR, male gender, oxygen saturation below 90%, coronary artery disease, ferritin, chronic kidney disease, LDH, diabetes mellitus, respiratory distress, and respiratory rate > 24/min were revealed to be significant among the 63 clinical and laboratory criteria. The model’s performance parameters were an accuracy of 96.89 and AUCROC of 0.8685. The validation cohort had an accuracy of 0.93 and an AUC of 0.782. This was among the first studies on the Indian population that looked at the “time to event” at hospitalization and precisely predicted patient results. The author of [9] used logistic regression model to find age, lymphocyte count, Ddimer, CRP, and creatinine (ALDCC), data obtained at hospitalization, as the significant indicators of hospital mortality. Area under the curves (AUCs) for external & internal validation cohorts, and the development cohort were 0.992, and 0.999, and 0.987, correspondingly. Using death probability and ALDCC score, all patients are

158

R. Ani et al.

split into 3 risk groups: low (probability less than 5%), moderate (probability between 5 and 50%), and high (probability greater than 50%). The predictive model, ALDCC score, and nomogram would aid in the earlier detection of covid and non-covid individuals with increased death probability, allowing doctors for treating their patients more effectively. The author of [10] have trained and compared performance of multiple machine learning models (SVM, DT, NN, RF, LR, and XGBoost) to determine covid death risk depending on the blood test data. Age, LDH, high-sensitivity C-reactive protein (hsCRP), lymphocytes, & neutrophils helps in the prediction of mortality with 96% accuracy. NN classification and XGBoost feature importance being the top performing method, estimates predictions up to 16 days in advance with 90% accuracy. The author of [11] examined the performances of 18 ML algorithms for predicting covid patients’ admission in the ICU & their mortality. It was found that other models couldn’t meet the performance showcased by ensemble-based models for predicting both 5-day admittance of patients in the ICU and 28-day fatality from COVID-19. O2 saturation, CRP, and LDH were essential for patients’ admittance in the ICU, while lymphocyte (%) and neutrophil (%) were found to be the most significant variables in death prediction. The author of [12] identified the risk to health and determine the death risk of covid patients, by developing a prognostic model focused on ML algorithms and AI. LR, DT, KNN, SVM, RF, and ANN were among the different ml algorithms employed. The results show an overall accuracy of 89.98 percent in forecasting death rates. The most concerning signs and symptoms were also highlighted. Finally, a different covid patients’ dataset was utilized for analyzing accuracy of generated model, then to conduct a thorough examination of the models and determine their specificity and sensitivity, a confusion matrix was employed. A prognostic model was developed by the author of [13] for COVID-19 ventilator assistance and fatality. To create and evaluate predictive models, four standard ML techniques, 3 data balance methodologies, and attribute selection are employed. The predictive model for ventilator assistance with the best 20 attributes chosen with the relief algorithm from baseline clinical, radiological, and laboratory data using SVM & random undersampling technique had 0.81 as its balanced accuracy and 0.87 as its AUC in an independent test set. By utilizing balanced RF with all features, the best model had 0.80 as its balanced accuracy and 0.83 as its AUC for the fatality endpoint. With 0.79 as balanced accuracy and 0.85 as AUC, severity annotations of chest X-ray solely outperformed age, gender, comorbidities or complete blood count for ventilator support. Comorbidity itself had 0.72 as its balanced accuracy and 0.80 as its AUC for mortality that is greater than algorithms which solely employ lab results, demographic data, or chest radiographs. For intubation and mortality, the prediction capacity of combined data consistently beat that of each dataset separately. The author of [14] created a probability-based model that forecasts the number of beds occupied over a given timeframe. Along with the probability-based models Poisson distribution, Geometric Poisson distribution, and weighted Poisson distribution were utilized. Patients were categorized into 2 groups: covid and non-covid patients by the author of [15] and also explained the relevance of each attribute on

Machine Learning-Based Mortality Prediction of COVID-19 Patients

159

the outcome with the help of SHapley Additive exPlanation (SHAP). To anticipate covid spread rate and to test accuracy, the author of [16] created prognostic models utilizing Long Short-Term Memory Networks (LSTM) and Auto-Regressive Integrated Moving Average (ARIMA) on the time-series data of the top-10 impacted Indian states. These models could be employed to assist the creation of methods for optimal healthcare resource allocation and management. Finally, various researches have demonstrated the value of ml algorithms, particularly in predictive modeling. Despite the fact that multiple studies have been initiated to conduct estimations and prediction, more training and research of the COVID-19 findings using a genuine clinical records’ dataset is still required. Table 1 shown above provides the comparison of different machine learning algorithms with regard to the related work. Table 1 Related works Study

ML approaches

Sample size

Performance

Aktar et al. [2]

XGBoost, RF, KNN, DT, ANN-based deep learning sequential models, LGBM, GBM, and SVM

2 datasets with 89 and 1945 patient records

AUCs of 0.88, 0.89, 0.84, 0.82, 0.82, 0.88, 0.89, and 0.84

Yadaw et al. [3]

LR, RF, XGBoost, SVM 3841 patients

AUC of XGBoost (0.91–0.94)

Hu et al. [4]

Elastic net model, LR, partial least squares regression, bagged flexible discriminant analysis, and RF

183 patients (115 survivors and 68 non-survivors)

AUROCs of bagged FDA (0.899), LR (0.895), and RF (0.922)

Hu et al. [5]

Univariate and multivariate LR

40 patients (19 mild and AUC of Multivariate 21 severe) LR 0.92

Yu et al. [6]

NNet1, NNet2, SVM, and LR

246 patients

Alballa and Al-Turaiki [1]

LR, RF, SVM, XGBoost 1270 patients

AUCs of NNet1 (0.806) and NNet2 (0.922), SVM (0.825), LR (0.743) AUCs of LR (85–95%), SVM (81%), RF (81%), XGB (80–90%)

Mahdavi et al. [7] SVM (invasive, non-invasive, and their combination)

628 patients

Invasive (0.75), non-invasive (0.77), combination (0.80)

Kar et al. [8]

2370 patients (1393–Development Cohort and 977—Validation cohort)

Accuracies of XGBoost (0.97), RF (0.94), LR (0.96)

XGBoost, RF, LR

(continued)

160

R. Ani et al.

Table 1 (continued) Study

ML approaches

Sample size

Performance

Rahman et al. [9]

RF, SVM, KNN, XGBoost, Extra-tree and LR

375 patients

Accuracies of KNN (0.88), RF (0.89), XGB (0.87), SVM (0.86), extra-tree (0.89), LR (0.91)

2779 patients

SVM (95%), DT (91%), neural network (96%), RF (94%), LR (95%), and XGB (94%)

Karthikeyan et al. SVM, DT, NN, RF, LR, [10] and XGBoost

Subudhi et al. [11]

Ensemble, Gaussian 10,826 patients process, linear, naïve Bayes, nearest neighbor, SVM, tree-based, discriminant analysis, and NN

Ensemble performed better than all other models types except naïve Bayes, tree-based,and discriminant analysis-based methods

Pourhomayoun et al. [12]

LR, DT, KNN, SVM, RF, and NN

2,670,000 laboratory-confirmed covid patients

Accuracies of LR (87.91%), DT (86.87%), KNN (89.83%), SVM (89.02%), RF (87.93%), neural network (89.98%)

5739 patients

Linear SVM (0.81), LR (0.75), RF (0.80) and XGB (0.73)

Aljouie et al. [13] Linear SVM, LR, RF, and XGBoost

3 Methodology 3.1 Data Collection For our study, we used 3 datasets-Dataset 1 from paper [13], Dataset 2 from paper [7], and Dataset 3 from paper [6] that were available from the related works that we read. Dataset 1 [13] consists of confirmed COVID-19 patients from Riyadh’s King Abdulaziz Medical City. From the first occurrence, that occurred on April 2, 2020, through the last case, which occurred on June 18, 2020, there were 5739 patients in all. But they had an inclusion criterion for determining the mortality endpoint. So, checking with the inclusion criteria, the final dataset provided by the corresponding author of the paper [13] was taken for our study. Dataset 1 [13] consisted of 1513 rows of patient information and 34 columns of demographic and clinical features. Dataset 2 [7] contains EMR records from 628 Masih Daneshvari Hospital patients from February 20th to May 4th, 2020. Here also, the data underwent an inclusion criteria and out of which 492 patients only met the requirements for identifying the

Machine Learning-Based Mortality Prediction of COVID-19 Patients

161

disease and severity analysis upon hospitalization. Therefore, Dataset 2 [7] consists of 492 rows of patient information and 37 columns of demographic and clinical features. Dataset 3 [6] consists of covid patients treated at Wuhan Third Hospital from January to February, 2020. It contained 246 rows of patient information and this dataset had the largest number of 110 demographic and clinical features among the 3 datasets we are using in this work.

3.2 Data Preprocessing Data preprocessing was done in all the 3 datasets. In Dataset 1 [13] and Dataset 3 [6], certain categorical variables were found. As ML techniques are often centered on mathematical equations, incorporating the data which is categorical into the equation will cause certain issues as the equations solely require numerical values. For instance, in Dataset 1 [6], we found 4 categorical variables namely-Case_ID, Gender, Vital_status, and Ventilation_support_status. These variables were encoded into numbers using Scikit-learn library’s LabelEncoder() class. Also, in data preprocessing, another important task is identifying and accurately treating the null values and missing values. Neglecting it can result in erroneous and inaccurate conclusions and inferences. Data normalization and standardization were done in all 3 datasets. For normalization and standardization, MinMaxScaler() and StandardScaler() from Scikit-learn library were used, respectively. Dataset splitting is the next step in data preprocessing in machine learning. All data required for the ML model must be categorized into two sets–test and training set. The method for separating the dataset differs depending on its shape and size. Here, for all 3 datasets, a ratio of both 70:30 and 80:20 was being used for different algorithms.

3.3 Study Design This study aimed to calculate the disease severity in covid patients utilizing the demographic along with clinical features. For our research, we used 3 datasets that will help us to understand how this data can be used for clinical outcomes as well as for predicting survival rate in patients with covid. ML algorithms such as Random Forest (RF), Support Vector Machines (SVM), XGBoost Classifier, and Logistic Regression (LR) were used. Accuracy comparison was done on the 3 datasets using the 4 ML algorithms. The methods we used for selecting features were Information Gain and Support Vector Machine Recursive Feature Elimination (SVM-RFE). Accuracy comparison was again done after feature selection with the 4 ML algorithms to check whether there was any change in the prediction after selecting important features that help in determining the disease severity in covid patients.

162

R. Ani et al.

SVM: SVM stands for Support Vector Machines which is a Supervised ML algorithm for binary classification. Its purpose is for determining the optimum line for categorizing n-D space into groups. This will help to readily place additional data points into correct category in future. This optimal line is called a hyperplane and support vectors are the extreme points that assist in creating a hyperplane. An ideal hyperplane is the one with the greatest margin. There are specific types of SVMs that can be used for particular machine learning problems. To distinguish the data, SVM employs a big margin and the kernel function as a key notion. Selection of right kernel function is critical in SVM since it is crucial for greater dimensions. Because of its great accuracy, SVM is employed in categorization [17]. Here we are using support vector classification (SVC). SVC class performs binary classification on datasets. Random Forest: It is a supervised ml algorithm. It works on the concept of trees. In this algorithm, weak output produces strong output by combining all the results. It is used for both classification and regression. It’s a collection of DTs that avoids the problems of overfitting [18]. It comprises several DTs on different subgroups of the given dataset and averages the results to enhance the dataset’s estimated accuracy [19]. Maximum voted class turns out to be the prediction of our model. Scikit-learn helps in creating and training the model. Then RF classifier model from Scikit-learn is imported, instantiated, and fitted into the training data. XGBoost: XGBoost is a competitive ml implementation of gradient boosted decision trees (GBDT) optimized for speed and performance. It stands for Extreme Gradient Boosting. Decision trees are constructed sequentially in this approach. Regression, classification, ranking, and user-defined prediction problems can be solved by using XGBoost. For building the classification model, it’s the XGBClassifier class we need to load from XGBoost. XGBClassifier is one of the most powerful classification algorithms, and it frequently generates state-of-the-art predictions. LR: Logistic regression, or LR, is a supervised learning classification approach for predicting the likelihood of a target variable. When the dependent variable (target) is categorical, logistic regression is employed. The target variables must always be binary, and factor level 1 represents the intended outcome. The logistic function, often known as the sigmoid function, is the foundation of LR. It takes its input as a linear equation and utilizes log odds and logistic function to execute a binary classification task. The model is built using LogisticRegression() (Fig. 1). Feature Selection-SVM-RFE: SVM-RFE’s main goal is to calculate for all features’ ranking weights thereby sorting them in accordance with weight vectors as the foundation of classification. SVM-RFE is a backward removal of features iteration algorithm. SVM is trained using training dataset. Different weights are assigned to different attributes. Feature with the least rank is removed. Same steps are being carried out again on the training dataset. Lastly, precision is determined [20]. Scikit-learn’s RFE class provides the RFE method. In order to use RFE, set up RFE class using “estimator” parameter specifying the algorithm to use and the “n_features_to_select” argument specifying the amount of features to choose. To

Machine Learning-Based Mortality Prediction of COVID-19 Patients

163

Fig. 1 Model for predicting the mortality of patients with COVID-19

pick features, class should be fit on a training dataset using the fit() function after it has been configured. After the class has been fitted, the “support_” attribute, which provides a True/False for every input variable, is used for examining input variables that have been chosen. Feature Selection—Information Gain The Information Gain score was used to choose attributes, which ranges from 0 to 1, with 1 indicating the most relevant information and 0 indicating the least relevant information. IG(S, a) = Entropy(S) − Entropy(S, a)

(1)

Here IG(S, a) denotes the details for the dataset S for the variable a, H(S) denotes the dataset’s entropy, and H(S|a) denotes the dataset’s conditional entropy given the variable a. Entropy is the measure of impurity in the data. The value of entropy ranges from 0 to 1. Entropy = −

n  i

where Pi is the probability of class i.

Pi log2 Pi

(2)

164

R. Ani et al.

4 Result Four ML algorithms were used in this research for predicting death probability in covid patients utilizing a prognostic model centered on comorbidities, biochemical markers, clinical, and demographic predictors. We took 3 datasets and compared the accuracy of mortality prediction with and without feature selection. We also found some important biomarkers that will help us to determine high-risk patients. Out of the 3 datasets, Dataset 1 [13] performed very well. XGBoost outperformed the other 3 algorithms. XGBoost achieved an accuracy of 92.2%. SVM, RF and LR algorithms achieved an accuracy of 90.76%. After performing feature selection methods on the dataset, not much drastic change in accuracy was found. When the SVM-RFE method was applied, RF performed better than other 3 algorithms with 91.09% accuracy. When applying information gain on Dataset 1 [13], XGBoost achieved a maximum accuracy of 92%. Age at diagnosis, gender, hypertension, patient with chronic ischemic heart disease, patient with coronary artery disease, RBC, Red cell Distribution Width (RDW), Mean corpuscular Hemoglobin Concentration (MCHC), Hemoglobin, Hematocrit and Mean corpuscular Hemoglobin (MCH) were the important features found from Dataset 1 [13] that will help us in the timely prediction of death risk in covid patients. In Dataset 2 [7], XGBoost achieved a maximum accuracy of 77.78%. After performing SVM-RFE feature selection method, both SVM and RF achieved an accuracy of 82.95%. When applied information gain on Dataset 2 [7], RF performed better than other three algorithms with 75% accuracy. Age, Cardiovascular disease (CVD), WBC, absolute neutrophils, hematocrit (HCT), International Normalized Ratio (INR), PCO2 , Base Excess (BE), Creatinine (Cr), lactate dehydrogenase (LDH), MCH, Platelet (PLT), Erythrocyte Sedimentation. Rate (ESR), PH and Blood Urea Nitrogen (BUN) were the important biomarkers found from Dataset 2 [7] that will help us in the timely prediction of death risk in covid patients. In Dataset 3 [6], SVM outperformed the other three algorithms with 86% accuracy, followed by XGBoost with 85.37% accuracy. After performing the SVM-RFE feature selection method, SVM was the best performer with the same accuracy as before. But when applied information gain, RF achieved a maximum accuracy of 86%. Other algorithms showed a slight decrease in accuracy. Age, Creactive protein, alpha-hydroxybutyrate dehydrogenase, neutrophil percentage, lactate dehydrogenase (LDH), urea nitrogen, total bilirubin, chlorine, percentage of lymphocyte, ratio of leukocyte to globulin, direct bilirubin, myoglobin, cystatin C, procalcitonin and B-type natriuretic peptide (BNP) were the important biomarkers found from Dataset 3 [6] that will help us in the timely prediction of death risk in patients with covid (Tables 2, 3 and 4).

Machine Learning-Based Mortality Prediction of COVID-19 Patients

165

Table 2 Comparative performance shown by ML algorithms (XGBoost, SVM, RF, LR) on 3 datasets XGBoost (%)

SVM (%)

RF (%)

LR (%)

Dataset 1

92.2

90.76

90.76

90.76

Dataset 2

77.78

76.14

77.27

37.5

Dataset 3

85.37

86

82

18

Table 3 Comparative performance shown by ML algorithms (XGBoost, SVM, RF, LR) on 3 datasets after performing Feature Selection-SVM-RFE Dataset 1

XGBoost (%)

SVM (%)

RF (%)

LR (%)

90.4

90.76

91.09

90.76

Dataset 2

79.17

82.95

82.95

37.5

Dataset 3

84.15

86

82

16.33

Table 4 Comparative performance shown by ML algorithms (XGBoost, SVM, RF, LR) on 3 datasets after performing feature selection-information gain XGBoost (%)

SVM (%)

RF (%)

LR (%)

Dataset 1

92

90.76

90.1

90.76

Dataset 2

70.83

73.86

75

37.5

Dataset 3

80.49

84

86

16.33

5 Conclusion Early detection of severity and mortality aids in prioritizing high-risk patients, ensuring that they receive the best care available and, ideally, improving their results. It can also help to relieve pressure on healthcare systems, enhance decision making, and make better use of limited resources. It’s difficult to choose the best models at this point. Yet, because the epidemic occurs at various stages, the machine learning algorithms must be cutting edge. A case study on covid patient mortality prediction using ML was performed and important biomarkers that will help in timely prediction of death risk in patients with covid were identified.

References 1. Alballa N, Al-Turaiki I (2021) Machine learning approaches in COVID-19 diagnosis, mortality, and severity risk prediction: a review. Inform Med Unlocked 24:100564. https://doi.org/10. 1016/j.imu.2021.100564 2. Aktar S, Ahamad MM, Rashed-Al-Mahfuz M et al (2021) Machine learning approach to predicting COVID-19 disease severity based on clinical blood test data: statistical analysis and

166

R. Ani et al.

model development. JMIR Med Inform 9:e25884. https://doi.org/10.2196/25884 3. Yadaw AS, Li Y-C, Bose S et al (2020) Clinical features of COVID-19 mortality: development and validation of a clinical prediction model. Lancet Digit Health 2:e516–e525. https://doi.org/ 10.1016/S2589-7500(20)30217-X 4. Hu C, Liu Z, Jiang Y et al (2021) Early prediction of mortality risk among patients with severe COVID-19, using machine learning. Int J Epidemiol 49:1918–1929. https://doi.org/10.1093/ ije/dyaa171 5. Hu H, Du H, Li J et al (2020) Early prediction and identification for severe patients during the pandemic of COVID-19: a severe COVID-19 risk model constructed by multivariate logistic regression analysis. J Glob Health 10:020510. https://doi.org/10.7189/jogh.10.020510 6. Yu Y, Zhu C, Yang L et al (2020) Identification of risk factors for mortality associated with COVID-19. PeerJ 8:e9885. https://doi.org/10.7717/peerj.9885 7. Mahdavi M, Choubdar H, Zabeh E et al (2021) A machine learning based exploration of COVID-19 mortality risk. PLoS One 16:e0252384. https://doi.org/10.1371/journal.pone.025 2384 8. Kar S, Chawla R, Haranath SP et al (2021) Multivariable mortality risk prediction using machine learning for COVID-19 patients at admission (AICOVID). Sci Rep 11:12801. https://doi.org/ 10.1038/s41598-021-92146-7 9. Rahman T, Al-Ishaq FA, Al-Mohannadi FS, et al (2021) Mortality prediction utilizing blood biomarkers to predict the severity of COVID-using machine learning technique. Diagnostics (Basel) 11:1582. https://doi.org/10.3390/diagnostics11091582 10. Karthikeyan A, Garg A, Vinod PK, Priyakumar UD (2021) Machine learning based clinical decision support system for early COVID-19 mortality prediction. Front Public Health 9:626697. https://doi.org/10.3389/fpubh.2021.626697 11. Subudhi S, Verma A, Patel AB et al (2021) Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19. NPJ Digit Med 4:87. https://doi.org/ 10.1038/s41746-021-00456-x 12. Pourhomayoun M, Shakibi M (2021) Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making. Smart Health 20:100178. https://doi. org/10.1016/j.smhl.2020.100178 13. Aljouie AF, Almazroa A, Bokhari Y et al (2021) Early prediction of COVID-19 ventilation requirement and mortality from routinely collected baseline chest radiographs, laboratory, and clinical data with machine learning. J Multidiscip Healthc 14:2017–2033. https://doi.org/10. 2147/JMDH.S322431 14. Ashok A, Gopika PT, Charishma G, et al (2021) Application of geometric Poisson distribution for COVID-19 in selected states of India. In: Lecture Notes in Mechanical Engineering. Springer Singapore, Singapore, pp 435–446 15. Nair AJ, Rasheed R, Maheeshma KM, et al (2019) An ensemble-based feature selection and classification of gene expression using support vector machine, K-nearest neighbor, decision tree. In: 2019 International Conference on Communication and Electronics Systems (ICCES). IEEE, pp 1618–1623 16. Kavitha KR, Rajendran GS, Varsha J (2016) A correlation based SVM-recursive multiple feature elimination classifier for breast cancer disease using microarray. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, pp 2677–2683 17. Choudary MNS, Bommineni VB, Tarun G, et al (2021) Predicting covid-19 positive cases and analysis on the relevance of features using SHAP (SHapley additive exPlanation). In: 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC). IEEE, pp 1892–1896 18. Jose C, Gopakumar G (2019) An improved random forest algorithm for classification in an imbalanced dataset. In: 2019 URSI Asia-Pacific Radio Science Conference (AP-RASC). IEEE, pp 1–4 19. Ani R, Krishna S, Anju N, et al (2017) IoT based patient monitoring and diagnostic prediction tool using ensemble classifier. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, pp 1588–1593

Machine Learning-Based Mortality Prediction of COVID-19 Patients

167

20. Sankaran S, Sunku Mohan V, Seshadrinath M et al (2021) Predictive modeling of the spread of COVID-19: the case of India. Lecture Notes of the Institute for Computer Sciences. Social Informatics and Telecommunications Engineering. Springer International Publishing, Cham, pp 131–149

Smart Computer Monitoring System Using Neural Networks Stephen Jeswinde Nuagah, Bontha Mamatha, B. Hyma, and H. Vijaya

Abstract In order to solve the problem that the recognition effect of traditional monitoring and recognition algorithms cannot meet the needs, a computer monitoring system based on Internet of things and neural network algorithm is proposed. Firstly, the basic functions and development status of the computer monitoring system are analyzed. Then, based on the study of the three-tier architecture and key technologies of the Internet of things, and based on the structure and characteristics of the computer monitoring system, the three-tier architecture of the computer monitoring system based on the Internet of things (sensing layer, network layer, and application layer) is put forward. Finally, according to the demand analysis of the real intelligent monitoring system, the overall framework of the server is designed, and the intrusion detection algorithm and wandering detection algorithm of the human behavior recognition algorithm based on the Internet of things and neural network are applied to the identification server of the intelligent monitoring system. The results show that the system can support more than 16 channels of real-time recognition through accelerated optimization. Compared with traditional intrusion detection, neural network algorithm can distinguish whether the intrusion subject is human body, and has better recognition effect and greater practical value. Keywords Neural network · Behavior identification · Intrusion detection · Wandering detection · Internet of things

S. J. Nuagah Department of Electrical Engineering, Tamale Technical University, Tamale, Ghana e-mail: [email protected] B. Mamatha (B) K L University, Vaddeswaram, Vijayawada, India e-mail: [email protected] B. Mamatha · B. Hyma Department of CSE, Guru Nanak Institutions Technical Campus, Ibrahimpatnam, India H. Vijaya Department of CSE (CS/DS), Guru Nanak Institutions Technical Campus, Ibrahimpatnam, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_13

169

170

S. J. Nuagah et al.

1 Introduction Through the Web and sensors, sensor technology, computer technology, wireless communication, and the Internet are linked to improve the application of the smart environmental monitoring system is developed based on internet of things to internal environment monitoring [1, 2]. Internet of things is known as the third wave of information industry development after computer and Internet. Once the concept of “Internet of things” was put forward, it immediately attracted the attention of governments, enterprises, and academia. Driven by the mutual promotion of demand and R&D, it is rapidly popular all over the world. At present, many countries are spending a lot of money on in-depth research. The Internet of things is a new technology system formed by the integration of a number of information technologies [3]. The Internet of Things (IoT) is a fairly new concept that is defined as the integration of all networkconnected devices that can be managed via the web and, as a result, provide real-time information and allows communication with humans who use them [4, 5]. However, so far, there is no recognized definition of the Internet of things in the world. A more extensive explanation is that the Internet of things refers to the sensing devices that use various types of information, such as various network nodes for wireless sensors, radio frequency identification (RFID) devices, infrared sensors, mobile phones, PDA, global positioning system (GPS), laser scanner and other devices are combined with the Internet to form a huge network [6]. Several studies on human health monitoring based on Internet of things are now being undertaken throughout the world. Foreign nations were ahead of China in creating Internet of things-based human health monitoring systems [7, 8]. In this network, objects become “feel and think” and can “communicate” with each other without human intervention. Its essence is to embed sensors and other devices into objects and network them to finally access the Internet, so as to extend human’s ability to perceive and control the external world by making objects have “wisdom” [9]. In IoT technology, smart sensing devices and the Internet are employed to provide an effective solution to the problems that networks, public and private sector firms, and government organizations face throughout the world. The Internet of Things (IoT) has ushered in a new era of data analysis that makes use of smart systems and clever devices for a wide range of applications [10, 11]. Both computer technology and Internet of things technology involve the process of data reading, transmission, and processing. The monitored objects are quantified in various ways, and the processed data are transmitted to the application part for processing by certain means. However, compared with Internet of things technology, computer monitoring system is more similar to a small function of Internet of things. Therefore, in terms of the similarity of the three processes of information reading, transmission, and processing, it increases the possibility of the application of Internet of things technology in computing monitoring system, so as to better expand the application scope of computer monitoring technology. In today’s society, due to the brutal interference of some hegemonic countries and the global exploitation and oppression of big capital monopoly forces, many violent crimes and extreme forces have bred, which are the

Smart Computer Monitoring System Using Neural Networks

171

public enemies of some global peace forces [12]. As a populous country, China has a dense flow of people in public places. Once an accident occurs, it is easy to cause major group accidents, such as the New Year’s stampede on the Bund of Shanghai on December 31, 2014. Therefore, it is very necessary in the market to add intelligent identification algorithm to the monitoring system based on the existing monitoring system, and the accuracy of the identification algorithm can be handled by only a small number of staff. At present, most recognition systems on the market have problems such as slow recognition speed and poor recognition accuracy, especially lack of robustness in complex and changeable environment, often need multiple sets of parameters, or need to be manually optimized during deployment [13]. The extraordinary ability of smart devices to share information among themselves has expedited the growth of the Internet of things. The adoption of IoT in multiple applications for remote monitoring of patients’ criticality levels has increased its importance in the healthcare business [14, 15]. In order to solve this series of problems, it is necessary to develop an intelligent monitoring and recognition algorithm with high recognition rate, strong robustness and fast operation speed, and develop a complete intelligent monitoring system. Figure 1 shows the computer monitoring system.

Fig. 1 Computer monitoring system

172

S. J. Nuagah et al.

2 Literature Review At present, many scholars have studied the related fields of equipment remote monitoring system, and started to work on various engineering and electronic equipment remote monitoring systems from the system architecture based on C/S or B/S mode. The application software in related fields has also been developed: some researchers pointed out that the equipment is prone to failure due to factors such as poor working environment of construction machinery and equipment [16]. Kowal and others proposed the remote monitoring system of road roller based on GSM communication. The enterprise centralized control center received the data information transmitted through advanced communication and established the enterprise central information management system. This technical means solved the problem of remote dynamic monitoring and management of equipment and construction quality [17]. Alfarozi, S. and others realize real-time monitoring of construction machinery equipment, which can ensure the normal and efficient operation of construction machinery and timely troubleshoot the faults [18]. After studying the technology of remote monitoring, Huang and others proposed that the premise and foundation of networked manufacturing is to realize remote monitoring. These technical means must be based on the architecture of remote fault diagnosis system of construction machinery equipment, so that the system can quickly and accurately locate the fault of equipment, so as to complete the rapid processing of fault [19]. Zhang and others developed the remote monitoring system of construction machinery, put forward the technical scheme in the implementation process of the system, and finally realized the remote monitoring of the working state of construction machinery and equipment, making it possible to remotely dispatch and command mechanized construction equipment [20]. Che and others proposed neurocognitive machine. Neurocognitive machine is a multi-layer network structure and a bionic model of biological visual nervous system, so that the mathematical model can also approach human vision to visually recognize objects [21]. Tong and others pointed out that with the rapid development and popularization of Internet technology, the impact of remote monitoring system on the management of electronic equipment is revolutionary. Taking the remote monitoring system of mechanical equipment in the railway system as an example, the optimal data acquisition mode based on B/S mode is studied, the data acquisition module and system framework are designed, and the reliability and practicability of the remote monitoring system are analyzed [22]. Wang and others pointed out that the remote monitoring system will become an essential element for equipment maintenance and after-sales service in the future. They also studied the key technologies adopted in the system, the selection of network architecture and the formulation of communication protocol, and looked forward to the prospect of the application and popularization of the remote monitoring system to power, construction, and other related industries [23]. On the basis of this research, aiming at the shortcomings of the existing intelligent recognition algorithms, such as poor recognition effect, few types of recognition objects and behaviors and weak robustness in complex scenes, this paper proposes

Smart Computer Monitoring System Using Neural Networks

173

recognition algorithm to improve the original recognition algorithm based on convolution neural network, so as to achieve better robustness and higher recognition accuracy, meet the application requirements in complex scenes, and adapt to multiobjective Multiscale detection. After achieving good results, we can make some academic contributions in the field of recognition algorithms.

3 Methodology 3.1 Overall Design of Computer Monitoring System According to the analysis of system functional requirements and combined with design principles, the system adopts distributed control system (DCS). The core idea of DCS is centralized management and decentralized control. For this system, the sensing and control devices are scattered to various monitoring locations to monitor different objects. The communication network is used to realize the data transmission between the sensing and control device and the application software, and the application software can realize the centralized management of the sensing and control device [24]. The sensing and control layer device realizes the specific data acquisition and control of corresponding devices through wired, ISM wireless and infrared methods, and through RS-485 Can bus interacts with application layer software. It can also Send alarm information through GSM module and use the Internet network to realize the remote control of the system. The overall functional composition of the system is as follows shown in Fig. 2. A prefilled copyright form is usually available from the conference website. Please send your signed copyright form to your conference publication contact, either as a scanned PDF or by fax or by courier. One author may sign on behalf of all of the other authors of a particular paper, providing permission has been given to do so. In this case, the author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. Digital signatures are not acceptable.

3.2 Monitoring Software Design As the only window for operating users, the client must have the characteristics of simple interface, simple operation and simple and clear related functions. The client of the software mainly includes the following functions: (1) Display of real-time data: use the communication line to request data communication from the lower computer module through serial communication technology to obtain real-time data.

174

S. J. Nuagah et al.

Fig. 2 Overall function composition of the system

(2) Query and display of historical data. The qualified query results can be obtained according to different query conditions. The query results can be displayed in charts, histograms, and line charts. (3) Export of data report. After querying the data required by users, it is exported to excel table for users to further analyze the relevant data. (4) Setting of system parameters. According to different working environments, select the required types of sensing layer modules, and determine the communication mode and communication protocol. The operating parameters of the sensing layer device can also be modified. The operation of the client for data is based on the database module, which queries or modifies the contents in the corresponding database for different control commands [25]. Figure 3 shows the data interaction process between the client and the database.

3.3 Database Module Design Database module is the core content of the monitoring software. The quality of the database module will directly affect the quality of the whole system. According to the functional requirements of the system and according to the different data types

Smart Computer Monitoring System Using Neural Networks

175

Fig. 3 Interaction process between client module and database module

Table 1 Controller data sheet Field name

Data type

Length

Primary/foreign key

Field value constraint

Controller ID

Char

20

P

Not null

F

Acquisition time

Char

20

Data 1

Char

20

Null

Null

Data 2

Char

20

Null

Data 3

Char

20

Null

Data 4

Char

20

Null

Data 5

Char

20

Null

and uses, it can be divided into the following four types: (1) Data storage table, (2) Alarm data sheet, (3) Data report, (4) Working parameter configuration table. Each type of table contains one or more tables. The database module design is shown in Table 1.

3.4 Neural Network Model No matter how good algorithms are invented, the purpose is to realize artificial intelligence, but it seems that no matter what kind of computing model is difficult to surpass human neurons. Artificial neural network is actually a calculation model inspired by this starting point, which is simply called neural network [26]. Like human neural networks, neural networks can have many neurons. These neurons

176

S. J. Nuagah et al.

Fig. 4 Schematic diagram of neurons

are connected with each other to form a multi-level network structure. Individual neurons are shown in Fig. 4. In Fig. 4, there are three input values, namely (x 1 , x 2 , x 3 ), and the formula of the result is (1). 



h w,b (X ) = f W x = f T

 n 

 Wi xi + b

(1)

i=1

where x is the input signal, which can generally be a matrix or a vector, and the activation function is f . W in the formula is called weight, and b is an offset parameter. At the time, the most commonly used activation function is the activation function of nonlinear change. The specific activation function will be described below. Compared with biological neural networks, x can be regarded as the stimulation signal of neurons, and the output signal of this neuron is the stimulation signal transmitted to the next neuron, while W and b can be regarded as the characteristics of this neuron. Through this neuron, we actually process the input signal [27].

3.4.1

Error Back Propagation Algorithm

Error back propagation (BP) algorithm is hereinafter uniformly referred to BP algorithm. The algorithm idea of BP algorithm is to conduct the error between the output result of the last output layer of neural network and the real result (i.e., the real label of input data) in a reverse way according to the structure of the network. The reverse conduction process can be regarded as the reverse process of the whole forward conduction. All input and output positions are reversed, and the connection relationship between layers is reversed. Then, the parameters of the whole neural network are learned through cyclic iterative training. The training method of BP algorithm is gradient descent algorithm. The error generated by each forward propagation will correct the weight and bias in the network until the error of the whole network reaches a minimum, and the network weight and bias will be considered as the optimal solution.

Smart Computer Monitoring System Using Neural Networks

177

    Suppose there are 2 training sets of samples x (1) , y (1) , . . . , x (m) , y (m) , where x is the independent variable of the sample, the input value, and Y is the label of the training sample. It is assumed that gradient descent is used for supervised learning of the current neural network, that is, the whole neural network parameters W and b are adjusted through cyclic iteration. Assuming that a sample has characteristic input x and real label Y, the loss function can be obtained as follows (2). 2 J (W, b; x, y) = 1/2 h w,b (x) − y

(2)

It can be easily analyzed from formula (2). This function is to obtain the difference between the predicted value and the real value. However, the calculation of a single sample is obviously not enough to solve the practical problem of massive data, so the above formula is extended to the problem of m samples. m samples will be input into the network at the same time as a batch. At this time, J’s formula will be formula (3). It can be found from the formula that the overall loss function formula consists of two parts. The first part is the mean square deviation of the loss function J(W, b). In this formula, a weight attenuation term is added compared with formula (2). The existence of weight attenuation term is to adjust the change amplitude of weight and prevent over fitting. The existence of over fitting phenomenon will make the whole model over fit the original data, r and make the whole model lose its robustness, resulting in insufficient feature extraction ability for new test samples [28, 29]. λ It is a super parameter to adjust the first and second terms in the formula.

J (W, b) = 1/m

m  i=1

(W, b; x, y) + λ/2

n sl  sl +1 l −1 

W ji(l)

2 (3)

l=1 i=1 j=1

The purpose of BP algorithm is to adjust the network parameters through multiple rounds of iterative training, and finally minimize the whole loss function. As mentioned above, when the network starts running initially, all parameters in the network will be initialized. After that, through multiple training iterations, the gradient descent method will be used to minimize the loss value. The loss function J(W, b) is nonconvex and is a surface in the geometric sense. The gradient descent method may make the loss function fall into a local minimum for a long time in the iterative process that is, the global minimum is not the optimal solution. Although in practical applications, this local minimum can also be the whole neural network to achieve good results, in order to break through the local minimum, the optimization method can also be used. The common optimization method is to supplement an impulse value based on the original descent gradient when the original gradient drops. This method can break through the original local minimum region and continue to iterate to the region with smaller loss value [30]. By deriving the loss function, the modification formula of each round of parameters W and b can be obtained as follows.

178

S. J. Nuagah et al. (l) Wi(l) j = Wi j − α

∂ ∂ Wi(l) j

J (W, b)

(4)

In the formula (1) represents the number of network layers, the network parameters (l) of layer l are weight Wi(l) j and bias bi , Z is the number of this neuron in this layer, and J corresponds to the number of connected neurons in layer l-1 in layer l-1. In the formula, the loss function J will be multiplied by the partial derivatives of the two parameters, respectively. And α is the correction of this iteration.

4 Results The intelligent monitoring system in this paper uses three algorithms: abnormal behavior detection algorithm, intrusion detection algorithm, and wandering detection algorithm. The three algorithms are tested in the corresponding experimental environment in the simulation experiments, which proves the excellent accuracy of the algorithm. In the system application of the actual algorithm, the same parameters as the experimental environment are used. At the time, there is no standard test process and database to conduct standard quantitative test on the intelligent monitoring system in the actual environment, so this paper independently defines the test process and test data [31, 32]. Test equipment and deployment environment: two Hikvision Mini PTZ infrared cameras (Both are network cameras, and RTSP protocol signals can be obtained), i5 processor, 16 G memory, GTX 780, Ubuntu 16.04 system. Two cameras are installed in two indoor environments to monitor the two indoor environments and test three situations: pedestrian abnormal behavior, indoor intrusion and personnel abnormal wandering. Each indoor environment and each situation shall be tested 20 times, respectively, the number of missed reports of the system shall be recorded, and the accuracy of the system test shall be counted. Table 2 shows the test results. In Fig. 5, the alarm category of abnormal behavior is three kinds of abnormal behavior, including fighting behavior, group running and group scattering. Intrusion detection is to detect whether pedestrians enter the monitoring area. If they enter, they will give an alarm. Wandering detection is to detect whether pedestrians stay in the monitoring area beyond the threshold. If they exceed the threshold, an alarm will be given. It can be found from Fig. 5 that the possibility of missing reports in pedestrian intrusion detection is very low, which is also because the ability to extract pedestrian features through the network is excellent, and there is almost no recognition failure. It can be found from the table that no scene has a certain impact on the recognition accuracy. The reason is that the monitoring environment of monitoring scene 2 is more complex and has a wider perspective than that of monitoring environment 1. It is about the interference factors brought by the environment of the whole teaching and research office, tables, chairs, computers, and other items. In addition, in the process of monitoring and identification, it is found that the detection sensitivity of abnormal

Smart Computer Monitoring System Using Neural Networks Table 2 System speed test

Number of videos recognized in parallel

179

Total time for minute video calculation (s)

Total time for minute video calculation (s)

1

5.2

5.20

3

11.46

4.08

5

16.41

3.40

8

23.51

2.07

10

31.17

2.12

13

40.8

2.11

15

48.06

2.17

18

47.64

2.15

20

54.24

2.16

behavior detection to the edge and corner of the monitoring perspective is less than that of the central area, which leads to the occasional omission of abnormal behavior in the corner of the monitoring area. The reason is that most of the videos in the training process of the detected deep neural network are human actions in the central area of the video, which makes the whole network more sensitive to the abnormal behavior in the central area of the monitoring perspective to a certain extent. In the process of fine tuning training, the training video can be randomly cropped, which can solve this problem to a certain extent [33, 34]. 120

Monitoring scenario 1 Monitoring scenario 2

100

percentage

80 60 40 20 0 Abnormal behavior detection

Fig. 5 System accuracy test

Pedestrian wandering detection

Pedestrian intrusion detection

180

S. J. Nuagah et al.

Based on the accuracy and real-time test of the whole system, as well as the results of deployment and use, the intelligent monitoring and identification system in this paper mainly has the following advantages: 1. The system can operate stably for a long time, and can handle some abnormal conditions independently, so as to ensure that professionals do not need to pay attention to the operation state of the system for a long time during operation. 2. The system has strong robustness. Even if it is deployed in different real monitoring environments, it does not need to adjust parameters separately. The recognition rate that meets the needs of alarm can be achieved by using the same set of network parameters. 3. With the support of single GTX970, the system can ensure the real-time calculation of various algorithms of 16 cameras, fully meet the deployment of larger monitoring system, and ensure that the cost of the whole system will not be too high. 4. The whole system can be integrated with the existing monitoring system on the market, which greatly reduces the cost of updating the system and makes it possible to promote the system. 5. The overall architecture of the system is flexible that is, it realizes both web client and desktop application client, which can meet most customer needs. 6. The expandability of the system is considered in the development process, so it can work with a single identification server or use multiple identification servers to monitor and identify tasks at the same time.

5 Conclusion Through the analysis and research of external computer monitoring system and combined with Internet of things technology, this paper completes the design and development of computer monitoring system software based on Internet of things. This paper briefly introduces the development status of computer monitoring system and Internet of things technology. According to the three-tier system architecture of Internet of things and combined with the structural characteristics of computer monitoring system, this paper determines the architecture and overall implementation scheme of computer monitoring system based on Internet of things. This paper also uses the neural network algorithm to completely design and implement an intelligent monitoring and identification system based on B/S architecture. The system not only has strong identification ability, but also fully meets the practical needs of multi-channel real-time detection.

Smart Computer Monitoring System Using Neural Networks

181

References 1. Garg RK, Bhola J, Soni SK (2021) Healthcare monitoring of mountaineers by low power wireless sensor networks. Inform Med Unlock 27:100775 2. Sriram GS (2022) Edge computing vs. cloud computing: an overview of big data challenges and opportunities for large enterprises. Inter Res J Modern Eng Tech Sci 4(1):1331–1337 3. George T, Ganesan V (2020) An effective technique for tuning the time delay system with pid controller-ant lion optimizer algorithm with ann technique. Int J COMADEM 23(1):39–48 4. Wang L, Kumar P, Makhatha ME, Jagota V (2021) Numerical simulation of air distribution for monitoring the central air conditioning in large atrium. Int J Syst Assu Eng Manag. https:// doi.org/10.1007/s13198-021-01420-4 5. Tang S, Shabaz M (2021) A new face image recognition algorithm based on cerebellum-basal Ganglia mechanism. J Healthcare Eng 3688881:11 p 6. Domashova JV, Emtseva SS, Fail VS, Gridin AS (2021) Selecting an optimal architecture of neural network using genetic algorithm. Procedia Comp Sci 190(14):263–273 7. Bhola J, Shabaz M, Dhiman G, Vimal S, Subbulakshmi P, Soni SK (2021) Performance evaluation of multilayer clustering network using distributed energy efficient clustering with enhanced threshold protocol. Wireless Personal Comm. https://doi.org/10.1007/s11277-021-08780-x 8. Sriram GS (2022) Resolving security and data concerns in cloud computing by utilizing a decentralized cloud computing option. Inter Res J Modern Eng Tech Sci 4(1):1269–1273 9. Nguyen VT, Pashchenko FF (2021) Development of an object recognition algorithm based on neural networks with using a hierarchical classifier. Procedia Comp Sci 184(12):438–444 10. Li H, Shabaz M, Castillejo-Melgarejo R (2021) Implementation of python data in online translation crawler website design. Inter J Syst Assu Eng Manag. https://doi.org/10.1007/s13198021-01215-7 11. Bukhari SNH, Jain A, Haq E, Khder MA, Neware R, Bhola J, Lari Najafi M (2021) Machine learning-based ensemble model for Zika Virus T-cell epitope prediction. J Healthcare Eng, 1–10 12. Pan JS, Shan J, Zheng SG, Chu SC, Chang CK (2021) Wind power prediction based on neural network with optimization of adaptive multi-group salp swarm algorithm. Clust Comput 24(3):2083–2098 13. Matayoshi J, Cosyn E, Uzun H (2021) Are we there yet? evaluating the effectiveness of a recurrent neural network-based stopping algorithm for an adaptive assessment. Int J Artif Intell Educ 31(2):304–336 14. Wang B, Yao X, Jiang Y, Sun C, Shabaz M (2021) Design of a real-time monitoring system for smoke and dust in thermal power plants based on improved genetic algorithm. J Health Eng 7212567:10 p. https://doi.org/10.1155/2021/7212567 15. Kumar A, Jagota V, Shawl RQ, Sharma V, Sargam K, Shabaz M, Khan MT, Rabani B, Gandhi S (2021) Wire EDM process parameter optimization for D2 steel. Mat Today: Proceed 37(2):2478–2482 16. Nguyen QH, Chou TY, Yeh ML, Hoang TV, Nguyen HD, Bui QT (2021) Henry’s gas solubility optimization algorithm in formulating deep neural network for landslide susceptibility assessment in mountainous areas. Environ Earth Sci 80(11):1–10 17. Kowal M, Ejmo M, Skobel M, Korbicz J, Monczak R (2020) Cell nuclei segmentation in cytological images using convolutional neural network and seeded watershed algorithm. J Digit Imaging 33(1):231–242 18. Alfarozi S, Pasupa K, Sugimoto M, Woraratpanya K (2020) Local sigmoid method: noniterative deterministic learning algorithm for automatic model construction of neural network. IEEE Access 99:1 19. Huang F, Zhang J, Zhou C, Wang Y, Huang J, Zhu L (2020) A deep learning algorithm using a fully connected sparse autoencoder neural network for landslide susceptibility prediction. Landslides 17(1):217–229

182

S. J. Nuagah et al.

20. Zhang Y, Jin Z, Chen Y (2020) Hybrid teaching-learning-based optimization and neural network algorithm for engineering design optimization problems. Knowledge-Based Systems 187(Jan), 104836.1–104836.18 21. Che M, Wang X, Wei Y (2020) A unified self-stabilizing neural network algorithm for principal takagi component extraction. Neural Process Lett 51(1):591–610 22. Tong Y, Sun W (2021) The role of film and television big data in real-time image detection and processing in the internet of things era. J Real-Time Image Proc 18(4):1115–1127 23. Wang Y, Liu X, Wang Y, Wang H, Wang H, Zhang SL et al (2021) Flexible seaweed-like triboelectric nanogenerator as a wave energy harvester powering marine internet of things. ACS Nano 15(10):15700–15709 24. Din S, Paul A (2020) Erratum to ‘smart health monitoring and management system: toward autonomous wearable sensing for internet of things using big data analytics’. Future Generat Comp Syst 108(Jul), 1350–1359 25. Ahmed N, Hussain MI (2021) Scalable internet of things network design using multi-hop IEEE 802.11ah. Telecomm Syst 78(4):577–588 26. Kwon JH, Zhang X, Kim EJ (2021) Scalable wi-fi backscatter uplink multiple access for battery-free internet of things. IEEE Access 99:1 27. Zhao M (2021) Information iterative retrieval of internet of things communication terminal based on symmetric algorithm. Wireless Pers Commun 117(4):3469–3485 28. Ezzahoui I, Abdelouahid RA, Taji K, Marzak A (2021) Hydroponic and aquaponic farming: comparative study based on internet of things iot technologies. Procedia Comp Sci 191(4):499– 504 29. Cao D, Xue D, Ma Z, Mei H (2022) Xiuos: an open-source ubiquitous operating system for industrial internet of things. Science China Inf Sci 65(1):1–2 30. Liu J, Duan Y, Wu Y, Chen R, Chen G (2021) Information flow perception modeling and optimization of internet of things for cloud services. Futur Gener Comput Syst 115(8):671–679 31. Maheswar R, Jayarajan P, Sampathkumar A, Kanagachidambaresan GR, Hindia M, Tilwari V et al (2021) Cbpr: a cluster-based backpressure routing for the internet of things. Wireless Pers Commun 118(4):3167–3185 32. Bhushan B, Sahoo C, Sinha P, Khamparia A (2021) Unification of blockchain and internet of things (biot): requirements, working model, challenges and future directions. Wireless Netw 27(1):55–90 33. Surantha N, Atmaja P, David, Wicaksono M (2021) A review of wearable internet-of-things device for healthcare. Procedia Comp Sci 179(11):936–943 34. Liu Z (2021) Construction and verification of color fundus image retinal vessels segmentation algorithm under bp neural network. J Supercomput 77(7):7171–7183

Using Deep Learning to Perform Payload Classification Jayesh Thakur and Kaushik Rane

Abstract Today, one of the most critical matters for a network administrator to heavily focus on their network is to manage the large volumes of traffic that are generated, otherwise there will be congestion over the pipe and eventual data loss. Most common reasons given for poor performance could stem from too many packets at a given router interface and an equalizing traffic condition. Network operators are sometimes required to provide Quality of Service for different applications which benefits customer’s network experience. Processing packet traffic is usually done in two ways; engineering approach with manual efforts and self-learning approach by leveraging on deep learning technology. The focus of this paper is on the process of designing deep learning models for network classification and yielding a packet unit and flow unit training data set using network traffic preprocessing. In order to accomplish this task, the authors attempt to detect network traffic and assign it to either unit from flow or packet classifier levels with good accuracy. They experiment with deep learning models including CNN, RNN, LSTM, and ResNet. The detailed flow-level detection procedure is introduced as well on how packet-level and flowlevel detection procedure should work with F1-score comparisons. It is noticed that the F1-score across LSTM is higher than the total F1-score RNN and CNN + RNN. It indicates that LSTM alone can provide similar or better performance without using complex models in the flow unit learning model. Keywords CNN · Deep learning · LSTM · Model · Packet classification · RNN · ResNet · Tuning

J. Thakur · K. Rane (B) Department of Information Technology, Mumbai University, Mumbai, India e-mail: [email protected] J. Thakur e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_14

183

184

J. Thakur and K. Rane

1 Introduction Indeed, most of the state-of-the-art Deep Learning networks for packet classification can be very easily retrained to solve the problems off low classification. However, when applying such methods on flow datasets, there exist a problem caused by the different abstraction between network packets and aggregated flows: while flows typically share identically shaped network packets while fresh batches are continuously delivered to process queues, different bytes may correspond to one single packet in a given batch. There is a large amount of responsibility that is placed on implementing and configuring appropriate services such as QoS policies, traffic types, user rate limits for different cases in order to optimize network performance. The simplest way to identify congested traffic and resolve these issues is with gauges. For example, one can use NetBTX or measurement tools like Networx or any research online [1]. Most common reasons given for poor performance could stem from too many packets at a given router interface and an equalizing traffic condition. Research on how to classify network traffic in order to provide smooth services according to application programs is actively being conducted. In general, rule-based and port-based network traffic classification methods are widely used. However, they are inefficient as they remain infeasible for hidden parts. To achieve better classification accuracy, payload-based classification approaches are developed, which verify the substance of the packet, procure the signature pertaining to the protocol, and match it with the signatures in the database to find a particular application or protocol. There is a problem with locality dependency when packet classification is performed by adding header information. Also, if the header information is changed, classification will not be performed properly [2]. But, the payload-based network classification method solves the locality dependency problem. Additionally, current payload-based methods provide the best classification accuracy. However, there is a practical problem due to the difficulty of accessing the payload of the raw packet and the encryption due to the user privacy policy. Advances in deep learning enable backpropagation through time. Having this ability offers an immense upside for network traffic classification, enabling professionals to make sense of traffic data can be intensive and time consuming. Through combining understanding of network traffic patterns with deep learning algorithms, it is now possible to automatically classify different types of network traffic. Our research is focused on self-developed data processing to generate training datasets. Through preprocessing, one packet in network traffic is imaged and generated as training data. Convolution Neural Network (CNN), Residual Network (ResNet), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and CNN + RNN are learned to classify network traffic using the generated packet unit and flow unit training datasets. CNN and ResNet models are suitable for classification of image data, they are used for learning imaged datasets of packet unit. The packet unit and flow unit datasets are then contrasted using five distinct deep learning models. The benefits from such algorithms being applied to network traffic

Using Deep Learning to Perform Payload Classification

185

monitoring are extensive. This can help the industry proactively assessing threats, determine what patterns attackers follow, and how they do it [3]. The rest of this paper is structured as follows. Section 2 presents relevant literature and reveals our study’s inspiration. Section 4 explains the deep learning models for network traffic classification. The model tuning method is discussed in Sect. 5. Section 6 contains the results and analysis of the experiments, which is followed by the conclusion in Sect. 7.

2 Literature Review Recently, there are many researches and technologies for packet classification in networks. In the existing research, there is a rule-based packet classification method. Recently, research on deep learning has been developed and research on packet classification using deep learning has been actively carried out. Packet classification using deep learning is a method of automatically classifying packets without human intervention. In this chapter, we study the rule-based packet classification research and the packet classification studies that utilized deep learning.

2.1 Rule-Based Packet Classification A rule-based classifier creates and executes a ruleset. A traffic classifier follows this ruleset to classify network traffic. With rule-based traffic classification, administrators of networks usually designate one link as the main upstream path and associate with it all packets destined for destinations outside of that subnetwork. On point-to-point links they typically designate protocol type or tunneling setup as the reason for forwarding packets to particular endpoints. The need for network classification arises from what are sometimes called “wide areas” in contrast to the “localized” area covered by private networks (intranets) operated largely according to hard-coded configurable rules. So, there are limitations to the rule-based packet classification method. Since the packet is classified using the information of the packet header, if the information of the packet that doesn’t match the packet is received, the packet can’t be classified or classified differently. In addition, because of rule-based packet classification with IP and port information of packet header, there are local limitations. Therefore, when a new network accesses or packets of a new network occur, there is a problem that a new rule must be redefined [4]. In the work of Fangfan Li et al., it was used to lower the dependency of packet header information. If there is similar information in the header information of the incoming packet using the header information found in the first packet, it is classified as the packet of the same type. Although the IP and port number of the packet is

186

J. Thakur and K. Rane

used less, the method of classifying the subsequent packets by using the header information of the first packet also depends on the header information.

2.2 Classify Packets Using Deep Learning Deep learning is transforming the way people use machine learning algorithms to parse data, and we’ve witnessed a lot of progress in just a shorts pan of time. This type of neural classification teaches computers to differentiate network traffic into established classes. Accordingly, studies are being actively carried out to perform packet classification using deep learning of network. Network classification is important because it solves the problem of false positives in detecting malware traffic and general traffic. Wei Wang et al. Implements neural network models built on training malware data with various types of malware. The SIGA Special IGITtask-force on cyber threat detection acknowledges that a good way to generate accurate malware is by just generating signatures from traffic samples. And finally it proves that network-on-tail distribution of data is a necessity for generating meaningful patterns which are reliable and straight forward. An option would be to classify malware by getting rid of interesting IP datagrams or those generated by centralizer networks. Balancing those mechanisms require time to happen, but there are already promising ways for solving these problems with new algorithms like CNN and deep learning. Lopez-Martin et al. used packet classification using CNN and RNN combination of deep learning model. The learning data was fed into the CNN/RNN combined model as input data. In both of the above papers, learning was performed on the deep learning model by adding the header information of the packet to the dataset. In such a case, the classification accuracy may be high because the header information can be certain information that characterizes the data to some extent. Therefore, in this paper, we will perform learning only with payload data of application layer except for header information of the packet [5].

2.3 Payload-Based Traffic Classification The basic idea about payload-based traffic classification is to categorize net-work packets going through an interface by judging them on their payload, which includes the packet contents and structure. The goal of a traffic classifier is to be able to tell that writing “Don’t eat this fish that you caught”, and “Don’t eat fried foods” are different sentences without relying on the number of characters in the sentence.

Using Deep Learning to Perform Payload Classification

187

Haffner et al. reduced the amount of computation that occurs when generating payload-based datasets. They use NB, AdaBoost, and MaxEnt for traffic classification. Despite, payload-based classification methods are used to provide high accuracy and establish ground truths [6].

3 Data Preprocessing In this section, the dataset used to perform the traffic classification is introduced to preprocess the PCAP file provided by Broadband Communication Research Group [7] for the deep learning. In this paper, traffic dataset provided by Broadband Communication Research Group is divided into flow unit and packet unit, and the preprocessing packet is used as learning data.

3.1 Data Split The given PCAP file provide is approximately 60 GB in size. Figure 1 is a graph showing the number of flows per application. The traffic data labeling is divided into three categories. Because of the labeling file, accurate label information corresponding to the group truth in the traffic classification can be obtained by using the deep learning model proposed in this paper. Depending on the percentage of flows, eight kinds of applications were selected for data preparation. The 8 applications selected the application that has the traffic of the actual network and the number of flow is more than 2000. The application layer payload of the selected flow internal packet is filtered and extracted, and 8 application layer payload data files are generated using the extracted payload data. Fig. 1 Number of data flows for 10 applications provided by Broad Band Communication Research Group

188

J. Thakur and K. Rane

3.2 Learning Data Generation The process of converting the application layer payload data file created in the above section into input data suitable for deep learning will be described. The application payload data file is divided into per packet unit and per flow unit, and each learning data is generated. Flow is the same as the 5-tuple of the packet’s header information, and packets generated within 3600 s of the previous packet are bundled into the same flow [8]. Each packet is extracted from each application layer payload data, and the elements of the payload data of the packet are grouped by 4 bits into one unit of learning data. Therefore, one data point of the learning data signifies a number on a scale of 0–15. One packet data collects pixelated data and image data as shown in Fig. 2. One image data is converted into one packet learning data. The learning data is generated by the packet unit and the flow unit for each application. The learning data for each application per packet generates learning data by arbitrarily extracting packets of 8 applications. 10,000 random packets were extracted to extract a total of 80,000 learning data. One packet of each randomly extracted application is resized according to the size of the payload. Therefore, the payload size of each application packet is extracted as 36 = (6 × 6), 64 = (8 × 8), 256 = (16 × 16), and 1024 = (32 × 32). Figure 2 shows 16 × 16 of the payload data of an arbitrary packet for each extracted application [9]. The selected flow fetches the packets from the first N packets as the number of packets (N) per predetermined flow. They are packets that have undergone a preprocessing process as in the learning data of each packet unit. The learning data of the flow unit extracts 2000 flows for each application from the application layer payload at a file and has a total of 16,000 flows [1].

Fig. 2 Pictures obtained from Broad Band Communication Research Group’s PCAP data

Using Deep Learning to Perform Payload Classification

189

A one-value index can be defined as a label representing an application. Therefore, two sets of learning data are used as the learning data packet or flow data and label data indicating the application data.

4 Deep Learning Models This segment expounds on the deep learning models for network traffic classification. Network traffic classification is one of the more straightforward tasks for traditional machine learning as it deals with data that can be easily represented numerically. In deep learning networks, this task becomes more complicated as datasets usually contain textual information. Deep Learning is related to modeling systems that are composed of multiple hidden layers of units, each processing information. It is a subset of machine learning that uses neural networks to learn from data. Convolutional Neural Networks are a type of deep learning algorithm that is used for image recognition. A convolutional neural network (CNN) is an artificial neural network where the connectivity pattern between its neurons resembles the organization of the animal visual cortex. CNNs useless training data than other models and can generalize better, making them more suitable for image recognition tasks [10]. Recurrent neural networks are a specific kind of deep learning models that follow a sequential pattern. They are trained by sequences of input data in order to generate content after every case. The Long Short-Term Memory (LSTM) is a recurrent neural network architecture that can learn long-term dependencies in sequences. LSTMs are used in speech recognition, language modeling, and other applications where there are long-term dependencies in the input sequence. A Residual Neural Network is a variation on deep neural networks that use mathematical operations known as “backpropagation.” They were discovered in 2006 and they were shown to outperform recurrent neural network architectures because they converged when training was less than sufficient and can be applicable for sparse datasets. All the above mentioned models are used in this paper in order to perform network classification. The Deep Learning model was supported by Keras, and the ResNet and CNN + RNN models were generated using Keras’ CNN and RNN models simultaneously. CNN and ResNet are used to classify using imaged packet unit data generated through preprocessing.

190

J. Thakur and K. Rane

Fig. 3 2-Layer CNN learning model architecture

4.1 Convolution Neural Network Architecture Figure 3 show the composition of CNN architecture. CNNs are supervised learning models. They use a variety of pixel-related features, such as edges and textures to achieve their objective. The different layers can be described below. Input layer: It is the first layer of a CNN and it takes in the image or video. Here, it takes the payload and the learning data. Packets are used as input data in the input layer in the form of N × N (N = 6, 8, 16, 32) like images. Convolution layer: Operates in a grid feature mapona maxpooling operation all around. Here, the feature of each packet data is convolved through the kernel of two Convolution layer, and output is generated through filter and activation process. Pooling layer: Shortens long transformations by performing the same operation on grids of small spatial areas at once relates to error. The neurons are downsampled by extracting their maximum value from each area and then summing them up (mathematical union). Fully connected layer: It operates on producing an output vector that equals the input vector times some weight. Here, the fully connected layer extracts the prediction value according to the last 8 classes by activation [11].

4.2 Residual Network Architecture Unlike traditional CNN, ResNet has a unique concept called shortcut connection. A shortcut connection is added to the existing CNN model structure, and this shortcut is directly connected without any other parameters. A shortcut connection is a type of adding a new type of network to an input value so that learning can be performed. Therefore, the newly added network can achieve better performance while maintaining the performance of the existing learned network as much as possible. As shown in Fig. 4, when one packet of learning data enters the input, it passes through one convolution layer and one maxpooling layer. The result through the

Using Deep Learning to Perform Payload Classification

191

Fig. 4 ResNet learning model architecture

maxpooling layer is then used as an input to one 3-convolution group. Also, the maxpooling layer result is used as the input of the newly added shortcut network. Thereafter, the output from one 3-convolution group and the output from the shortcut network are summed. The result of the addition is passed through the Relu function again, and is used as the input of the second 3-convolution group and the shortcut network. After three more iterations of the 3-convolution group process, the output after addition is sampled through the average pooling layer and the result of the final classification comes out. In terms of computation, there is nothing more than adding an operation. Deep networks can be easily optimized through shortcut connections and can improve accuracy as depth increases.

4.3 Recurrent Neural Network Architecture Recurrent Neural Networks are a type of artificial neural network that are distinguished by the fact that they allow for loops in their processing. These networks have been shown to be effective in modeling sequential data, like text, speech, and timeseries. The input layer is where data is fed into the network. In this paper, the packets are fed to this layer. Convolution layers are used to extract features from the input data. Pooling layers reduce the spatial dimensions of this representation. Fully connected layers connect all neurons in one layer with all neurons in another layer, or with itself (recurrent). These layers can be seen in Fig. 5. The number of units to be set is output to the number of applications learned in the output layer through the RNN cell.

4.4 Long Short-Term Memory Architecture One of the most popular architectures for RNNs is called Long Short-Term Memory (LSTM).

192

J. Thakur and K. Rane

Fig.5 2-Layer RNN or LSTM learning model architecture

LSTMs have been used with great success in speech recognition, natural language processing and machine translation. They have also been used to generate music and create art. In addition to the existing RNN model, LSTM determines whether to keep the weight value by adding another feature layer called a cell state. Through this, we solve the phenomenon that the weight value is not maintained as the distance between information and information of one input data in the existing RNN becomes longer, and the learning ability decreases. LSTM is more persistent than existing RNN because it keeps updating the past data. The cell state is responsible for adding or deleting information. The architecture of the LSTM model is configured as shown in Fig. 5. The advantage of LSTM is that each memory control is possible and the result can be controlled. However, there is a possibility that the memory maybe over written, and the operation speed is slower than that of the conventional RNN. Therefore, it is composed of two layers different from existing RNN model.

4.5 CNN and RNN Combination Network Model Architecture CNNs and RNNs are both machine learning models that have been used for a variety of tasks. CNNs are usually trained with supervised learning, while RNNs are usually trained with unsupervised or reinforcement learning. CNNs and RNNs have been combined in a network architecture to create an improved system that can process sequences of data and generate outputs more accurately. The LSTM layer is responsible for encoding the sequence, while the convolutional layer is responsible for encoding the features. As shown in Fig. 6, when the flow unit data set is fed into the convolution layer’s filter, the image is compressed.

Using Deep Learning to Perform Payload Classification

193

Fig. 6 CNN and RNN combination learning model architecture

5 Model Tuning Model tuning is the process of adjusting the hyperparameters of a machine learning model to find the best configuration. It is often done by using a grid search or random search [12]. Hyperparameters are variables that control the behavior of a model and its training algorithm. They are usually set before training begins and cannot be changed during training. Hyperparameter tuning can be used in order to improve performance on some task, or it can be used as part of an experimental design to compare different models or configurations. The following are some common hyperparameters: – learning rate: Control show much weight updates contribute to total error – batch size: Control show many examples are processed at once – momentum: Control show much weight updates contribute to total error. In addition, the performance of the model may vary depending on the characteristics of the training datasets. To do this, we use GridSearch CV, which is provided by Scikit-Learn, as a method of finding optimized hyper-parameters. Gridsearch is the process of tuning and finding the best set of parameters to suit one’s goal(s) by choosing a subset of all available parameters to, then balancing them against each other in order to find their trade-offs. Without hyperparameters, it is more computationally expensive and difficult to accomplish this task, especially on big data, which is where deep learning excels.

5.1 CNN and ResNet Model Tuning The learning dataset of CNN and ResNet consists of packet unit data. For each of the 8 applications, 10,000 packets were randomly organized into a single learning data

194

J. Thakur and K. Rane

Table 1 The optimal CNN hyperparameter values found by our gridsearch Filter

6 × 6 (36)

8 × 8 (64)

16 × 16 (256)

32 × 32 (1024)

18

32

256

512

Kernel size

3×3

5×5

5×5

3×3

Kernel initializer

Glorot Uniform

Uniform

Uniform

Uniform

Padding

Same

Same

Same

Same

Activation

Softmax

Softmax

Softmax

Softmax

Optimizer

Adam

Adam

Adam

Adam

Batch size

100

100

10

10

Table 2 The optimal ResNet hyperparameter values found by our gridsearch Filter

6 × 6 (36)

8 × 8 (64)

16 × 16 (256)

32 × 32 (1024)

18

32

256

512

Kernel size

3×3

7×7

5×5

7×7

Kernel initializer

Glorot Uniform

Glorot Uniform

Uniform

Glorot Uniform

Padding

Same

Same

Same

Same

Activation

Softmax

Softmax

Softmax

Softmax

Optimizer

Adam

Adam

Adam

Rmsprop

Batch size

100

100

100

100

set and the payload sizes of each packet were matched. The payload size was divided by 36 (6 × 6), 64 (8 × 8), 256 (16 × 16), and 1024(32 × 32) to increase the size of the dataset. Therefore, the shape of the total training datasets is (80,000,6,6,1), (80,000,8,8,1), (80,000,16,16,1), (80,000,32,32,1). There are a total of 15 hyperparameters used in the CNN model provided by Keras. Four of CNN hyperparameters were selected. It is necessary to fill in the pixel value specified by the outer angle of the input data with a specific value. We perform GridSearch by adding activation, optimizer, and batch size which affects the whole learning in addition to the hyperparameters to CNN. Tables 1 and 2 shows the values of the gridsearch performed for the selected hyperparameter.

5.2 RNN and LSTM Model Tuning The training datasets of Simple RNN and LSTM is data of flow unit. The payload size of each flow unit packets has 36, 64, 256, and 1024, as packet unit. The training data set of the flow unit is a data set that collects sequentially generated packets within the

Using Deep Learning to Perform Payload Classification

195

Table 3 LSTM hyperparameter values Filter

6 × 6 (36)

8 × 8 (64)

16 × 16 (256)

32 × 32 (1024)

18

32

256

512

Kernel size

3×3

7×7

5×5

7×7

Kernel initializer

Glorot uniform

Glorot uniform

Uniform

Glorot uniform

Padding

Same

Same

Same

Same

Activation

Softmax

Softmax

Softmax

Softmax

Optimizer

Adam

Adam

Adam

Rmsprop

Batch size

100

100

100

100

same 5-tuple and within 3600 h. The input data of the RNN must match the number of packets included in the flow. Therefore, the number of packets per flow is set to 30, 60, and 100. The final shape of the training dataset is (16,000,30,36), (16,000,30,64), (16,000,30,256) (16, 100, 256, 16,000, 100, 36, 16,000, 100, 64, 16,000, 1024). Keras the hyperparameters used in RNN and LSTM models are 20 and the hyperparameters of RNN and LSTM are the same. We selected dual units, kernel initializer, recurrent initializer, and dropout. Units represents the space of the output dimension, kernel initializer initializes the weight vector values of RNN and LSTM, recurrent initializer initializes the weight vector of recurrent state, dropout is a number between 0 and 1. It is a variable that deletes by the percentage of the number arbitrarily set in the number of units [13]. Like CNN, we perform GridSearch by adding activation, optimizer, and batch size. Table 3 shows the values of variables that perform gridsearch for the selected hyperparameter units, kernel initializer, recurrent initializer, dropout, activation, optimizer, batch size.

6 Evaluation In this section, we compare packet-based application prediction and flow unit application prediction. CNN and ResNet are used for packet unit prediction. RNN and LSTM are used for flow unit prediction.

6.1 Experiments Environment The learning model is based on CNN, RNN, and LSTM supported by Keras 2.2.0. In case of ResNet, network model was constructed using CNN model supported by Keras (Fig. 7).

196

J. Thakur and K. Rane

Fig. 7 Comparison of RNN, LSTM and CNN + RNN with the overall F1-score for each application

6.2 Performance Metrics In this paper, we use accuracy, precision, recall, and F1-score for performance comparison of CNN, ResNet, RNN and LSTM models. This provides performance metrics that take into account the unbalanced distribution of each application. F1score is the most important metric in all metrics. The F1-score is expressed as 0 to 1 and is the best value at 1. The definition of accuracy, precision, recall, and F1-score follows four previous definitions as follow: First, False Positive (FP) indicates that the prediction is that the application is correct, but not actually the application. Second, False Negative (FN) indicates that the application is not expecting the result, but the application is actually correct. Third, True Positive (TP) indicates that the application is correct and the application is correct. Finally, True Negative (TN) indicates that the application is not the result of the prediction, but is not actually the application. The definition of accuracy, precision, recall, and F1-score according to the previous definition is as follows: Accuracy =

TP + TN TP + TN + FP + FN

Recall =

TP T P + FN

Precision = F1 − score =

TP TP + FP

2 × Precision × Recall Precision + Recall

(1) (2) (3) (4)

Using Deep Learning to Perform Payload Classification

197

Fig. 8 Comparison of CNN and ResNet with the overall F1-score for each application

6.3 Experiments Results The first experiment is a comparison of CNN and ResNet with the packet unit datasets. In order to compare CNN and ResNet, the total F1-score was compared by varying the payload size of the packet to 6 × 6 (36), 8 × 8 (64), 16 × 16 (256), and 32 × 32 (1024). Figure 8 compares F1-score for all applications by payload size in CNN and ResNet. If the payload size of the packet is small, it can be seen that the overall F1score value of CNN is about 0.4 higher than ResNet. However, as the payload size increases, the training data set size also increases. As a result, F1-score of ResNet is larger than CNN. The reason is that because the learning model of ResNet is complex, less learning data sets are not good for learning. Conversely, the larger the payload size and the larger the training dataset, the better the performance of the more complicated ResNet model. The following experiment is a comparison of RNN, LSTM, and CNN + RNN using flow unit datasets. Figure 7 shows that the overall F1-score increases as the payload size of the packet increases. It is noticed that the F1-score across LSTM is higher than the total F1-score RNN and CNN + RNN. It indicates that LSTM alone can provide similar or better performance without using complex models in the flow unit learning model. Figure 9 is a graph comparing the F1-score results for each application when using a deep learning model to compare packet units and flow unit classifications. In the case of CNN and ResNet, it represents the F1-score for each application when the payload size is 32 × 32 (1024). Also, in the case of RNN, LSTM, and CNN + RNN, it indicates the F1-score for each application when the payload size is 1024 and the number of packets per flow is 100. Packet unit learning models CNN and ResNet show that F1-score on Facebook and Wikipedia are smaller than other applications. The Facebook and Wikipedia packets are very similar to each other, so they are smaller than the F1-score of other applications’ packets. In the case of RNN, we can see that F1-score value of Facebook and Wikipedia are low as in the case of packet

198

J. Thakur and K. Rane

Fig. 9 Comparison of packet unit classification and flow unit classification

unit classification. However, in the case of LSTM and CNN + RNN, the F1-score of both applications is high. Thus, we can see that the LSTM and CNN + RNN learning models are well-categorized and that the LSTM performs better with subtle differences. The following experiment is a comparison of RNN, LSTM, and CNN + RNN using flow unit datasets. It can be seen that the overall F1-score increases as the payload size of the packet increases. Also, as the number of packets per flow increases, the entire F1-score increases. It is noticed that the F1-score across LSTM is higher than the total F1-score RNN and CNN + RNN. It indicates that LSTM alone can provide similar or better performance without using complex models in the flow unit learning model.

7 Conclusion AI can be used to solve a number of security challenges and payload classification is one of the major use cases. Payload classification is having a payload detected and accurately identified against a set of knowledge assets such as MD5, SHA-256, metadata, or scripts files in some instances. Deep learning models are being trained with Human Behavioral Observation data and model segments IDs are raised by observing malicious API calls. With trained deep learning networks, detection accuracy increased and it also decreased misclassifications by more than 93%. In this paper, we focused on designing deep learning models for network classification and yielding a packet unit and flow unit training dataset using network traffic preprocessing. To this end, the authors worked on detecting network traffic and assign it to either unit from flow or packet classifier levels with good accuracy. The deep neural network models used include CNN, RNN, LSTM, and ResNet. The detailed flowlevel detection procedure is introduced as well on how packet-level and flow-level detection procedure should work with F1-score comparisons. It is noticed that the

Using Deep Learning to Perform Payload Classification

199

F1-score across LSTM is higher than the total F1-score RNN and CNN + RNN. The LSTM performs better because of the Bidirectional GRUs, unlike the RNN with a GRU at its last layer which is unidirectional and had to be reversed during back progression. Thus, given that there is no need for backward dynamics in LSTM, the pure speed seemed to boost its performance. Further work includes finding ways to train a neural network on more complex scenarios while adjusting its weighting correctly by paying attention to individual features. Changes can be made to these algorithms by incorporating unsupervised pre-training, or retraining networks in an unsupervised manner using pre-processed datasets or using partial supervision techniques. We can improve how payloads move through the network. Packet capture can convert anonymized payload information into XML to help manage configuration of firewalls, Exchange servers, and other planning devices.

References 1. Liu J, Song X, Zhou Y, Peng X, Zhang Y, Liu P, Wu D, Zhu C (2022) Deep anomaly detection in packet payload. Neurocomputing 485:205–218 2. Liu Z, Fang Y, Huang C, Han J (2022) Graphxss: an efficient xss payload detection approach based on graph convolutional network. Comp Security 114:102597 3. Zhou L, Zhu Y, Zong T, Xiang Y (2022) A feature selection-based method for ddos attack flow classification. Futur Gener Comput Syst 132:67–79 4. Izadi S, Ahmadi M, Nikbazm R (2022) Network traffic classification using convolutional neural network and ant-lion optimization. Computers Elect Eng 101:108024 5. Balachandran A, Amritha P (2022) Vpn network traffic classification using entropy estimation and time-related features. In: IOT with Smart Systems. Springer, 509–520 6. Wu Z, Dong Y-N, Jin J, Wei H-L, Xie G (2022) Multimedia traffic classification for imbalanced environment. IEEE Trans Net Sci Eng 9(3),1838–1852(2022) 7. Paul BK, Ahmed K, Rani MT, Pradeep KS, Al-Zahrani FA (2022) Ultra-high negative dispersion compensating modified square shape photonic crystal fiber for optical broadband communication. Alexandria Eng J 61(4):2799–2806(2022) 8. Lin CY, Chen B, Lan W (2022) An efficient approach for encrypted traffic classification using CNN and bidirectional gru. In: 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), pp 368–373. IEEE 9. Yue, G, Tian J, Zhu H, Zhang B, et al (2022) Power grid industrial control system traffic classification based on two-dimensional convolutional neural network. In: International Conference in Communications, Signal Processing, and Systems. Springer, pp 41–48 10. Fischell EM, Schmidt H (2015) Classification of under water targets from autonomous under water vehicle sampled bistatic acoustic scattered fields. J Acoustical Soc Am 138(6):3773–3784 11. Zanero S (2005) Analyzing tcp traffic patterns using self-organizing maps. In: International Conference on Image Analysis and Processing. Springer, pp 83–90 12. Ritchie M (2021) Multi-frequency micro-doppler based classification of micro-drone payload weight. Frontiers Sig Process 13. Choorod, P, Weir G (2021) Tor traffic classification based on encrypted pay-load characteristics. In: 2021 National Computing Colleges Conference (NCCC), pp 1–6, IEEE

Malicious Domain Detection Using Memory Augmented Deep Autoencoder Pavan Kartheek Rachabathuni, Hiranmayee Nandyala, G. Prasanthi, and Singamaneni Krishnapriya

Abstract An Advanced Persistent Threat (APT) is a sophisticated assault that obtains personal information by staying inside infected systems for an extended period of time. When APT assaults occur in a dynamic and sophisticated infrastructure like the cloud, traditional techniques of detection are difficult. The research offers an autoencoder-based deep learning methodology for recognizing the probable phases in the APT lifecycle to address the limitations of previous approaches. Because APTs change their attacking methods, current intrusion detection techniques are unable to detect those attacks. Because present systems have low generalization capability and attack samples are gathered occasionally as compared to regular attacks. To circumvent these issues, in this regard, APT detection using an autoencoder has been extensively researched, and semi-supervised learning has been employed. This approach is based on the premise that reconstruction mistakes will be large for non-training data; however, an autoencoder is commonly over-generalized, and this assumption is readily translatable. To tackle the autoencoder over-generalization problem, we presented a detection approach for Command control communication detection in an APT-based memory augmented deep autoencoder. The suggested model is trained to rebuild the input of an abnormal sample that is comparable to a normal sample, overcoming the generalization problem for such abnormal samples. Keywords Advanced Persistent Threat (APT) · Autoencoder · Command and control · Deep autoencoder

P. K. Rachabathuni Department of Information Engineering, University of Florence, 50132 Florence, Italy e-mail: [email protected] H. Nandyala · G. Prasanthi Department of CSE, Sri Vasavi Engineering College, Tadepalligudem, India e-mail: [email protected] S. Krishnapriya (B) Department of CSE (Cyber Security), Guru Nanak Institutions Technical Campus, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_15

201

202

P. K. Rachabathuni et al.

1 Introduction APT is one of the most rapidly rising information security risks to the enterprises today. They are conducted by the most skilled and well-funded attackers and target is specific. General objective behind the APT attacks is Data exfiltration. APT incidents included the loss of 9 GB of encrypted password data from Adobe Leak in 2013 and the compromise of 40 GB of Ashley Madison’s database in 2015. [1, 2] APT attacks are continual, targeted attacks on a single organization that occur in phases [3]. Six components make up the APT attack [4]. The identification of the literature on APT addresses the subject of threat detection in the relevant components. Numerous studies have been conducted to identify the main components of an APT attack, including the detection of malicious PDF files in phishing emails, the detection of malicious SSL certificates during Command and Control (C&C) communications and the detection of data leakage in the final stage of an APT attack. During the attack process attacker need to maintain connection between compromised infrastructure and command and control server. From server attacker sends their updated tools and techniques information. If attacker losses the connection the attack objective is not going to accomplish. Many researchers used the machine learning techniques for command and control communication detection. For detecting the targeted attacks like APT we need to process huge amount of traffic generated at multiple points of network and deep analysis is needed. Deep learning is widely used by researchers because it is having in depth computational power and able to mimics the human behaviors. Deep learning is first used by Hinton and presented the description of features [5]. Most of researches starting using one class learning which is based on SVM and autoencoders to solve the imbalance data problem. In one class learning we will train the model with only one class, which is called semi-supervised learning [6]. The author [7], studied the autoencoders and pointed the over-generalization problem. Even though the trained samples are normal autoencoders construct the samples by making these as suspicious. This leads to performance degradation in autoencoder-based detection methods. In this paper, we propose a detection method based on Memory augmented deep autoencoders [8] to detect the command and control communication during APT life cycle and reconstruct the obtained samples. The proposed method contains encoder, decoder, memory module. The memory module stores the patterns of normal inputs used during the training phase. The rest of the paper organized as follows: Sect. 2 describes the literature survey and background knowledge about auto encodes. Section 3 gives the background knowledge about autoencoders and comparison. Section 4 describes the proposed method, Sect. 5 presents the experiment details and Sect. 6 conclusion and future work.

Malicious Domain Detection Using Memory Augmented Deep …

203

2 Literature Survey The APT attacker needs one communication channel between the compromised host and their server for long-term access to the target network called “Command & Control Communication.” This is an essential step during the attack life cycle for attack progression. The executed malware will build this communication for expanding the attack. The compromised system will connect to the “Command and Control (C&C) server” for further instructions. It will download and execute the additional malware to open the reveres back door, allowing the attacker to get complete control over the compromised system and bypassing the firewalls. Attackers need to maintain an active connection with compromised infrastructure and command and control servers for further exploitation. The remote server is responsible for updating the malware, maintain access, bypass the current security systems in the target network and communication need to be undetected. To achieve this, the attacker will make use of some application layer protocols, Encrypted Channels, Web-services and removable media are also used for Command and Control Communication [9]. Most of the APT attacks will use the outbound connection to evade the current detection methods [10]. Most APT campaigns are using HTTP-based communication since most of the organizations labeled HTTP-based communication are permissible. The remaining protocols have some unique features, which are easily detectable [11, 12]. In literature [13], the author introduced a novel Deep Learning-based detection method for detecting malicious C&C communication based on DNS traffic. First, they will collect the DNS traffic data and apply preprocessing to categorize the collected information. Next, they evaluate the processed data’s behavior and finally apply the Deep Learning algorithms for the evaluation. The proposed detection method’s main limitation is that if an attacker uses a normal domain name, it classifies the normal as a malicious one. A method of monitoring and Detecting APT attacks based on the unknown domain feature are introduced in literature [14]. They analyzed the DNS logs and used different techniques for monitoring the behavior of domains. For this, they used 25 features and the Random Forest algorithm to detect the C&C channel. The main limitation of work is not able process large amount of information and considered the smaller number of features. In literature [15], proposed a detection system for C&C domains based on the Domain-graphs. The proposed method explores the relationship between the domain names and IP addresses and extends this to C&C domain names mapping for malicious C&C detection. The major limitation of the proposed work is the author not considered the relevance of domain names. If the attacker’s domain name is the same as the regular domain name, the proposed approach fails to detect the APT. An unsupervised detection of APT C&C communication channel using Webrequest Graphs was introduced in literature [16]. The author analyzed the HTTPbased network traffic for detecting the malicious C&C communication channels. The proposed work based on the key observation that, APT attacks will use HTTP-based Communication and its specific pattern differs from regular communication traffic.

204

P. K. Rachabathuni et al.

Based on the dependencies between the, they constructed a Web-request graph and filtered the malicious requests. The author tested the proposed methodology with 9 APT attacks. The limitation of the proposed work is that it is not suitable for analyzing a large amount of network traffic.

3 Auto Encoders It is a type of unsupervised neural networks used to learn efficient data coding. As shown in Fig. 1 Autoencoder (AE) has two networks (1) Encoder network (2) Decoder networks. The encoder network maps the given input (x) to lower dimension feature space and decoder network will recover the mapped data from low-dimensional space. The two network parameters of AE with one hidden layer are learn with reconstruction loss function. z = σ (We x + be )

(1)



x = σ (Wd z + bd )

(2)

Weight and bias is denoted by w and b, activation function which acts as both linear and non-linear. When it is linear acts as PCA. Another variation of autoencoder is stacked AE which is having several hidden layers. Fig. 1 Structure of Autoencoder

Malicious Domain Detection Using Memory Augmented Deep …

205

Fig. 2 Memory augmented autoencoder

3.1 Memory Augmented Deep Autoencoders Memory Augmented autoencoders [17] has three modules, viz: encode, decoder and memory modules as shown in Fig. 2:the encoder network followed by memory and decoder network. The memory module receives the input from encoder and produce an output and deliver the produced output to decoder. Like AE it doesn’t have direct connection between encoder and decoder networks. Here, z is latent vector which is used for retrieving the data items from memory. The retrieved items are combined and send to the decoder. As a results, the over-generalization problem solved by reconstructing the abnormal samples leads to good detection accuracy. Generally, the memory module has N dimension vector contain real values as shown in Fig. 3. As shown in figure C and z are same dimensions. The aggregation of z is obtained by soft addressing of w shown in Eq. (3). By SoftMax operation over z we will get weight vector w shown in Eq. (4). z i = wM =

N 

wi m i

(3)

i=1

wi =

exp(d(z, m i ))    exp d z, m j

N 

(4)

j=1

The memory increases the sparsity by hard-shrinkage on soft addressing of addressing as shown in Eqs. (5) and (6).  

wi = h(wi ; λ) =

wi , if wi > λ 0,

otherwise

(5)

206

P. K. Rachabathuni et al.

Fig. 3 Memory module



wi =

max(wi − λ, 0) · wi |wi − λ| + ε

(6)

By using ReLU activation function they calculated the discontinuous and backward function. The value of shrinkage threshold which is defined over the interval [1/N, 3/N]. Memory is defined in two variation: Sparse and non-Sparse, through this samples are reconstructed which are close to normal samples. Which leads to the increase in abnormal samples reconstruction error.

4 Proposed Method In this paper we proposed malicious domain detection based on memory augmented deep autoencoders. Auto encoder is trained with only normal class for minimizing the reconstruction errors. The reconstruction error is considered as anomaly score. So, we need to maintain minimum score. The autoencoders consisting of two variations of loses “reconstruction loss and entropy loss”. Given a training data D = {x i | i = 1, 2, 3,…n} contains n samples, reconstruction loss defined as distance between x i and it’s mapped value. The minimization of error over every sample is defined through Eq. (7) 2    L rec x, x = x − x 2 



(7)

Malicious Domain Detection Using Memory Augmented Deep …

207

The generated addressing weights are promoted with the help of entropy loss. This loss is minimized with shrinkage operation shown in Eq. (8). L entropy =

T 

  wi · log wi 



(8)

i=1

Finally, the overall loss in training of memory augmented autoencoders defined through Eq. (9). L=

T     1  L rec x, x + αL entropy w T i=1 



(9)

The parameter α is the hyper parameter which defines the importance of losses. During training we used the normal samples only.

5 Experiment The proposed method is using binary classification, the common evaluation method is ROC curve. The ROC curve shows the performance of proposed model at all threshold values. F1 score is used as addition evaluation metrics for the proposed method in acquisition of the threshold and testing phase. The evaluation parameters are calculated with the help of confusion matrix shown in Table 1.

TPR =

TP TP + FN

(10)

FPR =

FP FP + TN

(11)

Accuracy =

TP + TN TP + TN + FP + FN

(12)

TP TP + FP

(13)

Precision =

Table 1 Confusion matrix

Actual Positive Predicted

Negative

Positive

TP

FP

Negative

FN

TN

208

P. K. Rachabathuni et al.

Table 2 Autoencoder model

Layer (type)

Output shape

Param#

Input_23 (input layer) Dense_89 (dense) Dense_90 (dense) Dense_91 (dense) Dense_92 (dense) Total params: 241 Trainable params: 241 Non-trainable Params: 0

(None, 3) (None, 14) (None, 7) (None, 7) (None, 3)

0 56 105 56 24

Recall = F1 − Score = 2

TP TP + FN

(14)

Precision × Recall Precision + Recall

(15)

The experiment is conducted using autoencoders, the structure as follows Table 2. The autoencoder employs four completely linked layers, as indicated in Table 2. The first two levels are dedicated to the encoder, while the latter two layers are dedicated to the decoder. In the first and fourth levels, the sigmoid activation function is utilized, while the ReLu and tanh activation functions are used in the second and third layers, respectively. Only malicious applications were used to train the autoencoder. Keeping the benign applications on the test set will allow you to evaluate the autoencoder’s performance. Eighty percent of the dataset is used for training, whereas twenty percent is preserved for testing. To evaluate the experiments, we used accuracy and F1-score. The experimental results are shown in Table 3 shows that ability of proposed model for classification and F1-score defines how extent the proposed model is determinant. During the experiment autoencoders are trained with datasets [18] contains both malicious and non-malicious domain information. The following results are obtained through training and testing of autoencoders. The data is divided into 80% and 20% for training as testing, respectively. We obtained the following results. From the experimental results shown in Fig. 4, the proposed model is giving high accuracy. For further evaluation of model, conducted the experiment with different train and test splits. The experiment results are shown in Table 4. From Fig. 5, it is evident that the proposed model is giving less training loss and testing loss and giving high accuracy. Upon increase of number of iterations, the model gives accurate results and minimal loss. Table 3 Detection results

Accuracy

F1-Score

99.32

96.4

Malicious Domain Detection Using Memory Augmented Deep …

Fig. 4 Experimental results

209

210 Table 4 Accuracy and F1-Score for different train and test data splits

P. K. Rachabathuni et al. Train and test data split (%)

Accuracy

F1-score

70, 30

98.39

96.79

60, 40

98.57

97.07

50, 50

96.81

95.38

Fig. 5 Comparison of prediction results with other technique

6 Conclusion and Future Work In this study, we proposed a new detection model for malicious domain which is used during command and control communication APT attack based on memory augmented autoencoders. The proposed method given accuracy. Since, APT attacks are combination of various attack tools the proposed method is also used to detect the general attacks also. For future work, the obtained results are correlated with other detection modules of APT for attack scenario reconstruction and prediction of next possible attack step.

References 1. World’s biggest data breaches & hacks [Updated]—information is beautiful. https://informati onisbeautiful.net/2018/worlds-biggest-data-breaches-hacks-updated/ (accessed Jun. 17, 2022) 2. The 15 biggest data breaches of the 21st century | CSO Online. https://www.csoonline.com/art icle/2130877/the-biggest-data-breaches-of-the-21st-century.html (accessed Jun. 17, 2022) 3. Ghafir PVI (2014) (PDF) Advanced persistent threat attack detection: an overview. https:// www.researchgate.net/publication/305956804_Advanced_Persistent_Threat_Attack_Detect ion_An_Overview (accessed Jun. 17, 2022) 4. Giura P, Wang W (2012) A context-based detection framework for advanced persistent threats. In: Proc. 2012 ASE Int. Conf. Cyber Secur. CyberSecurity 2012, pp. 69–74. https://doi.org/10. 1109/CYBERSECURITY.2012.16

Malicious Domain Detection Using Memory Augmented Deep …

211

5. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507, Jul. https://doi.org/10.1126/SCIENCE.1127647/ SUPPL_FILE/HINTON.SOM.PDF 6. Manevitz LM, Yousef M, Cristianini N, Shawe-Taylor J, Williamson B (2001) One-class SVMs for document classification. J Mach Learn Res 2:139–154 7. Min B, Yoo J, Kim S, Shin D, Shin D (2021) Network anomaly detection using memoryaugmented deep autoencoder. IEEE Access 9:104695–104706. https://doi.org/10.1109/ACC ESS.2021.3100087 8. Gong D, et al (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. Proc IEEE Int Conf Comput Vis, October, pp 1705–1714. https://doi.org/10.48550/arxiv.1904.02639 9. Bishop M, Cheung S, Frank J, Hoagland J, Samorodin S, Wee C (2004) Internet security. https://doi.org/10.32870/pk.a6n11.280 10. Vukalovi´c J, Delija D (2015) Advanced persistent threats—detection and defense. In: 2015 38th Int Conv Inf Commun Technol Electron Microelectron MIPRO 2015—Proc, May, pp 1324–1330. https://doi.org/10.1109/MIPRO.2015.7160480 11. Ussath M, Jaeger D, Cheng F, Meinel C (2016) 2016-advanced persistent threats behind the scenes.pdf 12. Wang X, Zheng K, Niu X, Wu B, Wu C (2016) Detection of command and control in advanced persistent threat based on independent access. IEEE Int Conf Commun ICC 2016:1–6. https:// doi.org/10.1109/ICC.2016.7511197 13. Zhang R, Sun W, Liu J, J. Li, G. Lei, and H. Guo (2020) Construction of two statistical anomaly features for small-sample APT attack traffic classification [Online]. Available: http://arxiv.org/ abs/2010.13978 14. Yan G, Li Q, Guo D, Meng X (2020) Discovering suspicious APT behaviors by analyzing DNS activities. Sensors (Switzerland) 20(3):1–17. https://doi.org/10.3390/s20030731 15. Ma Z, Li Q (2019) A large-scale domain graph in information-centric IoT. IEEE Access 7:13917–13926 16. Lamprakis P, Dargenio R, Gugelmann D, Lenders V, Happe M, Vanbever L (2017) Unsupervised detection of APT C&C channels using web request graphs. Lecture Notes Computer Science (including Subser. Lecture Notes Artificial Intelligence Lecture Notes Bioinformatics), vol 10327 LNCS, pp 366–387. https://doi.org/10.1007/978-3-319-60876-1_17 17. Gong D, et al. (2019) Memorizing normality to detect anomaly (无监督重构法-MemAEICCV2019). IEEE Int Conf Comput Vis, pp 1705–1714, October 18. Marques C (2021) Benign and malicious domains based on DNS logs, vol 5. https://doi.org/ 10.17632/623SSHKDRZ.5

Graph Analysis Using Page Rank Algorithm to Find Influential Users D. Venkata Swetha Ramana , T. Anusha, V. SumaSree, C. R. Renuka, and Taiba Sana

Abstract In recent years, most people are using social media to be aware of friends’ activities, to do marketing, and for interpersonal communication. In these social media networks, not all the users are experts who impact other users. Users who influence others are identified as influential users. Identifying influential users is a major need of our task as they play a major role in broadcasting information. This work presents a novel approach to identifying influential users in a Twitter network. In this paper, friends and followers list for a particular Twitter account are extracted and analyzed for the most influential users among them, and also their topics of interest are extracted. Keywords Social media · Influential users · Visualization · Social media network analysis

1 Introduction Social media is an Internet-based interactive app that helps users to create, access, and share thoughts, views, and information through virtual networks. Social media has become a major part of our life for accessing, creating, and sharing information locally and globally regarding friends’ activities for marketing, interpersonal communication, and decision-making. The lifeblood of social media is user-generated content. User-generated content is the data created through online communications (tweets, posts, comments, captions, pictures, reels, stories, videos). Every 24 h, 2.5 quintillion bytes of data are generated. There are 3.484 billion social media users worldwide out of which 95% of users use Facebook, and 84% of users use Twitter and Instagram [1]. Users of social media who inspire other users with their ideas and thoughts and have a wide no. of friends and followers on the social network are called influencers. These influencers of social media can be used in marketing, advertisement, and in D. V. S. Ramana (B) · T. Anusha · V. SumaSree · C. R. Renuka · T. Sana Rao Bahadur Y Mahabaleswarappa Engineering College, Ballari, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_16

213

214

D. V. S. Ramana et al.

Fig. 1 How youth feel about social media influencers

spreading of information, ideas, and views. Identifying influential users is a major need of our task. This work presents a novel approach to identifying influential users in a Twitter network using a hyperlink-induced topic search algorithm. Page rank algorithm is used to rank Web pages for a particular search by google to improve the search results. It is also used to measure the importance of Web pages. Page rank of a Web page is calculated by counting the number of links and the quality of links to a Web page. This paper uses a page rank algorithm to calculate the ranks of a social media user. The rank of a social media user is determined by counting the number of followers of that user. Our work assumes that the more the number of followers more influential the user is. Figure 1 explains how youth feel about social media influencers. 50–60% of the users between 18 and 34 years of age believe that influencers’ opinions are honest, and more than 50% of the users consider that the influencers are knowledgeable about the topics they discuss and share the same passion or interest.

2 Problem Definition Identifying influential users is a major need of our task as these influencers of social media can be used in marketing, advertisement, and in spreading of information, ideas, and views. The toughest method of marking influential users is doing it manually. So, the proposed algorithm makes a small attempt to identify influential users using the page rank algorithm in Twitter social media by extracting a friendship network and visualizing the same.

Graph Analysis Using Page Rank Algorithm to Find Influential Users

215

3 Related Work In Scott et al. the Article [1] explains how important it is to analyze social networks and applications in different areas. Chien et al. in Article [2] analyze the relationships between the posts and reactions of users to a particular post. This article processes Facebook pages and user data for their social network analysis paper. Chatziasimidis et al. in Article [3] use GitHub API to access the GitHub data, construct graphs with users to analyze social networks, and detect communities. Identifying the most influential users is an important task in social network analysis. Alshahrani et al. in Article [4] do this by summarizing the interactions of the network. This article also analyzed how these influential users influence a lot of users and how certain cascade behavior spreads over the network. Article [5] speaks about how youth feel about social media influencers. Yuan et al. in the Article [6] explain how to maximize influence spread and how it affects rumor control and viral marketing. Yin et al. in Article [7] propose a method to analyze the propagation process of information in signed social networks. This method models the dynamics of user beliefs and altitudes about recommendations-based advertisements. This paper designed a signed page rank algorithm that selects seed nodes on the basis of positive and negative connections to maximize the influence in signed social networks. Alwan et al. in Article [8] propose a method for identifying influential users on Instagram based on user generated content (UGC). The method in this article finds whether a user is influential or not by combining different features extracted from the pictures on Instagram. The effectiveness of the method to identify influential users is validated with an extensive set of experiments. Mittal et al. in Article [9] focus on identifying and ranking topical influential nodes. This research proposes an algorithm, named ACRA-aggregation consensus rank algorithm, to generate the top K influencer list by combining different Twitter matrices. Bashari et al. in the Article [10] design a method using user-generated content to differentiate between influential and noninfluential users on Instagram. Alsaif et al. in Article [11] focus on the location of influential users, and influence is calculated based on the content generated by the social media users. Swetha et al. in Article [12] focus on the detection of inappropriate language in online content. It has been noticed that vulgar language is used in both humor and hate conversations. The identification of vulgar language is more difficult due to various circumstances. Ramana et al. in Article [13] online social media are important in today’s world. But, there are many negative consequences faced by social media users. This article describes an innovative process for developing generic online hate and network analyzer, in which both can help in overcoming obstacles.

216

D. V. S. Ramana et al.

Fig. 2 System architecture

4 Proposed Method Our method flows as shown in the Fig. 2. In the first step, we get Twitter API keys and then access the data from Twitter (datasets), i.e., we are trying to access friends, followers, location, no. of followers, friends, URL, user id, and user name of the user account. We are going to draw graphs for followers and friends to show the influential users. To show who are noninfluential users, i.e., friends but not followers. We collect data from Twitter to analyze the data for friends and followers. We will visualize the social network graph and word cloud which includes user account, no. of friends, and followers. Finally, we will analyze influential users, user-generated content, social network graph, and word cloud.

5 Implementation Step 1: Create a Twitter developer account to access consumer keys and access tokens. Step 2: Import Tweepy to access Twitter data. Step 3: Data collected from the Twitter account. Step 4: Import required packages pandas, NumPy, matplotlib, and network. Step 5: Access user input to know the influencer. Step 6: Displaying user name, location of the user, follower account, user ID, and user URL for the influencer mentioned in the input.

Graph Analysis Using Page Rank Algorithm to Find Influential Users

217

Step 7: Use the networkx module to generate a graph with nodes as the users and friends graph and edges as the relationship between the users using the page rank algorithm. • Initialize hubs score to 1 • Update the hub score by using the following formula Node score = (influencer follower count)/100

(1)

Node score decides the size of the node, each friend is connected to an influencer with an edge. Step 8: Step 7 is repeated for followers to generate a follower graph. Step 9: Extract the list of most influential people from both the friend’s graph and followers graph for whose node size is the largest. Step 9: Generate a list of people who are friends but not followers. Step 10: Collect all the hashtag tweets to generate a CSV file. Step 11: Collect user mentioned tweets to generate a CSV file. Step 12: Collect the tweets between these users and store them in a CSV file. Step 13: Collect the tweets and generate a word cloud between the users to analyze the content discussed in the tweets. The first step we need to create a Twitter account and then apply for a Twitter developer account. When a developer account is created, then it provides some keys like consumer keys and access tokens using those keys and tokens to access data. The dataset is collected from Twitter’s APIs and is used to download an initial corpus of tweets. The training dataset of the model is built on this “raw” corpus with information like user name, user ID, URL, location, followers, friends, no. of followers, and no. of friends using the get_user() method of the Tweepy module in Python. We downloaded 500 users information from Twitter to work with.

6 Results Importing matplotlib. pyplot, networkx, and wordcloud modules in Python, the following figures are extracted. Figure 3 is extracted using addnode(), append(), title(), draw(), axis(), show(), savefig() from matplotlib and network modules of Python. Figure 3 shows the followers graph for the followers of a user account, GeeksforGeeks. The node size of the followers varies depending upon the follower count. The larger the size of the node, the more is the influence of the user. Here, GeeksforGeeks node at the center is the largest node, so he is the most influential. Among the followers of the center node, Nihal Babu node is the largest, so is the most influential. Figure 4 is extracted using addnode(), append(), title(), draw(), axis(), show(), savefig() from matplotlib and network modules of Python. Figure 4 shows the friends graph of a user account, GeeksforGeeks, where the friends of the user account, i.e.,

218

D. V. S. Ramana et al.

Fig. 3 Followers graph

have a larger node size and more influence than the user account GeeksforGeeks. Forbes node is the largest in the friend’s graph as it is the most influential user. Figure 5 is extracted using WordCloud(), set(stopwords), title(), draw(), axis(), show(), savefig() from wordcloud, matplotlib, and network modules of Python. Figure 5 represents word cloud for the test communications among the followers

Fig. 4 Friends graph

Graph Analysis Using Page Rank Algorithm to Find Influential Users

219

Fig. 5 Word cloud

and friends of the user GeeksforGeeks. The larger the font size of the word, the more the frequency of appearance and is considered the area of interest.

7 Conclusion Important of influential users in social media is growing day by day. Influential users play a major role in influencing broadcasting, marketing, etc. In this paper, influential users are extracted based on followers count, and also their topics of interest are based on the content generated in their tweets. As part of this work, we generate graphs for influential followers and friends and a word cloud for the content generated. In this paper, data is collected only from Twitter. In future, this work can be extended to analyze influential users from Facebook and Instagram social media networks with a better visualization.

References 1. Scott J (1988) Social network analysis. Sociology 22(1):109–127 2. Chien OK, Hoong PK, Ho CC (2014) A comparative study of HITS vs PageRank algorithms for Twitter users analysis. In: 2014 International Conference on Computational Science and Technology (ICCST), August, pp 1–6. IEEE 3. Chatziasimidis F, Stamelos I (2015). Data collection and analysis of GitHub repositories and users. In: 2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA), July, pp 1–6. IEEE 4. Alshahrani M, Zhu F, Zheng L, Mekouar S, Huang S (2018) Selection of top-k influential users based on radius-neighborhood degree, multi-hops distance and selection threshold. J Big Data 5(1):1–20

220

D. V. S. Ramana et al.

5. https://www.marketingcharts.com/charts/fullscreenshareablee-how-youth-feel-about-socialmedia-influencers-apr2018 6. Yuan J, Zhang R, Tang J, Hu R, Wang Z, Li H (2019) Efficient and effective influence maximization in large-scale social networks via two frameworks. Physica A 526:120966 7. Yin X, Hu X, Chen Y, Yuan X, Li B (2019) Signed-PageRank: an efficient influence maximization framework for signed social networks. IEEE Trans Knowl Data Eng 33(5):2208–2222 8. Alwan WH, Fazl-Ersi E, Vahedian A (2020) Identifying influential users on Instagram through visual content analysis. IEEE Access 8:169594–169603 9. Mittal D, Suthar P, Patil M, Pranaya PGS, Rana DP, Tidke B (2020) Social network influencer rank recommender using diverse features from tropical graph. Procedia Comp Sci 167:1861– 1871 10. Bashari B, Fazl-Ersi E (2020) Influential post identification on Instagram through caption and hashtag analysis. Measure Cont 53(3–4):409–415 11. Alsaif SA, Hidri A, Hidri MS (2021) Towards inferring influential Facebook users. Computers 10(5):62 12. Ramana VS, Shruthi D, Saleru SM, Kaveri K (2022) Detection of women profanity in Twitter. In: Innovations in computer science and engineering. Springer, Singapore, pp 289–297 13. Ramana D, Reddy TH (2022) Detection of online hate in social media platforms for Twitter data: a prefatory step. In: Evolution in computational intelligence. Springer, Singapore, pp 411–419

Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers Seyed Muzaffar Ahmad Shah and Satwinder Singh

Abstract Social media is rapidly growing in popularity and has its advantages and disadvantages. Users posting their daily updates and opinions on social media may inadvertently hurt the feelings of others. Detecting hate speech and harmful information on social media is critical these days, lest it led to calamity. In this research, machine learning classifiers such as Naïve Bayes, support vector machines, logistic regression, and pre-trained models BERT and RoBERTa, developed by Google and Facebook, respectively, are used to detect hate speech and offensive content from Twitter data on a newly created dataset that included tweets and articles/blogs. The sentiments were obtained using the VADER sentiment analyzer. The results depicted that the pre-trained classifiers outperformed the machine learning classifiers utilized in this study. An accuracy score of 96% and 93% was scored by BERT and RoBERTa, respectively, on the tweet dataset, whereas on a dataset of articles/blogs, accuracy of 97% and 98%, respectively, was achieved by both the classifiers outperforming other classifiers used in this work. Further, it can also be depicted that neutral content is shared more in articles/blogs, hate content is mostly shared equally in both the tweets and article/blogs, whereas offensive content is shared higher in tweets than articles/blogs. Keywords BERT · Hate speech · Offensive language · RoBERTa · Tweets · VADER

1 Introduction Users flock to social media sites in greater numbers than any other Website. The most popular social media platforms are Twitter, Facebook, and Instagram. Users express themselves on these social media platforms by writing a post, sharing an image, or making a video. Twitter is a popular and easy-to-use microblogging Website that allows users to create and send brief messages of up to 140 characters, whose S. M. A. Shah · S. Singh (B) Department of Computer Science and Technology, Central University of Punjab, Bathinda, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_17

221

222

S. M. A. Shah and S. Singh

character limit has been updated to 280 characters. Twitter allows a succession of tweets to form a complete story due to its message length constraint. Twitter has also evolved as a powerful tool for gathering and disseminating news, forecasting election results, and exchanging political events and ideas [1]. Statistics show that on Twitter alone, 500 million tweets are sent every day, and approximately, 200 billion tweets are sent per year, resulting in an exponential increase in social media usage. However, certain organizations frequently abuse these means to defame others, propagating unpleasant, and hateful remarks directed at people and/or other groups [2]. With members from many walks of life, ethnicities, and beliefs, many people utilize these networks to spread violent content and abusive words directed at others [3]. As a result, it is critical to catch them as soon as possible and prevent them from spreading to a lower region, where they are more dangerous [2].

2 Related Work Any expression form which the speaker tries to humiliate or encourage hatred to a particular group of persons on some basis which may include race, skin, color, disability, or gender is termed as hate speech. Whereas offensive language may be termed as something which embarrasses or irritates people as it is unpleasant and disrespectful to them. In the detection of hate and offensive content, various techniques have been used. Authors have proposed different methodologies to get more accurate results. A thorough review of the literature is done to get the most of the published work. Watanabe et al. [3] proposed an approach where hate expressions can be detected over Twitter. The study was based on patterns and unigrams which were automatically collected from a training set. The technique was used further in machine learning algorithms that compromised SVM, random forest, and J48graft. Both binary and ternary classifications were used in the study and reached an accuracy level of 87.4% to detect whether that particular tweet was offensive or not, and 78.4% accuracy was acquired in detecting whether the tweet was hateful, offensive, or clean on the dataset of 2010 Tweets. Souza et al. [4] This was the first investigation for the best machine learning model to detect offensive language in Twitter data. Linear SVM and Naïve Bayes algorithms were applied as the models used were trending algorithms during the research. An accuracy of 92% via the Naïve Bayes approach and 90% accuracy via the Linear SVM method was achieved. According to the work, for better results with the linear SVM, a balanced dataset is required, and also Naïve Bayes classifier is easy to implement and thus makes it a good working algorithm. The problems in SVM like standard deviation problems are very low in Naïve Bayes thus it provided a better result. Alfina et al. [5] The main objective behind the work was to detect hate speech in the Indonesian language. A new dataset was created, and topics included were hate in religion, gender inequality, race, and ethnicity. For feature extraction, the proposed

Hate Speech and Offensive Language Detection in Twitter Data Using …

223

approach was based upon n-grams ranging from n = 1 to n = 4. Logistic regression, Naïve Bayes, SVM, Bayesian LR, and random forest decision tree were used for the classification purpose. F-measure of 93.5% was achieved by using the word n-gram feature through the random forest decision tree algorithm. Ketsbaia et al. [6] proposed an approach to take hate speech detection to a next level by using two datasets which were prepared by the University of Maryland and another dataset from the article which was published earlier by Davidson et al. [7]. Four machine learning algorithms were used including LR, linear SVC, Naïve Bayes, and Bernoulli Naïve Bayes. Out of these methods, linear SVC provided the best result, and Bernoulli Naïve Bayes had the lowest. Modha et al. [8] proposed an approach for detection and visualizing online aggression on social media platforms. The study categorized this aggression into three labels which were overtly, covertly, and non-aggressive content. Popular machine learning models were applied like SVM, LR, and deep learning approaches and achieved an F1 score of 64% on the Facebook English dataset and an F1 score of around 58% on the Twitter English dataset. Davidson et al. [7] used the crowd-sources lexicon of hate speech which helped the research to collect tweets which were containing hate keywords. The dataset was divided into three categories which were containing hate speech, just offensive language, and the last category contained none of the mentioned. A classifier was trained to distinguish this category. The results showed that the tweets falling under racist and homophobic sections are highly classified as hate speech. Tweets which did not contain any explicit keywords of hate speech were also very difficult to classify. Almeida et al. [9] proposed an approach to detect hate speech which was based on information theory like entropy and divergence for representation of documents. The work focused on the weight information of documents rather than their frequency. Results showed that the proposed method outperformed the previous approaches which used data representation like TF-IDF or unigrams both combined with text classifiers. Lastly, an F1 score of 86% was attained for identifying hate speech, 84% for offensive speech identification, and 96% for regular speech classes. The approach was nearly 2.27% faster than baseline approaches. Vigna et al. [10] introduced the first-ever hate speech classifier for Italian texts. The work focused on Facebook and considered the textual content of comments on public Italian pages. The authors proposed a variety of hate categories to differentiate between various hate speech contents. Machine learning algorithms used were SVM and LSTM and the dataset contained three-class and two-class labeling first composed of 3356 documents from where 2.8 k compromised of no hate content, 410 weak content, and 130 strong hate content. The other two-class dataset contained 3.5 k documents from that 2.7 were no hate content and 786 as hate content. In the strong, weak, and no hate classes, SVM scored 64% of accuracy while LSTM 60% of accuracy. While on hate and no hate classes, SVM scored 72% of accuracy and LSTM 75%. Gröndahl et al. [11] defined that hate speech detection models only work well on the same data they are trained on. The proposed method argued that for the successful detection of hate speech architecture of the model is as important as labeling and

224

S. M. A. Shah and S. Singh

type of data. Results further conveyed that all detection models are brittle against some constraints like typos and changing boundaries of words. The dataset used was from Twitter and Wikipedia. Zhang et al. [12] introduced a new method that was based on deep learning combined with the long–short-term memory and convolutional network. Dataset was compromised of hate speech focusing on religion and refugees. The method outperformed on 6 out of 7 datasets by 0.2–13.8 points in F1 score. Also, the automatic selection algorithm which drastically reduces the original feature spaces by over 90% and chooses other features from the dataset, and machine learning models resulted better if automatic features are selected like n-grams rather than traditional approaches. Chiril et al. [13] tackled hate speech detection via a multi-target approach for the first time. Autors have collected leveraged datasets to solve the problem of transferring knowledge from different datasets. The contribution from this work was threefold, first was to view the ability of detection models to get common properties via topic-based datasets. Another was to detect both (sexism, misogyny, racism) and hate speech targets. Lastly, they studied the usage of affective knowledge in semantically structured and semantic computing resources. The experiment was based upon the neural models which included multitask approaches. Plaza-Del-Arco et al. [14] analyzed hate speech on Spanish tweets which were against immigrants and women by performing classification experiments through different approaches. The research was based mostly on English content as there were few studies and other resources for other languages, like Spanish here. For the lexicon-based approach, linguistic resources were built and applied some patterns for the classification of the dataset. Results were comparable to the system for SemEval and HatEval tasks and other classifiers like decision tree. For machine learningrelated approaches, supervised classifiers were studied like DT, NB, LR, and SVM and more importantly use of n-grams. Koushik et al. [15] proposed an approach to automatically detect hate speech content in tweets through machine learning algorithms. BOW and TF-IDF approaches were used to feature extract things from tweets. A logistic regression classifier was applied for the classification of tweets into hate and non-hate speech. Dataset compromised of Twitter tweets provided from Kaggle. Of these, 70% were set for training purposes, and the remaining 30% for testing purposes. Though an accuracy of nearly 94% was achieved through both approaches, still it should have been better if the proposed method incorporated different techniques like adding linguistic features. Dorris et al. [16] proposed a defense system called HateDefender which detects hate speech and offensive content. It consisted of a defense model which was based upon neural network LSTM, and the explanation model was based upon LSTM’s gating signals. A dataset from the work of Davidson et al. [7] was used for the research. The proposed defense system was able to detect hate speech with an accuracy of 90.82% and offensive language detection accuracy by nearly 90%. Pitsilis et al. [17] proposed a detection scheme that was an ensemble of RNN classifiers. This incorporated a lot of features that were associated with the user’s

Hate Speech and Offensive Language Detection in Twitter Data Using …

225

information like their tendency toward sexism and racism. Around 16 k corpus of data was used for this detection scheme. The main point behind their research was to distinguish sexism and racism tweets from neutral tweets and the model achieved this by scoring higher accuracy levels. Badjatiya et al. [18] performed some extensive experiments with deep learning approaches to get the word embeddings. Over 16 k tweets were used for the work and the results conveyed that the deep learning approaches outperformed already used models which were char and word n-gram methods by the improvement of around 18 F1 points. Djuric et al. [19] proposed an approach to gather low-dimensional data of comments via neural language models which were fed to classification algorithms as input. The work was able to resolve issues of sparsity and high dimensionality which resulted in efficient detection of hate speech. Dataset used for the work compromised of user comments gathered from Yahoo Finance. Nobata et al. [20] the work was based upon the machine learning models to get the hate speech content online from user comments, and the results outperformed state-of-the-art deep learning methods. Dataset was created on their own for this work which compromised of comments posted on Yahoo finance and Yahoo news. Gao and Huang [21] proposed a system where context information of hate speech annotated corpus was well kept. Later, two systems were formulated compromised of a logistic regression model and the other of a neural network model. Dataset used for the work was 1528 Fox news comments. Results showed that the proposed approach outperformed state-of-the-art models by 3–4%, thus resulting in a 7% increase in F1 score. Roy et al. [22] developed an automatic system to overcome the issues of hate speech detection by using a deep convolution neural network. The approach utilized the text of the tweet by the GloVe embedding vector which was able to get the tweet semantics. Dataset for this work was gathered from Kaggle which consisted of 31,962 English written tweets. Results outperformed existing models by achieving precision, recall, and F1 scores of 97%, 88%, and 92%, respectively. Alakrot et al. [23] collected data from YouTube comments which consisted of both offensive and inoffensive comments posted in the Arabic language. Later, the dataset was used to train a support vector machine classifier that experimented with n-gram features. Finally, achieved an accuracy of 90.05% which was higher than reported by already published studies. Mossie and Wang [24] developed a model to detect hate speech on Facebook over Amharic texts. The data consist of Facebook posts and comments but Amharic text only. Word2Vec, TF-IDF, and n-grams were used to get the tweets in vectorized form, and labeling was done via human annotators. GBT and RF with deep learning algorithms like RNN-LST and RNN-GRU were compared in their experiment. Lastly, Word2Vec with RNN-GRU achieved the best performance scoring 97.85 as AUC and 92.56 as accuracy. Charitids et al. [25] the work focused on countering hate content on social media accounts that were linked to journalists. Few journalists grouped a definition of hate speech in their point of view. Then, a huge set of tweets were collected related to

226

S. M. A. Shah and S. Singh

journalists in various languages. Later, this dataset was made publicly available in five different languages. The main goal behind their study was to contribute their own dataset to the community so others may implement this dataset and attain higher results for identifying offensive and hate content. Lastly, by comparing different models, there were no performance differences in all the models, however, the sCNN model performed well individually with the best performance. Aziz et al. [26] the main problem of their task was to get the main boundary between hate speech and offensive language. Dataset used was already published by the existing author, which consists of Twitter tweets. Experiments were performed by taking into consideration of n-gram enhanced n-gram for the classification of tweet data into three classes which were related to hate content, offensive content, and neutral tweets. The results revealed a good accuracy score of 91% with the average scores in precession, recall, and F1 score.

3 Proposed Approach The main aim of this study is to get hate speech and offensive content in Twitter tweets and articles/blogs using machine learning classifiers. Before extracting data from Twitter, a query set was framed which consisted of words belonging to neutral, hate, and offensive categories. Most of the queries were collected from popular Website like Hatebase.org and digitalspy.com which contains words based upon the intensity of hate/slur meaning. Various authors cited this, examples being [2, 9]. Though a word set was constructed from Hatebase.org and digitalspy.com and categorized into three different sections, still clear sentiment from the tweets and Web articles/blogs was captured via VADER, a Python library, for data labeling.

3.1 Data Collection For this research work, two datasets have been used which were created from scratch. By using Twitter API, the work was able to extract data based on the search query. Search queries were grouped into three categories; every category contained a nearly equal number of search items.

3.2 Neutral Words These words were selected for the search query to get tweets and Web articles/blog links pertaining to neutral content: love, joy, honest, happy, good, goal, glad, generous, delight, cute, cheerful, calm, birthday, cool, excited, masterpiece, emotional, heart, celebrate, prayer, wisdom, talent, and respect.

Hate Speech and Offensive Language Detection in Twitter Data Using …

227

3.3 Hate Words These words were selected for the search query to get tweets and Web articles/blog links pertaining hate content: hate, vandalize, toxic, threat, terror, shit, sue, suicide, sorrow, silly, sexist, shocked, harm, fool, fascist, harass, bomb, bigot, awful, insult, irritate, misogyny, Nazi, racism, ridiculous, and scandal.

3.4 Offensive Words These words were selected for the search query to get tweets and Web articles/blog links pertaining to offensive content: asshole, bastard, motherfucker, black, trash, fuck, arsehole, cult, vagina, uncivilized, dickhead, wanker, bullshit, ape, nigga, punani, faggot, idiot, tosser, rascal, libtard, dick, bellend, bint, bollock, fanny, crap, twat. In Table 1, it is clear that 73,739 tweets were extracted from Twitter on neutral query search, 63,403 tweets were captured with hate keyword query search, and 69,430 tweets were extracted from Twitter with offensive keyword query search. In total, a dataset of 206,572 tweets was collected. Table 2 depicts the number of tweets captured that external Weblinks. 10,000 tweets from each class were collected resulting in a dataset of 30,000 tweets containing Weblinks. Table 1 Overview of data extracted from Twitter without any Weblinks associated

Tweet set Neutral keywords

73,739

Hate keywords

63,403

Offensive keywords

69,430

Total tweets

Table 2 Overview of data extracted from Twitter with Weblinks

Data extracted (tweets)

2,06,572

Tweet set containing links

Data extracted (tweets)

Neutral keywords

10,000

Hate keywords

10,000

Offensive keywords

10,000

Total

30,000

228

S. M. A. Shah and S. Singh

3.5 Data Cleaning Data extracted on Twitter contained some noise in it, the noise compromised of unnecessary tweets and URLs and unnecessary stuff like punctuations and nonASCII characters. The main task after extracting data was to remove this redundant data. Next was to neutralize all the tweet text and article/blog data which means to convert the whole data into lowercase letters so that the learning models do not group words in lower case and upper case as different. After that stop words were removed which were not helpful as they add no meaning to the sentence in the context of detecting hate speech and offensive language content examples being “a”, “to”, “at”, etc. To remove these unnecessary things, NLTK toolkit with the help of spaCy was used which contains inbuilt functionalities to work with these things.

3.6 RT’s and @ Removal RT or re-tweet is the feature in Twitter that helps Twitter users to repost a tweet that has been posted by another user. Since these types of tweets are just the copy of other tweets, a Python module called Tweepy was used to exclude tweets containing RT before extraction of tweets. Once the tweet set was extracted, various unnecessary information was there within the tweet and needs to be removed. @ Symbol appeared with every tweet before the username who posted the tweet, as this was unnecessary in the process of detection of hate speech, @ symbols were removed including the username.

3.7 Unnecessary URL Removal The tweet set consisted of URLs that are automatically shortened by Twitter just to minimize the tweet length. As this research had to work with the article/blogs, these links were shortened to long URLs, and those tweet URLs were removed which consisted of links leading to those websites from where no article/blog was expected to count. Examples are YouTube, Amazon, Spotify, and other-related sites.

3.8 Special Character and Stop Word Removal Special characters like comma (,), the note of exclamation (!), the pound sign (#), dollar sign ($), and others non-ASCII codes were removed so that the tweet set and article/blog set are free from the trash. Also, words like “a”, “the”, “are”, etc., did not add meaning too to the tweet set and are termed as stop words and are removed

Hate Speech and Offensive Language Detection in Twitter Data Using … Table 3 Data after pre-processing

Dataset

229

Data after pre-processing

Tweets

159,704 tweets

Articles/blogs

1980 external article/blog Weblinks

via inbuilt Python functionality. After the pre-processing stage, data were further reduced. Table 3 shows the amount of data left. Table 3 depicts how much data were left after the pre-processing step which includes 159,704 Tweets and 1980 external Weblinks.

3.9 Data Labeling Data labeling was done by a sentiment analyzer called VADER. VADER gives sentiment measurements of the text in four categories; the compound sentiment scores are categorized from −1 to 1 where −1 denoted negative content, and +1 denotes positive content. As the classification is based upon trinary classification, sentiment scores were divided into 3 categories, 0 denoting neutral content, 1 being hate content, and 2 being offensive content. The data sentiments were semi-automatically captured by VADER. Sentiment scores ranging from > 0 to 1 were labeled as neutral content and were assigned a class of 0, sentiment scores ranging from > −0.5 to 0 were labeled as hate content and were assigned a class of 1; lastly, sentiment scores ranging from −1 to −0.5 were labeled as offensive content and were assigned a class of 2.

3.10 Word Cloud Word cloud gives the frequency of repeated words in the corpus. A word cloud was generated for all the repeated words which fall in the category of neutral, hate, or offensive tweets from the dataset contained tweets, and a word cloud of all the data was generated from the article/blog dataset (Figs. 1, 2, 3 and 4).

3.11 Data Analysis For the data analysis, five classifiers were used which included Naïve Bayes, support vector machine, logistic regression, BERT, and Roberta. Below are the basic implementation details of the classifiers used in this work.

230 Fig. 1 Word cloud of neutral tweets collected in dataset

Fig. 2 Word cloud of hate tweets collected in dataset

Fig. 3 Word cloud of offensive tweets collected in dataset

Fig. 4 Word cloud of article/blog dataset

S. M. A. Shah and S. Singh

Hate Speech and Offensive Language Detection in Twitter Data Using …

3.11.1

231

Naïve Bayes

Naïve Bayes makes a certain assumption that every feature makes an equal and independent contribution to the result. In this work, multinomial Naïve Bayes was used for the classification process as this is mostly used for the classification of text. This classifier uses the word frequency for the prediction of the classes. Features were extracted with the help of the countVectorizer function available with the Sklearn Python module. No parameter tuning was done with this classification model.

3.11.2

Logistic Regression

Multinomial type of logistic regression, available in Python, was applied for the classification of the text, and this applies for multiclass classification. Similarly, the countVectorizer function was used here for the conversion of text to vector form and for the feature extraction. Like in the case of the Naïve Bayes classifier, no parameter tuning was done here too.

3.11.3

Support Vector Machine

For the implementation part, the linearSVC function was used to perform the model task. It returns a good hyperplane which is useful in categorizing the data. In this model too, no parameter tuning was done. The results were considered from the base model only.

3.11.4

BERT

The BERT tokenizer of the base uncased version was used, and the maximum length of the tokenized text was 512. Then, the truncation part was set to true for the [CLS] token at the start of the sentence and [SEP] at the end. Also, those tweets which were not in the English language were not considered fit. For the class balancing, random over sampler was used, for the text to vector conversion, the work followed one hot encoding method with the three output neuron layers for the classification of three different classes.

3.11.5

RoBERTa

In the implementation part, BERT tokenizer was implemented in a similar way as RoBERT tokenizer was used, with the length of tokens being 512 and the truncation part set to true. As similar to BERT, text to vector conversion, one hot encoding method was followed, and for the class balancing, random over sampler was used

232

S. M. A. Shah and S. Singh

Fig. 5 Proposed architecture of the work

with the three output neuron layers for the classification of three different classes (Fig. 5). The proposed architecture depicts how the data are predicted as neutral, offensive, or hate content. As discussed earlier, a corpus of data was selected for the search query, and related tweets were captured with and without links. Later, pre-processing was done which included data cleaning and data labeling. Data labeling was done by the VADER sentiment analyzer. Features extraction and word to vector form were done by countVectorizer function and oneHotEncoder () available with Sklearn, a Python library. Then, training data were selected from the data which consisted of 80% of the total data. Classification models were applied which included Naïve Bayes, support vector machine, logistic regression, and pre-trained models which were BERT and RoBERTa.

4 Results and Discussion This section is composed of results that were achieved after applying classification models to the tweet and article/blog datasets. The sentiment scores of tweets were calculated via the VADER sentiment analyzer and were quite impressive. Applying machine learning and pre-trained learning models on these two datasets compromising of tweet set and article/blog set, Google’s BERT model outperformed other classifiers used in this work on the tweet dataset, whereas Facebook’s RoBERTa model on the article/blog dataset achieved higher results as compared to other models used in the research work. Table 4 shows the comparison of evaluation metrics achieved on the tweet dataset. Table 5 shows the comparison of evaluation metrics achieved on the article/blog dataset.

Hate Speech and Offensive Language Detection in Twitter Data Using …

233

Table 4 Comparison of different models on a dataset that compromised tweets Sentiment analyzer

Learning classifiers

Accuracy (%)

VADER

Naive Bayes 76

Support vector machine

77

Logistic regression

78

BERT

RoBERTa

96

93

Tweet set

Precision (%)

Recall (%)

F1-score (%)

Neutral

95

71

81

Hate

49

74

59

Offensive

71

85

78

Neutral

84

91

87

Hate

44

27

34

Offensive

75

79

77

Neutral

86

90

88

Hate

47

34

39

Offensive

75

81

78

Neutral

99

96

98

Hate

86

92

89

Offensive

96

98

97

Neutral

99

93

96

Hate

77

88

82

Offensive

93

97

95

Table 5 Comparison of different models on a dataset that compromised articles/blogs Sentiment analyzer

Learning algorithm

Accuracy (%)

Tweet set

VADER

Support vector machine

70

Neutral

74

88

81

Hate

48

26

34

Offensive

63

47

54

Logistic regression

70

Neutral

73

88

80

Hate

58

26

36

Offensive

59

45

51

Neutral

99

77

87

Hate

61

98

75

Offensive

76

95

85

Neutral

97

99

98

Hate

98

98

98

Offensive

98

94

96

Naive Bayes 84

BERT

RoBERTa

97

98

Neutral

Precision (%)

Recall (%)

F1-score (%)

100

98

99

Hate

92

100

96

Offensive

98

98

98

234

S. M. A. Shah and S. Singh

Figures 6a, b and 7a, b show the confusion matrices generated by BERT and RoBERTa classifiers on the tweet and article/blog datasets, respectively. Figure 8 shows the comparison of sentiments of tweets and article/blog dataset. From the Fig. 8, it can be visualized that hate content is equally shared on both Twitter and Web articles/blogs, comparing tweets and articles offensive content, tweets contain a higher percentage of offensive content than articles/blogs, whereas neutral content is shared more in article/blogs.

Fig. 6 a Confusion matrix generated by BERT model on tweets dataset b Confusion matrix generated by RoBERTa model on a tweets dataset

Fig. 7 a Confusion matrix generated by BERT model on article/blog dataset b Confusion matrix generated by RoBERTa model on article/blog dataset

Hate Speech and Offensive Language Detection in Twitter Data Using …

235

Fig. 8 Sentiment comparison of tweet and article/blog dataset

5 Conclusion and Future Work The timely detection of hate speech and offensive content on social media is of utmost importance. In this work, various machine learning models including pre-trained models like BERT and RoBERTa, developed by Google and Facebook respectively, are implemented to get hate speech and offensive content in Twitter data on a newly created dataset. The dataset was divided into two parts, one containing just tweets and the other containing Web articles/blogs. Most of the works done in detecting hate and offensive tweets from Twitter data focused on binary classification, but in this work, ternary classification is done which consists of neutral, hateful, and offensive content. Data are captured from Twitter which consists of tweets and links which lead to articles/blogs. A comparison was made at the end just to explore where hate and offensive content are spreading more. From the comparison results, it is observed hate content is equally shared on both Twitter and Web articles, whereas tweets contain a higher percentage of offensive content than articles/blogs, and neutral content is shared more in articles/blogs. Taking comparison of models, the BERT model performed better in detecting hate and offensive content with an accuracy score of 96% over Naïve Bayes, SVM, logistic regression, and RoBERTa classifiers having accuracy scores of 76%, 77%, 78%, and 93%, respectively, when tweet set was considered. While with the article/blog dataset, RoBERTa scored more than Naïve Bayes, SVM, logistic regression, and BERT classifiers achieving an accuracy score of 98% which is way more than Naïve Bayes, SVM, logistic regression, and a bit closer to BERT’s accuracy. In future, the approach could incorporate more data from articles and blogs. During the pre-processing phase, data were cleaned in such a way that the focus was only

236

S. M. A. Shah and S. Singh

meant to be on text only but more users are sharing emojis that have their sentiments. The current study focuses on VADER sentiment analyzes, although additional sentiment analyzes and comparative analyzes could be done in future.

6 Contribution and Novelity The contribution of this work will provide the research society the content analysis based on hate, offensive, and neutral content. Also, the knowledge learned by BERT and RoBERTa models can be considered off-the-shelf knowledge and can be transferred to the new task relevant to hate and offensive speech. The work focused on a newly created dataset that was extracted from Twitter. The data consisted of tweets and tweet links, from these links data were scraped off from the articles/blogs with the help of Web scraping techniques. The research took a generalized dataset to work with and was not focused on a particular domain. Also, machine learning classifiers were applied including pre-trained classifiers from Google and Facebook. A later comparison was made to get an overview on what platform is hate and offensive speech spreading more whether that be social media or articles/blogs.

References 1. Ayo FE, Folorunso O, Ibharalu FT, Osinuga IA, Abayomi-Alli A (2021) A probabilistic clustering model for hate speech classification in twitter. Expert Syst Appl 173. https://doi.org/10. 1016/j.eswa.2021.114762 2. Kapil P, Ekbal A (2020) A deep neural network based multi-task learning approach to hate speech detection. Knowledge-Based Syst 210. https://doi.org/10.1016/j.knosys.2020.106458 3. Watanabe H, Bouazizi M, Ohtsuki T (2018) Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6:13825–13835. https://doi.org/10.1109/ACCESS.2018.2806394 4. Souza A de, Abreu DC, Souza, GA (n.d.). Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata. http://shura.shu.ac.uk/26018/, https://orcid.org/0000-0001-7461-7570 5. Alfina I, Mulia R, Fanany MI, Ekanata Y (2018) Hate speech detection in the Indonesian language: A dataset and preliminary study. In: 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017, 2018-January, 233–237. https:// doi.org/10.1109/ICACSIS.2017.8355039 6. Ketsbaia L, Chen X (n.d.) Detection of hate Tweets using machine learning and deep learning. https://t.co/xUCcwoetmn 7. Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. www.aaai.org 8. Modha S, Majumder P, Mandl T, Mandalia C (2020). Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance. Expert Syst Appl, 161. https://doi.org/10. 1016/j.eswa.2020.113725

Hate Speech and Offensive Language Detection in Twitter Data Using …

237

9. Almeida TG, Nakamura FG, Souza B, Nakamura EF (2017) Detecting hate, offensive, and regular speech in short comments. In: WebMedia 2017—Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web, 225–228. https://doi.org/10.1145/3126858.3131576 10. Vigna F, Cimino A, Dell’orletta F, Petrocchi M, Tesconi M (n.d.) Hate me, hate me not: Hate speech detection on Facebook. https://curl.haxx.se 11. Gröndahl T, Pajola L, Juuti M, Conti M, Asokan N (2018) All you need is “love”: evading hate-speech detection. http://arxiv.org/abs/1808.09115 12. Zhang Z, Robinson D, Tepper J (2016) Hate speech detection using a convolution-LSTM based deep neural network. https://doi.org/10.475/123_4 13. Chiril P, Pamungkas EW, Benamara F, Moriceau V, Patti V (2022) Emotionally informed hate speech detection: a multi-target perspective. Cogn Comput 14(1):322–352. https://doi.org/10. 1007/s12559-021-09862-5 14. Plaza-Del-Arco FM, Molina-Gonzalez, MD, Urena-Lopez LA, Martin-Valdivia MT (2021) A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access 9:112478–112489. https://doi.org/10.1109/ACCESS.2021.3103697 15. Koushik G, Rajeswari K, Muthusamy SK (2019) Automated hate speech detection on Twitter. In: Proceedings—2019 5th International Conference on Computing, Communication Control and Automation, ICCUBEA 201, September 19. https://doi.org/10.1109/ICCUBEA47591. 2019.9128428 16. Dorris W, Hu RR, Vishwamitra N, Luo F, Costello M (2020) Towards automatic detection and explanation of hate speech and offensive language. In: IWSPA 2020—Proceedings of the 6th International Workshop on Security and Privacy Analytics, 23–29. https://doi.org/10.1145/337 5708.3380312 17. Pitsilis GK, Ramampiaro H, Langseth H (2018) Effective hate-speech detection in Twitter data using recurrent neural networks. Appl Intell 48(12):4730–4742. https://doi.org/10.1007/s10 489-018-1242-y 18. Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: 26th International World Wide Web Conference 2017, WWW 2017 Companion, 759–760. https://doi.org/10.1145/3041021.3054223 19. Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: WWW 2015 Companion—Proceedings of the 24th International Conference on World Wide Web, 29–30. https://doi.org/10.1145/2740908.274 2760 20. Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. 25th International World Wide Web Conference. WWW 2016:145–153. https://doi.org/10.1145/2872427.2883062 21. Gao L, Huang R (2017) Detecting online hate speech using context aware models. http://arxiv. org/abs/1710.07395 22. Roy PK, Tripathy AK, Das TK, Gao XZ (2020) A framework for hate speech detection using deep convolutional neural network. IEEE Access 8:204951–204962. https://doi.org/10.1109/ ACCESS.2020.3037073 23. Alakrot A, Murray L, Nikolov NS (2018) Towards accurate detection of offensive language in online communication in Arabic. Procedia Comp Sci 142:315–320. https://doi.org/10.1016/j. procs.2018.10.491 24. Mossie Z, Wang JH (2020) Vulnerable community identification using hate speech detection on social media. Info Process Manag 57(3). https://doi.org/10.1016/j.ipm.2019.102087 25. Charitidis P, Doropoulos S, Vologiannidis S, Papastergiou I, Karakeva S (2019). Towards countering hate speech against journalists on social media. https://doi.org/10.1016/j.osnem. 2020.100071 26. Abdul Aziz NA, Aizaini Maarof M, Zainal A (2021). Hate speech and offensive language detection: a new feature set with filter-embedded combining feature selection. In: 2021 3rd International Cyber Resilience Conference, CRC 2021, January 29. https://doi.org/10.1109/ CRC50527.2021.9392486

Secrecy Rate Optimization for Energy Efficient Cognitive Relay Networks Md. Khorshed Alam and Trina Saha

Abstract We are concerned about the secrecy of power efficient transmission in an underlay cognitive relay network. We recommend a cooperative cognitive relay model in presence of a primary consumer. By using the use of the multi-relays equipped with single antenna, a secondary source and a secondary destination try to communicate with each other in presence of an eavesdropper. We formulated an optimization problem to maximize the secrecy efficiency (SE) and energy efficiency (EE) of the proposed version retaining in consideration of data rate and transmit power of the cognitive communication as well as interference problem with the primary user. Our formulated problem is a non-convex problem. We recognize that there are techniques to resolve a non-convex problem, such as one is relaxation technique and another one is randomization technique. Both the approach can give us invalid answer, so we brought a rank one constraint with main problem that is considered as a penalty approach and solved the problem by way of the use of an iterative set of rules which is called active-set algorithm. We furnished the numerical and graphical representation of proposed scheme result and tried to show a contrast of effectiveness between proposed cooperative version and present MIMO cognitive radio network model. The result shows individual secrecy rate at destination in proposed model which is more than the present MIMO model. However, overall secrecy rate is higher in the MIMO model than the proposed model. Keywords Cognitive relay network (CRN) · Energy efficiency (EE) · Multi-input–multi-output (MIMO) · Physical layer security (PLS) · Quality of service (QoS) · Secrecy efficiency (SE)

Md. K. Alam (B) · T. Saha Department of Computer Science & Engineering, State University of Bangladesh, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_18

239

240

Md. K. Alam and T. Saha

1 Introduction Typically available and usable spectrum is insufficient to cover a large number of secondary consumers. In an underlying cognitive network (CRN), cognitive radio enables secondary users (SUs ) to share the licensed spectrum of primary users (PUs ) [1]. Keeping minimal quality of service (QoS) with constrained inter-transmission energy is a massive design task confronted through underlay CRNs for secondary transmissions, and it needs precise tuning and compatibility of the transmit power of the SUs . Via limiting the transmit energy, it is easy to maintain the primary communication appropriate enough [2]. It is far the traits of broadcast wireless communication, for which transmitted secret messages in CRNs fall into entice of overhearing misuses from eavesdroppers. For managing this hazard, a few physical layer security (PLS) patterns had been imposed in CRNs to make secure and safe data transmission [3]. The efficiency of energy (EE) has been appeared as an vital problem in CRNs fast and massive data rate constraints and the increased energy cost [4]. For balancing the need for a excessive correct data rate and a simple power user in CRNs, the negotiation between efficiency of secrecy (SE) and energy efficiency (EE) was experimented [5]. Similarly, the authors experimented the cooperative version that is our proposed scheme, between the primary and the secondary users to maximize the security of the cognitive networks. Besides the security motive, the energy efficiency (EE) has additionally been taken in thoughts as a serious issue in CRNs hovering data rate constraints and the growing cost of energy [6]. An energy efficient (EE) power allocation scheme to maximize the overall output for energy consumption in OFDMprimarily-based CNs was formulated and proposed [7]. An elaborated joint power allocation and transmission beamforming (BF) set of rules were fabricated to optimize the energy efficiency for multiple input–multiple output (MIMO) CNs [8]. Getting motivation from the act that the systems of relaying can substantially enhance the safety transmission overall performance of cognitive network, they propose a BF relaying system to get the most beneficial energy efficiency and secrecy in cognitive relay networks (CRNs) in front of an eavesdropper. Particularly, we primarily deduce a constrained optimality function for making the SE and EE of the CRN maximum with the constraints requirements of data rate, power transmission, and PU’s interference constraints [9]. A MIMO model is considered, and optimum average secrecy and energy efficiency (SEE) are acquired via using penalty approach [10]. As our problem is a non-convex problem, it is not always so easy to solve it with a regular iterative set algorithm. So, we proposed an iterative active-set algorithm for finding better result. We have been aware of the local minima problem that might harm our required result. Therefore, we needed to take the penalty approach of the solution technique. We technically avoided relaxation and randomization method as it can supply in-feasible result. In this paper, firstly, we have tried to formulate the problem where we have used multi-relays equipped with single antenna. We have designed all the required mathematical equations that will be helpful to show the theoretical proof of efficiency of the proposed model. As there will be multiple relays available, there is a question arises which relay will be best for a particular communication. To

Secrecy Rate Optimization for Energy Efficient Cognitive Relay Networks

241

answer this question, keeping in mind different selection criteria, we have showed three available techniques for best relay selection. To solve the proposed non-convex problem, we have proposed an iterative active-set algorithm after the completion of proper mathematical derivation. By applying the proposed algorithm, we have generated the simulation result and compared the result with a reference MIMO model [10] for different number of relays like N = 4, N = 6 and N = 8.

2 Problem Formulation The proposed design illustrated in Fig. 1. Rayleigh flat-fading scenario is assumed, in which the identically and independently distributed (i.i.d.) source-to-relay, relayto-destination, relay-to-ED, and relay-to-PU channel coefficients are designated as { }N N N N {h sr }i=1 , h rp i=1 {h r e }i=1 and {h rd }i=1 respectively, where h rp is considered as the channel interference. In two time phases, the total source to destination communication ] takes place. The first phase duration, SUs transmits signal x S , satisfying [ E |xs |2 = 1. The source sends a symbol to SUs in 1st phase, and the signal is received at the ith relay which is expressed as follows: yri =



Ps h sri xs + n r

Fig. 1 Multiple relay CRN with an eavesdropper

242

Md. K. Alam and T. Saha

Here, Ps is used to represent the amount of transmit power at S Us , h sri = √ ∼ vsri h ∼ vector) and corresponding sri · h sri and vsri are considered 1 × 1 fading channel ( path loss between SUs and SUr, respectively. n r ∼ CN 0 N , σr2 I N represents additive Gaussian noise (AWGN) vector. And the variance σr2 = ∇ f N0 , ∇ f denotes system bandwidth, and N 0 corresponds single-sided noise spectral density. For making quality of service (QoS) for PU uninterrupted, the interference from S Us represented by I p(1) is a must to keep below a pre-calculated threshold value I pth : | |2 I p(1) = Ps |h sp | ≤ I pth √ where h spi = vspi h ∼ spi denotes SUs −PU link fading channel coefficient. SUs maximum transmit power is limited to a predefined value Psmax , and we can define Ps like: { } I ph max Ps = min | |2 , Ps |h sp | In second transmission phase, yri is first multiplied by 1 × 1 BF weight matrix W r in SUri , SUri transmit power of ith relay can be expressed as follows: ( ) Pri = Ai2 Ps |h sri Wr |2 + N0 The signal ydi and yei received at the destination SU d and ED, respectively, from relay Ri is expressed as follows: √ ydi = yri' h rdi Wr + n d = √ yei =

yri' h rei Wr

Pri yri h rdi Wr + n d Ps h sri |2 + N0 Pri yri h rei Wr + n e Ps h sri |2 + N0

+ ne =

Ai is randomly selected amplification factor. End-to-end SNR γdi and γei of the ith relay link can be expressed as follows: | |2 Ps |h rHdi Wr h sri | γdi (Wr ) = ∥ ∥2 σ 2 ∥h H Wr ∥ + σ 2 r

r di

F

d

| |2 Ps |h rHei Wr h sri | γei (Wr ) = ∥ ∥2 σr2 ∥h rHei Wr ∥ F + σe2 The end-to-end SNR at the secondary destination and eavesdropper due to N relaying links is then given by

Secrecy Rate Optimization for Energy Efficient Cognitive Relay Networks

γ D (Wr ) =

N ∑

γdi (Wr ) =

i=1

N ∑ i=1

243

| H |2 Ps |h rdi Wr h sri | ∥ ∥2 σ 2 ∥h H Wr ∥ + σ 2 r

rdi

F

d

| H |2 N N ∑ ∑ Ps |h rei Wr h sri | γ E (Wr ) = γei (Wr ) = ∥ ∥2 2∥ H 2 ∥ i=1 i=1 σr h rei Wr F + σe The interference from SUr , denoted as I p2 (Wr ), to PU should be constrained as follows: I p2 (Wr ) =

N ∑

Ii (Wr ) =

i=1

N ∑

| |2 Pri |h rpi Wr | ≤ I pth

i=1

The available SR of the CRN can be described as follows: Rsec (Wr ) = [R D (Wr ) − R E (Wr )]+ where R D (Wr ) =

1 log2 (1 + γ D (Wr )) 2

R E (Wr ) =

1 log2 (1 + γ E (Wr )) 2

Therefore, the secrecy and energy efficiency (SEE), our formulated function which optimizes the available SR and the consumption of power considered CRN, is expressed by ηSEE =

Rsec (Wr ) (bit/Hz/Joule) Ptotal (Wr )

where ) ( N ∑ 1 (1) (1) (2) (2) Ptotal (Wr ) = Pri (Wr ) + PC,s + N PC,r + N PC,r + PC,d ζs Ps + ζr 2 i=1 The target is to maximize: max ηSEE = Wr

Rsec (Wr ) Ptotal (Wr )

s.t C1 : R D (Wr ) ≥ R min D

244

Md. K. Alam and T. Saha

{

I pth C2 : Ps = min | |2 , Psmax |h sp | C3 :

N ∑

}

Pri (Wr ) ≤ Prmax

i=1

C4 : I p2 (Wr ) ≤ I Pth

2.1 Best Relay Selection (RS1) For every ith relay, considering {hsri }, {hsp }, {hrpi }, {hrdi }, and {hrei }, we try to find: max ηSEEi = Wr

Rseci (Wr ) Ptotali (Wr )

The calculated maximum value of max η S EEi gives us the best ith relay. Wr

2.2 Best Relay Selection (RS2) For every ith relay, considering {hrdi } and {hrei }, we try to calculate-

max Wr

γdi (Wr ) = γei (Wr )

The calculated maximum value of max Wr

H Ps |h rdi Wr h sri |

∥ | ∥

2



2 +σd2 F 2 H Ps h rei Wr h sri 2 H σr2 h rei Wr F +σe2

σr2

H h rdi Wr

γdi (Wr ) γei (Wr )



|

gives us the best ith relay.

2.3 Best Relay Selection (RS3) For every ith relay, considering {hrdi }, we try to get| H |2 Wr h sri | Ps |h rdi max γdi (Wr ) = ∥ ∥2 Wr σ 2 ∥h H Wr ∥ + σ 2 r

rdi

F

d

Secrecy Rate Optimization for Energy Efficient Cognitive Relay Networks

245

The calculated maximum value of max γdi (Wr ) gives us the best ith relay. Wr

3 Proposed Model For solving the non-convex problem, we propose an iterative active-set algorithm. Algorithm 1: My Proposed Algorithm 1. ASA: maximize q(x) subject to Ax ≤ b 2. Set i = 0 and ρ = 0.001 3. Initialize an feasible point x 0 ; 4. The active-set A(x) at x is } {| A(x) = i |aiT x = [b]i 5. If x ∗ solves ASA, Arg max q(x) ≡ arg max q(x) subject to aiT x = [b]i for all i ∈ A(x∗) 6. Pick a subset W k of {1,…,m} and find xk+1 = arg max q(x) If xk+1 does not solve ASA, adjust W k to form W k+1 and repeat. 7. Ensure all iterates are feasible, i.e., A x ≤ b 8. If Wk ⊆ A(xk ) I. Ak xk = bk and Ak xk+1 = bk II. xk+1 = xk + sk , where sk = arg max q(xk + s)subject to Ak s = 0

4 Simulation Result This section provides numerical results to calculate the performance of the proposed SEE maximization scheme and the existing MIMO work [10]. For both the simulation problem, we assumed some predefined threshold value. The values are represented in the following Table 1. All simulation results are obtained by averaging over 1000 random channel realizations. Figures 2 and 3 show the energy efficiency scheme for our proposed cooperative model, MIMO [10], RS1, RS2, and RS3 for N = 4 and N = 8, respectively. we can see that the energy efficiency for all cases simulation result almost overlapping each other except RS3. In case of RS3, the energy efficiency gets fixed when N ≥ 6. Figures 4, 5, and 6 show the comparative analysis of secrecy rate maximization for our proposed cooperative model, MIMO [10], RS1, RS2, and RS3 for N = 4, N = 6, and N = 8, respectively. As we can see here, secrecy rate of MIMO model is the highest of all and in between our proposed schemes RS3 which gives the best result.

246 Table 1 Threshold values

Md. K. Alam and T. Saha Name

Symbol Value considered

System bandwidth

∇ f

Single sided noise spectral density N0

10 MHz −174 dBm/Hz

Inefficiency power factor at SUs

ζs

2.6

Inefficiency power factor at SUr

ζr

2.6

Circuit power consumption at SUs PC,s Circuit power consumption at SUr PC,r

25 dBm 25 dBm

Convergence tolerance

δ

10−3

Source transmit power

Psmax

30 dBm

Minimum secrecy rate

R min D I pth

1 bit/s/Hz

Interference threshold

−10 dB

Fig. 2 Average EE versus Prmax

From Figs. 7, 8, 9, 10, 11, 12, we tried to show the comparative analysis of the secrecy rate maximization scheme at destination and minimum rate at eavesdropper which was one of our prime objectives.

Secrecy Rate Optimization for Energy Efficient Cognitive Relay Networks

Fig. 3 Average EE versus Prmax

Fig. 4 Average SE versus Prmax

247

248

Fig. 5 Average SE versus Prmax

Fig. 6 Average SE versus Prmax

Md. K. Alam and T. Saha

Secrecy Rate Optimization for Energy Efficient Cognitive Relay Networks

Fig. 7 Average RD versus Prmax

Fig. 8 Average RE versus Prmax

249

250

Fig. 9 Average RD versus Prmax

Fig. 10 Average RE versus Prmax

Md. K. Alam and T. Saha

Secrecy Rate Optimization for Energy Efficient Cognitive Relay Networks

Fig. 11 Average RD versus Prmax

Fig. 12 Average RE versus Prmax

251

252

Md. K. Alam and T. Saha

5 Conclusion We have evaluated the SE and EE maximization problem in CRNs. As the originally formulated problem is non-convex and difficult to solve, we have used an iterative active-set algorithm to find our required result. Finally, numerical results have been given to show that the proposed scheme can significantly enhance both the security and energy efficiency of CRN. Though as compared with the previous MIMO model, SR maximization schemes is slight lower, but in practical scenario, it is thus demonstrating the superiority of our proposed scheme.

References 1. Haykin S et al (2005) Cognitive radio: brain-empowered wireless communications. IEEE J Sel Areas Commun 23(2):201–220 2. Chen D, Ji H, Li X (2011) Optimal distributed relay selection in underlay cognitive radio networks: an energy-efficient design approach. In: Wireless Communications and Networking Conference (WCNC), pp 1203–1207 3. Shu Z, Qian Y, Ci S (2013) On physical layer security for cognitive radio networks. IEEE Net 27(3):28–33 4. Hong X, Wang J, Wang C-X, Shi J (2014) Cognitive radio in 5g: a perspective on energy-spectral efficiency trade-off. IEEE Commun Mag 52(7):46–53 5. Haider F, Wang C-X, Haas H, Hepsaydir E, Ge X, Yuan D (2015) Spectral and energy efficiency analysis for cognitive radio networks. IEEE Trans Wireless Commun 14(6):2969–2980 6. Gabry F, Li N, Schrammar N, Girnyk M, Rasmussen LK, Skoglund M (2014) On the optimization of the secondary transmitter’s strategy in cognitive radio channels with secrecy. IEEE J Sel Areas Commun 32(3):451–463 7. Mao J, Xie G, Gao J, Liu Y (2013) Energy efficiency optimization for ofdm-based cognitive radio systems: A water-filling factor aided search method. IEEE Trans Wireless Comm 12(5):2366–2375 8. Zhang X, Li H, Lu Y, Zhou B (2015) Distributed energy efficiency optimization for mimo cognitive radio network. IEEE Commun Lett 19(5):847–850 9. Zou Y, Champagne B, Zhu W-P, Hanzo L (2015) Relay-selection improves the securityreliability trade-off in cognitive radio systems. IEEE Trans Comm 63(1):215–228 10. Ouyang J, Zhu W-P, Massicotte D, Lin M (2016) Energy efficient optimization for physical layer security in cognitive relay networks. In: Communications (ICC), 2016 IEEE International Conference on, IEEE, pp 1–6

Avian Influenza Prediction Using Machine Learning Maana Shori and Kriti Saroha

Abstract The avian influenza virus can be the cause of economic devastation due the impact on poultry, making it the cause for a potential pandemic. By predicting the disease, the consequences can be mitigated; so, the research explores various simple and ensemble classifiers, their performances are compared, and the one that is most efficient at predicting the avian influenza in Asian Countries is found. Facebook prophet is used to make prediction about the number of cases arising in future. The research also studies the impact of meteorological factors on the prediction of the disease. The results obtained from the research can be used as a tool to understand the causes for the aggravation of the avian influenza in future studies and thus, help mitigate its effects. Keywords Avian influenza · Ensemble classifiers · FB prophet · Machine learning

1 Introduction The Hong Kong SAR event in 1997 validated the pandemic potential of avian influenza (H5N1) and brought new insight into how a new pandemic virus could emerge. Previous to 1997, pigs were considered the mixing vessels for virus reassortment because of the fact that pigs possessed receptors on the cells of their respiratory tract, for each avian and human influenza viruses. However, from the Hong Kong SAR event, it could be concluded that since human beings could also be infected with the avian influenza virus, they too could serve as mixing vessels and act as carriers for the virus genes’ exchange. Through these findings, H5N1 was then seen to have pandemic potential. Highly pathogenic avian influenza (HPAI), formerly called fowl plague, is known to cause a broad set of symptoms ranging from being a mild illness to being highly

M. Shori (B) · K. Saroha Centre for Development of Advanced Computing, Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_19

253

254

M. Shori and K. Saroha

contagious, and once introduced into the domestic poultry population has the potential to spread widely through contaminated insentient objects present in the environment, leading to a wide and swift spread of the disease. Thus, domestic poultry is extremely vulnerable to avian influenza where the mortality rates can reach up to 100%. Some of the symptoms in the infected poultry may include nervous system disorders, sneezing, diarrhea, coughing, edema of the head and sudden death. Researchers have made efforts in controlling and eradication of the HPAI disease, but the repeated losses to poultry have been a continuous threat to human lives. The different factors that contribute to the spread of the disease can be used to determine the establishment and the impact of the avian influenza virus outbreaks in future. Research on many environmental conditions such as land cover, agricultural factors, trading activities, poultry population, and their farming has found these to be important factors causing the introduction and spread of the disease occurrence [1–5]. The research work presented in this paper tries to explore the use of various simple and ensemble classifiers to provide a predictive model for avian influenza disease and also compare their performance. The work also uses the additional attributes from poultry data, patient symptoms data, and weather data in the dataset that was obtained from past researches [6–18] so as to study their impact on the prediction of the avian influenza disease. Also, the feature selection method is used to gain an insight of the important features and verify which features contribute the most toward the prediction of the disease. Further, this work used the Facebook’s prophet method to forecast the trend (weekly, monthly, and yearly) of avian influenza disease with an aim to early detect the spread of the disease. The paper is organized in the following sections; Sect. 2 presents the literature survey related to the domain, and Sect. 3 explains the models and methodology proposed and implemented. Section 4 presents a discussion on the results obtained, and Sect. 5 gives the conclusion and future work.

2 Related Work This section presents a brief introduction along with the background into avian influenza disease. Avian influenza outbreaks can start anytime anywhere due to the migratory birds, so the prediction of this pandemic is very essential. The literature survey presents a discussion on the work related to the prediction of diseases. Pilla Srinivas et al. [19] applied the Bayesian classifier to predict the severity of patient condition due to Swine Flu, which is a type of influenza virus. The authors use 14 sample records of different suspected patients and use these sample reports for creating the dataset. The Naive Bayesian classifier was used, which does not need large quantities of data to build the training set, and so was able to produce a fast outcome leading to rapid identification of the disease, which enables early treatment. Singh et al. [18] used different machine learning models, namely random forest, gradient boosting machine, and support vector regression to predict influenza. The

Avian Influenza Prediction Using Machine Learning

255

authors improved their forecast by including meteorological factors like precipitation, temperature, and humidity, which resulted in improved accuracy for forecasting influenza. Yousefinaghani et al. [20] mapped, executed, and verified a decision support framework that predicted and monitored the occurrence of the avian flu events. The authors collected data from Twitter, preprocessed the data, and extracted the rules and facts from it to form a knowledge base. Using the knowledge base questions, the degree of risk at various geographical places were explored by the authors. The authors concluded that through their proposed framework, prediction of high pathogenic viruses could be done more accurately as compared to low pathogenic viruses. Kane et al. [21] analyzed the time series structure of outbreak intensity in case of H5N1 for the country of Egypt using ARIMA and random forest time series models. Biswas et al. [22] studied the contribution of climatic factors in Bangladesh, concerning the incidence of H5N1 outbreaks. The authors, in their study, have used the ARIMA and SARIMA models for obtaining the relation of outbreak occurrences due to particular meteorological factors like cloud cover and average rainfall. Painuli et al. [23] used existing COVID-19 data to generate an estimate of positive cases to occur in near future. The authors used several machine learning approaches and in their research discussed the ones that had the best accuracies with the aim to predict the possibility of being infected by the virus and forecast the number of positive cases as well. Venkatesh et al. [24] predicted the disease based on some given symptoms and created a system using machine learning algorithms, namely decision tree, support vector machine (SVM), random forest, KNN, and Naive Bayes classifier. The dataset, which is used by the authors, included records of several patients who were individually diagnosed with 41 different diseases. Chauhan et al. [6] analyzed the performance of various classification algorithms, namely XGBoost, logistic regression, decision tree, KNN, random forest, Naive Bayes, and SVM. The algorithms were applied to a dataset obtained from UCI to find the most accurate algorithm that could predict a patient’s chances of developing heart disease. Tapak et al. [25] discussed the prediction of influenza-like illnesses and compared the accuracies obtained by applying different machine learning approaches, namely random forest, artificial neural network (ANN), and SVM. The study was carried out using a dataset that included weekly influenza cases from Iran. Naiyar et al. [26] forecasted the number of negative cases or the positive cases of dengue outbreak based on seven attributes and seven machine learning algorithms. It was concluded that LogitBoost ensemble model was the topmost performance classifier technique that had reached a classification accuracy of 92%. Kalipe et al. [27] used meteorological data of malarial cases, in contemplation of examining and forecasting the occurrence of malaria. The authors used various classifiers, namely extreme gradient boost (XGBoost), random forest, SVM, logistic regression, Naive Bayes, KNN, and ANN. The authors concluded that meteorological data could be used to predict malarial outbreaks, which can result in the saving of

256

M. Shori and K. Saroha

lives lost due to the disease. XGBoost algorithm was found to be particularly efficient for their study. Agrawal et al. [28] used ensemble and simple classifiers to predict patients’ chances of liver, heart, and diabetes diseases. The dataset for the study was taken from the University of California, Irvine’s Website. The authors obtained performance for all 3 datasets by using the machine learning model. The best accuracy was obtained for the liver dataset. In addition, by reducing the number of features, the authors got somewhat reduced accuracies but were all well within the acceptable ranges. Taj et al. [29] discussed the most widely used machine learning (ML) and deep learning (DL) models for understanding COVID-19 behavior by investigating time series data. A model for examining and forecasting COVID-19 by regional distribution was proposed by the authors. Herrick et al. [7] used the random forests classifier to develop a predictive plot for depicting avian influenza at a global scale. The authors described predictors and environmental factors as a possibility for the spread and infection of H5N1 in wild birds. The authors have identified the highest risk of the outbreak to be the northern regions in their study. The previous research works discussed above, used different machine learning classifiers for identification, evaluation, and prediction of diseases. It is observed that the comparison of the simple and ensemble classifiers for the prediction of avian influenza disease, in particular, has not been done before. In addition, for several parts of the Asian region, the study for the identification and evaluation of H5N1 has not been carried out using data that includes data on common backyard poultry like ducks, geeses, chickens, etc. Since ducks, geese, and chickens come in direct contact with humans, it is important to include their population for the prediction. Comparison in accuracies obtained before and after preprocessing of data rendered different results in [30]. In addition, change in values of accuracies with the number of features being considered was carried out in [31]. Thus, this paper presents the research work that includes attributes/features on common backyard poultry like ducks, geeses, chickens, etc., along with weather data to study the impact on the prediction accuracy.

3 Materials and Method This section presents the proposed approach and methodology used as follows: Figure 1 illustrates the proposed approach. The methodology used is discussed below.

Avian Influenza Prediction Using Machine Learning

257

Fig. 1 Block diagram of the proposed approach

3.1 Data Collection The dataset was collected from past researches [6–15], and patient symptoms of the avian influenza disease as well as poultry data were taken as additional attributes in the research to study the impact on prediction. The target variable has two categories, such as patient dead or recovered. The data consisted of 90 rows (individual patient records) and 37 columns (symptoms). The features of the dataset are patient Id, species (species due to which the person was infected (chickens, ducks, geese etc.)), serotype, gender, age, cough, dyspnea, sputum, rhinorrhea, vomiting, pneumonia, diarrhea, rash, myalgia, conjunctivitis, fever, body temperature, Systolic_BP, Diastolic_BP, respiratory_rate (breaths/min), ARDS, days_between_exposure_to_poultry /infection_and_onset_of_illness, days_since_onset_of_illness, Hemoglobin (g/dL), Leukocyte_count (per_mm3), Lymphocyte_count (per_mm3), neutrophil_count (per_mm3), platlet_count (per_mm3), serum_creatinine (µmol/liter), serum_glucose, oxygen_saturation_duration_receipt_40%_oxygen, day_of_illness_on_which_PCR_for_H5N1_performed, viral_culture, exposure_to_poultry_or_patient, dead/recovered_on_day_number, antiviral_therapy, outcome.

258

M. Shori and K. Saroha

Also, the data was updated with the weather conditions of the location where the cases were reported. Five additional attributes of weather data were also included as follows: Weather Data: Temperature_at_2_Meters_ (C), Relative_Humidity (%), Wind_Speed (m/s), Cloud_Amount (%), Precipitation_Corrected (mm/day). The weather data was included to study its correlation with other attributes and its impact on the prediction accuracy.

3.2 Data Preprocessing The following steps are performed as part of preprocessing of data: • Data integration: It combines important data from various sources. Data for this research is obtained from past researches appended with additional attributes like patient symptoms of the avian influenza disease. This was done with an aim to analyze the impact on the performance of classifiers. The weather data included the following attributes: cloud cover, precipitation, temperature, wind speed, and relative humidity. The weather data is obtained from NASA power’s ’data access viewer’ weather data Website, while the patient’s symptoms were combined from earlier studies. • Data cleaning: In this step, noisy data, inconsistency, and missing values in data are handled. To clean up the dataset, erroneous records are deleted. Standard scalar and min–max scalar are two preprocessing approaches. • Data reduction: Filtering of the data occurs with only the data required for analysis being evaluated. Only the fields that were mandatory were used. For the dataset’s attributes, the multiclass variable and binary classification were introduced. The binary classification was used to specify if the patient had survived after being infected from the avian influenza disease or not. If the patient did not recover from the disease, the value was set to 1, indicating that the patient died; otherwise, it was set to 0, indicating that the patient recovered. Medical records are converted into diagnosis values as part of the preprocessing of data. The results of data preprocessing for 90 patient records revealed that 38 records had a value of 1, indicating that the patient had died from the disease, while the remaining had a value of 0, indicating that the patient had been recovered. Two qualities related to age and gender were chosen from the 42 variables in the dataset to identify the patient’s personal information. With the exception of weather data, the remaining qualities were deemed relevant because they contain vital clinical details. Clinical data is essential for diagnosing and determining the severity of influenza infection.

Avian Influenza Prediction Using Machine Learning

259

3.3 Training and Testing After preprocessing the dataset, the data was randomly split into training and testing sets. The training dataset was used to train the model, while the test dataset was used to evaluate the performance of the final model. Two train-test split ratios were used, namely 70:30 and 60:40. In addition, K-fold cross validation was used as well.

3.4 Classifier Application Simple and ensemble classifiers were used on the data thereafter. Several machine learning techniques were used in the study like logistic regression, decision tree, Knearest neighbor, random forest, extra tree, AdaBoost, gradient boosting, quadratic discriminant analysis (QDA), support vector machine, neural network, stochastic gradient descent (SGD), and Naïve Bayes. The implementation was done using all the above mentioned ML techniques, and initially, all 42 attributes were used.

3.5 Performance Evaluation This step was performed to evaluate the models’ performances. Accuracy was calculated using the formula, Accuracy = (TP + TN)/(TP + TN + FP + FN) where TP, TN, FP, and FN are the true positive, true negative, false positive, and false negative values, respectively. The accuracies obtained were compared, and the maximum accuracy obtained was noted for all three splits, i.e., 60:40, 70:30, and K-fold cross validation. Accuracy is the ratio of correctly predicted samples to the total number of samples.

3.6 Dimensionality Reduction The main purpose of this step was to increase computational efficiency and verify the change/improvement in the performance of the classifiers used. Also, using dimensionality reduction, the important features from the dataset, which are relevant and more useful, can be found. Feature selection was used to reduce the model complexity and eliminate the over fitting. The independent factors’ influence on the dependent variable, as well as the relationship between the dependent and independent variables was investigated using the analysis of variance (ANOVA) test.

260

M. Shori and K. Saroha

ANOVA test—ANOVA or analysis of variance and its main objective are to determine if two or more groups differ significantly in one or more attributes. The test was used to rank the features from most important to least important during the implementation. Once the features were ranked in order of importance from most to least, the features were eliminated from the dataset one by one, and the accuracy was noted. The maximum accuracy was obtained up to a different feature each time, which was different for the different classifiers under consideration. This feature reduction was also done for all the splits 60:40, 70:30, and K-fold cross validation.

3.7 Pandemic Prediction Using Facebook Prophet Prophet is a time series data forecasting technique that uses an additive model to match non-linear trends with seasonality effects on a yearly, monthly, and daily basis. It works well with time series that have a lot of seasonal variation and historical data from several seasons. Prophet is tolerant of missing data and trend alterations, and outliers are usually handled successfully. Prophet was used to forecast the trend of the pandemic based on the cases reported on particular dates in the past. For the meteorological dataset, the trend, weekly, and yearly components were generated individually and utilized to estimate the future number of cases of avian influenza disease. A separate dataset was used for the prediction of the future trend of the avian influenza disease. The dataset had the date of reporting of the avian influenza cases and the number of cases reported on the day. Using these 2 features, pandemic was predicted. The results are discussed in next section.

4 Results and Discussion Aim of this research is to predict whether a patient will develop avian influenza disease. This research was done using machine learning classification techniques, namely logistic regression, decision tree, Naïve Bayes, K-nearest neighbors, support vector machine, XGBoost, random forest, AdaBoost, extra tree, stochastic gradient descent, and quadratic discriminant analysis. 1. Data was collected, preprocessed, and divided into training and testing sets. The performance of various simple and ensemble classifier was compared, and the best performance with the maximum accuracy was noted. 2. The features were ranked using ANOVA test, and the features were reduced to study the variation in the performance of the classifiers.

Avian Influenza Prediction Using Machine Learning

261

The accuracy score results of different classification techniques were noted for training and test datasets.

4.1 Performance Validation with the Weather Conditions in Addition to Patient Symptoms Data 1. The data used before was updated with the weather conditions of the location where the cases were reported. The weather data included the following attributes: cloud cover, precipitation, temperature, wind speed, and relative humidity. 2. This was done to evaluate if there was a correlation between weather conditions and patient symptoms/data. 3. This updated dataset was again preprocessed and split into training and testing sets. The performance of various simple and ensemble classifier was compared, and the best performance with the maximum accuracy was noted. 4. The features were again ranked using ANOVA test, and the features were reduced to study the variation in the performance of the classifiers. The feature was ranked in the order of most to least important, serum creatinine being most important and species (poultry) being the least important.

4.2 Summary of Results The results of implementing the research using multiple machine learning classifiers and different train–test splits are summarized below. Percentage accuracy scores for different algorithms are depicted in Tables 1, 2, and 3 for the splits 60:40, 70:30, and K-fold cross validation, respectively. Comparison of accuracy score of avian influenza disease prediction in proposed model with the previous works is given in Table 4.

4.2.1

For Train–Test Split = 60:40

The ratio of the training and testing data was 60:40. Table 1 shows the results obtained for this split. The third column represents the results obtained without including weather data, and the fourth column represents the accuracies obtained when the same data was reduced. The 5th column represents the accuracies obtained after the inclusion of the weather data, and the 6th column shows the accuracies obtained when the same dataset was reduced. Random forest classifier was observed to perform the best with an accuracy of 88.88%.

262

M. Shori and K. Saroha

Table 1 Accuracies of classifiers for 60:40 train–test split S. no

Classifiers

60:40_without weather data/using ANOVA

60:40_without weather data_reduced

60:40_with weather data

60:40_with weather data_reduced

1

Logistic regression

83.33

77.77

77.77

77.77

2

Decision tree

83.33

80.55

80.55

75

3

Naive Bayes 75.00

75

72.22

75

4

KNN

69.44

75

69.44

69.44

5

SVC (rbf)

58.33

69.44

69.44

69.44

6

Random forest

86.11

88.88

77.77

88.88

7

Extra tree

80.56

72.22

75

72.22

8

AdaBoost

83.33

83.33

86.11

83.33

9

Gradient boosting

80.56

77.77

80.56

83.33

10

QDA

58.33

69.44

69.44

69.44

11

SGD

52.77

69.44

30.55

69.44

12

Neural network

58.33

52.77

72.22

83.33

4.2.2

For Train–Test Split = 70:30

The ratio of the training and testing data was 70:30. Table 2 shows the results obtained for this split. The third column represents the results obtained without including weather data, and the fourth column represents the accuracies obtained when the same data was reduced. The fifth column represents the accuracies obtained after the inclusion of the weather data, and the sixth column shows the accuracies obtained when the same dataset was reduced. Gradient boosting classifier was observed to perform the best with an accuracy of 96.30%. Figure 2 shows the graph between the classifiers (on y-axis) and accuracy scores (on x-axis) based on the results obtained, as shown in Table 2, where the dataset used does not include the weather data. Figure 3 depicts the graph for dataset that includes weather data. On comparing both figures, it is seen that classifiers have performed better using dataset that included weather data.

Avian Influenza Prediction Using Machine Learning

263

Table 2 Accuracies of classifiers for 70:30 train–test split S. no.

Classifiers

70:30_without weather data/using ANOVA

70:30_without weather data_reduced

70:30_with weather data

70:30_with weather data_reduced

1

Logistic regression

85.18

86.11

85.18

96.29

2

Decision tree

81.48

80.55

92.59

88.89

3

Naive Bayes 70.37

81.48

66.67

85.18

4

KNN

81.48

80.55

70.37

88.89

5

SVC (rbf)

62.96

70.37

62.96

70.37

6

Random forest

88.88

96.29

92.59

96.29

7

Extra tree

77.78

88.89

81.48

92.59

8

AdaBoost

92.59

88.89

85.18

92.59

9

Gradient boosting

88.89

92.59

88.89

96.30

10

QDA

81.48

72.00

62.96

88.89

11

SGD

62.96

86.11

81.48

88.89

12

Neural network

55.56

86.11

59.25

96.29

4.2.3

For K-Fold Cross Validation

Table 3 shows the results obtained using K-fold cross validation. The third column represents the results obtained without including weather data, and the fourth column represents the accuracies obtained when the same data was reduced. The fifth column represents the accuracies obtained after the inclusion of the weather data, and the sixth column shows the accuracies obtained when the same dataset was reduced. Decision tree classifier was observed to perform the best with an accuracy of 82.22%.

4.2.4

Comparison with the Previous Work

The results obtained were compared to the research done in the past, and for this, the accuracies obtained in the study were compared to the accuracies of each classifier. This was also done to find out if the enlarged dataset (with additional attributes) helps in obtaining a higher accuracy. Table 4 shows the comparison of the accuracies obtained during implementing of the approach and the accuracies obtained in the previous researches.

264

M. Shori and K. Saroha

Table 3 Accuracies of classifiers for K-fold cross validation S. no

Classifiers

Without weather data/using ANOVA

Without weather With weather data_reduced data

With weather data_reduced

1

Logistic regression

74.44

80.00

74.44

80.00

2

Decision tree 74.44

76.66

73.33

82.22

3

Naive Bayes

71.11

72.22

71.11

72.22

4

KNN

72.22

72.22

72.22

72.22

5

SVC (rbf)

67.78

67.78

66.67

67.78

6

Random forest

72.22

73.33

71.11

73.34

7

Extra tree

71.11

74.45

71.11

73.34

8

AdaBoost

68.89

76.66

68.88

78.88

9

Gradient boosting

72.22

76.66

72.22

75.55

10

QDA

60

64.45

55.56

67.78

11

SGD

74.46

72.22

55.56

72.22

12

Neural network

74.44

80.00

74.44

80.00

Gradient boosting classifier was observed to perform best with an accuracy of 96.30% Furthermore, K-nearest neighbors, extra tree, gradient boosting, and stochastic gradient descent classifiers have performed better in comparison with the previous studies.

4.3 Prediction/Forecast Using Facebook Prophet Further, in the research, prophet method was used for predicting the future trend of the avian influenza disease. A separate dataset was used for this purpose. The dataset had the date of reporting of the avian influenza cases and the number of cases reported on the day. Using these 2 features, pandemic was predicted. FB prophet adopts successive progression and avoids outliers during modeling and forecasting. It is capable of dealing with missing data as well as shifts in the trend. The prophet components plot, shown in Fig. 6, provides information on the model it has fit. The components plot depicts the model’s weekly and monthly components, which are represented by the curves in the figure. Figure 4 shows the graph between the numbers of cases reported and months they were reported on. From the graph, a spike in the cases in September 2016 can be

Avian Influenza Prediction Using Machine Learning

265

Table 4 Comparison with past researches S. no

Classifiers

Highest accuracy obtained in the proposed approach

Highest accuracy obtained in past researches

1

Logistic regression

96.29—reduced weather (70:30 split)

97

2

Decision tree

88.89—reduced weather (70:30 split)

96

3

Naive Bayes

85.18—reduced weather (70:30 split)

93

4

KNN

88.89—reduced weather (70:30 split)

74

5

SVM

70.37—reduced weather (70:30 split)

89.2

6

Random forest

96.29—reduced weather (70:30 split)

97

7

Extra tree

92.59—reduced weather (70:30 split)



8

AdaBoost

92.59—reduced weather (70:30 split)

96

9

Gradient boosting

96.30—reduced weather (70:30 split)

96

10

QDA

76.67—reduced weather with K-Fold



11

SGD

88.89—reduced weather (70:30 split)

45

12

Neural network

88.89—reduced weather (70:30 split)

88.9

Fig. 2 Comparison of accuracies for data excluding weather data

266

M. Shori and K. Saroha

Fig. 3 Comparison of accuracies for data including weather data

observed. Figure 5 depicts the prediction through prophet for the number of new cases for the next 6 years, where x-axis is ‘ds’ (date-time of reporting), and y-axis is ‘y’ (no. of cases) which is target. The black dots are the cases that have been originally reported and are in the dataset used. The blue line is the predicted number of cases, and the shaded blue area is the variation based on the upper and lower limit of the cases that could possibly occur in future. Figure 6 represents the weekly trend in the number of cases every day in Asia [32, 33] and indicates how the cases increase in the coming years. The weekly prediction in Fig. 6 shows a forecast of avian influenza weekly trend and shows a peak value on Wednesdays and least on Saturdays. The weekly trend in the number of cases per day in Asia is depicted in Fig. 6. According to forecasts, the number of cases in the coming months will rise. The blue line in the trend graph represents the prediction line, while the shaded blue area represents the upper and lower limits of the prediction. It is clear from the graph that trend has the potential to increase and be stepper than what has been observed previously. The week component tells us that most cases occurred on Wednesdays and the least on Tuesdays in India. Also, from the yearly graph, the month of July can be seen to have the highest number of cases, followed by September. Table 5 shows the evaluation metrics obtained for the 7–11 days for the predictions made by the prophet model. The trajectories of time series regularly change abruptly. By default, prophet recognizes these shifts and allows the trend to adjust accordingly.

Avian Influenza Prediction Using Machine Learning

Fig. 4 Per date statistics for new cases in Asia

Fig. 5 Prediction through prophet for number of new cases

267

268

M. Shori and K. Saroha

Fig. 6 Prophet component plot of the model

Table 5 Evaluation metrics for the prophet model Horizon (days)

MSE

RMSE

MAE

MDAPE

Coverage

0

7

2.514316e + 07

5014.295247

3985.907591

218.495282

1.0

1

8

2.906901e + 07

5291.568204

3985.907591

193.770371

0.993197

2

9

2.817328e + 07

5307.850346

3666.247266

128.846914

0.993197

3

10

3.053064e + 07

5525.453846

3971.829409

173.158610

0.993197

4

11

2.567750e + 07

5067.397557

3501.344428

187.851751

0.993197

5 Conclusion Identifying how raw healthcare data of avian disease information is processed would aid in the long-term saving of human lives and the early discovery of irregularities in conditions. Machine learning techniques were used in the implementation to process

Avian Influenza Prediction Using Machine Learning

269

raw data and provide a predictive model toward avian influenza disease. In the medical field, disease prediction is both difficult and critical. The mortality rate, on the other hand, can be significantly reduced. However, the mortality rate can be drastically controlled. Gradient boosting proved to be quite accurate with accuracy score of 96.30% for prediction of the disease. Furthermore, the feature selection method was utilized to gain a broader view of the important features in order to improve disease prediction performance. This research work compares the accuracy score of quadratic discriminant analysis, AdaBoost, neural network, K-nearest neighbor, decision tree, random forest, extra tree, support vector machine, stochastic gradient descent, logistic regression, gradient boosting and Naïve Bayes’ for predicting avian influenza disease. The inclusion of the poultry data, patient symptoms, and weather data has been an addition to what has been done in the past researches in an attempt to verify their contribution for the prediction of the avian influenza disease. The attributes having high correlation were found to be the species of ducks and chickens, and in weather, attributes were temperature (in °C), cloud amount, and relative humidity. It was found that the classifiers had a better performance in terms of accuracy for the reduced dataset (dimensionality reduction). Additionally, it was also analyzed that the proposed approach with the additional attributes of poultry data, patient symptoms, and weather data performed better with classifiers as compared to the previous work. To improve the prediction statistics, the future work of this research can be carried out with various combinations (hybrid) of machine learning algorithms. Also, other time series forecasting models can be used to better the forecasting of the avian influenza disease and predict the number of cases based on the previous number of reported cases.

References 1. World Health Organization (2005) Communicable diseases cluster, “Avian influenza: assessing the pandemic threat” 2. Subbalakshmi G, Ramesh K, Rao MC (2011) Decision support in heart disease prediction system using Naive Bayes. Indian J Comput Sci Eng (IJCSE) 2(2):170–176 3. Ibrahim N, Akhir NSM, Hassan FH (2017) Predictive analysis effectiveness in determining the epidemic disease infected area. AIP Conf Proc 1891(1):020064 4. Sajana T, Narasingarao MR (2018) An ensemble framework for classification of malaria disease. ARPN Journal of Engineering and Applied Sciences 13(9):3299–3307 5. To KK, Ng KH, Que TL, Chan JM, Tsang KY, Tsang AK, Chen H, Yuen KY (2012) Avian influenza A H5N1 virus: a continuous threat to humans. Emerg Microbes Infect 1:e25 6. ChauhanRaj H, NaikDaksh N, Halpati RA., Patel SJ, Prajapati AD (2020) Disease prediction using machine learning. Inter Res J Eng Tech 7(5) 7. Herrick KA, Huettmann F, Lindgren MA (2013) A global model of avian influenza prediction in wild birds: the importance of northern regions. Vet Res 44(1):1–9

270

M. Shori and K. Saroha

8. Yuen KY, Chan PK, Peiris M, Tsang DN, Que TL, Shortridge KF, Cheung PT, To WK, Ho ET, Sung R, Cheng AF (1998) Clinical features and rapid viral diagnosis of human disease associated with avian influenza A H5N1 virus. Lancet 351:467–471 9. Centers for Disease Control and Prevention (1997) Isolation of avian influenza A(H5N1) viruses from humans–Hong Kong, May-December 1997. Morb Mortal Wkly Rep 46(50):1204–1207 10. Sun Y, Liu J (2015) H9N2 influenza virus in China: a cause of concern. Protein Cell 6:18–25 11. Hien TT, Liem NT, Dung NT, San LT, Mai PP, Chau NVV, Farrar J (2004) Avian influenza A (H5N1) in 10 patients in Vietnam. New England J Med 350(12):1179–1188 12. Ungchusak K, Auewarakul P, Dowell SF, Kitphati R, Auwanit W, Puthavathana P, Uiprasertkul M, Boonnak K, Pittayawonganon C, Cox NJ, Zaki SR, Thawatsupha P, Chittaganpitch M, Khontong R, Simmerman JM, Chunsutthiwat S (2005) Probable person-to-person transmission of avian influenza A (H5N1). N Engl J Med 352:333–340. https://doi.org/10.1056/NEJMoa 044021 13. Bridges CB, Katz JM, Seto WH, Chan PKS, Tsang DNC, Ho W (2000) Risk of influenza A (H5N1) infection among health-care workers exposed to patients with influenza A (H5N1). Hong Kong J Infect Dis 181:344–348 14. Chen Y, Liang W, Yang S, Wu N, Gao H, Sheng J, Yao H, Wo J, Fang Q, Cui D, Li Y, Yao X, Zhang Y, Wu H, Zheng S, Diao H, Xia S, Chan KH, Tsoi HW, Teng JL, Song W, Wang P, Lau SY, Zheng M, Chan JF, To KK, Chen H, Li L, Yuen KY (2013) Human infections with the emerging avian influenza A H7N9 virus from wet market poultry: clinical analysis and characterisation of viral genome. Lancet 381:1916–1925 15. Chakraborty A, Rahman M, Hossain MJ, Khan SU, Haider MS, Sultana R, Ali Rimi N, Islam MS, Haider N, Islam A et al (2017) Mild respiratory illness among young children caused by highly pathogenic avian influenza A (H5N1) virus infection in Dhaka, Bangladesh, 2011. J Infect Dis, 216(suppl_4):S520-s528 16. Kandun IN, Wibisono H, Sedyaningsih ER, Yusharmen, Hadisoedarsuno W, Purba W, Santoso H, Septiawati C (2000) Three Indonesian clusters of H5N1 virus infection in 2005. N Engl J Med 355:2186–2194 17. Goyal V, Yadav A, Mukherjee R (2022) Performance evaluation of machine learning and deep learning models for temperature prediction in poultry farming. In: 2022 10th International Conference on Emerging Trends in Engineering and Technology-Signal and Information Processing (ICETET-SIP-22), pp. 1–6. IEEE 18. Singh DE, Marinescu MC, Carretero J, Delgado-Sanz C, Gomez-Barroso D, Larrauri A (2020) Evaluating the impact of the weather conditions on the influenza propagation. BMC Infect Dis 20(1):1–14. https://doi.org/10.1186/s12879-020-04977-w 19. Srinivas P, Bhattacharyya D, Midhunchakkaravarthy D (2020) An artificial intelligent based system for efficient Swine Flu prediction using Naive Bayesian classifier. Inter J Current Res Rev 12:134–139 20. Yousefinaghani S, Dara RA, Poljak Z, Sharif S (2020) A decision support framework for prediction of avian influenza. Sci rep 10(1):19011 21. Kane MJ, Price N, Scotch M, Rabinowitz P (2014) Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks. BMC Bioinform 15(1):276 22. Biswas PK, Islam MZ, Debnath NC, Yamage M (2014) Modeling and roles of meteorological factors in outbreaks of highly pathogenic avian influenza H5N1. PLoS ONE 9:e98471. https:// doi.org/10.1371/journal.pone.0098471 23. Painuli D, Mishra D, Bhardwaj S, Aggarwal M (2020) Forecast and prediction of COVID-19 using machine learning. Data Sci COVID-19. https://doi.org/10.1016/B978-0-12-824536-1. 00027-7 24. Venkatesh K, Dhyanesh K, Prathyusha M, Naveen Teja CH (2021) Identification of disease prediction based on symptoms using machine learning. JAC: A J Comp Theory 14(6) 25. Tapak L, Hamidi O, Fathian M, Karami M (2019) Comparative evaluation of time series models for predicting influenza outbreaks: application of influenza-like illness data from sentinel sites of healthcare centers in Iran. BMC Res Notes 12(1):353

Avian Influenza Prediction Using Machine Learning

271

26. Iqbal N, Islam M (2019) Machine learning for dengue outbreak prediction: A performance evaluation of different prominent classifiers. Informatica 43(3) https://doi.org/10.31449/inf. v43i3.1548 27. Kalipe G, Gautham V, Behera RK (2018) Predicting Malarial Outbreak using Machine Learning and Deep Learning Approach: A Review and Analysis. Int Conf Inf Technol (ICIT) 28. Agrawal A, Agrawal H, Mittal S, Sharma M (2018) Disease Prediction Using Machine Learning. SSRN Electron J 29. Taj RM, El Mouden ZA, Jakimi A, Hajar M (2020) Towards using recurrent neural networks for predicting influenza-like illness: case study of covid-19 in Morocco. International J 9(5) 30. Khan MA, Abidi WUH, Al Ghamdi MA, Almotiri SH, Saqib S, Alyas T, Mahmood N (2021) Forecast the influenza pandemic using machine learning. Comp Mat Cont 66(1):331–357 31. Bloom E, Wit W (2005) Potential economic impact of an avian flu pandemic on Asia. ERD Policy Brief. No. 42 32. Shi Y, Wu K, Zhang M (2022) COVID-19 pandemic trend prediction in America using ARIMA model. In: 2022 International Conference on Big Data, Information and Computer Network (BDICN), pp. 72–79. IEEE 33. Mishra SR, Mathur P, Gupta AK, Baag S, Nagwanshi KK, Tailor S, Verma A (2021) Statistical analysis on the COVID-19 infection spread in United State of America: a prophet forecasting model. In: 2021 Sixth International Conference on Image Information Processing (ICIIP), vol 6, pp 523–528. IEEE

Prediction and Comparison of Diabetes with Logistic Regression, Naïve Bayes, Random Forest, and Support Vector Machine Sarthak Choudhary , Abhineet Kumar , and Sakshi Choudhary

Abstract Diabetes, is a major metabolic disorder, caused by abnormally elevated blood glucose or sugar concentration. Increased glucose levels can cause high harm to the heart, kidney, eyes, and blood vessels. According to the WHO, over 422 million people suffer it. Every year, this disease causes more than 1.5 million deaths. It is rampant in low and middle-income countries. Modern machine learning (ML) techniques improve predictions and performance. This study focuses on ML classification algorithms in a diabetes dataset for reliably predicting diabetes using Python. Six ML methods are used, i.e., random forest, logistic regression, XG boost, support vector machines, Naive Bayes, and KNN. In contrast to other ML algorithms, random forest had the highest accuracy of 97.02%. Keywords Diabetes · Glucose prediction · K-nearest neighbors · Logistic regression · Machine learning · Naive Bayes · Random forest · SVM · XG boost

1 Introduction Public health is critical for keeping people healthy and preventing the spread of potentially harmful illnesses. Governments spend a significant portion of gross domestic product (GDP) for the benefit of the general populace. Vaccinations and other factors have increased human life expectancy. However, chronic and hereditary illnesses have been on the increase for a long time. Diabetes is one of the most widespread illnesses [1–3]. In modern clinical practice, a battery of tests is utilized to collect the information required to diagnose diabetes, followed by an assessment. In medicine, S. Choudhary · S. Choudhary SRM Institute of Science and Technology, Uttar Pradesh, Ghaziabad, India e-mail: [email protected] S. Choudhary e-mail: [email protected] A. Kumar (B) University of Delhi, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_20

273

274

S. Choudhary et al.

supervised and unsupervised machine learning approaches are utilized to identify a broad variety of issues within the health care industry. A number of chronic diseases ML approaches assist researchers in discovering hidden patterns in medical datasets so that they may anticipate what will happen and avoid medical errors. The expense of manually diagnosing illnesses that are difficult to explain [4, 5]. ML algorithms are trained using actual medical datasets including a wide range of information. If we detect it early enough, problems can be avoided. Diagnosis of diabetes at an early stage will help a patient adapt to lifestyle changes and take the necessary measures to counter the progression before the condition worsens. If not diagnosed in time, it could lead to severe complications in the lungs, kidney, heart, blood vessels, eyes, gums, and teeth. If diabetes becomes severe, it could even cause death. Therefore, screening for diabetes becomes imperative to take preventive measures with adequate medical treatment, and we will be able to avoid it. ML approaches that might aid in the detection of early indicators of sickness it teaches itself. To create an algorithm-based prediction system, ML methods are applied. Diabetes is among the most pressing issues in the contemporary world [6, 7]. In this project, the power of machine learning is leveraged to come up with the best model for prediction of the existence for diabetes in people based on several attributes [8]. Different classification algorithms—SVM, L-regression, KNN, random forest, Naive Bayes, gradient boost—have been used. Each of the algorithms has different accuracy scores, and, thereby, each one is compared with the other to determine the most accurate algorithm [9].

2 Dataset The dataset was taken from Data [1]. It contains information about 2039 patients. The characteristics that are being considered are diabetes pedigree function, age, outcome, skin thickness, blood pressure, insulin, pregnancies, glucose, and BMI. Data has no missing or null values. These attributes are collectively used to determine whether the person is affected by diabetes or not. 70% of the rows are taken under the training dataset, while the remaining 30% comprises the test dataset. In the outcome column, 1 denotes that the person has diabetes, while 0 represents vice versa. The total number of diabetic patients in the data was 711, while the non-diabetic was 1328.

3 Methodology The goal of our research is to come up with a high-accuracy, effective algorithm for predicting diabetes. The algorithmic method suggested for this project is shown in Fig. 1. First, data is entered, then it is cleaned up and looked at. During the analysis, similar pieces of data are linked together [1] to make a neural network. Then, this

Prediction and Comparison of Diabetes with Logistic Regression, Naïve …

275

Fig. 1 Proposed methodology

information is used for training and testing, and then, the performance is looked at [10]. The comparison matrix makes a confusion matrix, which is used to test the accuracy of different levels of categorization. Finally, we get an algorithm with the highest accuracy in predicting diabetes.

3.1 Logistic Regression Common statistical models include logit models, commonly known as classification or predictive analytics models. On the basis of a collection of independent variables, logistic regression determines the probability that a certain event, such as voting or not voting, would occur. It is reasonable to assume that the dependent variable can only increase or decrease to 1. In logistic regression, probabilities are converted using a logit transformation, which is the probability of success divided by the probability of failure. This logistic function is also known as the log odds or natural algorithm of odds, and it is expressed using the following formulas. 1 ( ) 1 + e−(a+bx) In this logistic regression equation, logit(p) is the dependent or response variable, while the x represents the independent variable in this equation. Estimation based

276

S. Choudhary et al.

on the maximum likelihood is the method that is often used in this model to determine the value of the beta parameter or coefficient (MLE). This approach involves going through a series of rounds in which several beta values are put to the test in an attempt to find the one that provides the best match for log odds. Logistic regression seeks to determine the ideal parameter estimate by maximizing the log likelihood function, which is the outcome of each iteration. Conditional probabilities for each observation may be calculated and recorded, and after locating the optimal coefficients, the predicted probability can be summed (or coefficients, if there is more than one independent variable). If the probability is less than or equal to .05, the answer is 0, and if it is more than or equal to .05, the answer is 1. Once the model has been constructed, goodness of fit, or how well the model predicts the dependent variable, is an essential factor to examine (Fig. 2).

Fig. 2 Dataset differentiation after logistic regression

Prediction and Comparison of Diabetes with Logistic Regression, Naïve …

277

3.2 SVM SVMs or supervised machine learning models may be used for problems involving two groups of subjects. An SVM model can identify new text after being given tagged training data. They are faster and perform better with less samples than neural networks (in the thousands). This makes the approach perfect for text categorization, as datasets typically comprise a few thousand tagged instances are used for classification, regression, and identifying outliers. These are common ML jobs. You may use them to discover cancerous cells from millions of pictures or to predict driving routes using a well-fitted regression model. There are specialized SVMs for machine learning, such as support vector regression (SVR) (SVC). These are essentially mathematical equations optimized for speed and accuracy’s differ from other classification algorithms because they optimize the distance between the nearest data points of all classes. SVMs use the maximum margin classifier or hyper plane as a decision boundary.

3.3 SVM Functioning A straight line is created using a linear SVM classifier. All of the data points that are located on first side of the line are representative of one category, while all of the data points that are located at other side are representative of a different category. This means there are unlimited options. The linear SVM algorithm picks the optimum line to categories your data points, unlike K-nearest neighbors. It chooses the furthest line from the closest data points. Machine learning terminology is shown in 2D. You have a data grid. You are splitting data items by category, but none should be in the wrong one. That means you are looking for the line between the two nearest points that preserves data point separation. The two closest points give the line’s support vectors. Decision limit is this line. A line is not required to divide decisions. The decision boundary may be located using more than two features; hence, it is called a hyperplane.

3.4 KNN KNN is a non-parametric, supervised learning classifier that uses group data point proximity. It is possible to use it for classification or regression, but more frequently than not, it is employed as a classification strategy that assumes points with similar characteristics are located close to one another. For categorization issues, the label most often linked with a particular data point is used. In literature, “majority vote” is more often used than “plurality voting.” “Majority voting” requires a majority of more than 50%, which is only applicable to two categories.

278

S. Choudhary et al.

In the KNN method, the number of neighbors that will be looked at to classify a particular query point is set by the k parameter. If k = 1, for example, it will be put in the same class as the object that is closest to it. Different values of k could cause overfitting or underfitting, so defining it is a matter of striking a balance. When the value of k is low, the bias can be high, but the variance can be low. When the value of k is high, the bias can be high, but the variance can be low. The input data will have a big impact on the choice of k. Data with more outliers or noise will likely do better with larger k numbers. Overall, it is best to have an odd number for k to avoid classification ties. Cross-validation techniques may help you figure out the right k for your dataset.

3.5 Random Forest One of the most well-known techniques for supervised machine learning is called random forest. It is possible to use it in machine learning for classification as well as regression work. It is on concept of ensemble learning, which refers to the procedure of combining a number of different classifiers in order to address a challenging problem and improve the model’s overall performance. It is an approach in classification that, as the name implies, includes multiple decision trees with different subsets in the supplied dataset. It further uses the average for improvement in the projected accuracy of that dataset. A classifier that includes decision trees is random forest. The random forest algorithm does not rely on a single decision tree but instead takes into account the prediction made by each tree in the network. It then determines the final output based on the prediction that received the most votes. The more trees there are in the forest, the higher the accuracy will be, and the more likely it will be that there will not be any overfitting. Random forest is capable of solving issues involving both classification and regression. i. It is able to deal with enormous datasets that have a high dimensionality. ii. It increases the accuracy of the model and prevents it from being overfit (Fig. 3).

3.6 Naïve Bayes The Naive Bayes method uses supervised learning to figure out how to sort things. The theorem of Bayes is used. Its main use is classifying text, which needs a big set of training data. The Naive Bayes classifier is easy to understand and works well. It helps build fast models that can make predictions using machine learning. Because it is a probabilistic classifier, its predictions are also based on odds. The Naive Bayes algorithm is often used to filter spam, analyze how people feel about something, and sort publications. “Naive” means to think that one trait has nothing to do with anything else. Because of this, it is “Naive.” If you judge a fruit by its color, shape, and taste, an apple is red, round, and sweet. Each feature on its own is enough to tell

Prediction and Comparison of Diabetes with Logistic Regression, Naïve …

279

Fig. 3 Neural connections for multi-attributes in random forest

that something is [11], without the need for the others. Bayes’ theorem is the basis for the Bayes technique. Bayes Theorem P( A|B) =

P( B|A)P(A) P(B)

where P(A|B) = A probability for B is true. P(B|A) = B probability for A is true. P(A), P(B) = A and B independent probability.

3.7 Gradient Boost One of the most effective algorithms that may be used for machine learning is referred to as the gradient boosting approach. We are aware that bias error and variance error are the two forms of errors that occur the most often in machine learning systems. The level of the model’s bias error may be minimized to the greatest extent feasible by using the boosting technique known as gradient boosting. The base estimator for the gradient boosting process cannot be provided, in contrast to the AdaBoosting approach, which provides this information. In the case of gradient boost, the default

280

S. Choudhary et al.

Fig. 4 Box plot for outlier visualization

base estimator that is used is decision stump. It is possible to tweak the n estimator of the gradient boosting approach in the same manner that one would tune the AdaBoost estimator. If we do not provide a number for n in this method, the value will automatically be set to 100 [12]. Gradient boosting may be used to create predictions for both continuous and categorical target variables when used in the role of a regressor (as a classifier). The cost function in regression is referred to as mean square error (MSE for short). The cost function in classification is referred to as log loss (loss logistic regression cost). To boost a gradient, you need to do the following three things: i. A way to make a loss function work better. ii. A person who cannot see what is coming. iii. By adding weak learners to the model, the loss function can be made as small as possible (Fig. 4).

4 Conclusion The medical word for diabetes is diabetes mellitus (DM), while the common name for the condition is diabetes. It is a term used to describe a group of metabolic illnesses that may be identified by abnormally high concentrations of sugar in the blood. Our hypothesis was tested using a system that is based on machine learning. The random forest technique outperformed the other machine learning algorithms used by this system, including logistic regression, support vector machines, KNN, random forests, and Naive Bayes. The results of our inquiry indicate that the proposed combination was successful in achieving an accuracy of 97.029% and a ROC curve of 95.69% for the dataset that was used. In addition to that, comparative research was conducted, and the following are the findings of that study (Table 1 and Fig. 5).

Prediction and Comparison of Diabetes with Logistic Regression, Naïve …

281

Table 1 ROC and accuracy table of different algorithm S. no

Method applied

Accuracy

ROC curve

1

Logistic regression

78.61

71.06

2

Support vector machine

79.20

71.99

3

K-nearest neighbor

85.74

82.17

4

Random forest

97.02

95.69

5

Naïve Bayes

78.01

73.23

Fig. 5 Correlation matrix of multiple attribute

5 Results The end result of our project is a Boolean. The result of zero is taken out of the main dataset. We look at the result of the diabetes prediction and judge how accurate it is, how long it takes to calculate, how well it classifies, and how often it gets it wrong. Random forest is our last algorithm for making predictions. By comparing test data to real data, we were able to prove that our project was correct. The accuracy of random forest was 97.02 for normal curves and 95.69 for AOC curves (Figs. 6, 7, and 8).

282

Fig. 6 Accuracy graph

Fig. 7 ROC accuracy graph

Fig. 8 Result of new data in random forest model

S. Choudhary et al.

Prediction and Comparison of Diabetes with Logistic Regression, Naïve …

283

References 1. Muhammad LJ, Algehyne EA, Usman SS (2020) Predictive supervised machine learning models for diabetes mellitus. SN Comp Sci 1(5):240. https://doi.org/10.1007/s42979-020-002 50-8 2. Kharroubi AT, Darwish HM (2015) Diabetes mellitus: the epidemic of the century. World J Diabetes 6(6):850–867. https://doi.org/10.4239/wjd.v6.i6.850 3. Butt UM, Letchmunan S, Ali M, Hassan FH, Baqir A, Sherazi HHR (2021) Machine learning based diabetes classification and prediction for healthcare applications. J Health Eng. https:// doi.org/10.1155/2021/9930985 4. Rani KJ (2020) Diabetes prediction using machine learning. International J Sci Res Comp Sci Eng Info Tech, 294–305. https://doi.org/10.32628/CSEIT206463 5. Thyde DN, Mohebbi A, Bengtsson H, Jensen ML, Mørup M (2020) Machine learning-based adherence detection of type 2 diabetes patients on once-daily basal insulin injections. J Diabetes Sci Technol 15:98–108. https://doi.org/10.1177/1932296820912411 6. Xue J, Min F, Ma F (2020) Research on diabetes prediction method based on machine learning. J Phy: Conference Series. IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/ 1684/1/012062 7. Kurasawa H, Hayashi K, Fujino A, Takasugi K, Haga T, Waki K, Noguchi T, Ohe K (2016) Machine-learning-based prediction of a missed scheduled clinical appointment by patients with diabetes. J Diabetes Sci Technol 10:730–736. https://doi.org/10.1177/1932296815614866 8. Sneha N, Gangil T (2019) Analysis of diabetes mellitus for early prediction using optimal features selection. J Big Data 6. https://doi.org/10.1186/s40537-019-0175-6 9. Sahoo J, Dash M, Pati A (2020) Diabetes prediction using machine learning classification algorithms. Int Res J Eng Tech (IRJET) 7(8), e-ISSN: 2395-0056 10. Shreya C, Manjula S (2021) Diabetic prediction using machine learning techniques. JETIR 8(7), July 11. Nahar A, Lala A, Sharma S, Professor A, Diabetes prediction using machine learning 12. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H (2018) Predicting diabetes mellitus with machine learning techniques. Front Genetics 9. https://doi.org/10.3389/fgene.2018.00515

Bone Cancer Detection Using Deep Learning Mansoor Habib Mazumder and Maheshwari Prasad Singh

Abstract There have been many registered cases of bone cancer or tumor across the world with fatal cases involving in deaths too due to bone cancer. Thus, detecting them at the early stages becomes necessary which can save many lives. In today’s world, tumor is detected using MRI scans, and the process involved is costly and tedious. In the era of machine learning and artificial intelligence, detecting the tumors is still a tedious process, and it is very costly for setting up to train networks. There are various segmentation techniques, and convolutional neural networks can be used using X-ray images to detect the tumors at the early stages. If the tumor is detected and classified at the correct time, then without any hesitance, further complication can be prevented with right medications given at the required time. In this paper, we presented a modified architecture using convolutional neural network and inception modules to classify the stages of the bone cancers. The proposed model is trained on a dataset provided by ‘Stanford ML’ group which is available publicly for research and achieved accuracy of 92.68% on the testing dataset. Keywords Bone cancer · Convolutional neural networks · Inception module · X-ray · Segmentation

1 Introduction All over the world, a major health care challenge is musculoskeletal disorders which eventually lead to bone cancers and tumors. Musculoskeletal disorders involves a variety of bones, soft tissues, and joint abnormalities which adversely affects the quality of life of the patients. Thus, detecting the bone cancers becomes an utmost necessity at the early stages. Deep learning approach helps the radiologists diagnose the bone cancers at faster rates, and with better accuracy, as it is considered, there is M. H. Mazumder (B) · M. P. Singh National Institute of Technology, 800005, Patna, Bihar, India e-mail: [email protected] M. P. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_21

285

286

M. H. Mazumder and M. P. Singh

always five to eight human error rate involved. This can prove to be revolutionary in the field of clinical imaging data which is done by using manual methods in today’s world. There have been many registered cases of bone cancer or tumor across the world with fatal cases involving in deaths too due to bone cancer. Thus, detecting them at the early stages becomes necessary which can save many lives. In today’s world, tumor is detected using MRI scans, and the process involved is costly and tedious. Detecting the tumors is still a tedious process, and it is very costly for setting up to train networks. There are various segmentation techniques and convolutional neural networks that can be used using X-ray images to detect the tumors at the early stages. If the tumor is detected and classified at the correct time, then without any hesitance, further complication can be prevented with right medications given at the required time. Previously, the authors emphasized on the challenges that the machine learning algorithms faced due to the complex nature of clinical imaging data. They put forwarded the approaches needed to meet the data needs of a machine learning algorithm, properly reviewed de-identification of images, produce and enhance the quality of clinical imaging data, suitable architectures for computations and federated methods of learning. We presented a modified architecture using convolutional neural network and inception modules to classify the stages of the bone cancers. We have trained our model on a dataset provided by ‘Stanford ML’ group which is available publicly for research and achieved accuracy of 92.68% on the testing dataset. Sample images taken from the dataset are shown in Figs. 1 and 2, respectively. An overview of related work is given in Sect. 2. Section 3 describes the proposed solution in detail. Section 4 provides the dataset and the experimental results. Section 5 provides the conclusion and future scope of the paper. Fig. 1 X-ray scan of a patient with no tumor in shoulder

Bone Cancer Detection Using Deep Learning

287

Fig. 2 X-ray scan of a patient with tumor in shoulder

2 Related Work Langlotz et al. [1] emphasized on the challenges that the machine learning algorithms faced due to the complex nature of clinical imaging data. They put forwarded the approaches needed to meet the data needs of a machine learning algorithm, properly reviewed de-identification of images, produce and enhance the quality of clinical imaging data, suitable architectures for computations and federated methods of learning. The authors mainly focused on how to impart clinical imaging data with machine learning techniques. Ebsin et al. [2] proposed a fully automated system to detect wrist fractures in post anterior and lateral radiographs. They have used random forest regression voting constrained local model (RFCLM) for locating the outline of the bones in radiographs. Also, CNN is used and trained on cropped patches of PA and LAT view rather than raw images. They have used five convolution and max pooling layers and two fully connected layers, having trained two CNNs one for each view their outputs are combined by averaging. They have attained average performance of AUC as 95% for PA view and 93% for LAT view and 96% from both views combined on a wrist dataset containing 1010 pairs of wrist radiographs. They have trained the CNN on cropped radiographic patches. This can lead to less regions for feature extraction. The trained model might give inaccurate results for larger datasets. Varma M et al. [3] have compared the performance of DenseNet-161, ResNet-101, and RestNet50. DenseNet model has outperformed ResNet models all of which were pre-trained on ImageNet dataset. The authors introduced gradient-based class activation maps to display regions of abnormalities. Then, they augmented the ImageNet with the MURA dataset and trained on the 161- layer DenseNet again which resulted

288

M. H. Mazumder and M. P. Singh

Table 1 Comparison of literature review Name of author

Algorithm used

Dataset

Acc (%)

Ebsim et al.

RFCLM, 5 layered CNN network

Wrist dataset containing 1010 images

0.96

Varma et al.

DenseNet-161, ResNet-101, ResNet-50, and GradCAM

ImageNet and MURA containing 50,000 images

0.87

Rajpurkar et al.

ChexNext CNN algorithm

Chest X-ray 8 that consists of 420 images

0.88

Yan et al.

Weakly supervised deep learning algorithm

Chest X-ray 14 that consists of 30,805 studies

0.83

in better AUCROC values, highest being 0.87 on a training dataset of 5000 samples. In this paper, performance of already prevalent neural network structures has been compared. The prevalent structures showed variance in performance when different datasets were augmented, and sizes of datasets were shrinked. GradCAM worked well for visualizing abnormality in lower extremities, but for dense body pixels with complex bone structures, it might highlight incorrect regions. Rajpurkar et al. [4] created a CNN network called ChexNext to detect several diseases such as penumona, effusion, and pulmonary masses. They have used chest X-ray8 dataset that consists of 420 images. They have attained average value of AUC as 0.88. However, only frontal radiographs were used and thus can cause inaccuracy in results. Yan et al. [5] have proposed a weak supervised deep learning framework that has squeeze and excitation blocks, transfer of multi-maps, and maximum and minimum pooling. They have used chest X-ray 14 dataset that has over 30,805 studies and have attained an average AUC score of 0.83. All the above details are shown in Table 1. MacKinnon et al. [6] determined how well deep convolutional neural networks (CNNs) pre-trained on non-medical pictures may be employed for automated fracture identification on plain radiographs using transfer learning. Ee Lim et al. [7] describe a method for identifying femur and radius fractures by integrating multiple detection methods. For fracture identification, these approaches extract several types of characteristics. They include the neck-shaft angle, which is extracted especially for femur fracture identification, as well as Gabor texture, Markov random field texture, and intensity gradient, which are general features that may be used to detect fractures in a variety of bones. In Linder [8], random forest regression-voting was employed by the authors to swiftly generate high-quality response images. A regressor is used to cast votes for the ideal position of each point, rather than employing a generative or discriminative model to evaluate each pixel. When used in the constrained local model framework, they demonstrated that this results in rapid and accurate shape model matching. Thiagaraja [9] provided a fully automatic method for properly segmenting the proximal femur. A worldwide search using a detector yields a number of possible sites. Each model point is then refined using a statistical shape model in conjunction with local detectors. Olczak [10], the authors analyzed 256,000 radiographs of the wrist, hand, and ankle from Danderyd’s Hospital and classified

Bone Cancer Detection Using Deep Learning

289

them into four categories: fracture, laterality, body component, and exam view. They then chose five open-source deep learning networks to modify for these images. Rajpurkar [11] introduced MURA, a huge collection of musculoskeletal radiographs with 40,561 pictures from 14,863 investigations that are manually categorized as normal or abnormal by radiologists. Russakovsky et al. [12] described the design of the benchmark dataset and the gains in object recognition. They have discussed the difficulties of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of large-scale image classification and object detection, and compare computer vision accuracy to human accuracy. All the architectures mentioned above are denser in nature and involve higher number of computations. From the current problem of how machine learning techniques are used to detect bone cancers, we have defined the problem statement in two phases. In first phase, minimize the number of computations in the network. In second phase, maximize the accuracy involved after setting up the network.

3 Proposed Solution 3.1 Architecture Firstly, input size of 50 X 50 is taken and passed to the first layer in the network. The first layer is 2D convolution layer which has 96 kernels with feedback 11 × 11. Strides taken are 4, and activation function used is ‘ReLU’. The second layer is 2D max pooling with filter size as 3 × 3 with strides taken as 2. After max pooling, inception module with 1 × 1 convolutions is performed. After that, normalization of the network is done. The output layers do not have a constrained bounded range, so normalization is used to limit the increase in values of output layer as they can grow high because of the unbounded nature of the activation function. The next layer is 2D convolution layer which has 256 kernels and filter size is 5 × 5, and activation function used is ‘ReLU’. The next layer is again 2D max pooling with filter size as 3 × 3 with strides taken as 2. Again, inception module of 1 × 1 convolution is performed after max pooling. After that, normalization is again performed on the network. The next two layers are again 2D convolution layers with kernel size as 384 and filter size 3 × 3. The activation function used is ‘ReLU’ in both of them. The next layer is again a 2D convolution layer with kernel size 256 and filter size 3 × 3, and activation function used is ‘ReLU’ again. We then again have a 2D max pooling layer with filter size as 3 × 3 and strides taken as 2. Then, again normalization of the network is done. Then, we have a fully connected layer which has 4096 number of neurons, and activation function used is ‘tanh’, with dropout rate considered as 50% so that the network does not get over trained. We then again have a fully connected layer which has 4096 number of neurons, and activation function used is ‘tanh’, with dropout rate considered as 50%. The last layer is again a fully connected layer

290

M. H. Mazumder and M. P. Singh

Fig. 3 Proposed architecture

with 14 neurons, and activation function is softmax. After that regression is used to compile the model with learning rate as 0.001 with optimizer as ‘momentum’, and loss function is categorical cross entropy. The proposed architecture is shown below in Fig. 3.

3.2 Steps Involved in the CNN Algorithm for Pre-Processing the Input The following steps are involved to extract the features of the bone so that the efficiency of the input image and its resolution can be increased to a certain extent. Processing of the image for its classification is shown in Fig. 4. The steps involved in the process are (i) image pre-processing, (ii) selecting the features, (iii) extracting the features, (iv) classification and staging. The steps are shown in below Fig. 5. In order to extract the cancerous part from the x-ray image of the bone, pre-processing techniques such as removal of noise, enhancement of image, filtering, and histogram equalization are used. To help the process of diagnosis, pre-processing of image removes the redundant feature in the images while scanning. It also makes sure it does

Bone Cancer Detection Using Deep Learning

291

not affect the detailing of the image. Histogram equalization increases the contrast of the input image. Whereas, median filtering is used to remove the effect of noise or glare probably caused by insufficient lighting during the capturing of the image. Median filtering will change the low frequency image value with a median pixel value. The image features will be highlighted in the layers of the proposed network architecture which will be used while segmenting the images. Two types of pixels will be found with difference in density distribution: (i). Denser body structures will have higher pixel values. (ii). The remaining body parts will have comparatively lower density pixels. Optimal threshold value will be set as the segmentation threshold based on the pixel values to separate the denser and non-denser pixels. Once preprocessing and segmentation of the image are done, the pixels of the affected regions will be selected in feature selection so that the pixels get highlighted and are ready for the extraction of the features. After extraction, all the values of pixels will be send for classification. The classifier will then classify based on the cell growth whether it is a cancerous bone or non-cancerous bone.

Fig. 4 Processing the image for its classification

Fig. 5 Steps involved in the process

292 Table 2 Total number of studies in the dataset of different bone type

M. H. Mazumder and M. P. Singh Bone type

No. of studies

Elbow

1912

Finger

2110

Hand

2185

Humerus

727

Forearm

1010

Shoulder

3015

Wrist

3697

4 Result and Discussion 4.1 Dataset The dataset is called MURA dataset which consists of musculoskeletal radiographs. It has over 40,561 images that are taken from 14,863 studies. Each study is being manually labeled as either normal or abnormal by the radiologists. Study of Elbow consists of total 1912 total number of studies, where 1186 are normal studies, and 726 are abnormal. Finger consists of total 2110 total number of studies, where 1372 are normal studies, and 738 are abnormal. Hand consists of total 2185 total number of studies, where 1598 are normal studies, and 587 are abnormal. Humerus consists of total 1912 total number of studies, where 1186 are normal studies, and 726 are abnormal. Forearm consists of total 1010 total number of studies, where 659 are normal studies, and 351 are abnormal. Shoulder consists of total 3015 total number of studies, where 1463 are normal studies, and 1552 are abnormal. Wrist consists of total 3697 total number of studies, where 2174 are normal studies, and 1423 are abnormal. The dataset details are given in Table 2.

4.2 Experimental Results In our experiments, we have chosen many metrics like F1-score, recall, precision, support to evaluate the model. Mathematically, accuracy is written as follows: Accuracy =

tn + t p f n + tn + f p + t p

(1)

where tn = true negative, tp = true positive, fn = false negative, tn = true negative, fp = false positive, tp = true positive.

Bone Cancer Detection Using Deep Learning

293

Table 3 Various algorithms and accuracy attained Algorithm used

Dataset

Accuracy

RFCLM, 5 layered CNN network

Wrist dataset containing 1010 images

0.96

DenseNet-161, ResNet-101, ResNet-50 and GradCAM

ImageNet and MURA containing 50,000 images

0.87

ChexNext CNN algorithm

Chest X-ray 8 that consists of 420 images

0.88

Weakly supervised deep learning algorithm

Chest X-ray 14 that consists of 30,805 studies

0.83

Modified CNN algorithm with inception modules

MURA dataset consisting of 40,005 X-ray images

0.92

Table 4 Parameters and their values

Parameter

Value

Optimizer

Momentum

Learning rate

0.001

Epochs

100

Training accuracy

92.68%

Loss

0.021

Testing accuracy

92.57%

F1-score

0.90

The ratio of data that are classified correctly to entire data is called as accuracy. For training, we used 70% of images, 15% of them were used for validation purpose, and remaining 15% for testing. Optimal accuracy was reached in 100 epochs. We achieved 92.68% training accuracy and 92.57% test accuracy with a loss of 0.021. The experiments were performed on a 6 GB RAM system, and the model was trained on Anaconda3. The accuracies of the various algorithms used are shown in the Table 3. We have used optimizer momentum to reduce the overall loss. All the parameters that are used and their values are shown in Table 4. The four performance metrics used are precision, recall, F1-score, and support and are shown in Table 5. The training and loss graph are shown in Fig. 6. The training and validation accuracy are shown in Fig. 7.

4.3 Discussion In this study, we have demonstrated a modified architecture using convolutional neural networks and inception modules to classify the stages of the bone cancer. We have explored the usage of convolutional neural networks and successfully identified a range of abnormality from multiple types of radiographs. Convolutional neural

294

M. H. Mazumder and M. P. Singh

Table 5 Four performance metrics 0

Precision

Recall

F1-score

0.91

0.94

0.93

Support 558

1

0.94

0.87

0.91

403

2

0.90

0.95

0.93

574

3

0.94

0.88

0.91

398

4

0.91

0.91

0.91

216

5

0.90

0.86

0.88

143

6

0.94

0.97

0.95

773

7

0.91

0.85

0.88

283

8

0.82

0.93

0.87

132

9

0.93

0.82

0.87

137

10

0.93

0.89

0.91

758

11

0.90

0.93

0.91

799

12

0.92

0.93

0.93

1050

13

0.92

0.81

0.91

776

Fig. 6 Graph showing the loss during training and validation

networks with various model architectures have been developed over the years, and in order to achieve higher accuracy rates, the networks become deeper and computationally very expensive. Our findings have a number of significant clinical ramifications. Firstly, the model created by our training procedure is not constrained to

Bone Cancer Detection Using Deep Learning

295

Fig. 7 Graph showing the accuracy achieved during training and validation

a specific body part. Due to this, it is more applicable to real-world clinical workflow paradigms and may be used to prioritize radiographs of the lower extremities to increase operational effectiveness. This might thus make it possible for patients who are more seriously ill to receive diagnoses more quickly. Furthermore, the model quickly identifies examinations that had initially been termed as normal, giving the radiologist additional time to focus on anomalous and complex situations. Rapidly, interpreting the normal exams for the ordering doctor could possibly help with more patients being seen at once. There are a few restrictions on our study. Since this was a study using data from a single institution, it is unknown how well our models will perform at other institutions with different imaging technology, methods, and patient populations. In conclusion, our research has shown that deep learning models have the capacity to detect anomalies in radiographs of the lower extremities at levels of efficiency that are useful for clinical applications. These methods may eventually provide quick, automated triage of individuals with musculoskeletal problems with additional preclinical evaluation.

5 Conclusion and Future Work In this paper, we classified the different stages of bone cancer using seven predefined classes. We have trained the proposed model using our modified architecture involving convolution neural network layers and inception modules. Due to this, our network would be trained using less number of computations. Our model was tested on various performance metrics like precision, recall, F1-score, and support. From

296

M. H. Mazumder and M. P. Singh

the results, by changing the number of layers, kernel size, filters, the proposed model achieved accuracy up to 92.68%. The work can be further expanded to creating such architectures which involve even less number of computations and achieve even higher accuracy scores.

References 1. Langlotz CP, Allen B, Erickson BJ, Kalpathy-Cramer J, Bigelow K, Cook TS, Flanders AE, Lungren MP, Mendelson DS, Rudie JD et al (2018) A roadmap for foundational research on artificial intelligence in medical imaging: from the 2018 nih/rsna/acr/the academy workshop. Radiology 291(3):781–791 2. Ebsim R, Naqvi J, Cootes TF (2018) Automatic detection of wrist fracturesfrom posteroanterior and lateral radiographs: a deep learning-based approach. In: International workshop on computational methods and clinical applications in musculoskeletal imaging, pp 114–125. Springer 3. Varma M, Mandy L, Gardner R, Dunnmon J, Khandwala N, Rajpurkar P, Long J, Beaulieu C, Shpanskaya K, Fei-Fei L et al (2019) Automated abnormality detection in lower extremity radiographs using deep learning. Nature Machine Intell 1(12):578–583 4. Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz CP et al (2018) Deep learning for chest radiograph diagnosis: a retrospective comparison of the chexnext algorithm to practicing radiologists. PLoS medicine 15(11):e1002686 5. Yan C, Yao J, Li R, Xu Z, Huang J (2018) Weakly superviseddeep learning for thoracic disease classification and localization on chest x-rays. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, pp 103–110 6. Kim DH, MacKinnon T (2018) Artificial intelligence in fracture detection: transfer learningfrom deep convolutional neural networks. Clin Radiol 73(5):439–445 7. Lim SE, Xing Y, Chen Y, Leow WK, Howe TS, Png MA (2004) Detection of femur and radius fractures in x-ray images. In: Proc. 2nd Int. Conf. on Advances in Medical Signal and Info. Proc, vol 65 8. Lindner C, Bromiley PA, Ionita MC, Cootes TF (2014) Robust and accurateshape model matching using random forest regression-voting. IEEE Trans Pattern Anal Mach Intell 37(9):1862–1874 9. Lindner C, Thiagarajah S, Wilkinson JM, Wallis GA, Cootes TF, arcOGEN Consortium et al (2013) Fully automatic segmentation of the proximal femur using random forest regression voting. IEEE Trans Med Imaging 32(8):1462–1472 10. Olczak J, Fahlberg N, Maki A, Razavian AS, Jilert A, Stark A, Sköldenberg O, Gordon M (2017) Artificial intelligence for analyzing orthopedic¨ trauma radiographs: deep learning algorithms—are they on par with humans for diagnosing fractures? Acta orthopaedica 88(6):581–586 11. Rajpurkar P, Irvin J, Bagul A, Ding D, Duan T, Mehta H, Yang B, Zhu K, Laird D, Ball RL et al (2017) Mura: large dataset for abnormality detection in musculoskeletal radiographs. arXiv preprint arXiv:1712.06957 12. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

Energy and Buffer Size-Based Routing Protocol for Internet of Things Tariq Ahamed Ahanger, Chatti Subbalakshmi, and M. V. Narayana

Abstract The IoT is a developing technology that allows devices to communicate with one another without the need for human intervention. It is becoming more popular. Maximizing the lifespan of the network is a huge battle for the IoT. Routing is necessary for transferring data between nodes. The improper route selection causes the early battery drain in IoT network. In this paper, we propose an energy-aware routing protocol (E-RPL) for IoT. To determine the optimal parent node for data exchange, composite routing measures, namely residual energy (RER) and buffer size (BS), are used. The simulation is carried out with the help of the COOJA. The efficacy of E-RPL is contrasted with RPL and EL-RPL. Keywords Constrained devices · Energy efficiency · Internet of things · Routing protocol · Optimal path

1 Introduction IoT is a modern paradigm that is one form of advanced wireless communication technology. The general idea of IoT is to establish the communication between the physical objects at any time, any place and anywhere without human support. The physical objects are attached with resource constrained devices, namely sensors, RFID, smart phones, actuators, etc. [1, 2]. The general architecture of IoT is categorized into four types, namely three layers, four layers, five layers, and seven layers, respectively. The important core part of IoT is sensing part, network part, and application part. The sensing part contains sensors which are connected in the battery powered resources that able to generate the sensor data. The network part contains T. A. Ahanger College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia e-mail: [email protected] C. Subbalakshmi (B) · M. V. Narayana Department of Computer Science & Engineering, Guru Nanak Institutions Technical Campus, Hyderabad, Telangana State, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_22

297

298

T. A. Ahanger et al.

the wireless devices, namely Wi-Fi, ZigBee, Bluetooth, that able to collect the sensor data wirelessly, and it is stored in local database or cloud storage. Finally, the application layer contains the mobile app or Web app that can show the data to the user. IoT uses include smart parking, smart forestry, smart healthcare [3, 4]. In LLN, routing plays an essential role to move the data from one place another. RPL is a well-established and standardized routing protocol in IoT. Recently, several researchers have focused on RPL-based single routing metrics and RPL-based multiple routing metrics [5–14]. The purpose of this paper is to propose for the use of a multiple routing metricadopted RPL protocol in order to extend the life of the network. This paper proposes an E-RPL for IoT. It considers the RER and BS characteristics when deciding which parent to employ for data exchange. Organizing the paper as follows: The background work of the research is addressed in Sect. 2. The E-RPL protocol is proposed in Sect. 3. The result and discussions are given in Sect. 4. Finally, Sect. 5 has the conclusion.

2 Related Work Here, we examine the different composite routing metrics based RPL. It provides the early convergence, loop freeness, and optimality. Bouzebiba and Lehsaini [15] proposed new protocol for Internet of medical of things (IMoT). It proposed a free bandwidth based RPL (FreeBW-RPL) protocol for IMoT. The FreeBW-RPL selects the maximum available bandwidth of node for exchange the data. In this comparison, the effectiveness of the FreeBW-RPL is compared to that of the RPL. The proposed protocol offers better efficiency than the RPL. However, the FreeBW-RPL is not considered the residual energy as routing parameter for selecting the best parent to the data transfer. Altwassi et al. [16] proposed BCA-RPL protocol in IoT. This method is taken into consideration as a load balancing and congestion avoidance strategy in order to choose the most appropriate node for the data exchange process. The simulation is carried out, and BCA-RPL is contrasted with RPL. The proposed BCA-RPL offers superior results over network life and latency. However, it is suitable and provides the better efficiency for certain extent. Vaziri and Haghighat [17] proposed Brad-RPL for IoT. The Brad-RPL has conducted in two ways. First one is that it is considered the metrics, namely delay, RER, ETX for picking optimal for exchanging the data. Also, the second one is that it is considered the routing metric as node traffic intensity to prevent the congestion among the network nodes. These two objective functions are executing alternatively based on the network condition. The efficacy of BCA-RPL is contrasted with RPL. The Brad-RPL offers superior effectiveness to the RPL. Haque et al. [18] suggested an energy efficient and effective route in IoT. The improvement of the network will be measured in terms of energy savings. In this work, it is used different routing metrics namely, RER, ETX, delay, and RSSI to

Energy and Buffer Size-Based Routing Protocol for Internet of Things

299

evaluate the network performance. The Contiki Cooja simulator is a tool to perform the simulation. It is compared with different metrics performances and also discussed the same. Safara et al. [19] proposed PriEnergy-RPL protocol for IoT. In the DODAG, the time slot is utilized to transport data between the nodes. Actually, this protocol is suitable for multimedia data transmission network. The multimedia data are audio and image data. The efficiency of PriEnergy-RPL is contrasted to the QRPL. The PriEnergy-RPL offers superior performance than QRPL. Hoghooghi and Javindan [20] proposed a new method for enhancing the mobility features in RPL. For the purpose of identifying the most appropriate parent for data exchange on a mobility platform, the routing metrics ETX, RSSI, and RER are recommended as part of the routing protocol. The proposed protocol provides the superior performance than RPL and mRPL.

3 The Proposed E-RPL Protocol In this paper, we propose E-RPL protocol for IoT. The E-RPL makes use of the RER and BS routing measures in order to determine if it is a viable parent for data exchange. In E-RPL, the rank calculation plays a major role to establish a route in the network. The rank is determined based on the rank of the parent, and the value of the rank increases. A. Residual Energy (RER) It displays the amount of energy that is currently accessible in the nodes. It is estimated based on the amount of energy that is now accessible from the original energy. The RER can be calculated by RER = Available Energy/Initial Energy

(1)

B. Buffer Size (BS) It is one of the routing metric that indicates the available memory in each node, and it is given in Eq. (2). BS = Available Buffer size/Total Buffer Size

(2)

C. Objective Function It is shown via the objective function (OF) how far the participant is away from the DODAG root. The OF is computed using Eq. (3). OF(RER, BS) = w1 ∗ RER + w2 ∗ BS

(3)

where w1 and w2 are weight values. The weight values are adjusted on various times. Finally, the weight of w1 and w2 is 0.5 and 0.5, respectively.

300

T. A. Ahanger et al.

D. Rank Calculation The rank shows the distance of root and participant. Eqs. (4) and (5) describe the rank process. Rank(n) = parantRank(n) + RI

(4)

RI = OF(RER, BS) + MHRI

(5)

The algorithm 1 illustrates the process of E-RPL parent selection. Algorithm1: E-RPL Parent Selection Input: Set of nodes N Output: Optimal parent node 1: Calculate the node’s RER using Eq. (1) RER = AvailableEnergy/Initial 2: Calculate the node’s BS using Eq. (2) B S = Avai l abl eBu f f er si ze/ T ot al Bu f f er Si ze 3: Generate the objective function using Eq. (3) OF(RER, BS) = w1 ∗ RER + w2 ∗ BS 4: Computing the rank using Eq. (4) and Eq. (5) Rank(n) = parantRank(n) + R RI = OF(RER, BS) + MHRI Return optimal node

4 Result and Discussions The simulation is intended to examine the outcomes of E-RPL to that of RPL in order to assess its effectiveness. Thirty network nodes and one root node have been taken into consideration in this simulation. There is a one-packet-per-minute transmission rate for this simulation. The total network surface area is 400 * 400 m2 in size. We have taken into consideration the node type Tmote sky. Table 1 shows the simulation setting and parameter. Table 1 Simulation setting and parameter

Parameter

Value

Number of nodes

30

Network area

400*400 m2

Simulation duration

1h

OS

Contiki 3.0

Type of node

Tmote sky

Full battery

1J

Energy and Buffer Size-Based Routing Protocol for Internet of Things

301

Fig. 1 Various RPL protocol versus average number of parent changes

4.1 Number of Parent Changes It expresses the reliability of the network between the nodes on the network. Figure 1 shows the various RPL protocols for different parent change values. It is noted that E-RPL has less parent adjustments than other routing protocols. It is due to consideration of RER and BI metrics for parent selection during the route establishment.

4.2 Average Energy Consumption The quantity of energy used by the different network nodes is shown in Fig. 2. In comparison with EL-RPL and RPL, E-RPL consumes much less energy. This is owing to the fact that the RER and BIS indicators were taken into account throughout the route’s construction. Fig. 2 Remaining energy for different network size

302

T. A. Ahanger et al.

Fig. 3 Packet loss ratio for different network size

4.3 Average Packet Loss Ratio Figure 3 illustrates the packet loss ratio (PLR) for various network sizes. For a network with a capacity of 30, The PLR values in RPL, EL-RPL, and E-RPL are reported to be 2 percent, 1.5 percent, and 1 percent, respectively, based on the RPL model. It is noticed that the PLR is rising as the network node grows.

4.4 Average End-To-End Delay Figure 4 illustrates the delay for a given hop count. In RPL, EL-RPL, and E-RPL, it has been observed that the latency is 1300 ms, 1000 ms, and 900 ms, respectively, for a hop size of six. This is owing to the fact that the RER and BIS indicators were taken into account throughout the route’s construction. Fig. 4 Number of hops versus end-to-end delays

Energy and Buffer Size-Based Routing Protocol for Internet of Things

303

5 Conclusion Maximizing network longevity is hard in IoT. To transmit data from one location to another, routing is critical. The improper route selection causes the early battery drain in IoT network. This paper proposed an E-RPL for IoT. The parameters RER and BS are taken into consideration while selecting the most suited parent node for data exchange. E-RPL is evaluated in comparison with EL-RPL and RPL. The E-RPL increases network lifespan by 12% and packet distribution by 5%.

References 1. Zhao R, Wang X, Xia J, Fan L (2020) Deep reinforcement learning based mobile edge computing for intelligent internet of things. Phys Commun 43:101184 2. Sankar S, Srinivasan P (2020) Enhancing the mobility support in internet of things. Int J Fuzzy Syst Appl (IJFSA) 9(4):1–20 3. Malik PK, Sharma R, Singh R, Gehlot A, Satapathy SC, Alnumay WS, Pelusi D, Ghosh U, Nayak J (2021) Industrial internet of things and its applications in industry 4.0: state of the art. Comput Commun 166:125–139 4. Sennan S, Balasubramaniyam S, Luhach AK, Ramasubbareddy S, Chilamkurti N, Nam Y (2019) Energy and delay aware data aggregation in routing protocol for internet of things. Sensors 19(24):5486 5. Sankar S, Srinivasan P (2019) Fuzzy sets based cluster routing protocol for internet of things. Int J Fuzzy Syst Appl (IJFSA) 8(3):70–93 6. Sadeeq MA, Zeebaree S (2021) Energy management for internet of things via distributed systems. J Appl Sci Technol Trends 2(02):59–71 7. Stoyanova M, Nikoloudakis Y, Panagiotakis S, Pallis E, Markakis EK (2020) A survey on the internet of things (IoT) forensics: challenges, approaches, and open issues. IEEE Commun Surv Tutorials 22(2):1191–1221 8. Sankar S, Srinivasan P (2018) Multi-layer cluster based energy aware routing protocol for internet of things. Cybern Inf Technol 18(3):75–92 9. Muzammal SM, Murugesan RK, Jhanjhi NZ (2020) A comprehensive review on secure routing in internet of things: mitigation methods and trust-based approaches. IEEE Int Things J 10. Sankar S, Srinivasan P (2017) Composite metric based energy efficient routing protocol for internet of things. Int J Intell Eng Syst 10(5):278–286 11. Gali S, Venkatram N (2022) Cluster-based multi-context trust-aware routing for internet of things. In: Expert clouds and applications, pp 477–492. Springer, Singapore 12. Almusaylim ZA, Alhumam A, Jhanjhi NZ (2020) Proposing a secure RPL based internet of things routing protocol: a review. Ad Hoc Netw 101:102096 13. Sadrishojaei M, Navimipour NJ, Reshadi M, Hosseinzadeh M (2021) Clustered routing method in the internet of things using a moth-flame optimization algorithm. Int J Commun Syst 34(16):e4964 14. Sadrishojaei M, Navimipour NJ, Reshadi M, Hosseinzadeh M (2021) A new clustering-based routing method in the mobile internet of things using a krill herd algorithm. Cluster Comput, pp 1–11 15. Bouzebiba H, Lehsaini M (2020) FreeBW-RPL: a new rpl protocol objective function for internet of multimedia things. Wirel Pers Commun, pp 1–21 16. Altwassi HS, Pervez Z, Dahal K (2019) A burst and congestion-aware routing metric for RPL protocol in IoT network. In: 2019 13th international conference on software, knowledge, information management and applications (SKIMA), pp 1–6. IEEE

304

T. A. Ahanger et al.

17. Vaziri BB, ToroghiHaghighat A (2020) Brad-OF: an enhanced energy-aware method for parent selection and congestion avoidance in RPL protocol. Wirel Pers Commun, pp 1–30 18. Haque KF, Abdelgawad A, Yanambaka P, Yelamarthi K (2020) An energy-efficient and reliable RPL for IoT 19. Safara F, Souri A, Baker T, Al Ridhawi I, Aloqaily M (2020) PriNergy: a priority-based energyefficient routing method for IoT systems. J Supercomput, pp 1–18 20. Hoghooghi S, Javidan R (2020) Proposing a new method for improving RPL to support mobility in the Internet of things. IET Netw 9(2):48–55

V-Shaped Binary Version of Whale Optimization Algorithm for Feature Selection Problem S. Hameetha Begum, C. Balasubramanyam, J. T. Thirukrishna, and G. Manoj

Abstract Feature selection, often known as FS, is often considered to be a component of the larger issue of global optimization. FS is being used to optimize and improve the quality of huge datasets. This is accomplished by selecting prominent features and minimizing duplicate data in order to provide satisfactory classification performance. Feature selection is an approach that is taken with the goal of reducing the complexity of the classification process while simultaneously improving its level of precision. This approach is significant in a variety of fields, including data mining, data processing, and pattern classification. The primary objective here is to devise a more accurate subset of all of the data that takes into account the relevant sample. In order to solve this issue, the BWO-V method, which is short for the binary form of the whale optimization (WO) technique, has been introduced. In order to transform the findings into binary, the BWO-V makes use of a function called the hyperbolic tan function. Validation of the BWO-V algorithm’s performance is carried out on five datasets taken from the repository at UCI. The quantitative and qualitative results both show that using BWO-V helps limit the number of features picked while also optimizing the classification accuracy with significantly less effort. Keywords Binary whale optimization (BWO) algorithm · Classification · Feature selection · Whale optimization (WO) algorithm

S. H. Begum (B) Department of Computing, Muscat College, Muscat, Sultanate of Oman e-mail: [email protected] C. Balasubramanyam · J. T. Thirukrishna Department of Computer Science and Engineering, Guru Nanak Institute of Technology, Hyderabad, Telangana, India G. Manoj Department of Information Science and Engineering, Dayananda Sagar Academy of Technology and Management Bangalore, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_23

305

306

S. H. Begum et al.

1 Introduction The databases are high dimensional and comprise an obsolete or massive amount of information from practical systems such as business or medicine. Such kinds of databases then have pointless information, which influences the productivity of machine learning techniques; the learning is impacted in some instances. The selection of features is an effective whirring strategy used to find the significant set of attributes, overcome the problem of reducing high dimensionality [1], classify the essential components, and delete unnecessary features [2]. In addition, each machine learning technique can be used for classification who used a set of attributes. Numerous studies considered the possibility that the problem of feature selection might actually be an optimization process. As a result, the objective functions for the optimization technique were changed to focus on the accuracy of the classifier, which is something that can be improved by the features that were specified [3]. There are three basic categories of FS techniques, which include wrapper methods, filter methods, and hybrid methods. Wrapper approaches, in general, are those that incorporate classification strategies in order to check for the pertinent qualities and select the best [4, 5]. The term “wrapper method” is typically used to refer to classification accuracy. Because the wrapper method performs better than the other strategies, it was chosen to be utilized in this work. Binary feature selection issues may be tackled using a variety of approaches, including support vector machines (SVM), artificial neural networks (ANN), decision trees, K-nearest neighbor (KNN), and discriminant analysis. On the other hand, typical optimization algorithms have a limited number of stumbling blocks to overcome when trying to solve the problem of feature selection [6, 7]. For the time being, metaheuristic algorithms like the Ant Lion Optimizer (ALO), the Gray Wolf Optimizer (GWO), the Moth–Flame Optimization, the Whale Optimization (WO) Algorithm, the Crowd Search Algorithm, the Lightning Search Algorithm, and the Lévy Flight Schemes are commonly used in the research community for the purpose of solving dynamic optimization problems [8–10]. In addition to the conventional discoveries, a large number of search strategies, together referred to as binary optimization approaches, have previously been researched in order to gain answers to questions regarding the FS optimization issue. The binary cuckoo search, binary bat algorithm, and binary flower pollination algorithm are some examples, and most of these techniques assess the validity as an objective function of the classification model. A binary differential evolution technique is proposed by the authors discussed to pick the required subset to train the SVM. The binary GWO and binary ALO were suggested. On the other hand, the salp swarm algorithm has been applied for the feature selection of the psychoactive chemical activities. There is also a discussion of a binary form of the particle swarm optimization.

V-Shaped Binary Version of Whale Optimization Algorithm for Feature …

307

There is no technique which can address all optimization issues, as the No Free Lunch theorem says. Therefore, if an algorithm demonstrates good productivity in a problem class, it does not demonstrate the same success on other subjects. This is the impetus of the presented research, wherein the authors propose a new binary version called BWO-V of the whale optimization (WO) algorithm. Therefore, the proposed algorithm’s strengths against the traditional methods are demonstrated by the two aspects as follows. (i) BWO-V indicates not only the reduction of characteristics but also the choice of specific characteristics, and (ii) BWO-V uses the wrapper search strategy to pick essential features, and therefore the concept of such rules depends primarily on the more accuracy of classification irrespective of a wide set of relevant features. The purpose of the wrapper approach has been utilized, and as a result, correct information on the feature has been supplied. This was done in order to maintain an adequate balance between exploration and exploitation. Therefore, meaningful search functionality is achieved by BWO-V, which permits the selection of a limited range of characteristics as a subset of its most significant features. The remaining parts of the paper are organized in the following way: In Sect. 2, the WOA is discussed in detail right away. The binary version of the WO method, also known as BWO-V, is presented in Sect. 3 for the purpose of feature selection and the findings of BWO-experiments V’s are detailed. In the Sect. 4 we concluded the work as well as a discussion of prospective future work. Traditional Whale Optimization (WO) Algorithm The whale optimization (WO) method, which was created based on the behaviors of whales, was presented for the first time. The fascinating action of the humpback whale is known to be the unique hunting process. This strategy for hunting is called bubble-net feeding. The latest best candidate’s response is set in the traditional WO algorithm near the target prey. The latter whales are going to change their approach toward its best. Numerically, the WO algorithm imitates social actions as follows. | | → Y→ ' (t) − Y→ (t)|| → i = || B. D

(1)

' →D →i Y→ (t + 1) = Y→ (t) − A.

(2)

' where the current iteration is denoted as t, Y→ (t))denotes the best position vector, ' Y→ (t)) denotes the current position vector, and B→ and A→ denote the vector coefficients vectors and is given as follows.

A→ = 2.→ a .→ r1 − a→

(3)

308

S. H. Begum et al.

B→ = 2.→ r2

(4)

where (r 1 ) and (r 2 ) are the random vectors that can take on any value between [0, 1], and the value decreases linearly from 2 to 0 in the expression: The algorithm consists of two steps, which are referred to as exploration and exploitation, respectively. During the exploration phase, the search agents are moved about to look in a variety of different solution space areas. However, during the exploitation phase, the search agents travel to different locations to improve the existing solutions there. The stage of exploitation is divided into two phases, such as the shrinking encircling method and the spiral updating phase, during which the position of the agent is updated by employing equation Eq. 5. Y→ (t + 1) =

{

' →D → i , p < 0.5 Y→ (t) − A. bl → → Y (t) + Di .e . cos 2πl, p ≥ 0.5 '

(5)

where l is a random value that may take on any value between −1 and 1, b has a value that is always the same, and p is a random integer that is always distributed in the same way. The following model assumes that during the exploration stage, the value of A is arbitrarily chosen within the range [−1, 1] so that the population can migrate a significant distance away from its current position. | | → Y→rand (t) − Y→ (t)|| → i = || B. D

(6)

→D →i Y→ (t + 1) = Y→rand (t) − A.

(7)

2 Binary Version of WO Algorithm Humpbacks travel within the continuous solution space in the traditional WO algorithm to change the locations, and it is called the continuous domain. The approaches are indeed, restricted to only {0, 1} values to address the FS problems. The continuous answer has to be converted into their corresponding binary replies before feature selection problems can be solved.

V-Shaped Binary Version of Whale Optimization Algorithm for Feature …

309

Algorithm 1: Pseudocode of BWO-V Algorithm Input: The population size, Np and maximum number of iterations is ITmax Output: Best position of the search agent Initialize the values for Np and a Calculate While t < ITmax do for Each Search Agent do Calculate p, a, l, A, and B if p < 0.5 then if (|A| < 1) then Update the population position using Eq. 2 Else (|A| ≥ 1) Select Yrand Update the population position using Eq. 7 end if if p ≥ 0.5 then Update the population position using Eq. 5 end if Update using Eq. 8 end for if any best solution Update t=t+1 end while

The binary iteration of the WO algorithm is put into action so that the objectives may be determined and effective outcomes can be obtained. Using the V-shaped function, the transformation is executed by first adding unique transfer functions to each dimension. Transfer functions suggest the possibility that the position vectors will be altered from 0 to 1, which will force the populations to shift in a binary space [11–13]. This may be thought of as a probability distribution. The BWO-V technique’s pseudocode is shown in Algorithm 1, which is the first of the algorithms. The hyperbolic tan function is used to update the position of the search agent in the V-shaped variant of the algorithm. The hyperbolic transfer function’s features are seen in Fig. 1, which may be found here. The following is a mathematical representation of a V-shaped transfer function, which illustrates a common type of transfer function. | | h k = |tanhx k | { X id

=

( ) seldt , if rand < S X ik (t + 1) orgdt , otherwise

(8)

(9)

If N is the total number of features for a feature vector size, then the variation number should be 2 N. This is a huge number of features; therefore, lengthy searching will be required to find all of them. Within such a context, the BWO-V algorithm has been put to use in an adaptive feature space search, which allows for the generation of

310

S. H. Begum et al.

Fig. 1 V-shaped transfer function

optimum features. The precision of the classification can be improved, along with the number of minimal characteristics that are chosen, in order to provide this variation. Equation (10) demonstrates the objective function that will decide the positions of each search agents. | | |C − R | | | F = αγ R (D) + β | C |

(10)

where the objective function is denoted as F, the length of a subset of the selected features is noted as R, the total number of features is C, αγ_R (D) denotes the accuracy of the classification, the symmetrical arguments to the length of the subset are denoted as α and β, and the classification efficiency is calculated as β = 1−α and α ∈ [0,1]. After that, it arrives at the goal function, which provides the level of classification accuracy that is wanted. Equation (10) may be recast as a minimization problem by taking into account the characteristics that have been chosen in addition to the classification error rate as follows: | | |R| (11) F = α E R (D) + β || || C where E_R (D) refers to the error rate of the classification. The KNN classifier would be used in this paper to ensure that the target features seem to be the most significant. Nevertheless, the BWO-V algorithm is the search technique that attempts to expand the space of the feature to optimize the feature assessment, as given in Eq. (11).

V-Shaped Binary Version of Whale Optimization Algorithm for Feature …

311

3 Results and Discussions A group of traditional approaches, GWO, ALO, and traditional WO algorithm are compared with the proposed BWO-V algorithm. The control parameter of various algorithms is listed in Table 1. The simulation results are carried out by taking five datasets from the UCI repository to have a valid assessment. The six databases from the UCI repository used in the simulations are listed in Table 2. The databases were chosen to represent different types of problems in various situations. The entities are split randomly into three separate subgroups in each database, specifically, training, testing, and validation. In the laboratory tests, KNN is used, and the best value of K is equal to five. In the meantime, through the learning process, each whale position generates one feature subset. The test dataset is used throughout the optimization procedure to review and validate the KNN classifier’s output in the validation subclass. Then, the BWO-V algorithm is implemented to direct the feature selection process at the same time. The results of 10 separate iterations of the BWO-V method that was suggested to solve the FS problem are outlined in Table 3, which can be seen here. The BWO-V algorithm has an average accuracy that is greater than 95% for the first four datasets, and the proposed BWO-V algorithm also achieves the best fitness function value, which is zero for the Colon and Lung datasets. Both of these results can be attributed to the algorithm’s performance. In addition, the chosen characteristics are fewer in Table 1 Control parameters of all algorithms

Table 2 Datasets utilized for the validation

S. No

Parameters

Value

1

Search agents, N p

10

2

Maximum number of iterations, IT max

100

3

Dimension, dim

Based on the features

4

Total number of runs

10

5

Argument, α

0.99

6

Argument, β

0.01

S. No

Name of dataset

Total number of features

1

Leukemia

7129

72

96

4026

Total number of samples

2

Lymphoma

3

Colon

2000

62

4

Lung

203

3312

5

ORL

400

1024

6

Yale

165

1024

312

S. H. Begum et al.

Table 3 Results attained by the proposed BWO-V algorithm S. No

Dataset

Average accuracy

Best fitness

Selected features

Computation time (Seconds)

1

Lymphoma

95.7368

0.0526

405

6.58

2

Colon

100

0

484

5.54

3

Lung

99.25

0

1096

7.12

4

ORL

96.25

0.0375

304

4.56

5

Yale

75.7576

0.2424

278

3.88

number across the board for the datasets. The suggested BWO-V algorithm requires a less amount of processing time compared to the datasets that were chosen. The suggested BWO-V algorithm’s performance is evaluated and contrasted with that of GWO, WO, and ALO. According to the findings shown in Table 4, the BWOV algorithm has an overall accuracy that is superior to that of any other algorithm applied to any and all datasets. The best outcome is shown by the letter in bold across all of the tables. Table 5 contains a listing of the selected properties that, on average, were produced using the suggested BWO-V as well as alternative approaches. On every dataset, the results obtained by the BWO-V method are superior to those obtained by GWO, WO, and ALO. When the results in Tables 4 and 5 are compared, it is possible to see a considerable difference in both the accuracy and the characteristics that were picked. Additionally, it has been found that the suggested BWO-V method is more effective in selecting fewer characteristics while maintaining a high level of classification accuracy. Tables 6, 7, and 8 include a listing of the statistical measures for each of the datasets. These measurements include the best fitness, the worst fitness, and the standard deviation (SD) that was achieved after doing numerous iterations of each method. It has been demonstrated that the suggested BWO-V algorithm performs much better when considering all of the statistical data. In addition to the statistical data, an explanation of how the suggested BWO-V algorithm converges is given to the readers so that they may have a better knowledge of the material (Fig. 2). The debate has made it abundantly evident that the suggested algorithm is superior in terms of its ability to handle the feature selection challenge.

4 Conclusion and Future Scope In this study, the binary form of the standard whale optimization, which has been given the term BWO-V, is provided in an effort to solve the problem of feature selection. When the WO algorithm is converted into its binary form, V-shaped conversion functions are the tools that are used. In order to evaluate the efficacy of the BWO-V

Colon

Lung

ORL

Yale

2

4

5

Lymphoma

1

3

Dataset

S. No

69.697

87.5

97.25

91.6667

84.2105

Grey Wolf Optimizer

66.6667

82.5

97.5

100

89.4737

Whale Optimizer

Table 4 Overall average rate of correct classifications achieved by all algorithms

63.6364

86.25

97.75

91.6667

73.6842

Ant Lion Optimizer

75.7576

96.25

99.25

100

94.7368

Binary Whale Optimizer- V Shaped

V-Shaped Binary Version of Whale Optimization Algorithm for Feature … 313

Colon

Lung

ORL

Yale

2

3

4

5

Dataset

Lymphoma

S. No

1

240

359

1620

993

1964

Grey Wolf Optimizer

Table 5 Average selected features of all algorithms Whale Optimizer

278

1024

1671

29

2004

Ant Lion Optimizer

467

1024

1478

484

1604

225

304

1096

414

405

Binary Whale Optimizer-V Shaped

314 S. H. Begum et al.

V-Shaped Binary Version of Whale Optimization Algorithm for Feature …

315

Table 6 Best fitness function of all algorithms S. No

Dataset

GWO

WO

ALO

BWO-V

1

Lymphoma

0.1579

0.1053

0.2632

0.0526

2

Colon

0.0833

0

0.0833

0

3

Lung

0.0250

0.0250

0.0250

0

4

ORL

0.1250

0.1750

0.1375

0.0375

5

Yale

0.3030

0.3333

0.3636

0.2424

Table 7 Worst fitness function of all algorithms S. No

Dataset

GWO

WO

ALO

BWO-V

1

Lymphoma

0.1579

0.1053

0.2632

0.1579

2

Colon

0.0833

0.4167

0.2500

0.0833

3

Lung

0.0569

0.0758

0.0500

0.0250

4

ORL

0.1875

0.2500

0.2125

0.1125

5

Yale

0.3939

0.3636

0.4545

0.3333

Table 8 SD of fitness function of all algorithms S. No

Dataset

GWO

WO

ALO

BWO-V

1

Lymphoma

4.1843e-16

0

2.7895e-16

0.0406

2

Colon

0

0.1897

0.0186

0.0315

3

Lung

0.0099

0.047

0.0082

0.0090

4

ORL

0.0120

0.0099

0.0181

0.0240

5

Yale

0.0374

0.0052

0.0242

0.0250

method, the simulations make use of five benchmark datasets retrieved from the UCI repository. These datasets are used to test a variety of aspects of the comparison algorithms. The outcomes of the tests revealed that the BWO-V algorithm, which was suggested, obtained superior results when compared to the GWO, the WO, and the ALO. In addition, the findings demonstrated that the BWO-V method required the least amount of time for computing and utilized the fewest number of well chosen characteristics while yet achieving the best classification precision. The BWO-V technique that was just introduced can, in addition to being used for the feature selection problem, be applied to a variety of other optimization problems.

316

S. H. Begum et al.

Fig. 2 Convergence curve of the BWO-V method that was proposed for each of the selected datasets

References 1. Neggaz N, Houssein EH, Hussain K (2020) An efficient henry gas solubility optimization for feature selection. Expert Syst Appl 152:113364 2. Yi JH, Deb S, Dong J, Alavi AH, Wang GG (2018) An improved NSGA-III algorithm with adaptive mutation operator for big data optimization problems. Future Gener Comput Syst 88:571–585 3. Sayed SAF, Nabil E, Badr A (2016) A binary clonal flower pollination algorithm for feature selection. Pattern Recogn Lett vol 7, 21–27

V-Shaped Binary Version of Whale Optimization Algorithm for Feature …

317

4. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:491–502 5. Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: Proc. of the science and information conference, London, UK, pp 372–378 6. Kumar KS, Suganthi N, Muppidi S, Kumar BS (2022) FSPBO-DQN: SeGAN based segmentation and fractional student psychology optimization enabled deep Q network for skin cancer detection in IoT applications. Artif Intell Med 102299, ISSN 0933–3657 7. Li F, Shankar A, Santhosh Kumar B (2021) Fog-internet of things-assisted multi-sensor intelligent monitoring model to analyse the physical health condition. Technol Health Care 29(6), pp 1319–1337 8. Jia L, Kumar BS, Parthasarathy R (2022) Research and application of artificial intelligence based integrated teaching-learning modular approach in colleges and universities. J Interconnection Netw. https://doi.org/10.1142/S0219265921430064 9. Kumar BS, Karthik S, Arunachalam VP (2019) Upkeeping secrecy in Information Extraction using ‘k’ division graph based postulates. Cluster Comput, SpringerLink, ISSN: 1386–7857. https://doi.org/10.1007/s10586-018-1705-2 vol 22, Supplement 1, pp 57–63, pp 1–7 10. Ganeshan R, Muppidi S, Thirupurasundari DR, Kumar BS (2022) Autoregressive-elephant herding optimization based generative adversarial network for copy-move forgery detection with Interval type-2 fuzzy clustering. Signal Process: Image Commun vol 108, 116756, ISSN 0923-5965, pp 1–10 11. Zhang Y, Qin G, Cheng L, Marimuthu K, Kumar BS (2021) Interactive smart educational system using AI for students in the higher education platform. J Multiple-Valued Logic Soft Comput 36. 12. Kumar BS, Ranjitham PK, Karthekk KR, Gokila J (2016). Survey on various small file handling strategies on Hadoop. In: 2016 international conference on communication and electronics systems (ICCES), (pp 1–4). IEEE 13. Prabu MK, Kumar BS, Karthik S (2015) Optimized scheduling for data anonymization in cloud using top down specialization. Int J Appl Eng Res (IJAER) 10(41):30546–30549

An Energy-Efficient Deep Neural Network Model for Photometric Redshift Estimation K. Shreevershith , Snigdha Sen , and G. B. Roopesh

Abstract In cosmological applications, redshift is a distance-measuring metric, and machine learning approaches have produced amazing results in this domain. Deep learning is a promising solution in redshift prediction due to a big set of cosmic data. An artificial neural network (ANN) model was implemented in this work with redshift values in the low (0–3) and high (0–7) ranges, where the data was extremely unbalanced and skewed in the high range. The true redshift value is translated into the logarithmic domain to account for the skewness in the redshift range distribution. Because redshift prediction is a regression job, the metrics used to evaluate model performance include mean absolute error (MAE), mean squared error (MSE), and R2. Furthermore, our experiment shows that by using a limited number of hidden layers, training time and carbon emissions can be minimized while still achieving sufficient performance. Keywords Deep learning · Photometric redshift · Sloan digital sky survey · Artificial neural network

1 Introduction In recent times, machine learning (ML) techniques are being used in many applications [1, 2], and in astronomy, more novel ML methods are designed for estimating astronomical constants to increase accuracy [3]. Using these techniques, observational astronomy has generated a greater amount of data. Analyzing and extracting the required information from this data requires the usage of cloud-based tools (ex: Apache Spark), which speeds up the process and performs better [4]. We present an approach to estimate photometric redshift using a deep learning (DL) method, artificial neural network (ANN). Although work in this field has been K. Shreevershith (B) · S. Sen · G. B. Roopesh Global Academy of Technology, Bengaluru, Karnataka, India e-mail: [email protected] S. Sen e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_24

319

320

K. Shreevershith et al.

done, most of them were correctly predicted when the data used was in the lowredshift range (z < 1) [5]. Very little work has been published on high-redshift ranged data using ML and DL methods [6]. Although few researchers have used deep convolutional neural network (DCNN) and image processing techniques to predict redshift ranging from 0 to 4, since it processes raw image input, the importance of certain feature classes cannot be resolved [7] and therefore is not considered in our study. In the proposed work, we will be using the ANN technique to predict redshift, and the entire redshift range covered (0 < z < 7) has been divided into two parts, one with a low-redshift range and the other with a high-redshift range. A few techniques used in this paper, like the logarithmic calculation of output values and multiple input features, are referred to in prior works [8], considering their huge impact on our work. We contribute mainly in the following three ways: 1. Developing a deep learning model for low as well as high-ranged redshift datasets. 2. The usage of a smaller number of hidden layers leads to lower computational costs and complexity. 3. Considering the energy efficiency of deep learning models and the calculation of carbon emissions.

1.1 Related Work Only the data from spectroscopic filters can be used to estimate galaxy distances accurately. Salvato et al. [9] provide an explanation for why photometric distance estimations have lower precision than spectroscopic ones despite being very easy to produce for all sources found in photometric data. To get around this problem, Gomes et al. [10] developed a unique method called Gaussian Processes for Photometric Redshift Estimation (GPz), which offers quick and precise photometric redshift calculations. Han et al. [11] used a classifier to divide the data samples into subsamples and a regressor with kNN and random forest as key algorithms to estimate the photometric redshift of each subsample. Snigdha et al. [12] proposed methods for processing astronomy data in a multi-node clustered environment for Apache Spark to handle bulk data and train models on a single node/machine utilizing the Lipschitzbased adaptive learning rate. In addition, Snigdha et al. [13] created a technique to analyze this data using a variety of ML algorithms and big data frameworks to help astronomers address significant problems in large astronomical datasets. In terms of DL, Collister et al. [14] showed that using 50,000 training samples with a redshift range of 0 to 0.7, ANN obtained an root mean squared error (RMSE) of 0.023, exceeding the traditional template fitting approaches in low-redshift data. The remaining section of the article is organized as follows. Section 2 explains the dataset description and input features. Sections 3 and 4 direct us toward the proposed model and implementation details. The results are discussed in Sect. 5. In Sect. 6, the potential future scope is discussed.

An Energy-Efficient Deep Neural Network Model for Photometric …

321

Fig. 1 Number of data samples in each redshift (SpecZ) range of the low-redshift dataset (left) and high-redshift dataset (right)

2 Data For this experiment, data has been acquired from SDSS (The Sloan Digital Sky Survey) (https://www.sdss.org/). We used two types of data: low-redshift data with redshift values ranging from 0 to 3 (0 < z < 3) and high-redshift data with redshift values ranging from 0 to 7 (0 < z < 7) For 0 < z < 3, we used only DR16 because the data samples obtained are sufficient for estimating accurate output). The highredshift data is based on SDSS’s 14th, 15th, and 16th data releases, which contain 604,935 samples. DR15 and DR16 have been merged for 3 < z < 5, and DR14, DR15, and DR16 have been merged for 5 < z < 7, to increase the count of training examples in that range. The merging of datasets results in more data as well as more training samples for getting better accurate estimates through the deep learning model. To deal with missing values, we implemented data imputation using the python function ‘SimpleImputer’ from the ‘sklearn’ Python package, which employs a statistical approach to fill in the gaps. Figure 1 depicts the redshift data distribution. In this work, model magnitudes and colors, fiber magnitudes, Petrosian magnitudes, band overlap magnitudes, and the mean magnitude of adjacent filters were used as input features. To achieve a uniform distribution of redshift values, we use the logarithm of the true redshift values as a target variable in our DNN model [8].

3 Proposed Model Considering the huge potential of ANN in complex dataset analysis, we propose ANN in this work. An ANN is a neural network consisting of input layers, hidden layers, and output layers. Each layer is a computational unit that takes inputs (input layer), applies an activation function, and passes them to the next layer. The normalization of data is done using the MinMaxScaler function that converts all the values between 0 and 1 and preserves the shape of the original distribution.

322

K. Shreevershith et al.

3.1 Feature Engineering In neural network regression, covering all the data points or more than required leads to overfitting. Feature selection with the PCA method solves the issue, but it reduces the dimensionality of data, which leads to the depletion of model performance [15]. To address this, we use the regularization technique for weight selection to reduce overfitting. Regularization is of two types: 1. L1 regularization (Lasso Regression). LASSO is also known as the Least Absolute Shrinkage and Selection Operator. The penalty term affixed to the loss function is the ‘Absolute Value of Magnitude’ of the coefficient. n 

⎛ ⎝Yi −

i=1

p  j=1

⎞2 Xij βj ⎠ + λ

p    βj  j=1

2. L2 regularization (Ridge regression). The penalty term affixed to the loss function here is the ‘squared magnitude’ of the coefficient. ⎛ ⎞2 p p n    ⎝Yi − ⎠ Xij βj + λ βj2 i=1

j=1

j=1

The main aim is to minimize the above cost function. When λ is large, heavier weights are penalized more acutely, but when λ is small, the regularization effect is reduced. The values of the weight matrix are driven down by adding the regularization component, which helps in reducing the overfitting issue in our DNN model.

3.2 Training and Validation The input dataset is divided into training and validation sets. The input data is shuffled initially, making the dataset more varied. The DNN model is a multi-layer perceptron consisting of one input layer, three hidden layers, and one output layer. The basic Python packages used in the ANN model are ‘Keras’, which is a deep learning library, and ‘livelossplot’, which displays a live training loss plot in a Jupyter Notebook for frameworks like Keras, etc. For building deep neural network architectures, we use the Keras library on top of TensorFlow, which provides out-of-the-box scalability for multiple machines (CPUs and GPUs). Table 1 displays the neural network configuration used in our model.

An Energy-Efficient Deep Neural Network Model for Photometric … Table 1 Neutral network configuration

323

Configuration variable

Value

Number of features

30

Number of hidden layers

3

Hidden layer neurons

256, 250, 245

Output layer neurons

1

Hidden layer activation function

relu

Output layer activation function

linear

Batch size

100

Optimizer

adam

4 Implementations The implementation of the work is as follows.

4.1 Step1: Implementing the Initial Model with Low-Redshift Data In the beginning, the initial model was processed with low-redshift data (0 < z < 3). All the features were considered during training. It was a basic MLP sequential model with no regularizers and dropouts. After fitting and predicting the output, the results were more accurate than other ML algorithms.

4.2 Step2: Implementing the Initial Model High-Redshift Data After the satisfactory results obtained from the previous step, the same model was tried with high-redshift data. The results obtained from the high-redshift data did not show accurate estimates as before, which implies that change is required in the model to process new data.

4.3 Step3: Using High-Redshift Data, Put the Final/Proposed Model into Action Many trial and error methods were implemented before ending up at this step. The use of RFE and chi2 for feature selection, changes made to the dataset, increasing

324

K. Shreevershith et al.

and decreasing the count of neurons in the neural network layers, changing the NN layer count, and changing the activation function are a few methods that had been implemented before proposing the final model. Both the regularization and dropout methods displayed positive outputs compared to other methods implemented.

5 Results and Discussion 5.1 Metrics The following metrics were used to evaluate the model: Mean absolute error (MAE). It is the sum of the absolute differences between actual and predicted values. MAE =

1 |r − rˆ | n

Mean Square Error (MSE). The value of this metric is always close to 0. It is the mean squared difference between predicted values and actual values. MSE =

2 1  r − rˆ n

MSE is more useful for this kind of model where there are many outliers and a model with a lower MSE is considered better. R-squared or Coefficient of determination (R2 metric). The Coefficient of determination (COD) is a measure of how well the data samples are fitted to the regression line where COD ranges between 0 and 1. COD denotes how much variance in the data is explained by the model.   2

[M ( rˆr ) − r rˆ )

COD =  2  2

2 [{M ( r ) − r } − {M rˆ 2 ) − rˆ } where M-count of test samples, r-true values, and rˆ -relative predicted values for all the equations above.

An Energy-Efficient Deep Neural Network Model for Photometric …

325

5.2 Results and Output A few machine learning algorithms were experimented with before extrapolating to ANN. The results obtained from the decision tree algorithm and kNN implemented with low-redshift data are as follows: (Fig. 2). Although the above plots demonstrate that kNN is superior to the decision tree in terms of reduced scattering, they also signal that the fitting is insufficient. According to the r2 scores from Table 2, the model cannot account for 0.54 of the variation for the decision tree approach and 0.35 of the variance for the kNN algorithm, which leads to unsatisfactory results. We changed to the ANN algorithm because the outcomes with ML techniques were subpar. The results obtained from the steps of implementation are shown below. Step1. The ANN algorithm is implemented with the low-redshift data and the metric results obtained are shown in Table 3. In comparison with other ML techniques used, the ANN outcomes are significantly better. ANN can represent complex, nonlinear relationships and generalize them, which enhances the prediction of yet-to-be-observed data. The MAE and MSE

Fig. 2 Graph plot of predicted redshift against test redshift values using decision tree (left) and kNN (right) algorithms

Table 2 Regression metric values obtained using the decision tree and kNN algorithms

Table 3 Regression metric values obtained using the ANN algorithm

Decision tree

kNN

MAE

0.0210

0.0183

MSE

0.0029

0.0020

R2 Score

0.46

0.65

ANN (Low-redshift data) MAE

0.0172

MSE

0.0017

R2 Score

0.69

326

K. Shreevershith et al.

scores are likewise relatively low, and Table 3’s r2 score is the highest of all the computed r2 scores. The majority of the data samples in Fig. 3 converges to the regression fit line, indicating that the model is generally well-fitted with acceptable results and precise estimates. This demonstrates that the original model created is an accurate predictor and a strong fit for low-redshift data. Step 2. Successively, after finding satisfactory results in Step1 with the ANN algorithm, we then use the same model to process high-redshift data. The output obtained is shown in Fig. 4. Figure 4 shows the plot of the loss function MSE and MAE results, respectively, of the model implemented. Overfitting is observed as the lossval > losstrain . This happens when the model is unable to generalize the data. The metrics results of Step 2 are in Table 4. The method used in Step 2 produces minor errors. The goal of this paper is to improve results by using the same ANN model for low- and high-redshift data by tweaking the model and its parameters rather than using additional methods. Fig. 3 Graph plot of predicted redshift against test redshift values using the ANN algorithm

Fig. 4 Graph plots of the variation of MSE and MAE versus the number of epochs for training and validation data

An Energy-Efficient Deep Neural Network Model for Photometric …

327

Table 4 Regression metric values obtained using the ANN algorithm on high-redshift data without using regularization and dropout approaches ANN (High-redshift data) (1) MAE

0.2028

MSE

0.0545

R2 Score

0.11

Alternatively, to obtain more accurate results, modify the output after each function in the process. Step 3. In this step, methods like regularization, the addition of dropout layers, usage of rules of thumb to change the number of nodes, number of hidden layers, activation functions, batch size, and metrics are experimented with to see if the present output of the model improves its output. The results after the changes were made showed improved output, as shown below. Figure 5 depicts the improved changes in the outputs as the errors have been reduced. Table 5 gives the metric results of Step 3. The R2 score from the improved model is still not a desirable result. In OLS, the R2 score is a measure of how well your estimates predict the data, but sometimes the R2 score can be quite low due to the type and variance of data used.

Fig. 5 Graph plots of the variation of MSE and MAE versus the number of epochs for training and validation data using regularization and dropout approaches

Table 5 Regression metric values obtained using the ANN algorithm on high-redshift data using regularization

ANN (High-redshift data) (2) MAE

0.1576

MSE

0.0483

R2 Score

0.29

328 Table 6 Results of high-redshift range

K. Shreevershith et al. Algorithm

Value

Weak-gated expert

0.07

ExtraTreesRegressor

0.66

ANN

0.04

5.3 Comparison with Other Methods In Table 6, we compare the results with where high-redshift work has been reported. The values of weak-gated expert and ExtraTreeRegressor are referred to in [7, 8], respectively.

5.4 Energy Consumption While predicting accurate redshifts, most writers overlook one critical feature of deep learning: large training times, which result in massive carbon emissions due to considerable CPU and memory utilization when the model is fully optimized. As responsible data scientists, we assessed our model’s environmental impact when developing it. A few factors used to determine emission are the location of the training server and the electrical grid used, the length of training time, and the hardware used. According to one survey, training an AI model can emit more carbon than five cars in their lifespan. To determine the environmental impact of training ML models, a tool called ‘Machine Learning Emissions Calculator’ [16] is utilized. We present 0.17 kg eq. Co2/KWh (Carbondioxide/kilowatt-hour) and an execution duration of 38 min by implementing our model on Google Collaboratory on GPU, with three hidden layers using the nearest server. During the experiment, we tried with fewer neurons in each hidden layer (33,50,45) and reached the same MSE value with a much shorter training period (approximately 4 min). It emits roughly 0.01 kg of CO2. These findings clearly show that our model is both energy efficient and environmentally benign. With a simpler neural network model, we were able to attain excellent results.

6 Future Enhancements This project can be scaled up in the future by running it on the cloud (for example, AWS, GCP, and so on) and big data platforms, which allow for faster and more accurate results with massive data. GPUs, which help to reduce processing time and provide faster outputs than local machines, can help to overcome the issue of high processing power (CPUs). The required Python packages are automatically updated on cloud platforms and include a few standard ML tools and APIs that aid in

An Energy-Efficient Deep Neural Network Model for Photometric …

329

project scaling. To improve prediction results, we intend to use the Synthetic Minority Oversampling Technique (SMOTE) approach for dealing with skewed data.

7 Conclusion To summarize, the paper’s main goal is to successfully predict photometric redshift for low-redshift data, but there is room for improvement for high-redshift data because data is scarce, especially in the high-redshift zone (Red (r), near infrared (i), and infrared (z)). A future approach could be to combine multiple surveys to increase data in a high-redshift zone. DNN models can always be improved by using the appropriate architecture and hyperparameter tuning.

References 1. Khasnis NS, Snigdha S, Shubhangi SK (2021) A Machine Learning Approach for Sentiment Analysis to Nurture Mental Health Amidst COVID-19. In: Proceedings of the international conference on data science, machine learning and artificial intelligence 2. Sen S et al (2021) Analysis, visualization and prediction of COVID-19 pandemic spread using machine learning. Innovations in computer science and engineering. Springer, Singapore, 2021, pp 597–603 3. Sandeep VY, Sen S, Santosh K (2021) Analyzing and processing of astronomical images using deep learning techniques. In: 2021 IEEE international conference on electronics, computing and communication technologies (CONECCT), pp 01–06. https://doi.org/10.1109/CONECC T52877.2021.9622583. 4. Monisha R et al (2022) An approach toward design and implementation of distributed framework for astronomical big data processing. Intelligent systems. Springer, Singapore, 2022, pp 267–275 5. Tagliaferri R et al (2003) Neural networks for photometric redshifts evaluation. In: Apolloni B, Marinaro M, Tagliaferri R (eds) Neural Nets. WIRN 2003. Lecture notes in computer science, vol 2859. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45216-4_26 6. Laurino O, D’Abrusco R, Longo G, Riccio G (2011) Astroinformatics of galaxies and quasars: a new general method for photometric redshifts estimation. Mon Not R Astron Soc 418(4):2165– 2195. https://doi.org/10.1111/j.1365-2966.2011.19416.x 7. Scranton R, Connolly AJ, Szalay AS, Lupton RH, Johnston DE, Budavari T, Brinkman J, Fukugita M (2005) Photometric covariance in multi-band surveys: understanding the photometric error in the SDSS. arXiv: Astrophysics. 8. Reza M, Haque MA (2020) Photometric redshift estimation using ExtraTreesRegressor: galaxies and quasars from low to very high redshifts. Astrophys Space Sci 365(3): 2020. https://doi.org/10.1007/s10509-020-03758-w 9. Salvato M, Ilbert O, Hoyle B (2019) The many flavors of photometric redshifts. Nat Astron 3:212–222. https://doi.org/10.1038/s41550-018-0478-0 10. Gomes Z et al (2018) Improving photometric redshift estimation using GPz: size information, post processing, and improved photometry. Monthly Notices of the Royal Astronomical Society. 475. https://doi.org/10.1093/mnras/stx3187 11. Han B, Ding H-P, Zhang Y-X, Zhao Y-H (2016) Photometric redshift estimation for quasars by integration of KNN and SVM. 2016 Res Astron Astrophys 16 005. https://doi.org/10.1088/ 1674-4527/16/5/074

330

K. Shreevershith et al.

12. Snigdha S, Saha S, Chakraborty P, Pratap Singh K (2021) Implementation of neural network regression model for faster redshift analysis on cloud-based spark platform. In: International conference on industrial engineering and other applications of applied intelligent systems. Springer, Cham. https://doi.org/10.1007/978-3-030-79463-7_50 13. Sen S, Agarwal S, Chakraborty P et al (2022) Astronomical big data processing using machine learning: a comprehensive review. Exp Astron. https://doi.org/10.1007/s10686-021-09827-4 14. Collister AA, Lahav O (2004) ANNz: estimating photometric redshifts using artificial neural networks. Publ Astron Soc Pac 116:345–351. https://doi.org/10.1086/383254 15. Ismoilov N, Jang S-B (2018) A comparison of regularization techniques in deep neural networks. Symmetry 10:648. https://doi.org/10.3390/sym10110648 16. Lacoste A, Luccioni A, Schmidt V, Dandres T (2019) Quantifying the carbon emissions of machine learning. arXiv:1910.09700

Deep Learning-Based Diabetic Retinopathy Screening System Rajkumar Kalimuthu, Limbika Zangazanga, S. Jayanthi, and Ignatius A. Herman

Abstract Diabetic retinopathy (DR), a serious eye disease that can result in vision loss, occurs particularly in low- and middle-income economies. The objective of this research paper is to expound an automated screening system that can automatically identify the retina’s primary structural elements and other key characteristics of DR. This research work is intended to identify blood vessel leak areas as a precursor to diabetic retinopathy. The deep learning-based automated system is used for early detection of DR which can also assist an ophthalmologist in DR screening for providing timely treatment. The key objective of this proposed research work is to automate image classification as well as analyse retinal images for the presence of diabetic retinopathy symptoms, which are manifested as spots on the iris image brought on by blood vessel fluid leakage. Keywords Diabetic retinopathy · Retinal images · Fundus · Convolution neural network · Deep learning

1 Introduction Diabetic retinopathy represents vascular issues in diabetic individuals’ retinas. It is an eye problem linked to diabetes that is brought on by impaired glucose tolerance. Blood vessels in the cornea inflate due to the elevated blood sugar levels, which causes larger blood vessels and even vascular leakage, which results in black spots in iris image [1–3]. The retina really serves as the eye’s lens, sending a cognitive signal R. Kalimuthu · L. Zangazanga School of Computers Science and Information Technology, DMI St. John the Baptist University, Lilongwe, Malawi S. Jayanthi (B) Department of Information Technology, Guru Nanak Institute of Technology, Hyderabad, India e-mail: [email protected] I. A. Herman DMI—Group of Institutions, Lilongwe, Malawi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_25

331

332

R. Kalimuthu et al.

to the brain that enables us to perceive objects as they travel through the iris, corneal, and internal space of the eye. Early identification of diabetes raises the possibility of averting cataracts and dizzy vision because vision is typically lost gradually. DR screening using automated procedure provides an efficient solution to evaluate the retinal images. The emergence of DR screening is one of many automated methods that have benefited from the advancement of deep neural network (DNN) technologies in the recent years [4, 5] across a variety of medical disciplines. DNNs are trained using retinal data obtained via inflated pupils to guarantee maximum diagnostic precision. The algorithms detect attributable photographs images based on according to common DR severity measures. One or more of these automated algorithms are in use in a few screening procedures that have already received regulatory approval. These investigations have led to the inclusion of these algorithms by numerous vendors into their small cameras for the purpose of fast offsite classification of retinal data obtained [6, 7]. There is several reports summarising the diagnostic efficacy of techniques for interpretation of retina data acquired by portable cameras in non-mydriatic pupils [8, 9]. Further, there have been no instances of automated assessment being used in a DR screening and treatment in India. In this research work, we assessed the efficiency of proposed algorithm in identifying the patient’s eye with three outcomes: referable, non-referable, and ungradable DR. Software-supported diagnosing systems and optometrists use fundus scanning as a reliable diagnostic imaging resource to detect many retinal illnesses. The sparkly “optic nerve head and ophthalmic cup” and the dark “macula lutea and fovea centralis” regions are clearly differentiated in photographs of the fundus’s pivotal and posterolateral eye regions. The clinical features of DR can also be seen on fundus imaging, such as “blister aneurysms, haemorrhages, exudates, and cotton wool spots” [8, 5] (Fig. 1). This research work aims at use of blood vessel leak spots appearance as an indication of diabetic retinopathy, even though there is a need for further testing techniques in the real world. The image taken will be processed using image processing using a sharp masking which involves the changing of the colour of the image in RGB. In order to improve capillaries and retinal images segmentation, the green section of the cornea is utilised. The green image is changed to green entropy and the red image is changed to grey entropy. After this concept is done, the next thing is the learning process that will be done using convolutional neural networks. This is done by comparing of the current retinal image with the images in the datasets of the system. The automatic solutions for DR recognition are more effective than a traditional diagnostics and cost effective and time saving. Traditional diagnostics carry a higher chance of inaccuracy and demand extra work than automatic techniques. This research does a thorough assessment of recent automated approaches for detecting diabetic retinopathy and suggests a deep learning-based method to diagnose DR.

Deep Learning-Based Diabetic Retinopathy Screening System

333

Fig. 1 Fundus images (1) blister aneurysms (2) haemorrhages (3) exudates (4) cotton wool spots [8]

2 Related Work There are a number of researches that are being carried out by different institution that are also focusing the same diabetes retinopathy [1, 7]. Buddhist Tzu Chi General Hospital uses a bi-channel convolutional neural network (CNN) to implement deep learning. This investigation employed a CNN for the extracting features of a referable DR [1]. This study is based on the development of a bi-channel CNN to interpret the entropy of the luminance (grey level) and the green region of images together with four convolutional layers having 5 * 5 kernels and subsequent layers’ filters, 128, 64, 64, and 32. To avoid overfitting, maximal pooling, linear activation function, and dropout are utilised [10]. The completely linked layers are statistically integrated to assess the referable DR detection result after the channels have been flattened. The Adam algorithm and the cross-entropy loss function are used to train the network for this suggested referable DR method of detection using Tensor-Flow software with Python at a learning rate of 0.0001 [1].The summary of related works of diabetic retinopathy is given below (Table 1).

Paper name

“Detection of diabetic retinopathy using bi-channel convolutional neural network” [1]

“Comparison of smartphone-based-retinal imaging systems for diabetic retinopathy detection using deep learning” [2]

“Diagnosis of diabetic retinopathy using machine learning” [11]

“Great expectations and challenges of artificial intelligence in the screening of diabetic retinopathy” [12]

S. No

1

2

3

4

Artificial intelligence

Machine learning, image processing

Smartphone-based retina imaging systems, deep learning.

Convolutional neural networks and image processing algorithm

Methods

Table 1 Summary of related works—diabetic retinopathy Benefitsin automated retinal image analysis

Merits

Convolutional neural network-based deep learning methodology

It merges inverting and rage of pixels with morphologically based dark lesion classification

Enhance cost efficiency in strong environments while increasing accessibility in low-resource environments

Even from a poor resolution colour fundus image, the basic features like retinal exudates and microneurysms can be extracted

The retinal imaging Technological and economic technology based on mobile feasible, portable retinal phone, iNview, can capture imaging systems images of the cornea by sending pulses of light from the equipment. For the purpose of checking the complete posterior pole in a image, it receives a 50° retinal view

Unsharp masking is used to improve image quality as a pre-processing step prior to estimating the entropy of retinal images

Method description

(continued)

Lack of clinical research comparing deep learning-based AI systems with human graders in randomised supervised trials

When converting a binary image, this method is likewise ineffective for thresholds that remain constant

These approaches are semi-automated, therefore, some professional oversight is necessary to determine whether a retinal problem exists

It is complex

Demerits

334 R. Kalimuthu et al.

Image processing, neural networks

CNN and deep learning

Image processing

“Convolutional neural networks for diabetic retinopathy” [13]

“Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs” [4]

“Detecting diabetic retinopathy using embedded computer vision” [8]

“Comparative analysis of deep learning methods of detection of diabetic retinopathy” [6]

“Diabetic retinopathy detection through deep learning techniques” [5]

5

6

7

8

9

CNN and deep learning

Deep learning

Methods

Paper name

S. No

Table 1 (continued)

Autoencoder, sparse coding, and CNN

Evaluated the diabetic retinopathy deep learning algorithm by using datasets

Machine learning algorithm

Deep neural networks are trained with large amounts of data

Training using convolutional neural networks (CNNs)

Method description

The proposed algorithm might not perform well for images with minute observations that the majority of ophthalmologists would overlook

The network learns to recognise an image of a healthy eye without any trouble

Demerits

Computational complexity

(continued)

Considerably cut down on the The amount of data required amount of time needed to to train DL systems, as DL make diagnoses, reducing requires a lot of data ophthalmologists’ time and money while enabling prompt patient treatment

Low computational requirements

Detecting DR at a cheap cost Low accuracy for the medically poor society

Consistency of interpretation because each time a machine interprets a particular image, it will make the same prediction

Trained CNN has the advantage of enabling real-time whenever a new image is received as it can categorise hundreds of photographs per minute

Merits

Deep Learning-Based Diabetic Retinopathy Screening System 335

“Predicting optical coherence tomography-derived diabetic macular oedema grades from fundus photographs using deep learning” [14]

11

Deep learning

“Automated invasive ductal Deep learning, carcinoma detection based DenseNet using deep transfer learning with whole-slide images” [7]

10

Methods

Paper name

S. No

Table 1 (continued)

3D and 2D images produced via fundus photography

Automated IDC detection task

Method description

Accurate

Best balanced accuracy

Merits

Complex

Demerits

336 R. Kalimuthu et al.

Deep Learning-Based Diabetic Retinopathy Screening System

337

3 Problem Definition and Existing System In remote locations where medical screening is challenging to do, this research aims to identify and mitigate this disorder among residents. At the meantime, doctors with extensive training evaluate the photographs and make the necessary diagnoses after travelling to rural locations to obtain the necessary images [10, 11, 12]. Currently, medical personnel’s take time to test patient manually, patients waiting in very long lines to get an eye test, people cannot afford to pay for frequency eye check-up in hospitals, the systems that are used require a trained doctor or medical personnel to operate the instruments. But due to the human error, the manual eye test is not very effective [13].

4 Methodology This concept is implemented using deep learning and image processing. The image will be taken form a fundus camera. The image taken will processed using image processing using a unsharp masking which involves the changing of the colour of the image in RGB. Blood vessel and inner retinal segmentation are pre-processed using the green portion of the retinal picture. The green image is changed to green entropy and the red image is changed to grey entropy [4]. After this concept is done, the next thing is the learning process that will be done using convolutional neural networks [6]. The second method that is used is the deep learning-based CNN. In this concept, we use the data set to compare with the results that we get from the patient and then we use pattern matching to find the appropriate results.

5 System Architecture This system has the methodology that involves image processing. In image processing, we have the interface. Here, the user can upload a picture from fundus camera. This image is then changed to RGB. Here, only red and green colours are used, and the system will change the image into grey entropy and green entropy the pictures are then displayed the interface. System architecture for diabetic retinopathy screening system is illustrated in Fig. 2.

6 Results The system will show and analyse the image and it will check if the eye retina has a defect or not. If the retina has no defect, it will display that the eye is normal if it has a defect it will also show. The defected eye will be analysed to check the stages of the

338

R. Kalimuthu et al. Interface Fundus Camera Image Analysing

Display

Display Red

RGB

Grey Entropy

Green Green Entropy

Datasets Learn Results

Database

Fig. 2 System architecture for diabetic retinopathy screening system

eye defect. After the system detect, the abnormal eye it will give the corresponding treatment and keep the data into the database for the next test. The screenshots of retinal scanning and optic disk detection are shown in Figs. 3and4. There are numerous key performance indicators that may be used to compare the effectiveness of the two strategies. Accuracy, sensitivity, and precision are the metrics that are frequently used to evaluate the efficiency of approaches. Specificity is the computed percentage of normal images that are classified as normal, whereas sensitivity is the calculated percentage that classifies the aberrant image as abnormal

Fig. 3 Screenshot of retinal scanning

Deep Learning-Based Diabetic Retinopathy Screening System

339

Fig. 4 Screenshot of optic disk detection

[13]. The percentage of correctly classified photographs is known as accuracy. The bar chart that displays the sensitivity, specificity, and accuracy is shown in Fig. 5. The above results emphasise the efficiency of deep learning-based CNN, since its metrics, sensitivity, accuracy, and specificity values are high compared to that of conventional CNN. Fig. 5 Comparison of CNN and deep learning-based CNN

340

R. Kalimuthu et al.

7 Conclusion Long-term diabetes can cause diabetes retinopathy, in which the retinal is harmed as a result of blood vessel fluid leakage. Usually, the blood vessels, discharges, hematoma, micro-aneurysms, and thickness are used to assess the phases of diabetic retinopathy. Currently, an ophthalmologist utilises an optical system to see the blood vessels and brain to determine the phases of diabetic retinal disease. Digital imaging is now a technology that can be used to screen for diabetic retinopathy. Automatic analysis technologies may be used to analyse digital photographs. It provides excellent, long-lasting records of the retina’s appearance that can be used to track treatment response or improvement over time and reviewed by an ophthalmologist. An automated technique for detecting diabetic retinopathy can lower the cost of grading, making the screening process affordable.

References 1. Varadarajan AV, Bavishi P, Ruamviboonsuk P (2020) Predicting optical coherence tomographyderived diabetic macular edema grades from fundus photographs using deep learning. Nature Commun vol 11 130, January 2020 2. Bhatia K, Arora S (2016) Diagnosis of diabetic retinopathy using machine learning. IEEE https://doi.org/10.1109/NGCT.2016.7877439, October 2016 3. Karakaya M (2019) Comparison of smartphone-based retinal imaging systems for diabetic retinopathy detection using deep learning. BMC Bioinf vol 5, march 2019 4. Hacisoftaoglu RE (2016) Convolutional neural networks for diabetic retinopathy. Inf Med vol 2, July 2016 5. Tan CS, Chew MC, Lim LW, Sadda SR (2016) Advances in retinal imaging for diabetic retinopathy and diabetic macular edema. Indian J Ophthalmol 64(1):76–83. https://doi.org/ 10.4103/0301-4738.178145. PMID: 26953028; PMCID: PMC4821126 6. Kaur T, Singh J (2017) Diabetic retinopathy detection system (DRDS): a novel guibased approach for diabetic retinopathy detection semanticscholar, ID: 2870235, 2017 7. Alyoubi WL, Shalash WM, Abulkhair MF (2020) Diabetic retinopathy detection through deep learning techniques. Inf Med, June 2020. 8. Abbood SH, Hamed HNA, Rahim MSM, Rehman A, Saba T, Bahaj SA (2022) Hybrid retinal image enhancement algorithm for diabetic retinopathy diagnostic using deep learning model. IEEE Access 10:73079–73086. https://doi.org/10.1109/ACCESS.2022.3189374 9. Pao SI, Lin HZ, Chien KH, Tai MC, Chen JT, Lin GM (2020) Detection of diabetic retinopathy using bichannel convolutional neural network Hindwawi Journal of Ophthalmology, vol 2020, Article ID 9139713 10. Zhao M, Jiang Y (2020) Great expectations and challenges of artificial intelligence in the screening of diabetic retinopathy The Royal College of Ophthalmologists. Eye 34:418–419, July 2019 11. Faust O, Acharya R (2019) Algorithms for the automated detection of diabetic retinopathy using digital fundus images. J Med Syst https://doi.org/10.1007/s10916-010-9454-7 12. Vora P, Shrestha S (2020) Detecting diabetic retinopathy using embedded computer vision vol 10. https://doi.org/10.3390/app10207274, October 2020 13. Patel P, Sharm KJ (2016) Diabetic retinopathy detection system: review researchgate. https:// doi.org/10.13140/RG.2.1.2974.1040, April 2016

Deep Learning-Based Diabetic Retinopathy Screening System

341

14. Gadekallu TR, Khare N, Bhattacharya S, Singh S, Maddikunta PK, Ra IH, Alazab M (2020) Early detection of diabetic retinopathy using PCA-firefly based deep learning model. MDPI 2, February 2020

Artificial Intelligence-Based Data Analytics Techniques in Medical Imaging Prasanalakshmi Balaji, Prasun Chakrabarti, and Bui Thanh Hung

Abstract In the medical field, AI is used for machine learning (ML) models to examine medical data and reveal insights for improving the health results of patients. Generally, medical imaging (MI) also called radiology which reconstructs many images in the body part for diagnostics and treatment. The main challenging task in MI is proper diagnosis and accurate detection of diseases. In this survey, MI-based Artificial Intelligence (AI), Big Data Analytics (BDA), ML, and Deep Learning (DL) were discussed. Also investigate the techniques and algorithms used in AIbased BDA in MI such as supervised learning, ML, unsupervised learning, and DL. Furthermore, the merits, demerits, advantages, disadvantages, and performance analysis of MI are discussed. The main contribution of the work is to analyze and surveys the AI-based BDA techniques and identifies the issues in medical imaging also provide proper solution for the issues. Finally, the performance assessment of the relevant topic is examined and provides the problem definition. Keywords Artificial intelligence · Big data analytics · Diagnosis · Detection · Features extraction · Medical imaging · Radiology · Preprocessing · Machine and deep learning

P. Balaji (B) Faculty of Information Technology, Post-Doctoral Researcher, Artificial Intelligence Laboratory, Ton Duc Thang University, Ho Chi Minh, Vietnam e-mail: [email protected] P. Chakrabarti Deputy Provost, ITM SLS Baroda University, Vadodara Gujarat, India e-mail: [email protected] B. T. Hung Faculty of Information Technology, Artificial Intelligence Laboratory, Ton Duc Thang University, Ho Chi Minh, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_26

343

344

P. Balaji et al.

1 Introduction Generally, MI is also called radiology which reconstructs many images in the body part for diagnostics and treatment. Moreover, the procedure of MI contains noninvasive tests which permit the doctors or specialists to diagnose injuries and diseases without being disturbing. Furthermore, computed tomography (CT), ultrasound, magnetic resonance imaging (MRI), and nuclear medical imaging like positron emission tomography (PET) are used in MI systems. Additionally, MI is the process of imaging the inner part of the body for medical interventions and clinical analysis. It is helpful to the doctors for identifying the disease in an early stage which leads to enhanced outcomes for the patients. Subsequently, AI denotes the machine or system which mimics human intelligence for performing tasks and iteratively enhances the performance based on the collected information. Furthermore, the use of AI in diagnostic MI is undergoing wide-ranging evaluation, and it indicates impressive sensitivity and accuracy in the identification and detection of abnormality images and improves tissue-based detection. In the medical field, AI is used for ML models to examine the medical data and also reveal insights for improving the health results of patients. Presently, the most public role of AI in MI is to support imaging analysis and provides clinical decisions. Thus, the clinical decisions offer the decisions related to mental health, treatment, medications, and so on. Consequently, AI is used in MI for analyzing X-ray, MRI, and CT scans for finding lesions. The BDA in MI is shown in Fig. 1. BDA is the data science also called Big Data (BD) that is rapidly increasing in every engineering and science field. It is the process of investigating and examining enormous and mixed data which are useful for making business decisions and organizations. Initially, Bio-Medical (BM) data are collected using certain IoT sensors and wearable devices are trained and tested for the system. Moreover, BD is needed in several organizations that are compacted with many organizations through huge quantities of precise information that involves useful information around difficulties like biology, cybersecurity, fraud detection, national intelligence, marketing, medical informatics, and astronomy. Numerous talented ML methods were used for BDA with depiction learning, DL kernel-based learning, transfer learning, distributed learning, parallel learning, and active learning. Additionally, BDA demands novel and sophisticated framework-based ML and DL methods for handling the data in a realtime application through high accuracy and high productivity. In this paper, several methods correlated to AI-based BDA in medical imaging are briefly described, and the function, algorithm, dataset used, advantages, and disadvantages are discussed. The search target of the review is to analyze the techniques and identify the problem for enhancing the medical imaging system. Also, examine the future scope of the MI-based AI using BDA. The main contributions of the research article are, • Analyze the medical imaging based on existing techniques for identifying the problem and how these apply to BDA. • Identify the advantages and disadvantages for examine the future scope of the models

Artificial Intelligence-Based Data Analytics Techniques in Medical …

345

BDA in medical imaging

preprocessing Data images

ROI definition

Diagnosis Feature extraction and feature selection

Classification

Physicians

Fig. 1 BDA in medical imaging

• Medical imaging techniques using DL and ML were analyzed and discussed in this research. • Finally, the performance metrics of each paper were examined and analyzed the future scope for enhancing the performance of MI in BDA.

2 Literature Survey In this section, describe the numerous techniques which are designed in a medical imaging system using AI and BDA. Also details the existing techniques’ advantages, disadvantages, merits, demerits, years, and so on.

346

P. Balaji et al.

2.1 AI-Based BDA in Medical Imaging Patricia et al. [1] designed a coherent basic standard model for enhancing the communication of doctors with patients and examination subjects. Furthermore, the rising utilization of BD and AI strategies requests a reconsideration of these standards’ possible problems like epistemology, encompassing protection, informed assent, secrecy, data proprietorship, and imbalances. Generally, patients contain deep assessments of these problems. In that capacity, the local area of radiology pioneers, informatics, and ethicists includes the discussion about the suitable method for managing these problems and assist with driving the way in creating abilities. Moreover, AI applications are presently traversing numerous different fields from financial matters to amusement, to assembling, and medication. Issam et al. [2] summarize AI in radiological sciences, featuring, with models, its amazing accomplishments and impact on re-forming the act of MI and radiotherapy in the detection system, analysis, anticipation, and choice help. The advancement of data sciences used for data board, incorporation, mining, characterization, and sifting. Additionally, hypothesis strategies and replicas from AI were varying the healthcare scene in experimental and local area settings and also proactively shown talented outcomes in different applications in health care including, coordinated health data frameworks, patient schooling, geocoding health data, virtual entertainment analytics, plague and syndrome observation, versatile health, and MI. Additionally, David et al. [3] proposed a health intelligence framework that utilizes instruments and strategies from AI and data science for giving better bits of knowledge, decreasing burn-through time, and speeding up, administration efficiencies, level of exactness, and efficiency in medication and health care. Richardson et al. [4] design a large part of the new energy about AI from the medical writing has rotated around the capacity of AI techniques to perceive life structures and recognize pathology on medical pictures, at times and the degree of master doctors. In any case, AI is utilized to settle an extensive scope of non-interpretive issues which are pertinent to radiologists also their patients. Mobile health (m-health) is defined as the health monitoring system utilizing cell phones and patient checking gadgets. That is frequently considered a significant leap forward in innovation. As of late, AI and BDA were applied inside m-health to give a sustainable healthcare framework. Alotaibi et al. [5] proposed AI and the BDA technique to develop the m-health framework. Different AI-based systems of BD are designed for the source of data, methods utilized, and the applications examined. The designed model investigates the utilization of AI and BDA to offer bits of knowledge to the clients and empower them to utilize the assets, particularly for the difficulties in m-health. The introduced developed framework directs the improvement of methods involving the blend of AI and the BD to switch the care of m-health data. This publication presents a special issue on simulation and synthesis in MI. Jerry et al. [6] characterized up until these points ambiguous terms of re-enactment and blend in MI. Additionally, momentarily examine the synergistic significance of robotic and phenomenological replicas of MI generation. At last, present the 12 papers distributed in this problem casing together unthinking and phenomenological MI generation. Moreover, a rich choice of

Artificial Intelligence-Based Data Analytics Techniques in Medical … Medical sensor and wearable devices

347

Trigger alert

AI and BDA IoT data collection

Automated connectivity

Smart phone

EHR

Collect and record data

Action taken by doctors

Mobile care monitor

Process and analyze data

Trigger alert to the patient

Fig. 2 AI-based BDA in medical imaging

documents shelters applications in retinopathy, cardiology, oncology, neurosciences, and histopathology. Additionally, shields standard demonstrative MI modalities and finishes up the publication through an individual view of the field and features a few existing difficulties and future exploration open doors. Thus, the AI-based BDA in medical imaging is shown in Fig. 2. The exact division of MIs has an important stage in molding during radiotherapy arranging. Moreover, CT and MRI become the most involved radiographic procedures for analysis, clinical investigations, and treatment arranging. Aggarwal et al. [7] proposed subtleties of computerized division techniques, explicitly examined with regards to CT and MRI. AI depicts the utilization of computational strategies to impersonate human intelligence. Moreover, health care includes enormous medical datasets being utilized for anticipating a determination, distinguishing new illness genotypes or aggregates, or guiding therapy methodologies. Harmless imaging stays a foundation for the determination, risk definition, and nourishment of patients with cardiovascular sickness. Simultaneously, AI is worked with each phase of the imaging system, from obtaining and recreation to division, estimation, understanding, and resulting clinical pathways. In this paper, Andrew et al. [8] developed AI methods and their ongoing applications present in cardiovascular imaging and also examine the forthcoming job of AI. Furthermore, cardiovascular medication is prepared for adaptable AI applications that decipher immense measures of clinical and imaging data in more prominent profundity than at any time. Additionally, AI frameworks can further develop the work process and give reproducible and detached measurable outcomes that may illuminate clinical choices. Within a reasonable timeframe, AI

348

P. Balaji et al.

might work behind the scenes on heart pictures, examination software, and routine clinical announcing. Consequently, gathering data and empowering continuous analysis are explained. The summary of the comparison of AI-based BDA in medical imaging is detailed in Table 1. Moreover, Embase and PubMed MEDLINE databases are looked for distinguishing unique examination articles distributed between January 2018 to August 2018 that researched the exhibition of AI systems and investigate the medical pictures for giving symptomatic choices. Frequently, Wook et al. [9] are assessed for deciding on outside approval as opposed to inward approval, and in the event of outer approval, consequently, gathered the data for approval, with an indicative companion plan rather than a demonstrative case–control plan, from numerous foundations, and in an imminent way. Moreover, essential systemic elements are suggested for clinical approval of AI execution in true practice. Thus, the examinations of the designed model satisfied the above models have been distinguished. Table 1 Summary of AI-based BDA Year

Author

2019 Kim et al. [1]

Technique

Advantage

Disadvantage

Design AI system for diagnostic analysis in MI

• Improve diagnosis • High accuracy

• Error rate is high

• Detect the health status effectively • Visual content

• The delay rate is high • High error rate

2020 Alotaibi, Sultan Refa Applications of AI [2] and BDA in p-health 2018 Patricia, et al. [3]

Design AI and • Classify extensive • TPR is low predictive analytics networks because of error • Improve prediction performance

2020 El Naqa, Issam, et al. AI reshaping the [4] preparation of radiological sciences

• Identify reshaped • High cost image • High execution • High accuracy and time robustness • Overfitting problem

2018 Martin Michalowski, AI based • Exploring certain et al. [5] personalized health kinds of extracted and transforms features in the MI • Attaining better population prediction and automatic healthcare detection

• Data complexity • Less reliable and sensitivity

2021 Michael L. et al. [6]

Non-interpretive uses of AI in radiology

• High precision and • High error rate recall • Low TPR

2018 Frangi et al. [7]

MI simulation and synthesis

• 92.7% in ROC • Low cost

• Lack of security • High attack rate

Artificial Intelligence-Based Data Analytics Techniques in Medical …

349

The distributing diaries are grouped into medical and non-medical diary gatherings. Additionally, the outcomes were validated between medical and non-medical diaries. Of 516 qualified distributed examinations, just 6% (31 investigations) completed outer approval. From the 31 investigations, everyone contains three plan highlights: demonstrative associate plan, different establishment considerations, and planned data assortment for outer approval. No tremendous distinction was found between medical and non-medical diaries. Virtually, every one of the investigations distributed the review period which assessed the exhibition of AI system for analytic examination of medical pictures were planned in specialized attainability studies and didn’t contain any plan. The utilization of AI with MI examination is analytic grouping and understanding assignments, also numerous extra significant applications that support translation errands, quality, security, and functional effectiveness. Laurent et al. [10] design an effective execution of AI work process applications that is dependent upon a powerful and incorporated informatics framework.

2.2 AI-Based BDA in Medical Imaging Using Machine Learning Techniques The new transformations in advanced mechanics, AI, Internet of Things (IoT), wearable gadgets, ML, and BDA have uncovered capable conceivable outcomes in BM and healthcare innovations that might be reached out to the terahertz (THz) range. Moreover, Banerjee et al. [11] designed the current part to manage hypothetical, systemic, and observational ideas through broad models connected with the use of these advancements in THz health care. The use of THz in arising health care is apparent, through different striking applications such as security, material spectroscopy, MI, science, detecting, medication, drugs, and interchanges. Furthermore, telerobotic medical procedures and IoT-based robotics are talked about momentarily setting. Moreover, subtleties on different improvements in the IoT and accessible wearable, and brilliant gadgets focusing on BM and healthcare applications, that is equipped for data assortment and examination through standard conventions of machine intelligence to forecast the health connected problems. Finally, the extension and development of these advances regard the THz healthcare framework and BM examination. Banerjee et al. [12] proposed BDA for BM pictures that examine new strategies utilized for handling and offers. Moreover, contend to adjust and expand connected work methods in BD software, utilizing Hadoop and Spark systems. Furthermore, give an ideal and effective design for the BM picture examination. Additionally, the designed model accordingly gives a wide outline of BDA to mechanize the BM picture. Moreover, the work process of the designed model and framework for all progression is proposed. Two designs for picture arrangement are proposed. It utilizes the Hadoop structure to plan the first and the Spark system for the second. The proposed Spark engineering permits us to foster fitting and effective strategies to use an enormous number of pictures for grouping that is

350

P. Balaji et al.

redone concerning one another. The proposed structures are more finished, simpler, and versatile. The acquired Spark design is the absolute most complete because it works with the execution of calculations with its implanted libraries. AI is ready to have an authentic effect on the medication. Clinical choice help is the significant region where AI can expand the clinician’s capacity for gathering, comprehending, and making surmising on a mind-boggling capacity of patient data to arrive at the ideal clinical choice. Moreover, Faiq et al. [14] proposed advancements in MI analysis, like data computations and Radiomics, for example, ML that extended how we might interpret sickness processes and their administration. The designed model audits the most significant ideas of AI as material for cutting-edge imaging-based clinical choice emotionally supportive networks. In 2000, the industry examiner endeavored to portray BD as three like variability, velocity, and volume. Moreover, Amirhessam et al. [15] designed new advances of Hadoop that are presently possible for storing and utilizing incredibly enormous volumes of data which comes under phenomenal speed. The BDA is demonstrated and valuable in different fields, for example, sports, promotion, health care, science, genomic succession data, and MI. Finally, the designed model explains the outline of BDA in MI approaches by considering the significance of contemporary ML strategies like DL. Moreover, AI-based BDA using ML is shown in Fig. 3. The new upsets on the Internet of Things (IoT) and BDA have opened hopeful conceivable outcomes in BM and healthcare innovations. Amit et al. [16] proposed the headway of telerobotic medical procedure, and the significance of the Internet of Robotic Things (IoRT) was examined quickly. In conclusion, the last extension and

Segmented medical images

Hand crafted feature extraction

Predictive modeling

Prediction

Fig. 3 AI-based BDA in medical imaging using ML

Feature preprocessing and normalization

Feature selection

Artificial Intelligence-Based Data Analytics Techniques in Medical …

351

advancement of these innovations in the healthcare framework and BM exploration, beginning with customized drug planning for designating drug conveyance and past it, are portrayed. Amit et al. [17] designed an ML framework that distinguishes the best mix of the picture highlights to classify the picture or register some measurement for the given picture area. Few strategies are utilized, each with various qualities and shortcomings. The open-source variants of most of these ML techniques mark them humble to attempt and apply to pictures. A few measurements for estimating the presence of a system exist; in any case, one should know about the conceivably related traps which bring misdirecting measurements. Maria et al. [18] developed a sense of the essentials of ML as well as subfields of administered learning, unaided learning, support learning, and DL. The outline of momentum ML applications in rheumatology essentially regulated learning strategies for e-determination, illness recognition, and MI investigation. The gigantic outcome of ML systems at picture acknowledgment undertakings lately converges with a period of decisively expanded utilization of EHRs and demonstrative imaging. Justin et al. [19] developed the ML system for MI investigation, focusing on CNN, and underscoring clinical parts of the field. Tchito et al. [20] developed the characterization work process because of the optimal algorithm, ML, and DL that are drawn from the writing. Thus, the designed model extract venture during the grouping system is introduced and modified by the remaining strides of the proposed work process. Alhasan et al. [21] proposed the patterns of present-day healthcare innovations and AI during the COVID-19 emergency to characterize the ideas and clinical job of AI from the relief of COVID-19, research, and connect the adequacy of AI-empowered innovation in MI during COVID-19 as well as decide benefits, disadvantages, and difficulties of AI during COVID-19 pandemic. Thus, the designed model applied a precise survey approach involving a thought research convention. Computerized intercessions upgraded the reactions to COVID-19, amplified the job of MI during the COVID-19 emergency, and also presented healthcare professionals with the chance of contactless consideration. Moreover, Gandomi et al. [22] imagination in the laid designed model dwells in the strategies, surveys, and exploratory procedures which present an exceptional incentive for helpful applications. In any case, there is another clarification: functional applications need analysts, researchers, and architects for identifying answers for the BD issues predictably by flow advancements and respond to the requests from the distant future. To that end, the searchers should use and foster AI and ML strategies for a particular need. Some merits and demerits of AI-based BDA using ML techniques are discussed in Table 2.

352

P. Balaji et al.

Table 2 Summary of ML techniques in medical imaging Refs Author & year Method

Used dataset

[11]

Banerjee, et al. MI,AI, IoT in (2020) terahertz healthcare

Biomedical • Detect patient image dataset abnormality • Identify potential risk

• No creativity • Implementation cost is high

[12]

Kouanou, et al. optimal (2018) workflow for BM image analysis

Biomedical • Affected image dataset region detection • Improve efficiency

• Privacy concerns • Bias and complexity

[13]

Hameed, et al. (2021)

AL with ML X-ray and CT • High • Make human and data science images availability lazy • Minimize • Less quality of developments execution time data

[14]

Shaikh, et al. (2021)

AI based Patient data clinical decision support schemes by advanced MI and radiomics

• Detect • Need human cardiovascular surveillance diseases • Security risk • Enhance tissue based detection

[15]

Tahmassebi, et al. (2019)

BDA in MI using ML

Biomedical data

• Better decision • Less availability making • High error rate • High accuracy • Enhance sensitivity

[16]

Amit, et al. (2021)

IoT and BDA for BM and health care technologies

Patient healthcare data

• Improve network • Innovative • Cost optimization • Less security risk

• Lack of commitment and patience • High cost

[17]

Erickson, et al. ML for MI (2017)

BM data

• Error reduction • Informed patient care • Offer contextual relevance

• Problem to detect disease • Require more time

[18]

Maria, et al. (2020)

Applied ML and Medical AI in dataset rheumatology

Merits

Demerits

• Increased • Error rate is high productivity • Poor diagnosis • Enhance performance classification • Less execution time (continued)

Artificial Intelligence-Based Data Analytics Techniques in Medical …

353

Table 2 (continued) Refs Author & year Method

Used dataset

Merits

Demerits

• Low cost • High sensitivity • High robustness • Less error

• Overlook social variable • Possibility of inaccuracies

BM image data

• Enhance efficiency • Abnormality detection • Identify injury and fraction

• High Execution time • Low sensitivity • Misclassification problems

COVID-19 CT image

• High data quality • High availability

• Overfitting • High noise and error

[19]

Justin, et al. (2017)

[20]

Tchito ML based BM Tchapga, et al. image (2021) classification in a BD architecture

[21]

Alhasan, et al. (2021)

[22]

Gandomi et al. ML Patient (2022) technologies for medical data BDA

ML applications EHRs in MI analysis

Digital imaging and AI applications -COVID-19

• High accuracy • Poor & efficiency segmentation • Better decision and detection making

2.3 AI-Based BDA in Medical Imaging Using Deep Learning Techniques AI systems, especially DL, have shown noteworthy advancement in picture acknowledgment tasks. As well, Hosny et al. [23] proposed Convolutional Neural Network (CNN) to a variety of Autoencoder that observed heap applications in the medical picture investigation field, pushing it onward at a fast speed. Furthermore, AI techniques succeeded in naturally perceiving complex examples in data imaging and giving quantitative, as opposed to subjective, evaluations of radiographic qualities. Utilitarian execution of such philosophy needs data handling values beginning from data assortment and the board and finishing in the data investigation strategies. Moreover, Kortesniemi, et al. [24] proposed data quality switch and approval which are essential for DL application to give solid further investigation, grouping, and translation, probabilistic and prescient demonstrating from the tremendous various big data. The challenges contain the reasonable data analytics which connect with both flat and longitudinal examination perspectives. Moreover, Kim et al. [25] proposed AI, especially with DL models, is supposed to further processed the develop work. Furthermore, AI is exceptionally valuable for performing three primary clinical assignments in breast ultrasonography such as localization and segmentation, classification, and prediction. Thus, the designed model gives an ongoing outline of AI applications in breast ultrasonography, through a conversation of strategic contemplations and the improvement of AI replicas. Chan et al. [26] proposed the capability

354

P. Balaji et al.

of relating DL-based MI investigation to Computer-Aided Diagnosis (CAD), subsequently giving choice help to clinicians and working on the precision and proficiency of different indicative. Notwithstanding the positive thinking in this new time of ML, the turn of events and execution of CAD or AI instruments in clinical training looks numerous difficulties. The designed model talks about a portion of these problems and endeavors expected for fostering strong DL-based CAD apparatuses and coordinating these devices into the experimental work process, subsequently progressing to the objective of giving solid clever guides to patient care. Propelled by the new outcome of applying DL strategies to health picture handling, Guo et al. [27] designed an algorithmic design for administered multimodal picture examination with a cross-methodology combination at the classifier level, component learning level, and dynamic level. Finally, the developed model gives observational direction to the plan and use of multimodal picture examination. The process of Dl in MI is shown in Fig. 4. Wang et al. [28] proposed an outline of current and likely applications for AI strategies in pathology picture investigation, by an accentuation on a cellular breakdown in the lungs. Finally, illustrated the difficulties and cellular breakdown in the lungs pathology picture and talked about the new DL advancements that might affect computerized pathology in cellular breakdown in the lungs, and summed up the current uses of the DL system in a cellular breakdown in the lungs determination and forecast. Tang et al. [29] designed AI that is quickly moving from an exploratory stage to the execution gradually eases in numerous fields, with medication. Moreover, the blend of further developed accessibility of huge datasets, expanding registering power, and developments in learning systems has made significant execution forward leaps in the improvement of AI applications.

Medical data

Data augmentation

Data preprocessing

Deep learning

Training

Testing

Feature extraction CNN

Prediction

Fig. 4 Deep learning model for medical imaging

Artificial Intelligence-Based Data Analytics Techniques in Medical …

355

Panagiotis et al. [30] developed the rudiments of radionics including extraction, DNNs in picture investigation, and significant interpretability techniques which assist the empowering logical AI. Human body disease detection using various modalities has reformed the field of medication throughout recent many years and keeps on developing quick speed. Even though outcomes and clinical reception of techniques connected with the computational and measurable investigation of the pictures that fallen behind the improvement of picture securing approaches. Mainly, designers and PC researchers observed the data contained in MI, when seen over picture-based vector spaces and large very meager. Duncan et al. [31] proposed groundbreaking techniques in numerous ways also very inescapable in the designed model. Consequently, data-driven methods such as arrangement through picture reproduction and picture investigation using DL are picking up speed in their turn of events. With the enormous deluge of multimodality data, the job of data analytics in the field of health informatics becomes filled quickly. Ravì et al. [32] developed the DL technique as a procedure by the establishment of ANN, which that arisen of late as an amazing asset for ML, promising to reshape the fate of AI. The quick enhancements in computational power, quick data stockpiling, and parallelization were added for fast take-up of the innovation notwithstanding its prescient power and capacity for creating. Moreover, the main aim of the designed model is centered on key uses of DL in the fields such as translational bioinformatics, MI, unavoidable detecting, medical informatics, and general health. The overall comparisons of the DL techniques in medical imaging are detailed in Table 3. Michael et al. [33] designed six suggestions called as 6Rs to further the development of AI projects in the BM space, particularly clinical health care is used to work through correspondence among AI researchers and medical specialists: AI utilizes PCs to emulate mental elements of the human mind, permitting deductions to be produced using commonly enormous datasets. Customary ML (SVM) and DL (CNN) are two normally utilized AI methods both exterior and inside the field of medication. Those methods were utilized for assessing MI for the motivations behind mechanized recognition and division, arrangement assignments, and picture remaking. Dillman et al. [34] proposed feature late writing depicting current and arising AI strategies applied to stomach imaging using MRI, CT, and recommend expected future utilizations of AI by the pediatric population. Choi et al. [35] developed better attributable to the upgraded BD handling ability. Improvement in registering power through equal handling units, and new system for deep NN, which are turning out to find lasting success and drawing in the interest of numerous spaces, including PC vision, discourse acknowledgment, and normal language handling. Late investigations in this innovation foreshadow healthy for medical and healthcare applications, particularly in endoscopic imaging. Finally, the designed model provides points of view on the set of experiences, advancements, applications, and difficulties of DL innovation. Many techniques are developed for enhancing the performance of MI with AI and BDA using ML and DL but still have the problems of low efficiency, less robustness, diagnosis problem, high cost, false

356

P. Balaji et al.

Table 3 Overall comparison of DL methods in medical imaging Sl. No.

Authors

Advantage

Disadvantages

1

Hosny et al. [23] AI in radiology

Technique

• Affected region detection • Improve efficiency • More useful

• Privacy concerns • Bias and complexity

2

Mika et al. [24]

BD and DL in MI and medical physics profession

• High availability • Less error • Minimize execution time

• Make human lazy • Less quality of data

3

Kim et al. [25]

AI in breast ultrasonography

• Detect cardiovascular diseases • Enhance tissue based detection

• Need human surveillance • Security risk

4

Chan et al. [26]

DL in MI analysis

• Better decision • Less availability making • High error rate • High accuracy • Enhance sensitivity

5

Guo et al. [27]

DL based image segmentation on multimodal MI

• Perform better improvement in diagnosis • High accuracy

• Error rate is high

6

Wang et al. [28]

AI in lung cancer pathology image investigation

• Detect the health status effectively • Abnormality detection • Identify injury and fraction

• The delay rate is high • High error rate

7

Tang et al. [29]

Canadian Association of AI in radiology

• Identify wide connections and kernel sizes • Enhancing the performance of prediction

• TPR is low because of error

8

Papadimitroulas et al. [30]

DL in oncology radiomics and data harmonisation

• Identify reshaped image • High accuracy • High robustness

• High cost • High execution time • Overfitting problem

9

Duncan et al. [31]

BM imaging and analysis in BD and DL

• Exploring certain kinds of extracted features in the MI • Attaining better prediction

• Data complexity • Less reliable • Less sensitivity

10

Van Hartskamp et al. [33]

AI in clinical health • 92.7% in ROC care applications • Easy access • Low cost

• Lack of security • High attack rate (continued)

Artificial Intelligence-Based Data Analytics Techniques in Medical …

357

Table 3 (continued) Sl. No.

Authors

Technique

11

Dillman et al. [34]

AI applications for • Enhance pediatric abdominal segmentation imaging results • High complexity

Advantage

Disadvantages

12

Choi et al. [35]

CNN technology in • Enhance efficiency • High cost and vast endoscopic imaging • Less execution data time • Reduce error

• Low robustness • misclassification

detection, less detection accuracy, and error. Some of the common issues in existing techniques are high execution time, low quality of data, problems in diagnosis, and high cost.

References 1. Balthazar P et al (2018) Protecting your patients’ interests in the era of big data, artificial intelligence, and predictive analytics. J American College Radiol 15(3):580–586 2. El Naqa I et al (2020) Artificial intelligence: reshaping the practice of radiological sciences in the 21st century. British J Radiol 93(1106):20190855 3. Shaban-Nejad A, Michalowski M, Buckeridge DL (2018) Health intelligence: how artificial intelligence transforms population and personalized health. NPJ Digital Med 1(1):1–2 4. Richardson ML et al (2021) Noninterpretive uses of artificial intelligence in radiology. Acad Radiol 28(9):1225–1235 5. Alotaibi SR (2020) Applications of artificial intelligence and big data analytics in m-health: a healthcare system perspective. J Healthcare Eng 2020 6. Frangi AF, Tsaftaris SA, Prince JL (2018) Simulation and synthesis in medical imaging. IEEE Trans Med Imaging 37(3):673–679 7. Sharma N, Aggarwal LM (2010) Automated medical image segmentation techniques. J Med Phys/Assoc Med Phys India 35(1):3 8. Lin A et al (2020) Artificial intelligence: improving the efficiency of cardiovascular imaging. Expert Rev Med Devices 17(6): 565–577 9. Kim DW et al (2019) Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol 20(3): 405–410. 10. Letourneau-Guillon L et al (2020) Artificial intelligence applications for workflow, process optimization and predictive analytics. Neuroimaging Clinics 30(4):e1–e15. 11. Banerjee A, Chakraborty C, Rathi Sr M (2020) Medical imaging, artificial intelligence, Internet of things, wearable devices in terahertz healthcare technologies. Terahertz biomedical and healthcare technologies. Elsevier, 145–165. 12. Kouanou AT et al (2018) An optimal big data workflow for biomedical image analysis. Inf Med Unlocked 11:68–74 13. Hameed BMZ et al (2021) Engineering and clinical use of artificial intelligence (AI) with machine learning and data science advancements: radiology leading the way for future. Ther Adv Urology 13:17562872211044880. 14. Shaikh F et al (2021) Artificial intelligence-based clinical decision support systems using advanced medical imaging and radiomics. Curr Prob Diagn Radiol 50(2):262–267.

358

P. Balaji et al.

15. Tahmassebi A et al (2019) Big data analytics in medical imaging using deep learning. Big data: learning, analytics, and applications. vol 10989. International society for optics and photonics 16. Amit B et al (2020) Emerging trends in IoT and big data analytics for biomedical and health care technologies. Handbook of data science approaches for biomedical engineering. Academic Press, 121–152. 17. Erickson BJ et al (2017) Machine learning for medical imaging. Radiographics 37(2):505–515 18. Hügle M et al (2020) Applied machine learning and artificial intelligence in rheumatology. Rheumatol Adv Pract 4(1): rkaa005 19. Ker J et al (2017) Deep learning applications in medical image analysis. IEEE Access 6:9375– 9389 20. Tchito TC et al (2021) Biomedical image classification in a big data architecture using machine learning algorithms. J Healthc Eng 2021 21. Alhasan M, Hasaneen M (2021) Digital imaging, technologies and artificial intelligence applications during COVID-19 pandemic. Comput Med Imaging Graph 91:101933 22. Gandomi AH, Chen F, Abualigah L (2022) Machine learning technologies for big data analytics. Electronics 11(3):421 23. Hosny A et al (2018) Artificial intelligence in radiology. Nat Rev Cancer 18(8):500–510. 24. Kortesniemi M et al (2018) The European federation of organisations for medical physics (EFOMP) white paper: big data and deep learning in medical imaging and in relation to medical physics profession. Physica Medica 56:90–93 25. Kim J et al (2021) Artificial intelligence in breast ultrasonography. Ultrasonography 40(2):183 26. Chan H-P (2020) et al (2020) Deep learning in medical image analysis. Deep Learning in Medical Image Analysis, 3–21 27. Guo Z et al (2019) Deep learning-based image segmentation on multimodal medical imaging. IEEE Trans Radiat Plasma Med Sci 3(2):162–169 28. Wang S et al (2019) Artificial intelligence in lung cancer pathology image analysis. Cancers 11(11):1673 29. Tang A et al (2018) Canadian association of radiologists white paper on artificial intelligence in radiology. Canadian Assoc Radiol J 69(2):120–135 30. Papadimitroulas P et al (2021) Artificial intelligence: deep learning in oncological radiomics and challenges of interpretability and data harmonization. Physica Medica 83:108–121. 31. Duncan JS, Insana MF, Ayache N (2019) Biomedical imaging and analysis in the age of big data and deep learning [scanning the issue]. Proc IEEE 108(1):3–10 32. Ravì D et al (2016) Deep learning for health informatics. IEEE J Biomed Health Inf 21.1:4–21 33. Van Hartskamp M et al (2019) Artificial intelligence in clinical health care applications. Interact J Med Res 8(2):e12100 34. Dillman JR et al (2021) Current and emerging artificial intelligence applications for pediatric abdominal imaging. Pediatr Radiol, pp 1–10. 35. Choi J et al (2020) Convolutional neural network technology in endoscopic imaging: artificial intelligence for endoscopy. Clin Endosc 53(2):117

Ensuring Data Protection Using Machine Learning Integrating with Blockchain Technology Princy Diwan, Brijesh Khandelwal, and Bhupesh Kumar Dewangan

Abstract In recent years, the emergence of blockchain technology (blockchain) has become a singular, most tumultuous, and trending technology. The decentralized info in blockchain underscores information security and confidentiality. Also, the agreement mechanism in it makes positive that information is secured and legit. Throughout this paper, we have got coated the analysis on combining blockchain and machine learning technologies and demonstrate that they are attending to collaborate with efficiency and effectiveness. Machine learning might even be a general language that comes with a spread of methods, machine learning, deep learning, and reinforcement learning. These ways square measure the core technology for big information analysis. As a distributed and append-only ledger system, a blockchain can be a natural tool for sharing and handling massive information from varied sources through the incorporation of good contracts. Additional expressly, blockchain will shield information security and promote information sharing. It also permits various countries to utilize distributed computing powers, for instance, IoT, for developing on-time prediction models with varied sources of knowledge. Blockchain systems can also generate large amounts of useful data from completely different sources, and the knowledge domain analysis on combining the two technologies is of nice potential and the combination of machine learning and blockchain will give extremely precise results. Keywords Blockchain · Machine learning · Smart contract · Data analytics

P. Diwan (B) · B. Khandelwal Department of Computer Science and Engineering, Amity School of Engineering, Amity University, Raipur, India e-mail: [email protected] P. Diwan · B. K. Dewangan Department of Computer Science and Engineering, School of Engineering, O. P. Jindal University, Raigarh 496109, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_27

359

360

P. Diwan et al.

1 Introduction Data security is the biggest concern nowadays, so we are proposing an architecture to protect healthcare data. As healthcare raw data also needs more privacy and security so using blockchain technology for data preservation is a good choice. Blockchain is an ever-growing one to the peer-to-peer connected unchanging blocks that are secured cryptographically. Blockchain was introduced by Satoshi Nakamoto [1] in 2008 as the Bitcoin Cryptocurrency public financial transaction registry to be used. By using a trustable device without navigating a trustworthy organization, blockchain solved the Byzantine General’s dilemma [2]. The dual-spending problem [3, 4] was also resolved in an exceedingly purely peer-to-peer distributed network with nonfinancial institutions involved. The network timestamps transactions with the SHA-256 [5], creating a record that can not be updated without replenishing the Proof of Work (a.k. mining) which requires large amounts of computing power [2, 6]. The longest chain of straightforward challenges proves both the order of the transactions that have taken place and that they came from a central source of computing power. It is becoming more difficult to hack the blockchain system if it is not overpowering the rest of the system, which is not realistic in practice, as more computer power is deployed to the blockchain every day. People consequently think that data stored in the blockchain is trustworthy and immutable. To solve the trust and data security issues, this study suggests that machine learning algorithms should employ the immutable data supplied by blockchain. [2, 6]. The blockchain that is integrated into the Ethereum network is fully functional and Turing-complete [6] linguistic communication which will be accustomed to creating so-called smart contracts. Smart contracts are automatic processing which enables users to build and run complicated structures on top of an Ethereum blockchain by encoding subjective state transfer functions. The blockchain Ethereum has opened the door to the great efforts to build the blockchain environment has witnessed. Rest of the paper organized as follows: Sect. 1.1 enlightens the background technology behind the model, Sect. 2 covers the methodology, Sects. 2.1 and 2.2 provides description about smart contract and the collaboration of machine learning and blockchain technology, Sect. 3 covers the implementation part and Sect. 4 has been distributed into two sub-Sects. 4.1 and 4.2, which covers advantages of proposed architecture over existing approaches, Sect. 5 concludes the paper.

1.1 Background Privacy conservation is incomplete until blockchains. The blockchain technology unifies all user identities, which are dispersed between user-managed ID providers. Thus, identity solutions are designed undergoing a paradigm change, in which users prepare who will be told (from the user perspective), of their sensitive personal information, rather than trusting identity providers in order to handle their personal

Ensuring Data Protection Using Machine Learning Integrating …

361

Current Hash Block

Current Hash Block

Current Hash Block

Previous Hash Block

Previous Hash Block

Previous Hash Block

Data

Data

Data

Nonce

Nonce

Nonce

Time stamp

Time stamp

Time stamp

Fig. 1 Basic structure of blockchain technology

data. Furthermore, the sensitive data is made possible selectively by the implementation in blockchains of zero-knowledge proof [7, 8]. Certainly, blockchain-based identity management systems eliminate unauthorized disclosure to third parties and have some good features including immutability, neutrality, and secure timing used to establish trust. Lu et al. [9] are given an example of how to build up trust credibility by personalized Ethereum tokens. These decentralized and spread forms, however, still address many difficulties in creating a reputation framework or input mechanism for aggregating confident relationships between individual parties including all subjects and repair providers. The systems need to be restructured in a very decentralized fashion and separately decentralized applications known as Dapp. The blockchain gives us the flexibility to build a scalable distributed system in a trust-free environment. However, it still involves the evaluation and quantification of the output of blockchain systems. With the blockchain continually warming up in the academy, some researchers suggested their study structure for blockchain systems. Gervais et al. [10, 11] checked Proof of Work-based blockchain systems with varying operating parameters such as Bitcoin, Litecoin, or Ethereum. Dinh et al. [12] proposed their assessment framework for personal blockchain systems, during which they analyze consensus, data model, implementing stage, and application layer blockchain systems [11]. The structure of Blockchain can be a protected connection to nodes in which the hash value of the encrypted hash value is connected to each block along with the previous block hash value. Each block within the blockchain also contains timestamp, difficulty, Markel root which is shown in Fig. 1.

2 Material and Methodology The proposed architecture is a combination of machine learning and blockchain technology. Here, we want to combine the security aspects of blockchain in the basic

362 Fig. 2 Flowchart for proposed architecture

P. Diwan et al.

Healthcare Data Data Mining Using Regression Trees Raw Data Set Basic Blockchain Model Deep Learning Based Training Data Machine Learning Model Advance Blockchain Model

machine learning model. We are trying to protect the trained dataset of machine learning models and the raw data which is going to be trained as well. A flowchart for a better understanding of the techniques, which are going to be implemented in the proposed architecture, is explained above in Fig. 2.

2.1 Smart Contract and Security The smart contract emits a trustworthy party that makes all the dissemination of the message, money transfers, and computing of a given contract diligently possible, without a guarantee of privacy. It is replicated transparently across the Internet and thus made available to the public in the blockchain. With interesting intelligent contracting technology, the intelligent deal is still in its infancy for many of the more exciting decentralized applications. One of our major challenges at present lies in the fact, that an intelligent contract can only contribute to measuring a flare-up. If therefore, such wise contract results are to be reported during the blockchain mining technique, which is the latest block output and may also affect the validity of future blocks, honest miners shall implement this software for verification of the exactness of the outcome. If this program is machine intensive, smart opposing nodes may either skip these test steps or fail to place the result and suggest new blocks. The reason is that honest miners are required during the blockchain mining process to implement this software, the latest block generation of results. There are major advantages for opposing nodes in achieving the prospect of proposing new blocks, as honest nodes cannot propose blocks before the intelligent agreement is complied with. The problem of the verifier was regarded as so superfluous [13]. For smart contracts, randomized calculations cannot be enabled to help another fundamental problem:

Ensuring Data Protection Using Machine Learning Integrating …

363

since randomization, even honest nodes are not matched by the same software output and the same input, and a core goal of blockchain is prevented from confirming the agreement between all honest nodes. The use of large scenarios, especially where machine learning programs are applied, is seriously hindered by smart contract limits. As, for example, the use of a stochastic gradient-declination algorithm [14] is always costly and randomly determined in machine learning how auto-execution of coded machine learning works is permitted in decentralized implementation is not enough. Instead, start-ups typically outsource their computer’s learning tasks [14].

2.2 The Collaboration of Machine Learning and Blockchain Technology Nowadays, automated learning is very common, and uses its tens of thousands of times a day without knowing it. The research, reasoning, and behavior of computers without human interference are part of machine learning. Computer applications (AI) are one. The machine learning skills are given to scan computers without special programming. Their main goal is to produce an effective algorithm that accepts machine files, predicts, and uses statistical analysis to update the outputs. Machine learning to make data-driven decisions [7] can also evaluate a significant amount of information. An extremely communicative network of blockchain-based smart applications has layer-specific controls for security issues. At the network level, some security issues, such malicious packets, and other system-layer issues, like malware, can also occur [15]. Malicious packets on the network layer are aware of placing a false consensus on the network. A firewall would be used to naively solve the current problem to ensure that packages satisfy predefined security requirements [16]. However, with unexplored trends, the attacks are gradually advanced to prevent a firewall. In order to resolve the problem, the packaged header data is analyzed in real-time using previous historical data using machine learning models [17]. This research allows new trends to be found and modified. Machine learning methods are most frequently accused of classifying malware as endpoints such as servers, cell phones, or workstations. In addition, several blockchain applications like UAV [18], trust building using SG between Data Exchangers [19] have also been implemented in a number of blockchain-based smart applications. Information protection is crucial in any intelligent application at the same time. Blockchain technology offers data protection, but it makes sure machine learning techniques predict the untrustworthy nodes of past trends. Similar to the regular blockchain setup [18], UAVs have substantially different configuration values. It involves satellite communication and separate ground stations. UAV is using blockchain technology for securely storing related data to protect vehicle integrity graph. The proposed model will address the latest research on the adoption of machine learning in the blockchain in the following pages. Machine learning models are very much in demand and businesses that have access to good machine learning

364

P. Diwan et al.

models benefit from increased performance and new capabilities. Since this form of technology is so demanded and because talent resources are small, it is sensible for machine learning models to be put on the market. Since machine learning is just software and training, interacting with physical structures is not needed, it can be normal to use blockchain for communication between users and payment with cryptocurrency.

3 Implementation The proposed architecture is designed by using two blockchain models, the first one is a basic blockchain model and the other one is an integrated blockchain, which is integrated by using a smart contract for more security and law preservation also. Integration of blockchain with machine learning machine learning capabilities is used on blockchain-based applications to make them more intelligent. The protection of the distributed ledger is also enhanced by using machine learning. Additionally, machine learning can increase the time needed for consensus by creating better routes for data sharing. We have proposed a basic architecture for the application of machine learning-based adoption in the blockchain, which is shown in Fig. 3. Here, the raw data would be collected by using various data collection methods, then the preprocessing of data will be performing in the second phase which is Data Analytics and real-time analytics then the preprocessed data would be securely stored in a basic blockchain model (It would prevent data loss), after that only required data would be going through the learning process according to the machine learning model. The resultant of the model (trained data set) would also be stored in an integrated blockchain model. Here, we have used the term ‘integrated’ because the final blockchain model would be integrated with the smart contract which is already discussed in this paper. The intelligent framework frequently gathers data from a variety of data sources, including sensors, smart devices, and IoT devices, for use in various smart applications. Data gathered by the devices is processed as part of intelligent applications. The blockchain is a core component of these intelligent applications. Machine learning can then be used to interpret and simulate the data of these applications (data mining and real-time analytics). Machine learning models can also store knowledge and datasets on the blockchain-based network. This eliminates data errors such as replication, missing data meaning, mistakes, and noise. Blockchains concentrate on knowledge and thus problems relating to machine learning models are excluded. Instead of the entire dataset, machine learning models also assisted particular chain segments. Custom models, such as fraud prevention and fraud detection, could be used for different applications.

Ensuring Data Protection Using Machine Learning Integrating … Fig. 3 Proposed blockchain-based smart adoption system architecture

365

Data Analytics and Realtime Analytics

Blockchain

Machine Learning Model

Integrated Blockchain with Smart Contract

4 Results and Outcomes In present scenarios, data security is the biggest concern; here, we have proposed an architecture for data security. In the proposed architecture, we are securing data for the machine learning model, and some smart application-based data which are working with the same machine learning model. In the proposed architecture, we are securing three types of data, the first one is the raw dataset which is going to be trained in the machine learning model, the second one is the trained dataset which is going to be used in various smart applications, and the last one is data generated by smart applications which are using same machine learning model which is shown in Fig. 4. Apart from data security, some benefits are their while using blockchain with machine learning which is fraud detection.

4.1 Advantages of Proposed Architecture Over Existing Approaches Existing approaches are focusing on preserving only trained datasets, but in our proposed architecture, we are also trying to preserve raw datasets, which will invoke less data loss. User authentication as a legitimate user for requesting or carrying out any transaction in the blockchain network is one of the fundamental benefits of

Proposed Architecture Securing Data

Raw Data Set

Trained Data Set

Fig. 4 Types of data securing using proposed architecture

Data Generated by Smart Application

366

P. Diwan et al.

blockchain with machine learning technology because blockchain offers a high level of data and transaction protection and confidence. Blockchain uses templates for the education of public machines in intelligent contracts to check that the previously accepted terms and conditions are met. Blockchain helps to enforce an incentive scheme in a trustworthy way; it, therefore, encourages users to provide information. These wide datasets help boost the efficiency of the machine learning model. The peer-to-peer connectivity-based ecosystem of blockchain off-chain versions can be streamlined in the immediate vicinity on a single computer without any costs. Users can have strong data help; these data are continuously measured, and users can receive plunders. Non-editable smart contracts can be tested by various tools with different hardware configurations, and machine learning models are not isolated from their potentials and deliver results exactly as they should be. Payments are handled on a blockchain environment in real-time with confidence.

4.2 Challenges to Machine Learning and Blockchain Every technique has their own pros and the cons so we have already discussed advantages of blockchain over machine learning, but there are some basic challenges can also occur with the combination of these technologies such as storage, and machine learning algorithms perform better with larger datasets. However, the increase of data in Blockchain platforms will degrade its performance, or we can say that blockchain technology is not compatible with large datasets. The majority of technical processes and transactions may produce a significant volume of data that required additional time for training and computation, thus increasing the overall performance of traditional machine learning models, also known as latency. Machine learning and blockchain both have problems for scalability in terms of computing costs and telecommunications costs while talking about scalability. Many machine learning algorithms need the increase in data for increased processing and communication costs. Likewise, the number of users and network nodes is increasing poorly with blockchain because transactions are being increased as well. Last but not least, problem with this architecture is vulnerabilities, but there are some problems concerning the combination of machine learning and BC to improve safety and privacy. The arising number of risks, malware, malicious code, and so on. The creation process of machine learning takes longer, and while malicious traffic can be seen, only a qualified model is conceivable. On the other hand, blockchain can ensure the immutability of data and can define their transformations. However, the problem is the data that is broken before the blockchain is reached. Apart from the above, BC is vulnerable to privacy evasion technologies, because the stored data is open to the public and is open to all readers. The use of private BC is one solution to these challenges; this would still restrict access to a large number of data needed for efficient machine learning.

Ensuring Data Protection Using Machine Learning Integrating …

367

5 Conclusion These are the most pioneering developments due to new advances in blockchains and software education. Numerous applications, including as smart cities, UAV, SG, and data trading, will be supported by the distributed directory [7]. In this study, we have discussed about the details based upon blockchain and machine learning and how they are used in smart application. This architecture can be used to design and a machine learning and blockchain-based data analysis system. We concluded that the proposed architecture is having a better approach to data security. With the help of this architecture, we can preserve the final data which is a trained dataset and the preprocessed data as well, which provides additional data security and prevent data loss. We have integrated our blockchain model by using a smart contract, which would be designed as the requirement of existing technology. Several benefits of the proposed architecture over existing models have been highlighted as outcomes, and certain fundamental research issues that may arise when machine learning is used in blockchain-based systems have also been examined and necessitated solutions.

References 1. Nakamoto S (2019) Bitcoin: a peer-to-peer electronic cash system. Manubot 2. Lamport L, Robert S, Marshall P (2019) The Byzantine generals problem. In: Concurrency: the works of Leslie Lamport, pp 203–226 3. Osipkov I, Eugene YV, Nicholas H, Yongdae K (2007) Combating double-spending using cooperative P2P systems. In: 27th international conference on distributed computing systems (ICDCS’07), pp 41–41. IEEE 4. Hoepman J (2007) Distributed double spending prevention. In: International workshop on security protocols, pp 152–165. Springer, Berlin, Heidelberg 5. Gilbert H, Helena H (2003) Security analysis of SHA-256 and sisters. In: International workshop on selected areas in cryptography, pp 175–193. Springer, Berlin, Heidelberg 6. Hodges A (2012) Alan Turing: the enigma. Random House 7. Tanwar S, Qasim B, Pruthvi P, Aparna K, Pradeep KS, Wei-Chiang H (2019) Machine learning adoption in blockchain-based smart applications: the challenges, and a way forward. IEEE Access8, 474–488. 8. Hardjono T, Alex P (2019) Verifiable anonymous identities and access control in permissioned blockchains.: arXiv preprint arXiv:1903.04584 9. Liu Y, Zheng Z, Guibing G, Xingwei W, Zhenhua T, Shuang W (2017) An identity management system based on blockchain. In: 2017 15th annual conference on privacy, security, and trust (PST), pp. 44–4409. IEEE 10. Gervais A, Ghassan OK, Karl W, Vasileios G, Hubert R, Srdjan C (2016) On the security and performance of proof of work blockchains. In: Proceedings of 2016 ACM SIGSAC conference on computer and communications security, pp 3–16 11. Lu Y, Qiang T, Guiling W (2018) On enabling machine learning tasks atop public blockchains: A crowdsourcing approach. In: 2018 IEEE international conference on data mining workshops (ICDMW), pp 81–88. IEEE 12. Dinh TT, Ji W, Gang C, Rui L, Beng CO, Kian-LT (2017) Blockbench: a framework for analyzing private blockchains. In: Proceedings of the 2017 ACM international conference on management of data, pp 1085–1100

368

P. Diwan et al.

13. Luu L, Jason T, Raghav K, Prateek S (2015) Demystifying incentives in the consensus computer. In: Proceedings of 22nd ACM SIGSAC conference on computer and communications security, pp 706–719 14. LeCun Y, Yoshua B, Geoffrey H (2015) Deep learning. Nature 521(7553): 436–444 15. Namanya AP, Andrea C, Irfan UA, Jules PD (2018) The world of malware: an overview. In: 2018 IEEE 6th international conference on future internet of things and cloud (FiCloud), pp 420–427 16. Krit S, Elbachir H (2017) Overview of firewalls: types and policies: managing windows embedded firewall programmatically. In: 2017 international conference on engineering & MIS (ICEMIS), pp 1–7 17. Betarte G, Eduardo G, Rodrigo M, Alvaro P (2018) Improving web application firewalls through anomaly detection. In: 17th IEEE international conference on machine learning and applications (ICMLA), pp 779–784. IEEE 18. Kuzmin A, Evgeny Z (2018) Blockchain-base structures for a secure and operate network of semi-autonomous unmanned aerial vehicles. In: 2018 IEEE international conference on service operations and logistics, and informatics (SOLI), pp. 32–37. IEEE 19. Pop C, Tudor C, Marcel A, Ionut A, Ioan S, Massimo B (2018) Blockchain based decentralized management of demand response programs in smart energy grids. Sensors 18(1):162

Evaluation and Language Training of Multinational Enterprises Employees by Deep Learning in Cloud Manufacturing Resources Arodh Lal Karn, Julian L. Webber, Abolfazl Mehbodniya, D. Stalin David, Balu Subramaniam, Rajasekar Rangasamy, and Sudhakar Sengan Abstract In other growth trends and sustainable development of the economy’s globalization, low manufacturing utilization rates, production resource imbalances, and impaired coordination functions are growing. To tackle the difficulties mentioned above, organizations must securely determine how to share distant and heterogeneous production resources. Previses work is a network language learning framework for learning audio language clouds. The application offers a single computer user and

A. L. Karn Department of Financial and Actuarial Mathematics, School of Mathematics and Physics, Xi’an Jiaotong-Liverpool University, Suzhou 215123, Jiangsu, China e-mail: [email protected] J. L. Webber · A. Mehbodniya Department of Electronics and Communication Engineering, Kuwait College of Science And Technology (KCST), Kuwait, Kuwait e-mail: [email protected] A. Mehbodniya e-mail: [email protected] D. Stalin David Department of Information Technology, Vel Tech Multi Tech Dr. Rangarajan Dr.Sakunthala Engineering College, Chennai, Tamil Nadu 600072, India e-mail: [email protected] B. Subramaniam Department of Computer Science and Engineering, Paavai Engineering College, Namakkal, Tamil Nadu 637018, India e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru, Karnataka 561203, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_28

369

370

A. L. Karn et al.

Internet users economic and easy oral learning and large-scale oral assessment technologies. To select Cloud Manufacturing (CM) resources to fulfill service production enterprise assessment, five first-level metrics (time, cost, availability, reliability, security) and 13 s are used. Deep Learning (DL) is utilized to calculate the index weights, the ultimate total score, each indication, and membership at all levels. Degree determines it. It quantifies Cloud Manufacturing Resource Selection (CMRS). This model shows that manufacturing RS involves DL. Finally, it’s proven feasible to pick the resources to produce using a fuzzy comprehensive assessment. Keywords Deep learning · Cloud manufacturing · Service evaluation · English training · Employees of multinational enterprises

1 Introduction Committed to the technique for training focused on English instruction in instructive professional organizations. In this technique, they must prepare and train in English depending on their advantage in securing forceful expert exercises and information on the understudy. English learning will help understand English as a training method focused on correspondence for professional practice and workers’ expert exercises. Figure 1, shown as a direct English framework, mirrors the most recent deduction in language learning. A portion of the language course must be educated in a concise language. However, it can figure out how to state a ton of moderately short timeframe and have nothing to comprehend [1]. All understudies, before that, require a couple of long periods of introduction to another dialect. It is perceived that representatives of worldwide organizations grow their aptitudes, information, and capacities the work day by day. Nonetheless, to learn outside the conventional learning framework, completely comprehending the data that has been given doesn’t ensure an appropriately being Fig. 1 Flow diagram of ET

Evaluation and Language Training of Multinational Enterprises …

371

assessed. The course of treatment is a simple exercise for understudy’s English level is low for additional work, the absence of their entire showing time, educators have their own for better outcomes in the conventional course it can attempt to accomplish the objective. The help enriches understudies jargon and casual circumstances of this sort of personal computer innovation and an assortment of up close and personal exercises, screen, and record, directs dissect the presentation, and sums up the learning information. For understudies with helpless English oral and understanding abilities, language courses of the present moment, language clubs, gatherings, visit rooms, meeting, address, a portion of the simple learning exercises, for example, is helpful for inspiration to improve their language with regards to help to learn the outcomes [2]. Subsequently, this is the wake of moving on from school, having individuals to talk with them, and growing into superior masters. However, it is disputable whether it is casual preparing just as formal preparing compelling because it speaks to what happens typically in the lives of many individuals, such as preparing, as a rule, to give achievement and positive outcomes. It is critical to improve non-formal schooling; consideration is required; all the instructive establishments where proper training happens together will remember. Committed to the technique for training focused on English instruction in proficient instructive foundations. In this technique (Fig. 2), they must prepare an unknown dialect training dependent on their advantage in procuring forceful expert exercises and information on the understudy. It is a methodology of commonsense situated assistance in an unknown dialect. English is dominated as an expert practice and correspondence media for the understudies’ expert movement to learn [3]. The rest of the paper is organized as follows: Sect. 1 is the introduction about the English learning will help understand English totally as a training method focused on correspondence for professional practice and workers’ expert exercises. Section 2 is related works about existing works. Section 3 is about English Training and solutions offered by employees of multinational enterprise companies by providing on-demand, paid-in-use, or mixed-delivery services. Section 4 is These outcomes Fig. 2 RS evaluation and ET

372

A. L. Karn et al.

demonstrate that the technique recently proposed, a specific procedure to improve the benchmark, is an aftereffect of joint preparing methodology of self-restraint and an optional view. Section 5 supports how this article’s CM stage determination and the theoretical AI-CM assets will be changed to a particular quantitative score.

2 Related Works To take care of the advancement issue of CMRS, the enhancement model depends on the article, which understands the undertaking at the most minimal expense and the briefest time is set up. CMRS is a mix of bat calculation of cell automata, another proposed analysis. Considering the problem area of the flow examination, standards, and organization association techniques for creative assets of disclosure, a presentday fabricating model of the CM industry combined with the strategy for understanding the revelation instrument and mists creation assets has been proposed. To start with, it gives the four general framework structure of individuals do qualities of order, and CMRS is climate innovation and assembling assets investigation OWL-S fundamental importance the creation of mists, and the consolidated semantic web services (SWS), it has proposed a system dependent on the revelation of the SWS and creation assets give coordinating the likeness calculation with vital individuals. In the CM climate, an assortment of assembling assets is coordinated through an organization, giving clients excellent, accessible, protected, and dependable assistance. The asset portion in the CM climate is a mix of assembling undertakings and asset suppliers. Autonomous outsider, the administration arranged, proficient, and distributed computing working with low force utilization, IoT, with the advancement of CM, the new organization delivered compared to complex assembling an enormous scope co-creation has gotten a model. Static and dynamic execution: property meaning of two ascribes about the CM framework for assembling assets and creating cloud devices. Intellectual creation gear is fundamental to clarify these two—the proposed validation technique for the device, the information assortment strategy, and an information transmission convention. The haze of creation has become an organization fabricating the brand of the new assistance model. The way to accomplish this creation technique is to exploit the assortment of potential creation assets and abilities without limit. Kind of investigation and the assembling assets’ design, the cement strategy is orchestrated to convey an idea, and its intermediary model distributed computing. To deliver an asset to move the CM, settle the tight coupling of conditions between actual creation assets and application administrations. To help virtual CM administrations construct a climate, named general assembling industry that examines the attributes of an organization of support. The various leveled virtual model of creative assets has been proposed in assembling the environment. Plan and production of cross-provincial appropriation model CM abbreviate the form fabricating cycle, considerably more to deliver a purchase, to improve the accessibility of assembling a plan limit, by the significance of the examination Yes, lessen the

Evaluation and Language Training of Multinational Enterprises …

373

expense of a big business. As of late, very much planned philosophy, to kill the semantic heterogeneity between administrations, will be applied in a bound together design in the administration model’s asset [4]. Notwithstanding, the philosophy’s adequacy put together model relies on experts’ aptitude in cosmology. Likewise, it is hard to catch the cloud’s dynamic changes after the philosophy is implanted. Distributed computing and the IoT is the popular expression after Web 2.0. Distributed computing, disseminated processing, and a vast scope model to give the application through the Internet administration is an ongoing pattern in Information Technology (IT). IoT, Internet, and future organization of the Internet alludes to physical and virtual unique worldwide organization framework coordinated “web of things” and flawlessly to the data organization [5]. E-learning is a general business schooling program around the world. English E-learning, to improve the inspiration of understudies, can lessen the learning cost. To decide if learning English online website, look for this article to work for hardware organizations interested in Taiwan and e-learning schooling and instructional classes of English, to furnish consumer loyalty with perspectives on the representative. Also, to assess a higher limit than those without the assignment of involvement and the insight of the subject. Further investigation shows that positions requiring creation abilities significantly affects the subject’s self-assessment. In traditional industrial production, new employees are trained face-to-face by experienced engineers. With the rapid development of IT, e-learning has become an essential alternative to education. That designed the E-learning system has been implemented. It will describe a framework for using the analysis to respond to active employees’ voluntary turnover rate. This is the withdrawal plan of critical employees: the loss of productivity, delay, or missed deadline [6]. Since a significant loss due to exchange costs can occur, the organization is significant in its excellent service. Actively by identifying the high top of the voluntary departure risk’s talent, the organization takes the appropriate action and time, thereby avoiding financial and knowledge loss, affecting them to leave it, with the advancement of distributed computing, the moving from conventional assembling to increasingly considering the cloud [7]. Huge highlights of mists’ creation are that the assets determined for specific assembling administrations are consistently perplexing and heterogeneous. The haze of design has become a co-creation cycle of the organization. That ultra-huge scope creation of administrations and different creation assets is shown as a tremendous number of execution measurements cloud administrations and cloud climate delivery implies [8].

3 Materials and Methods English is the most broadly utilized language in numerous nations with interoperability in the current worldwide circumstance. There is a need to examine the English, the main examination since it is the English content data separating calculation, many individuals, and spare their significant archives in English, the following language,

374

A. L. Karn et al.

even a portion of the enormous scopes are utilizing the English as a global organization. Everybody separating the content data and data arrangement realizes that a content data handling innovation, including sifting and different methods. This association between the content data and PC preparing and sifting strategies. It is conceivable to re-measure the content dependent on the substance of the data [9]. Regular service contracts govern the provision of services between service providers and companies. Examples of potential service provider companies are consultants, personal consultants, law firms, store design, and investment banks. If you are an employee of multinational enterprises and want to understand Fuzzy’s background and how it might be helpful, here is a fundamental concept of fuzzy logic and its business role. Some basic knowledge and services will help. A service provider is an employee of multinational enterprises’ solutions and services to end-users and organizations. This broad term includes all English Training (ET), and solutions employees of multinational enterprise companies offer by providing on-demand, paid-in-use, or mixed-delivery services.

3.1 Services Analysis Service analysis reports are designed to check service evaluation [10]. ET is done within a given time frame to analyze detailed information about these services’ valuation and ET. Detailed information to be analyzed includes ET, revenue categories, and service receipts. The services analysis employees of multinational enterprises companies aim to use the CM-RS to define enterprise views related to business concepts and features. This has nothing to do with the record requirements and nothing with the implementation notes. Analyze business processes to identify service candidates from new activities representing CM-RS and new service candidates. The output at this stage is the candidate for each identified service and the behavior of a new custom function for a custom data concept that allows it to interact with other services. Business analytics planning is an important starting point for service analytics. This vital communication and project commitment document ensures proper analytical techniques, project team coordination, requirements, and timeline use. The analysis plan is a CM-RS for the actual project [11].

3.2 Cloud Provider Cloud Service Providers (CSP) utilize their products as buying and self-configuring options for their customers. Customers can pay for services like those provided in the cloud on a subscription-based model with payments made on a monthly or quarterly schedule. By customizing their products, some CSPs differentiate themselves to meet the needs of vertical markets. Their cloud-based service/attempt provides industryspecific functionality and users to help you meet specific regulatory requirements.

Evaluation and Language Training of Multinational Enterprises …

375

For example, some medical cloud products, medical services, storage of personal health information, and maintenance and backup have been released [12, 13].

3.3 Load Balancer (LB) In computing, LB, to increase the overall performance, refers to assigning tasks to resource groups. LB techniques, avoiding uneven overload compute nodes, while other computing nodes are idle, can optimize each task’s response time. LB is the subject of research in the field of parallel computers. There are two main ways: the risk of loss of efficiency is usually more general and practical and does not consider the state of the different machines that require the exchange of information between different computing devices such as electrostatic algorithms, and dynamic algorithms [14, 15].

3.4 Resource Scheduler Using Fuzzy Logic Resource scheduling (RS) is often a software program that utilizes the atmosphere level of a virtual field of reachable assets to control the load on a computer. Companies recognize that they must adjust each of their allocations to their portfolio efficiently and economically. However, this method is much easier and more complicated and can be done at any time, generally with a specific amount of different sources. Therefore, resources are often misallocated to cause an imbalance. Schedules for these resources tend to be exceptions rather than norms. The following is looking for in the project resource management software. Good RS can be a company. A fuzzy exhaustive assessment strategy depends on the rule of fluffy math. It changes the subjective assessment to quantitative evaluation based on the hypothesis’s participation. It is an exhaustive assessment framework. It is reasonable to care for an assortment of unsure issues [16, 17]. Step 1: Create a coefficient set. Step 2: Assuming a set of service evaluation and ET metrics. Step 3: Establish a set of weights of various factors. Step 4: Establish a fuzzy relation matrix. Step 5: Make the overall conclusion in the service evaluation and ET index the results obtained by the comprehensive assessment model for RS all of a hierarchical fuzzy subset.

376

A. L. Karn et al.

3.5 Cloud Allocator Resource allocation (RA) and RS are some of the most critical challenges in cloud computing. This identifies that the user’s requirements are met and included in such a way, and the specific goal of the CSP is to meet the RA for each incoming user request. These goals can optimize energy consumption and cost optimization—resource information based on similar resources. The scheduler discovered usage and monitoring, request information and CSP goals, RA, or RS methods. The scheduler either ensures initial and static RA or continuously guarantees static and dynamic RA, manages optimized resources, and coordinates old requests just after the request arrives.

4 Result and Discussion Communicated in English structures, doesn’t give adequate data to the sentence breaks, for English preparing information, as a subset of three unique capacities such utilize the quantity of different prosody dependent. Prosodic component bunch of use incorporates a delegate character, depending on the energy contrast between the casing stretches. Then again, a prosodic element utilizing P2 is agent work, delay, including vowels, rhyme span and energy, and a substantial proportion of the pitch-based character the particular technique just jargon, an aftereffect of the normal presentation, P1-just, depend on the model and English learning information of is for the parallel mix of these utilitarian gatherings. These outcomes demonstrate that the technique that has been recently proposed, a specific procedure to fundamentally improve the benchmark, an aftereffect of joint preparing methodology of selfrestraint, and an optional view. Figure 3 shows the previous method in the support vector machine algorithm and proportional integral derivative algorithm based on service evaluation and ET for employees of multinational enterprises provided in CM-Resource Selection (RS). The proposed fuzzy logic algorithms used in CM-RS service evaluation and ET if easily improved in the proposed algorithm. Figure 4 shows Grizzlies have scored in every one of the view bottles, dependent on the quantity of choosing a creation haze of assets for the ten asset choice capacity administration assessment of the recurrence and ET preparing. CM has a choice Fig. 3 Comparison of CM-RS

Evaluation and Language Training of Multinational Enterprises …

377

Fig. 4 RS based on ET

of support based on administration assessment and the division of ET preparation. Any area that characterizes it; has determined the recurrence of the bear’s utilization forced by the quantity of CM asset choice in determining every asset for preparation. The estimation of the help of the choice capacity depends on the administration assessment and ET practice. Figure 5 shows that the distinctive information arrangement is free of one another. For this situation, the bar’s tallness and length speak to the deliberate worth and the cloud’s recurrence to produce a choice of assets based on a help-based assessment and English preparation. Beneath can see a case of a help assessment that a large portion of the cloud depends on the insights given information by the decision of creative assets and administration assessment and English preparation (Fig. 6). Table 1 shows the exhibition record and explains the outcome estimation of the decreased plan’s target capacity. The expansion of benefits must be the objective of the creation plan. Nonetheless, while requiring few factors to present the other target work, it will have a similar impact. It can supplant. For instance, it has an adequate request framework to burn through practically the entirety of the accessible limit; it can zero in limiting the postponed conveyance of these requests. The framework that can be deftly considered is the measure of limit. The objective is reasonably to fulfil the right with insignificant extra time costs. Since the date is a typical main impetus for deciding the creation plan, as a rule, it doesn’t have most endeavors Fig. 5 CMRS based on service

378

A. L. Karn et al.

Fig. 6 Performance of multinational enterprises

considered to decrease the expense of missing the cutoff time. If there is a sure measure of the subsequent half in specific frameworks, it is adequate if it doesn’t surpass a predetermined sum. Since it will be the reason for the get-together plant’s conclusion, for example, the creation of parts collected in such a period, in the other framework, can be considered an expense of conceded conveyance excessively high. A portion of the regular presentation pointers is as per the following. Late appearance (most excellent/normal), total cost, machine usage, set-up time and expenses (aggregate/normal), stock expenses, and the benefit commitment. Table 2 shows there are numerous kinds of timetable age procedures. Point-bypoint administration valuation should select the complete technique to upgrade the segregated target capacity. Numerous arrangement innovations apply to planning issues. Some potential methods are administration and planning rules, basic strategy, and numerical improvement. The primary technique is regularly utilized in venture arranging, where needs are the primary factor. The fundamental approach depends on recognizing producing tasks at any rate or no leeway time accessible. One meaning of ‘Basic Way’ is the longest total handling time for a progression of lessons. Numerical advancement has numerous ongoing orders, just as innovation, ensuring ideal arrangements. Table 1 ET for employees of multinational enterprises

Outcome

Releases sequence

RS

Resources sequence

Classic training

0.25

0.25

0.25

Open training

0.44

0.41

0.03

Flow

0.41

0.20

0.34

Manufacturing

0.45

0.20

0.22

Transfer

0.43

0.15

0.24

Evaluation and Language Training of Multinational Enterprises … Table 2 Services evaluation employees of multinational enterprises

Outcome

379

Evaluation rules Critical path Optimization

Classic training 0.25

0.40

0.25

Open training

0.30

0.35

0.28

Flow

0.42

0.48

0.40

Manufacturing

0.55

0.59

0.50

Transfer

0.58

0.66

0.53

5 Conclusion and Future Work Progress the occasions, an ever-increasing number of individuals can appreciate the advantages of the data. It has brought about the quick advancement of fast social change and organizational innovation. Notwithstanding the mind-boggling measure of data, short message network innovation, there is a subjective change in the data transmission by information-based innovation and PC innovation. By presenting an AI calculation and highlighting extraction calculations, the test framework’s information boundaries are improved, as the classifier’s grouping productivity and exactness are improved. Administration assessment and English are significant in giving assembling assets of the cloud. It doesn’t influence the creation organization to choose only Not different CM asset stages to mirror the adequacy and proficiency of the gracefully of CM assets. Considering the attributes of the consideration and mists creation stage, makers can choose the cloud to deliver a purchase, choose an assessment file of unwavering quality, in the improvement of assessment and CM asset assessment model of the haze of assembling assets it has, if play out an extensive review of AI, this quantitative list framework. Finally, utilize the case investigation. The show’s support is how this article’s CM stage determination and the theoretical AI-CM assets will be changed to a particular quantitative score. Furthermore, choosing the suitable CM assets per their conditions gives a successful reference to organizations’ creation.

References 1. Curry C, O’Shea JD (2012) The implementation of a story telling chatbot. Adv Intell Syst Res 1(1):45 2. Huang L, Wu C (2020) Selection approach of cloud manufacturing resource for manufacturing enterprises based on trust evaluation. In: Prognostics and health management conference, pp 309–313 3. Huang Y (2022) Mixed training mode of business English cloud classroom based on mobile app with shared SDK. In: International conference on electronics and renewable systems, pp 792–795 4. Liu Y (2020) College English Teaching Reform Driven by Big Data. In: 2nd international conference on machine learning, big data and business intelligence, pp 300–304 5. Ma L, Guo Y, Jia W, Lang H (2021) Research on the construction and application of college students’ digital health management platform. In: International conference on information technology and contemporary sports, pp 624–627

380

A. L. Karn et al.

6. Moor JH (2001) The status and future of the turing test. Mind Mach 11(1):77–93 7. Povinský M, Melicherˇcík M, Siládi V (2019) A Chatbot based on deep neural network and public cloud services with TJBot interface. In: IEEE 15th international scientific conference on informatics 8. Priyadarshni AU, Sudhakar S (2015) Cluster based certificate revocation by cluster head in mobile ad-hoc network. Int J Appl Eng Res 10(20):16014–16018 9. Shi Y, Luo L, Guang H (2019) Research on scheduling of cloud manufacturing resources based on bat algorithm and cellular automata. In: IEEE international conference on smart manufacturing, industrial & logistics engineering, pp 174–177 10. Sudhakar S, Chenthur Pandian S (2016) Hybrid cluster-based geographical routing protocol to mitigate malicious nodes in mobile ad hoc network. Int J Ad Hoc Ubiquitous Comput 21(4):224–236 11. Sudhakar S, Chenthur Pandian S (2012) Secure packet encryption and key exchange system in mobile ad hoc network. J Comput Sci 8(6):908–912 12. Sudhakar S, Chenthur Pandian S (2013) A Trust and co-operative nodes with affects of malicious attacks and measure the performance degradation on geographic aided routing in mobile ad hoc network. Life Sci J 10(4s):158–163 13. Sudhakar S, Chenthur Pandian S (2015) Investigation of attribute aided data aggregation over dynamic routing in wireless sensor. J Eng Sci Technol 10(11):1465–1476 14. Sudhakar S, Chenthur Pandian S (2013) Trustworthy position-based routing to mitigate against the malicious attacks to signifies secured data packet using geographic routing protocol in MANET. WSEAS Transac Commun 12(11):584–603 15. Villanueva J, Gallardo E (2020) A pilot study on subtitling tasks and projects for intercultural awareness and queer pedagogy. XV Conferencia Latinoamericana de Tecnologias de Aprendizaje, pp 1–8 16. Wang X, Lu M, Wang Y (2020) Workload optimization and energy consumption reduction strategy of private cloud in manufacturing industry. In: IEEE 11th international conference on software engineering and service science, pp 440–444 17. Zhen Y (2021) Research and implementation of industrial production resource virtualization based on cloud manufacturing environment. In: International conference on electronic information technology and smart agriculture, pp 198–202

Development of a Cognitive Question Answering System to Learn Concepts for Placement Assistance R. Dhana Lakshmi , Abirami Murugappan, and M. Srivani

Abstract Cognitive assistants help humans and enhance their capability to solve a large range of complex tasks. A cognitive assistant has developed a pedagogical assistant. This work aims to improve learning capabilities and helps to identify learning preferences. A cognitive assistant can hold conversations with users in natural language to help the user to solve a complex problem. The proposed system has been implemented to assist as a personal agent for students to learn Python programming language. The steps are user capability level identification, construction of assertion graph, QA analyzer, question analyzer, and primary search analysis, hypothesis generation, evidence identification and evidence scorers, final evidence identification user answer validation, and resource generation. The cognitive assistant facilitates natural interactions with the students, and it applies human reasoning skills to judge the student’s ability and train them further. Normal conversational is employed in question-answering systems. This increases users’ satisfaction and easily engages them with the system. A social dialogue and question-answering system has achieved significantly higher learning gains than a non-interactive online course. The result of the system is evaluated using the confidence-weighted score and expert judgments. Keywords Cognitive assistants · Evidence extraction · Human reasoning skills · Question answering systems

R. D. Lakshmi (B) · A. Murugappan · M. Srivani Department of Information Science and Technology, College of Engineering Gunidy, Anna University, Chennai, India e-mail: [email protected] A. Murugappan e-mail: [email protected] M. Srivani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_29

381

382

R. D. Lakshmi et al.

1 Introduction The cognitive technologies aim at providing new kinds of computation power to process and provide insight from huge amounts of data. A feature of cognitive systems is to be learning from arriving data and from their communications with humans. A conversational question-answering system is simply a system that does best attempt to maintain a discussion, to personal assistants that understand user’s requests and perform tasks on their behalf. Understanding natural language involves grammar and semantics to recognize important words from the input [1]. The challenges are to reduce the maintenance cost of the system services at a time as extending to hold the huge knowledge bases. Another challenge is a social dialogue in the system. Social conversations [2] have an important role to launch the social interaction with users in organizing to help humans and computers collaboration. The contribution of the system consists in proposing a modular architecture and allows adapting conversational agents, by combining question answering system, and social dialogue capabilities independently. Watson [3] solutions provide a combination of the following key characteristics: • Understand questions through natural language. • Generate and evaluate hypotheses, answers, and supporting evidence. The rest of the paper is standardized as follows in Sect. 2 describes the work concerning the cognitive question-answering system. Section 3 explains the system architecture, and Sect. 4 describes the method theology of the cognitive questionanswering system. Section 5 presents the discussion and results, and Sect. 6 concludes the work by suggesting possible future works.

2 Background and Related Work Le and Wartschinski [4], developed a pedagogical agent which aims to improve the reasoning abilities and decision-making capability of the users. LIZA cognitive assistant could test persons, improve their reasoning skills, and show significant higher learning gains. The advantage of the system is that it supports humans and enhances their capabilities for complex tasks. The limitation is that long-standing effects could not be examined by reassessing the learner’s performance after weeks, months, or even years. Miguel Coronado et al. [5] implemented a modular cognitive agent for question answering and social dialogues improvement for a particular domain. The advantage of the system is to increase users’ satisfaction and makes them easily engage with the system. The limitation is that question-answering systems and a personal assistant do not generate a social dialogue. Lally et al. [6] proposed a question-answering system based on Watson. The system uses scenario-based questions, and the answer with proper evidence is

Development of a Cognitive Question Answering System to Learn …

383

contained in the evidence or confidences source. The main challenge is a question– answer system not only returns the correct answer but should also give the correct explanation to the user. Zheng et al. [7] are providing effective and convenient query techniques for endusers is an urgent and important task. Keywords are simple but have very limited expression ability. A huge challenge is how to understand the question clearly to translate the unstructured question into a structured query.

2.1 Limitation of the Existing System • • • • •

Student’s reassessment performance has not been analyzed. No natural language interaction between student and system. Correct answer is retrieved but not optimal solution for a particular question. No evidence is extracted to prove the retrieved answer is correct. Validation score for the correct answer has not been examined.

3 System Architecture The proposed system acts as a cognitive question-answering system to improve the domain knowledge of the student and also to assist the student in developing their knowledge for placement training. Figures 1 and 2, depict the system architecture of the cognitive question-answering system. The cognitive question-answering system consists of four modules, namely, question analyzer and primary search analysis, hypothesis generation, evidence identification and evidence scorers, and final confidence merging and ranking and resource generation. The input of the proposed system is the user response (answer) to the question. The first step is to calculate the user knowledge about the domain. Several questions are asked to test the user’s level of understanding of the domain. Wordnet-wup similarity algorithm [8] calculates the sentence similarity between user response and answer to the question. Based on the calculated similarity, the user’s level of understanding is assisted. The construction of an assertion graph is used to represent the concepts in the domain of Python. The QA analyzer asks several questions to the user and the answer collector stores the user’s answers to the respective questions. These questions are passed to the question analyzer. The question analyzer derives the LAT [9], which is a clue to identify the answer. The process is done by keyword extraction, relation identification, and LAT generation. LAT shows what type the answer should be. It contains a source that is related to the question asked by the proposed system. The hypothesis generation module is used to create a candidate answer or hypothesis from the derived answer source. The process is done by NER, keyword diffusion, and context detection. The evidence

384

Fig. 1 System architecture for cognitive question answering system

Fig. 2 System architecture for cognitive question answering system

R. D. Lakshmi et al.

Development of a Cognitive Question Answering System to Learn …

385

identification module derives the evidence source where evidence is extracted for each candidate’s answer. The evidence determines whether the particular candidate’s answer is the optimal answer or not. The process is done by the knowledge base. The evidence scorer module is used to calculate the score for each candidate answer’s evidence called the confidence score. The process is done by semantic relationship between evidence and LAT, sequence matching algorithm between evidence and LAT, and passage support weighting term algorithm between evidence and LAT. The final confidence score merging and ranking module [10] is used to rank the evidence score for each candidate’s answer. The process is done by the multinomial logistic regression model. The resource generator module is used to evaluate the user’s answer and an optimal answer. The process is done by the semantic-Word net similarity algorithm. If the user’s answer is wrong, this module gives justification such as study resources to the user.

4 Proposed Methodology The proposed framework consists of the following modules.

4.1 User Capability Level Identification User capability level identification module consists of knowledge analysis, answer evaluation using Wordnet-wup similarity, and answer validation. The system begins its conversation with the user through some greeting and asks the task questions (domain fundamentals) for analyzing the user’s domain knowledge. These task questions are mapped with the domain topic and with an answer (i.e., a description of one of the domain’s concepts) and their justification (for only wrongly answered). These are stored in the knowledge base. The answer evaluation is the main step, where the user’s answers and the appropriate answers for questions are evaluated by semantic similarity.

4.2 Construction of Assertion Graph Construction of assertion graph consists of sentence extraction, entity tagging, and learner reassessment. Initially, the domain concepts or Python concepts are derived from the Python books. The assertion graph is generated by using the sentence tokenization algorithm, entity tagging algorithm, node prioritization step, and dependency graph algorithm. The dynamic study plan is generated by using an assertion

386

R. D. Lakshmi et al.

graph with the user’s capability label. Questions are asked to verify whether the user has studied the dynamic study plan or not.

4.3 QA Analyzer The QA Analyzer ask punch line question to the user and collect the respective user’s answer. The QA Analyzer considers both the user capability label and assertion graph to identify the domain concepts which are unknown concepts to the user. The QA Analyzer ask punch line question corresponding to the unknown concepts to validate the user’s learning phase.

4.4 Question Analyzer and Primary Search Analysis The cognitive question-answering system asks punch line questions to the user, these questions are analyzed by the question analyzer. The question analyzer generates the LAT for each question. LAT is a clue for that question’s answer and defines what type of answer is needed for that question. Primary search generates the answer source [11] for a particular question by using a web scraping algorithm. The following Algorithm 1 describes the steps involved in Question Analyzer and Primary Search Analysis. Algorithm 1 Question Analyzer and Primary Search Analysis – Input: List of Hypothesis – Output: Inference to Each Punch Line Questions 1: Extract Lexical Answer Type which has high scores 2: for For triple in annotate text do 3: continue 4: print the triple (Subject, Object, and Relation) 5: end for

4.5 Hypothesis Generation The process deals with generating the candidate answer (hypothesis) from generated answer source. The candidate’s answer is numerous possible answers consisting of both correct and incorrect answers. The candidate answer is generated by using a context detection algorithm which detects the background information of the answer source. The following Algorithm 2 describes the steps involved in hypothesis generation.

Development of a Cognitive Question Answering System to Learn …

387

Algorithm 2 Hypothesis Generation – Input: Answer Source – Output: Candidate Answer 1: Use TextBlob, for Answer Source’s easy access & Set of 10 patterns, using NER 2: for each line with POS tags in bucket & each pattern in patterns do 3: Extract only the noun tagged (NN) words in line using keyword diffusion 4: Noun tagged words as Candidate Answer and store it in ans using context detection 5: end for

4.6 Evidence Identification and Evidence Scorer The evidence identification module is used to derive evidence sources from the corpus, where the relevant evidence is extracted for each hypothesis. This evidence is necessary to identify whether the hypothesis is an optimal answer or not. The evidence scorers module is used to calculate the score or confidence score for evidence. The following Algorithm 3 describes the steps involved in Evidence Identification and Evidence Scorer.

4.7 Final Evidence Identification and User Answer Validation and Resource Generator The input of module is evidence score and output is user’s knowledge level. The process deals with identifying the proper evidence by their score. An optimal candidate answer is generated by considering their maximum evidence score. The system asks the particular question to the user, the user’s response and the final optimal answer are given to the semantic similarity algorithm for validation. The validation process gives whether the user acquires enough knowledge on the domain or not. And also it provides the study resource as justification for the question which is wrongly answered by the user.

5 Discussion and Results The implementation and result of the cognitive question answering system’s input as the user’s answer and output as the user’s capability level and provide the optimal answer to the question and also inference to the question. The user capability level identification consists of a greeting, some small task questions, the user’s answer, and the similarity between the user answer and the actual answer, providing a similarity ratio.

388

R. D. Lakshmi et al.

Algorithm 3 Evidence Identification and Evidence Scorer – Input: Candidate Answer – Output: Evidence Score (Confidence Score) 1: Use PyPDF2, extract evidence source and extract all hypothesis 2: Use smith waterman sequence matching algorithm (evidence, hypothesis), 3: for i in length (a) + 1 & j in length (b) + 1 do 4: match store H[i - 1, j - 1] + ascore if a[i - 1] = = b[j - 1] else 0 5: delete store H[1:i, j].max() - gcost if i ? 1 else 0 & same as insert 6: end for 7: Use semantic-Wordnet similarity algorithm (evidence, hypothesis) 8: for words in word tokenize of statement do 9: if words not in stop words then sentence is append the words 10: end for 11: for for sense in product of Wordnetsynset of words do 12: Calculate Wordnetwup similarity between senses 13: end for 14: Use text to vector cosine similarity (evidence, hypothesis) 15: numerator has sum of vec[x] multiple with vec2[x] for x in intersection 16: denominator has math.square root of sum. Store the sim in total 17: if total is greater than threshold, Print evidence is similar 18: if total is greater and equal to threshold, Print evidence is somewhat similar 19: Otherwise Print evidence is not similar

The question analyzer Fig. 3 consists of small task questions, their answer provided by the user and similarity ratio high than threshold value user moves to next topic, otherwise not to next topic. The construction of the assertion graph consists of the list of concepts or topics in Python string which are the topic that is not correctly answered by the user. The user has to study the topics and the system will check whether the user has studied or not. The assertion graph Fig. 4 shows the assertion graph for python string concepts. The question analyzer Fig. 5 consists of task questions asked to the user after the self-learning process. The task question is processed in the question analyzer module and provides the LAT. The primary search Fig. 6 consists of an answer source that is generated by the web scraping algorithm. The web scraping algorithm extracts only the relevant context with respect to the LAT. The hypothesis generation consists of numerous candidate answers. The evidence identification consists of evidence that is extracted from the corpus set. The extracted

Fig. 3 User capability level

Development of a Cognitive Question Answering System to Learn …

Fig. 4 Assertion graph-python string

Fig. 5 Question analyzer Fig. 6 Primary search

389

390

R. D. Lakshmi et al.

Fig. 7 Evidence simplification

Fig. 8 Evidence source

evidence consists of a lot of information about the hypothesis. But only the specific information of the hypothesis becomes the perfect evidence for that hypothesis. The evidence simplification Fig. 7 consists of a summary of the evidence identification. The evidence is important to identify whether the candidate’s answer is the optimal answer or not. Based on the evidence simplification, the evidence for each hypothesis is analyzed and reconstructed as the subject-verb-object triplet’s graph. The scoring algorithms are the Smith-Waterman sequence matching algorithm, semantic relationship identification algorithm, and text to vector cosine similarity algorithm. The evidence score consists of scoring algorithms to validate the evidence with the respective LAT type. Figure 8 consists of evidence for a particular hypothesis and validation for evidence by the scoring algorithm. The final evidence and user answer validation consists of respective evidence with an optimal answer to a question by considering the maximum evidence score. And also the optimal answer is compared with the user’s answer to validate the user’s answer.

5.1 Performance Evaluation To improve the result of the system, the performance evaluation is done by the confidence-weighted score. It is a metric that evaluates the accuracy of the system and its confidence in producing the top answer to that particular question. Figure 9 shows the graphical representation of the confidence-weighted score. For each user, the confidences are calculated by considering their answer and calculating their similarity ratio. And rank the list of hypothesis.

Development of a Cognitive Question Answering System to Learn …

391

Fig. 9 Confidence weighted score

To improve the system performance, expert judgment is also evaluated using the logistic regression algorithm. Expert judgment is the domain expert’s suggestion about the optimal answer. The domain experts are used to cross verify the optimal answer and show their suggestion in a form of percentages. The domain experts are experts in machine learning, text mining, and artificial intelligent. Figure 10 shows the accuracy of the system along with the classification report. The future scope is that system accuracy is needed to improve based on this judgment. Fig. 10 Expert judgment

392

R. D. Lakshmi et al.

6 Conclusion and Future Work The proposed framework deals with the development of a cognitive questionanswering system to learn concepts for placement assistance to generate the dynamic study plan and also provide the proper answers to the questions which are wrongly answered by the user, with the appropriate inference for each answer to the questions. This will enable the system to understand the user’s domain knowledge by the number of questions which are the basics questions in the domain, which the user has to develop their knowledge on. The questions asked in this system are only the factoid question. In the future works, the level of user understanding on the domain level to be improved by improving the questioning manner as programmatic questions. And also the hypothesis generation is to be improved by adding more answer sources.

References 1. Ansari A, Maknojia M, Shaikh A (2016) Intelligent question answering system based on artificial neural network. In: the proceedings of 2016 IEEE international conference on engineering and technology (ICETECH) 2. Raimo B et al (2017) Performance of natural language classifiers in a question answering system. In: The proceedings of IBM journal of research and development, pp 14–1 3. Ying C, EleneeArgentinis JD, Weber G (2016) IBM Watson: how cognitive computing can be applied to big data challenges in life sciences research. In: The proceedings of clinical therapeutics 38(4), pp 688–701 4. Le NT, Wartschinski L (2018) A cognitive assistant for improving human reasoning skills. In: The journal of human-computer studies vol 117, 45–54 5. Coronado M, Iglesias CA, Carrera A, Mardomingo A (2018) A cognitive assistant for learning java featuring social dialogue. In: The proceedings of human-computer studies vol 117, pp 55–67 6. Lally A, Bagchi S, Barborak MA, Buchanan DW, Chu-Carroll J, Ferrucci D APatwardhan S (2017) WatsonPaths: scenario-based question answering and inference over unstructured information. In: The proceedings of AI magazine 38, 59–76 7. Zheng W, Cheng H, Yu JX, Zou L, Zhao K (2019) Interactivenatural language question answering over knowledge graphs. In: The proceedings of information sciences 481, 141–159 8. Dhana Lakshmi R, Abirami S, Srivani M (2021) Development of a cognitiveassistant to learn concepts for placement assistance. In: The proceedings of the first international conference on advanced scientific innovation in science, engineering and technology, ICASISET 9. Lally A et al (2012) Question analysis: How Watson reads a clue. In: The proceedingsof IBM journal of research and development, pp 2–1 10. Ding S et al (2016) Using conditional random fields to extract context answersof questions from online forums. In: the proceedings of ACL-08: HLT, 2008. Kollia, Ilianna, and Georgios Siolas, “Using the IBM Watson cognitive system in educational contexts”, in the proceedings of 2016 IEEE Symposium Series on Computational Intelligence (SSCI) 11. Ambar D (2016) A novel extension for automatic keyword extraction the proceeding of International Journal of Advanced Research in Computer Science and Software Engineering

Cervical Cancer Prediction Using Optimized Meta-Learning P. Dhivya, M. Karthiga, A. Indirani, and T. Nagamani

Abstract Cervical cancer is the fourth most common cancer among women worldwide. A meta-learning model is used to select the best algorithm to predict the cervical cancer from cervigram pictures. Initially, a comparative study on state-ofthe-art approaches is observed. The performance of logistic regression (LR), gradient boosting classifier, random forest (RF), neural network (NN), and voting classifier is observed for automatic cervix identification and cervical tumor classification. In this work, an optimized meta-learning (OML) algorithm selects the optimal model for the supplied dataset for classification. The proposed OML technique is used to perform the hyperparameter tuning on the model and predict the best one with highest accuracy (99.39%) and reduced elapsed time (1.2 s). Keywords Cervical cancer · Interactive dichotomize · Logistic regression · Neural network · Boosting classifier

1 Introduction Cervical cancer is death-causing disease among girls in third-world nations, compared to other genital cancers. According to estimates, about 57,000 new instances of this cancer are diagnosed each year, with 80% of them occurring in under developed nations. Furthermore, this malignancy is responsible for 77% of female mortality [1]. Cervical cancer is reported to be less common in Iran than in P. Dhivya (B) · M. Karthiga · A. Indirani Bannari Amman Institute of Technology, Sathyamangalam, Erode, India e-mail: [email protected] M. Karthiga e-mail: [email protected] A. Indirani e-mail: [email protected] T. Nagamani Kongu Engineering College, Erode, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_30

393

394

P. Dhivya et al.

other nations. Each year, around 11,000 new occurrences of invasive cervical cancer are detected. Cervical cancer kills roughly 4000 women in the United States and 300,000 women globally each year, despite the fact that it is the most preventable type of cancer [2]. Cervical cancer mortality rates in the United States fell by 74% from 1955 to 1992 as a result of greater screening and early identification with the Pap test. The average age at the time of diagnosis is 48 years old. Between the ages of 20 and 40, about 15% of women acquire cervical cancer [3, 4]. Hand-crafted traits could be used to classify cervical cancer, according to Bhargava et al. Bhargava et al. used the proposed features to train three different classifiers, including support vector machine model in their classification solution. Only 66 cervical pictures were used to train these classifiers [5]. A growing number of scholars have started to apply modern skills to medical research, such as cancer research, pharmaceutical development, computer-assisted diagnostics, and image processing, as a result of technological advancements. Vamathevan et al. recently used machine learning for target verification and biomarker identification among other things. Most of the researchers use the deep learning techniques in the field of medical image processing. Furthermore, machine learning-based disease prediction has a promising future in medicine [6]. Cervical cancer is generally avoidable, early identification is the most effective method for lowering the disease’s global burden. However, due to a lack of information, lack of access to medical centers, and the high cost of surgeries, vulnerable patient groups in underdeveloped countries cannot afford to undergo frequent tests. This work presents a novel ensemble approach for predicting the chance of cervical cancer [7, 8]. Cervical cancer is hard to detect and analyze due to the complexities of disease factors and the difficulty in forecasting prognosis. This paper employs a sophisticated data-driven machine learning approach [6]. The definitive diagnosis of mainstream ovarian cancer detection methods is estimated using a multi-layer perceptron model and findings revealed a close connection between both the risk variables and uterine cancer [9, 10]. Using the K-nearest-neighbors technique with other standard models can increase performance. The bulk of the generated classifier is verified on consistently fragmented pictures using available techniques. The bulk of current algorithms attain an efficiency of around 94.8% on an unstructured Pap-smear set of data segregated using digital photography technology [11, 12]. The method solves the issue of categorization by combining strategies and applying supplementary trees classifiers to reduce the amount of parameters. The percentage of data used for training is rising, and some algorithms have reached 100% accuracy, precision, recall, and F1-score [13, 14]. Although certain techniques, such as logistic regression with L1 regularization, achieve 100% accuracy, they are excessively time consuming in comparison to others that achieve 99% accuracy with less CPU time. The cost-effectiveness of CPU time is also examined [15, 16].

Cervical Cancer Prediction Using Optimized Meta-Learning

395

2 Proposed Work In the proposed work, initially, the dataset is collected from Kaggle repository with 35 features and 860 instances [12]. The dataset contains the null values, and it is replaced with the mean value. After that, the performance of state-of-the-art methods are considered and its performance observed. In the proposed work, feature selection is done by applying the optimized meta learning (OML) [17]. The working flow of the entire project is given in Fig. 1, and the steps involved in the proposed methodology is given in phase 1 and 2. Phase 1: • Initialize the dataset with 35 features. • The relationship between the characteristics can be discovered using Pearson correlation [8].

Collect Cervical cancer Dataset

Apply Data Preprocessing

Replace null values with mean

Remove the features with same correlation

Train the model

Gradient Boosting

Voting Classifier

Deep Learning Model

Grid Search

Test the model

Tumor Classififcation Fig. 1 Working flow the model

396

P. Dhivya et al.

Table 1 Comparative analysis of various model S. No

Name of the model

Training accuracy (%)

Testing accuracy (%)

Response time (in seconds) 2.8

1

Logistic regression

94.16

95.34

2

Grid search CV

96.64

97.09

5.1

3

Random forest

97.66

95.93

3.9

4

Gradient boosting classifier

95.27

94.23

4.2

5

Voting classifier

95.38

92.31

4.4

6

Optimized meta-learning

99.39

99.87

1.2

• Sort the attributes based on correlation and rank it [8]. • Identify the attributes with the same correlation and with minimal difference [8]. • Remove any one of the features based on the relationship [8]. Phase 2: • Create a subset of features from the dataset. • Apply the traditional classifier techniques such as gird search, logistic regression, random forest, voting, and gradient boosting classifier [18, 19]. • Analyze the performance based on accuracy and response time. The proposed OML technique is used to perform the hyperparameter tuning on the model and predict the best one based on the accuracy and elapsed time [20]. This Fig. 1 shows how a classification model performs across all phases of the classification process.

3 Results and Discussion The comparative analysis of the traditional model with the proposed work is given in Table 1. Traditional machine learning algorithms’ receiver Opportunistic curves are shown in Fig. 2. Figure 3 illustrates the various models’ confusion arrays. Deep learning methods are proposed with three layers namely input, output, and hidden layer. Initially, eighty epochs with the learning rate of 0.2 are considered with the activation function rectified learning unit (ReLU) and ADAM optimization function [12, 13]. Artificial neural network performance is given in Figs. 4 and 5. The

Cervical Cancer Prediction Using Optimized Meta-Learning

397

Fig. 2 ROC curve of state-of-the-art model

performance of the proposed classifier is observed based on accuracy and loss and given in Figs. 6 and 7. The figure shows the analysis of proposed optimized metalearning model with the accuracy of 99.39% during training and 99.87% during testing. From Fig. 7, the loss of the model is below 1% which is very less compared with other model. This shows the less chance of overfitting and underfitting.

4 Conclusions The performance of various state-of-the-art models was analyzed for benign and malignant classification. The proposed optimized meta-learning model (OML) is used to select the best optimization model. Hyperparameter tuning is used to select the optimized parameters inside the models [20]. Initially, the performance of models is observed without attribute selection. The proposed method is an ensemble learning technique consists of parameter tuning, meta-learning, and Pearson correlation. The proposed technique produces the accuracy of 99.39% in training data and 99.87% in testing data with a reduced response time of 1.2 s for the highest epoch in a neural network. In the future work, the proposed algorithm will be applied to various medical dataset to analyze the performance and it will be enhanced to handle the images after feature extraction.

398

P. Dhivya et al.

(a) Confusion Matrix of LR

(b) Confusion Matrix of Grid Search CV

(c) Confusion Matrix of RF

(d) Confusion Matrix of GradientBoosting

(e) Confusion Matrix of Votingclassifier Fig. 3 Confusion matrix of the various models

(f) Confusion Matrix of proposed classifier

Cervical Cancer Prediction Using Optimized Meta-Learning Fig. 4 Accuracy of ANN with 80 epochs

Fig. 5 Loss of ANN with 80 epochs

Fig. 6 Accuracy of proposed classifier

399

400

P. Dhivya et al.

Fig. 7 Loss of proposed classifier

References 1. Lu J, Song E, Ghoneim A, Alrashoud M (2020) Machine learning for assisting cervical cancer diagnosis: an ensemble approach. Future Gener Comput Syst 106:199–205 2. Alyafeai Z, Ghouti L (2020) A fully-automated deep learning pipeline for cervical cancer classification. Expert Syst Appl 1(141):112951 3. Yang W, Gou X, Xu T, Yi X, Jiang M (2019) Cervical cancer risk prediction model and analysis of risk factors based on machine learning. In: Proceedings of the 2019 11th international conference on bioinformatics and biomedical technology, pp 50–54 4. Mitra P, Mitra S, Pal SK (2000) Staging of cervical cancer with soft computing. IEEE Transac Biomed Eng 47(7):934–40 5. Bandyopadhyay H, Nasipuri M (2020) Segmentation of pap smear images for cervical cancer detection. In: 2020 IEEE Calcutta conference (CALCON), pp. 30–33. IEEE 6. William W, Ware A, Basaza-Ejiri AH, Obungoloch J (2018) A review of image analysis and machine learning techniques for automated cervical cancer screening from pap-smear images. Comput Methods Programs Biomed 164:15–22 7. Singh SK, Goyal A (2020) Performance analysis of machine learning algorithms for cervical cancer detection. Int J Healthc Inf Syst Inf (IJHISI) 15(2):1–21 8. Dhivya P, Bazilabanu A, Ponniah T (2021) Machine learning model for breast cancer data analysis using triplet feature selection algorithm. IETE J Res 9. Asadi F, Salehnasab C, Ajori L (2020) Supervised algorithms of machine learning for the prediction of cervical cancer. J Biomed Phys Eng 10(4):513 10. Vijayakumar T (2019) Neural network analysis for tumor investigation and cancer prediction. J Electron 1(02):89–98 11. Pandian AP (2019) Identification and classification of cancer cells using capsule network with pathological images. J Artif Intell 1(01):37–44 12. Lilhore UK, Poongodi M, Kaur A, Simaiya S, Algarni AD, Elmannai H, Vijayakumar V, Tunze GB, Hamdi M (2022) Hybrid model for detection of cervical cancer using causal analysis and machine learning techniques. Comput Math Methods Med 13. Ceylan Z, Pekel E (2017) Comparison of multi-label classification methods for prediagnosis of cervical cancer. Graph Models 21:22 14. Chu R, Zhang Y, Qiao X, Xie L, Chen W, Zhao Y, Xu Y, Yuan Z, Liu X, Yin A, Wang Z (2021) Risk stratification of early-stage cervical cancer with intermediate-risk factors: model development and validation based on machine learning algorithm. Oncologist 26(12):e2217– e2226

Cervical Cancer Prediction Using Optimized Meta-Learning

401

15. Vilalta R, Drissi Y (2002) A perspective view and survey of meta-learning. Artif Intell Rev 18(2):77–95 16. Hospedales T, Antoniou A, Micaelli P, Storkey A., 2020. Meta-learning in neural networks: A survey. arXiv preprint arXiv:2004.05439. 17. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, pp 1126–1135 PMLR 18. Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316 19. Falkner S, Klein A, Hutter F (2018) BOHB: Robust and efficient hyperparameter optimization at scale. In: International conference on machine learning, pp 1437–1446. PMLR 20. Smithson SC, Yang G, Gross WJ, Meyer BH (2016) Neural networks designing neural networks: multi-objective hyper-parameter optimization. In: 2016 IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–8. IEEE

An Ensemble Deep Closest Count and Density Peak Clustering Technique for Intrusion Detection System for Cloud Computing B. Sudharkar, V. B. Narsimha, and G. Narsimha

Abstract The rapid development of cloud applications, data sharing, processing, and cloud users is more encouraged to rely on remote storage in cloud-based data centers, and the amount of data created by applications is growing at an alarming rate. On the other hand, customers found it challenging to adapt to data sharing and data processing applications. Outlier detection may be used in a broad range of areas, but it is a challenging job. This article analyzes using a clustering-based K-means method with three closest cluster counts (K − 1, K, and K + 1) and a radius calculation for each cluster, as well as the use of a clustering-based K-means algorithm with three closest cluster counts (K − 1, K, and K + 1). It is described to analyze density peak clustering technique, which will enhance the clustering of intrusion assaults. Anomaly intrusion detection system, created using the original density peak clustering method, was found to provide accurate findings that could be utilized by the intrusion detection system’s data mining module. We propose a novel improvement to the DPC algorithm by modifying the calculation of the local density method based on the cutoff distance parameter rather than the cosine similarity parameter to enhance picking the peak outlier points during the second objective. The second goal is to improve clustering of high-dimensional nonlinear dynamic indivisible network traffic data while simultaneously decreasing the amount of noise by using the Gaussian kernel measure as a distance metric instead of the Euclidean distance. As a result, this article proposes a contextual deep clustering technique based on an ensemble the merits of the density peak clustering technique and three closest cluster counts called DeepDCP-CC algorithm thorough investigation of Euclidian distance measurements to find the optimal number of clusters for enhanced intrusion detection performance. After doing the study, it was discovered that this work displays around 90% detection accuracy for the entire system and B. Sudharkar (B) Department of Computer Science and Engineering, JNTUH, Hyderabad, Telangana, India e-mail: [email protected] V. B. Narsimha Department of CSE, University College of Engineering, OU, Hyderabad, Telangana, India G. Narsimha JNTUHUCES, Sultanpur, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_31

403

404

B. Sudharkar et al.

nearly 100% detection accuracy for selected clusters. As a result of this work, a novel algorithm to reduce the features of attack characteristics in order to justify the gaps in data processing frameworks is proposed, as is an algorithm to derive a strong rule engine from analyzing the attack characteristics in order to detect newer attacks with a significant reduction in processing time. Keywords Density peak clustering · Gaussian kernel measure · DeepDCP-CC · Outlier · Clustering

1 Introduction Cloud-based data centers came to sensitive client data, application owners, and service providers were up against it, and consumers had issues with trust in the overall architecture. Conventional centralized environments, where intrusion detection is restricted to network status monitoring and application characteristics analysis, were the target of the specific legacy programs. In an attempt to map standard security algorithms into the data processing space, several research initiatives were undertaken; however, the efforts were highly condemned due to the absence of an in-depth examination of security concerns affecting data processing applications. Thousands of attackers have attempted to compromise Internet-based services, and the information generated by these services in recent years, as the number of services, infrastructure, and information offered by these services has increased dramatically. In addition, the services must often make themselves visible on the client’s screen to make service functionality available to end-users. Taking advantage of these circumstances, intruders conduct attacks against Internet-based services. Many parallel research initiatives have attempted to discover intrusions by automating the detection process based on the characteristics of the attacks, which has proven to be unsuccessful. Although frameworks scan systems for potentially harmful movement, they can also produce false alarms when the system is not monitored closely. As a result, when organizations initially offer their IDS items, they must make modifications to them. It requires appropriately configuring the interruption detection frameworks to distinguish between regular traffic on the system and malicious activity on the system (Fig. 1). Despite this, the characteristics-based techniques use a critical technique known as a classification to identify assaults or intrusions. It is expected that the detection mechanism overlooks certain events because parametric or characteristics-based detections are unable to grasp the circumstances in which the values for these parameters occur at certain occurrences. As a result, the security mechanisms in place to detect and prevent intrusions by attackers are at risk of being compromised. The problem may be resolved by grouping the entire attack circumstance and looking for overlaps between assault occurrences. In contrast to preparation and capacity abilities, elite system limitation, which serves as the foundation of top-tier registration frameworks, has not risen at the same

An Ensemble Deep Closest Count and Density Peak Clustering …

405

Fig. 1 Proposed DeepDCP-CC

rate as these skills. Consequently, the computing restriction has changed from sending data to a massive supercomputer to transferring the application to numerous smaller PCs on which the data is stored. Using N’s example, the program is executed where the data is preserved in an inexactly connected and highly dispersed architecture. Chand and colleagues are among those who have made contributions to this study. A relational database management system (RDBMS), on the other hand, would give access to data as a single data processing warehouse that was built on the foundation of effective and tightly integrated frameworks. In most organizations, the structured query language (SQL) is the default method of accessing databases, as it enables relatively straightforward access to data at many levels inside organizations. A similar quantity of SQL is commonly found in low-level software developers and abnormal state business auditors, all of whom seek to learn or understand it. However, this kind of data exchange has its limits, and it cannot cope with the massive growth of static, non-evolving data. Even though they have typically been late to the game, there are now more than a hundred NoSQL approaches that have fundamental expertise in administering a wide range of multimodal data types (ranging from organized to unorganized) and the ability to comprehend specific problems. The Map-Reduce concept, built on a highly adapted design that uses low-cost hardware, governs most of these systems. It eliminates the need for a production component to store and process data, which reduces the cost of the component. In addition, the fact that transmission is significantly more expensive than capacity and processing assets means that it is far less expensive to replicate and distribute information via computer networks. This section of the research goes into further detail into the characteristics of intrusion assaults. (a) Infrastructure security: Jobs submitted to the data processing area are processed in a distributed design, ensuring they are not compromised. The tasks

406

B. Sudharkar et al.

that are submitted are separated into two categories: mappers and reducers. Mappers are responsible for mapping the data, and reducers are responsible for reducing it. Reducer workloads are sent to slave nodes, where they can be processed. In this scenario, faulty slave nodes may be responsible for invasions. The number of connection requests from slave nodes to the master node surpassing the number of reducers is the most important characteristic that may be used to identify an intrusion from infrastructure security. (b) Privacy of data: It is very critical when it comes to data analysis processes, and the confidentiality of the data is critical. If the cloud provides sensitive information to slave nodes, it may be at greater risk of violating the law. Since slave nodes are often under the authority of third-party users, maintaining data privacy is a difficult task to do. A distinction may be made between attacks on network protocols and attacks on application data since a particular network protocol can be chosen for a specific type of application data. Any breach of the network protocol categories may result in changes to the arrival of intrusions and alterations to the network protocol types. In order to minimize the probability of an attack occurring, it is also essential to verify the validation of client login credentials at the virtual machines or reduction nodes. (c) Data Integrity Management: This includes the distribution of data processing programs across a network, and it is responsible for handling massive quantities of data that must be read, processed, and updated continuously. Data integrity is routinely endangered throughout the procedure because of network intrusion assaults that often occur throughout the updating process. In order to detect the assaults, it is conceivable that examining the service request types coming from the reducer or slave nodes would be helpful. The recognition that certain service types are prohibited for specified data or information during update operations and that this may result in data integrity loss due to the restriction is commonplace in the IT industry. Any client operating on the reducer nodes that are properly built must abstain from starting any services that may compromise the integrity of the data. Two characteristics may be used to detect intrusion attacks: they are reversible and utilize improper service types. An outlier is a data outlier in statistical research that deviates significantly from the rest of the data. Identifying the outlier among many data outliers is one of the most challenging problems in data mining methods, and the concept of outlier detection was developed to help solve this difficulty. When outliers are identified, the likelihood of delivering ineffective results based on incorrect data is reduced, and the likelihood of engaging in erroneous behavior is reduced. Outlier detection is the process of removing items from the analysis that vary the most from the dataset that was provided. Outliers are data that has been incorrectly recorded during the data collection process. Depending on the circumstances, outliers may be classified as either inaccurate or accurate information. Among the most successful approaches for outlier detection are distribution-based, depth-based, clustering, and distancebased methods. These techniques may be divided into the following categories: clustering-based, density-based methods, depth-based, and distribution-based.

An Ensemble Deep Closest Count and Density Peak Clustering …

407

The proposed system makes use of two different techniques: clustering-based and distance-based approaches. The system that has been proposed is a hybrid of the two approaches. The K-means technique is employed to cluster data into three different closet cluster counts, and a distance-based approach is used to detect outliers from each cluster number in the clustering of each closet cluster total. Consequently, we follow the data outlier associated with the outlier cluster in the proposed work by using a clustering method and a distance-based function to determine its location. The data outliers in the outlier group should be on one of the clusters unless they are on the other cluster. We identify those with distance-based assessments greater than the cluster’s average radius as outliers based on their distance-based evaluation when all outliers are calculated. Afterward, we aggregate the subdivision of outliers using three different clustering methods to produce the set of outliers that we need for our research. Further, Sect. 2 explains the contemporary and parallel outcomes. After that Sect. 3 depicts the proposed density peak clustering technique and three closest cluster counts called DeepDCP-CC. A detailed analysis of the results of the proposed algorithms is shown in Sect. 4. Finally, the conclusion is shown in Sect. 5.

2 Related Work Many authors have proposed a wide range of methods for detecting outliers, each with its own set of advantages and disadvantages. Author [1] was the first to propose utilizing a distance-based method to identify outliers, which was later adopted by the scientific community. Suppose a DBM(p, dist)-outlier seems to be a location in a dataset Cntds where some percentage p of the object is located at a greater distance than the eldest from the item in question. It appears to be true. This idea has been expanded, and it is currently used in a wide variety of statistical outlier testing procedures. As suggested by the author [2] the above criteria should be widened by including an outlier score, and outliers were assigned significance ratings depending on how significant they were. According to the author, an outlier detection technique clustering is proposed by the author [3] for practical data mining. With the help of an improved K-means algorithm, the technique clusters the datasets and identifies outliers using a weighted center approach [4]. Outliers are detected using this method, which makes use of a threshold value. Using the lowest and highest values in a specific cluster, multiply the lowest and highest values by 100% to arrive at this threshold value. Methods like clustering [5], for example, are used to find groups of data. These techniques are used to improve the performance of clustering algorithms rather than to improve the detection of outliers. Although anomalies are evaluated by algorithms such as CLARINS, DBSCAN, BIRCH, and CURE, among others [6, 7], they are only taken into account to the extent that they do not interfere with the clustering process itself. Because the significance of the anomalies used is partly subjective, it is related to the clusters discovered by the algorithms used. It is more goal-oriented

408

B. Sudharkar et al.

and is not impacted by how clusters in the input dataset are split when exceptions are triggered based on separation. Anomalies had previously been studied only in terms of detection; however, the work in [8] attempts to provide deliberate learning by explaining why a discovered anomaly differs from the rest.

3 Density Peak Clustering Technique and Three Closest Cluster Counts Called Deep DCP-CC IDS in cloud computing issue: There are certain limitations to the IDS that are now in effects the cloud, some of which are mentioned below: 1. Network traffic is unevenly distributed in the real-world network environment, meaning that threats records seem to occur less often than average data than standard records. Records that occur more often in the dataset are given preferential treatment by classification techniques biased in favor of those records. Unbalanced network traffic has a substantial impact on the detection performance of most common classification techniques, so it is essential to balance network traffic. In addition to having a significant effect on the small attacker records, it will also significantly impact the accuracy of these minor assaults. 2. In-network IDS classes, the data mining model must recognize the network’s nonspherical shape. It has been shown that conventional data mining methods, such as k-means clustering, are inadequate for identifying these groupings. Consequently, finding the most suitable proactive strategy for classifying these attacks is essential to achieving success in this issue. 3. When dealing with many-dimensional characteristics, the Euclidean distance metric is the primary method of clustering used in the original DPC. When dealing with a large number of dimensional characteristics, however, the Euclidean distance metric may result in misclassifications due to the complexity of the dataset. 4. In real-world networks, the structure and operating environment are continuously changing, leading to the appearance of new risks that were not included in the training dataset. 5. Furthermore, it is for this reason why the vast majority of supervised IDS algorithms consistently perform below expectations in terms of performance. To identify outliers, we use the following procedure. To begin, separate the information into categories. The K-means technique is used in order to cluster the datasets together. We use the average of the cluster counts of K derived from three different cluster counts as a starting outlier. That is, K, K + 1, and K − 1 are all present at the same time. By calculating the distance between the radiuses of each cluster, it was possible to identify a single outlier for K. Exception outliers are data outliers that are

An Ensemble Deep Closest Count and Density Peak Clustering …

409

farther out from the cluster’s center than the radius, and they are referred to as such. After identifying anomalies on an individual basis, we pick the intersection outlier of these distinct exception sets, and the outliers we get are considered genuine outliers in the population. Listed below are the many phases of the process that our algorithm is responsible for follows. (1) K-means technique is used to divide the dataset into K clusters, and the radius of each cluster is calculated when data has been divided. (2) After that, we apply the K-means technique to divide the whole dataset into K clusters.     x→ıˆ − x→2   K (→ x , x→) = exp − where σ > 0. (1) 2 σ2 (3) Determine the distance between each outlier in a cluster with the centroid of the cluster it belongs to using the distance formula. The outlier is referred to be an outlier when it is distant from the center. The cluster exceeds the span of the cluster itself. √ − → x , x] )) (2) di, j = 2(1 − k(→ (4) After that, we repeat steps 1 and 2 with two additional cluster counts of K-1, K, and K + 1. Iterate 1, 2 steps with two more cluster numbers of K, K + 1, and K + 2. Pi =

di, j 2 exp − Cd 2 i, j∈S 

(3)

Algorithm 1, deep density peak clustering technique and three closest clustering (DeepDCP-CC algorithm) is used to remove one cluster center seed outlier (the cluster center with the fewest elements) from the existing k-centers for each initial cluster center of K − 1 by removing it from the existing k-centers using the smallest number of elements. n

Xi ∗ X j i=1 Cs = ┌ √ | n n | √ (xi )2 ∗ (x j )2 i=1

i=1

(4)

410

B. Sudharkar et al.

When the K + 1 and K − 1 meaning begin to converge, it may be of assistance in the early stages. N 1  A− S = Cs(i ) N i=1

(5)

Finding outlier outliers is as easy as finding the intersection of the subgroups of outliers found in each of the three clusterings and calculating the square root of that intersection. Algorithm 1: Deep density peak clustering technique and three closest clustering (DeepDCP-CC algorithm) 1. Initialize DeepDCP-CC -K Means (IDSk , Density, Cntds) with a random Cntds value \\ Three Closest Clustering Phase 2. For each cluster, Cj, IDSk , do step Radius—radius (Cj) 3. For each cluster outlier cpi , do 4. If Distance (cpi , Cj) Radius, then prune cpi , 5. Else Insert cpi into Cl1 6. End if Calculate the mean Vo existing centroids (if any) 7. Cntds 1 = Cntds V0 is the 8. Set IDSk+1 —K Means (k + 1, Density, Cntds1) (k + 1, Density, Cntds1) 9. Repeat steps 2 through 8 until that Cl2 10. \\ Deep Density Peak Clustering Phase 11. Min—the number of minutes (C1) 12. For each cluster C2 to Cj, do 13. If count (Cj) is less than min, then 14. Update min 15. If Cl threshold Cluster true, then delete Cj with the lowest possible count whose centroid is Vj 16. Cntds2 = Cntds—Vj (threshold Cluster) 17. Set IDSk-1 —K Means (Set IDSk-1 —K Means) (k-1, Density, Cntds2) 18. Repeat steps 2 through 16 until Cl3 19. Repeat—Cl3 / Cl4 / Cl5 / Cl6 / Cl7 until all instances

Pi =

di, j 2 exp − A− s i, j∈S  

℘ (xi ) =

min(di j ), if i > j max(di j ),

(6)

(7)

An Ensemble Deep Closest Count and Density Peak Clustering …

411

Algorithm 2: Deep density peak clustering phase 1. Calculate the distance i, j between data outliers xi and xj by using the Gaussian kernel distance 2. Compute the cosine similarity Cs between data outliers xi and xj, and then compute the average of the cosine similarity for all of the data outliers in the dataset 3. Compute the adjusted local density ℘i of each data outlier xi using an exponential kernel technique based on the average of cosine similarity as the cutoff threshold, and save the results in a text file 4. Calculate the separation distance ℘i between each data outlier xi in the set 5. Compute where Ui, γ i = Pi ∗ δi (8) 6. Ui is a parameter of production 7. Sort the indexes Ui in descending order and use the highest Ui index set to determine which location is the peak (centroid) 8. Lastly, calculate the remaining (nonpeak) outliers and assign each outlier to the cluster as its nearest neighbor, based on the distance between each outlier and the center of the cluster   xi−min(xi ) Min− Max xi ' = max(xi )−min(xi) (9) 9. Return the groupings of subsets, where each subset is referred to as a cluster

In summary, the following are the steps that comprise the EDPC algorithm process: 1. The first step is to formulate a kernel function based on the scaler a parameter () that was specified earlier and then compute a Gaussian function as the distance measure based on the kernel functions to determine how far apart any two points in the training data are from one another. 2. Instead of using a cutoff distance parameter, compute the average of cosine similarity for all of the data points and use it as a threshold instead. 3. With the aid of a decision graph, compute the updated local density based on the average of cosine similarity and then calculate the separation distance to locate the peak spots. 4. Calculate I where I produce the maximum local density and the most significant separation distance to generate peak spots. Then, compute the remaining (nonpeak) points and assign each of them to a cluster as the point that is the closest neighbor to the cluster. Silhoute index = Cohesion =

x(i ) − y(i ) max{(x(i ), y(i )}

K 

mid(ci, c)2

(10)

(11)

i=1

A perfect clustering approach, the DPC has certain limits in terms of its structure, which are as follows: There is a difficulty with estimating the local density of all data points based on the cutoff distance (Cd) parameter, which is used to calculate the neighborhood radius of the data points. The Cd parameter has an impact on the proportion of clustering results that are obtained. With 2% and less than 2% of the data in the neighborhood, we investigate Cd. In the case of original DPC, the Cd parameter affects accuracy results but is not

412

B. Sudharkar et al.

sensitive to peak numbers; in the case of modified DPC, the Cd parameter does not affect accuracy results but is sensitive to peak numbers. Plot all data samples based on both high local density and high separation distance to choose as the peak points, which is referred to as a cluster center, with the assistance of a computer to help choose the peak points. However, the findings reveal that there is only one peak point, which will result in a significantly poor clustering quality in this situation. It is declared by the DPC findings of peak numbers in table Rand index =

a+d a+d +c+b

(12)

4 Results Analysis The Cntds symbol represents the set of centroids used in Algorithm 1. Each letter indicates the number of iterations needed for algorithm convergence, and letters k and n represent the total number of clusters and the total number of data outliers, respectively. Compared to the K-means algorithm [6], this method is (it * k * n) times more complex in terms of time consumption. The suggested approach uses centroid seed outliers from the initial K-means convergence in the (K + 1)- and (K − 1)-means convergences, thus decreasing the number of iterations in the later stages of the process final stages. Furthermore, if we had used K-means three times in a row, the number of iterations required for convergence would have been much more significant than with the suggested approach each time. Compared to the previous method (it * k * n), the suggested technique requires less than three times the calculation. Generally speaking, since k and are both small, the suggested approach has a low temporal complexity of O(n). To demonstrate our method in action, we used various datasets with varying Kvalues to run our whole procedure. For example, in the KDD CUP 99 dataset, we used our K, K + 1, and K techniques to get the median Outlier Desity of a class =

Number of instances normal class Total number of instances in the class

(13)

One method identified the K-value that produces the most significant outcomes among the three possible K-values ((3, 5, 20)) using three techniques. Consider the following illustration: A subset of the outliers identified in each occurrence of kclustering the dataset for a specific K-value, such as K 3, was then selected from each instance of k-clustering the dataset for a specific K-value, such as K 3. For K 20, the program also produced outliers by k-clustering datasets for k with many values and then choose a subset of the outliers identified in each of the three cases. The final results are calculated with the help of the formula provided below. KDD CUP 99 is a dataset (version 2.1).

An Ensemble Deep Closest Count and Density Peak Clustering … Table 1 Results for KDD CUP 99 dataset

413

Value of K

% Outliers from KDD CUP 99 normal

% Outliers from KDD CUP 99 anomaly

K=3

20.00

10.00

K=5

18.00

18.00

K = 20

36.00

32.00

These three kinds of data are included in this data collection (https://archive.ics.uci.edu/ml/datasets/KDD CUP 99), which is available online (KDD CUP 99 Normal, KDD CUP 99 anomaly). In each of these categories, there is a consequence of a large number of appearances. Each instance has its own set of cloud attack characteristics (infrastructure, data security, data integrity, and responsive). The KDD CUP 99-Normal class may be differentiated from the anomaly class in the KDD CUP 99 dataset using a linear discriminant analysis method (see below). It is anticipated that the KDD CUP 99-Normal class will account for the vast majority of outliers due to this. Table 1 illustrates that when we have a large k-value, we obtain most of the outliers from the KDD CUP 99-Normal class, as seen in the figure.

5 Conclusion We proposed a new technique for detecting outliers in a cluster. In this paper, we applied the clustering K-means technique with three neighboring cluster counts to identify anomalies. It was compared to the traditional K-means clustering methodology using a distance-based approach. When these two techniques are integrated and utilized together, the system’s performance outperforms that of existing frameworks. In contrast to the traditional k-means calculation, which uses a single cluster count, three separate nearest cluster counts are used to detect a perfect anomaly. Detecting anomalies in our method is comparable to or better than detecting anomalies in existing strategies. We did, although, rule out several of the potential focuses.

References 1. Aggarwal CC et al (1999) Fast algorithms for projected clustering. In: ACM SIGMOD conference proceedings 2. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks Alex Rodriguez and Alessandro Laio, vol 1492 3. Lin J (2019) Accelerating density peak clustering algorithm, pp 1–18 4. Loureiro A, Torgo L (2004) Outlier detection using clustering methods: a data cleaning application, ed. by C. Soares 5. Aggrwal S, Kaur P (2013) Survey of partition based clustering algorithm used for outlier detection. Int J Adv Res Eng Technol 1(5):57–62

414

B. Sudharkar et al.

6. Parmar M, Wang D, Tan A, Miao C, Jiang J, Zhou Y. A novel density peak clustering algorithm based on squared residual error 7. Pamula R, Deka JK, Nandi S (2011) An outlier detection method based on clustering. In: Second international conference on emerging application of information technology 8. Science N, Phenomena C, Li L, Zhang H, Peng H, Yang Y (2018) Chaos, solitons and fractals nearest neighbors based density peaks approach to intrusion detection, vol 110, pp 33–40 9. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114

A Soft Computing Based Approach for Pixel Labelling on 2D Images Using Fine Tuned R-CNN Nedumaran Arappal, Ajeet Singh, and D. Saidulu

Abstract Nowadays Artificial Intelligence (AI) is an integral component of the core business model of several organisations and a crucial strategic component of many industries’ strategies such as—healthcare, finance, retail education etc. on a global scale. Machine Learning (ML) is a technology to achieve AI. Machine learning methods utilize a variety of statistical, A vast, unstructured, and complicated dataset may be analysed using probabilistic and optimization approaches to learn from the past and find relevant trends. Automated oriented text categorization, network intervention detection, spam email filtering, credit card deceit detection, consumer purchase behaviour detection, and manufacturing process optimization are just a few of the many uses for these algorithms, image categorization, object recognition and disease modelling. Allocating a different class oriented label to every pixel of a particular image is certainly one of the significant steps in constructing complex automated systems like driver-less vehicles/drones, human-friendly oriented robotic machines, robot-assisted medical procedure, and savvy military systems. Pixel labeling’s computational purpose is to give each individual pixel in an imagery a class label. It is critical long-range (pixel) label dependencies in a model must be captured pictures to provide strong visual coherence and high class accuracy. This may be accomplished in a feed-forward architecture by considering a suitably big input to the context patch surrounding each individual pixel to be labelled. We present a strategy in this chapter that consists of an ideally finetuned recurrent convolutional neural network that mainly allows us to incorporate a broad input context without restricting the model’s capacity. Our solution does not rely on any such segmentation techniques or N. Arappal School of Electrical and Computer Engineering, Kombolcha Institute of Technology, Wollo University, Dessie, Ethiopia e-mail: [email protected] A. Singh (B) School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Telangana 500034, India e-mail: [email protected] D. Saidulu Department of Information Technology, Guru Nanak Institutions Technical Campus, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_61

415

416

N. Arappal et al.

task-specific characteristics, in contrast to the majority of the conventional methods. The system is trained on raw pixels from beginning to end, and it models complicated spatial dependencies with low inference costs. The system detects and corrects its own faults as the context dimension grows using the built-in repetition. On the Stanford Background Dataset, our method achieves best-in-class results, while being considerably much faster at test time. The methodology presented in this chapter has an empirical scope for societal improvement, modernization and progress. Keywords Semantic segmentation · Pixel-wise labelling · R-CNN · Machine learning · Statistical measures

1 Introduction Pixel-wise labelling of an image, is termed as semantic segmentation. In other words, Semantic oriented segmentation is the process of allocating a class name to each pixel in a picture. Semantic segmentation can be thought of as image classification at the pixel level. Boosted by the phenomenal capacity of the convolutional oriented neural networks (CNN) in making semantic, significant level as well as progressive image attributes; over the major quantities of certain deep learning-oriented 2D oriented semantic oriented segmentation methodologies have been developed over past decade. In inspecting the development of the field, we sequentially ordered the methodologies into three principle periods, to be specific, pre- and early-oriented deep learning period, the fully convolutional period, and the post-FCN period. We actually investigated the solutions set forward as far as solving the key issues of the field, for example, fine-grained thread localization as well as scale oriented invariance. Knowledge representation and reasoning is one of the most prominent approach to deal with this problem segment. Z. Pawlak, in his research works have given several methodologies [1–3] such as—information systems, Rough sets and Rough membership functions.

1.1 Post-fully Convolutional Networks The previous six years has seen an emotional expansion in worldwide interest regarding the matter of semantic division. Nearly all resulting methodologies on the semantic division have certainly followed the possibility of FCNs, consequently it necessarily would not have been right to express that completely connected layers adequately stopped to exist following the presence of FCNs to the certain issue of the semantic division. Then again, the possibility of FCNs additionally set out new open doors to additionally improve profound semantic division structures. As a rule, the primary downsides of FCNs can then be summed up as wasteful loss of mark localization inside the element progressive system, failure to deal with worldwide setting infor-

A Soft Computing Based Approach for Pixel …

417

mation, and the absence of a component for multiscale preparing. Accordingly, most resulting contemplates have been primarily pointed toward tackling these certain issues via the proposition of different structures or methods. For the rest of this research paper, we investigate these specific issues under the core title, ‘fine-grained oriented localization’. Thus, prior to introducing a rundown of the post-FCN cutting edge strategies, we center on this categorization of methods and inspect various methodologies that target tackling these fundamental issues. Along with that, we additionally examine scale oriented invariance in particularly to the semantic division setting, and get done with object location based methodologies, that are another variety of arrangement that target settling the semantic division issue all the while with identifying object instances. Some example architectures are—U-Net, SegNet, DeepMask [4], ParseNet [5], Convolutional (Conditional Random Fields) CRFs [6], Dense CRF [7], CRN-as-RNN [8], Graph LSTM network [9], DAG-RNN [10], Regions with CNN features (RCNN) network [11], You-Only-Look-Once (YOLO) [12], Mask-YOLO [13], YOLACT [14] etc.

1.2 Contribution Highlights The main contribution highlights in this research paper are described as follows: • We provide a method based on an ideally fine-tuned recurrent convolutional neural network that enables us to take into account a vast input frame of reference without restricting the model’s capacity. • Our method does not rely on any segmentation techniques or task-specific characteristics, in contrast to the majority of the main conventional methods. The system models complicated spatial relationships at minimal inference cost after being trained end-to-end utilizing raw pixel data. The system recognises and fixes its own mistakes as the context size grows due to the built-in repetition.

1.3 Structure of the Paper The remaining paper is structured as follows: State of the art review and evolution of methods in this domain are summarized in Sect. 2. Our modeling pipeline, experimental evaluation and obtained results then are given in Sect. 3. Finally, Sect. 4 then concludes the paper.

418

N. Arappal et al.

2 Related Work In this section, we present a portion of the cutting edge approaches utilized for semantic division. In Table 1, we present a few semantic division techniques, each with a concise run-down clarifying the basic thought that addresses the proposed arrangements, the issue type they mean to determine (like article, occurrence or the parts division), and whether they incorporate a necessary refinement step. Considering the investigations of the post-FCN period, the principle issue of the field probably remains proficiently incorporating worldwide setting to restriction data, which actually doesn’t show up to have an off-the-rack arrangement. State of the art methodologies summary is depicted in Table 1.

3 Our Modeling Pipeline and Experimental Evaluation Our adopted modeling pipeline in terms of simulation set-up, dataset used, architecture, methodology and obtained results are presented in this section.

3.1 Simulation Set-Up and Dataset To assess techniques for geometric and semantic scene interpretation, Gould et al. [27] developed a new and novel dataset called the Stanford Background Dataset. The collection includes 715 photos that were selected from the LabelMe, MSRC, PASCAL VOC, and Geometric Context public databases. The photographs had to be of outdoor sceneries, be around 320 by 240 pixels, feature at least one forepart item, and have the straight horizon position in the image in order to meet our selection criteria.

3.2 Methodology and Obtained Results 3.2.1

Architecture

The architecture with its different components are shown in Fig. 1. Mainly it consists of three major components i.e. Convolution layers, Region Proposal Network (RPN), Categories and Bounding Boxes prediction. 1. Component 1 ⏵ Convolution Layers: In this particular layers we usually train channels to extricate the most appropriate attributes of the image, for instance suppose that we will prepare those channels to remove the suitable attributes for a human face, at that point those channels will learn via training shapes and shad-

A Soft Computing Based Approach for Pixel …

419

Table 1 State-of-the-art approaches summary Method Approach summary GCN [15]

DFN [16] MSCI [17] DeepLab.v3+ [18] HPN [19]

EncNet [20]

PSANet [21]

ExFuse [22]

EMANet152 [23]

KSAC [24]

CFNet [25]

SDN [26]

Taken care of by an underlying ResNet-oriented encoder, GCN utilizes huge parts to intertwine high-as well as low-level attributes in a possible multiscale way, trailed by a generalized convolutional border refinement oriented (BR) module Comprises of two sub-organizations: smooth Net as well as border net Totals attributes from various scales by means of associations between long momentary memory (LSTM) oriented chains Improved form of DeepLab.v3, utilizing unique encoder and decoder composition with enlarged convolutions Followed particularly by a convolutional ‘appearance oriented feature encoder’, a ‘relevant feature encoder’ comprising of LSTMs produces super-pixel attributes took care of to a Softmax-oriented grouping layer Completely associated design to extricate setting is taken care of by thick element maps and followed in certain way by a convolutional expectation layer Utilizing a consideration the pixels between the two convolutional structures are linked by a self-adaptive scholarly consideration guide to give worldwide setting Improved form of GCN for attribute melding which brings more semantic data into specific low-level attributes and more spatial subtleties into significant level attributes module Novel consideration module in-between bi-modular CNN structures changes over input attribute guides to yield include maps, in this manner giving worldwide setting Permits parts of various open fields to have a similar piece to work with correspondence between branches and function highlight increase inside the organization Utilizing a circulation of the co-occurrent variables for a given objective in an picture, a fine-grained oriented spatial invariant portrayal is granularily learnt and the particular CFNet is built Comprises of numerous shallow deconvolutional oriented networks, that are called the SDN units, which are stacked individually to coordinate logical data and assurance fine recuperation of restricted data

ings that lone exist in the normal human face. Convolution networks are by and large made out of Convolution enabled layers, pooling oriented layers and a last part which is the completely associated or another all-inclusive thing that will be utilized for a fitting assignment like characterization or knowledge discovery. We register convolution by the sliding channel up and down our specific input image as well as the outcome is a two measurement grid called attribute map. Pooling comprises of diminishing amount of attributes in the attributes map by taking out pixels with low qualities. at last, exploiting the fully connected oriented layer to categorize those attributes.

420

N. Arappal et al.

Fig. 1 Architecture: different components

2. Component 2 ⏵ Region Proposal Network (RPN): RPN is a tiny neural network that slides on the final attribute map of the convolution layers to detect the presence of objects and predict their bounding boxes. 3. Component 3 ⏵ Categories and Bounding Boxes Prediction: Currently, we employ another fully connected neural system that accepts the areas suggested by the RPN as an input and predicts object class (characterization) and Bounding boxes (Regression).

3.2.2

Feature Extraction

• Pixels Location: The pixel location characteristic that we employ is the X and Y coordinate of a pixel. The assumption used when employing pixel location is that a foreground item would most likely be dispersed throughout the horizontal axis in an outdoor scene photograph. • RGB Color: In an image of a landscape, the RGB colour of each individual pixel is an important element to discern foreground items. • Surface Detection: We divided the pixel classification based on surfaces. So on each pixel we check whether that pixel belongs to either Horizontal or vertical or none. Based on this classification we will label the pixel with respective surface.

3.2.3

Training and Testing the Model

To train this certain architecture, we utilize Stochastic Gradient Descent (SGD) to optimize and improve convolution layers filters, associated RPN weights as well as the last fully connected layer weights. After experimental evaluation, the input image instance and corresponding output instances are given in Figs. 2, 3, 4, 5 and 6.

A Soft Computing Based Approach for Pixel …

Fig. 2 Input image instance and corresponding output instances (Exp-1)

Fig. 3 Input image instance and corresponding output instances (Exp-2)

Fig. 4 Input image instance and corresponding output instances (Exp-3)

Fig. 5 Input image instance and corresponding output instances (Exp-4)

Fig. 6 Input image instance and corresponding output instances (Exp-5)

421

422

N. Arappal et al.

Table 2 (*L = Layer) 2-layer model accuracy and 3-layer model accuracy in categorization for respective input image instances for all five individual experiments 2-L accuracy (%) 3-L accuracy (%) Image instances I mg_1

86.98 86.74 77.42 82.01 61.03

I mg_2 I mg_3 I mg_4 I mg_5

Table 3 Comparative analysis Method Xia et al. [23] Proposed approach

86.56 84.24 74.58 73.43 69.92

Best accuracy obtained (%) 77.41 86.98

Table 2 presents the two-layer model accuracy and three-layer model accuracy in categorization for respective input image instances.

3.2.4

Comparative Analysis

Table 3 depicts the comparison of our proposed approach with Xia et al. [23] method. The comparison is done in terms of best accuracy obtained in the pixel labeling task on input image dataset.

4 Conclusion We propose an efficient approach that mainly consists of an optimally fine tuned recurrent convolutional neural network that restricts the model’s capability while allowing us to take into account a vast input context. We gave specific consideration to the key specialized difficulties of the specific 2D semantic issue, the profound learning-oriented arrangements that were generally proposed, and exactly how these arrangements developed as they molded the progressions in the domain. To this scenario, we saw that the fine-grained localization of pixel names is obviously the complete test to the generally foreseeable issue.

A Soft Computing Based Approach for Pixel …

423

5 Future Scope In future, our goal is to apply more effective modeling procedure on various geospatial images and try to obtain higher accuracy in pixel labeling. We also analyse scalability and efficiency of the proposed framework.

References 1. Pawlak Z (1981) Information systems, theoretical foundations. Inf Syst 6(3):205–218 2. Pawlak Z (1982) Rough sets, algebraic and topological approach. ICS PAS Reports (482) 3. Pawlak Z, Skowron A (1993) Rough membership functions: a tool for reasoning with uncertainty, algebraic methods in logic and in computer science, vol 28. Banach Center Publications, Institute of Mathematics, Polish Academy of Sciences, Warszawa 4. Pinheiro PO, Collobert R, Dollar P (2015) Learning to segment object candidates. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., pp 1990–1998 5. Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better 6. Marvin TT, Teichmann, Cipolla R (2018) Convolutional crfs for semantic segmentation. CoRR, abs/1805.04777 7. Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In: Advances in neural information processing systems, pp 109–117 8. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. In: 2015 IEEE international conference on computer vision (ICCV), pp 1529–1537 9. Xiaodan L, Xiaohui S, Jiashi F, Liang L, Shuicheng Y (2016) Semantic object parsing with graph lstm. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016, pp 125–143 10. Shuai B, Zuo Z, Wang B, Wang G (2016) Dag-recurrent neural networks for scene labeling. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3620–3629 11. Darrell T, Malik J, Girshick RB, Donahue J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524 12. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788 13. Sun J Mask-yolo: efficient instance-level segmentation network based on yolo-v2, https:// ansleliu.github.io/MaskYOLO.html 14. Bolya D, Zhou C, Xiao F, Lee YJ (2019) YOLACT: real-time instance segmentation. CoRR, abs/1904.02689 15. Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters-improve semantic segmentation by global convolutional network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1743–1751 16. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 1857–1866 17. Lin D, Ji Y, Lischinski D, Cohen-Or D, Huang H (2018) Multi-scale context intertwining for semantic segmentation. In: The European conference on computer vision (ECCV) 18. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. CoRR, abs/1802.02611 19. Shi H, Li H, Meng F, Wu Q, Xu L, Ngan KN (2018) Hierarchical parsing net: semantic scene parsing from global scene to objects. IEEE Trans Multimed 20(10):2670–2682

424

N. Arappal et al.

20. Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR) 21. Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) Psanet: point-wise spatial attention network for scene parsing. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer International Publishing, Cham, pp 270–286 22. Zhang Z, Zhang X, Peng C, Xue X, Sun J (2018) ExFuse: enhancing feature fusion for semantic segmentation. In: Computer vision—ECCV 2018. ECCV 2018. Lecture notes in computer science, vol 11214. Springer, Cham 23. Xia L, Zhisheng Z, Wu J, Lin Z, Liu H, Yang Y (2019) Expectation-maximization attention networks for semantic segmentation 24. Huang Y, Wang Q, Jia W, He X (2019) See more than once—kernel-sharing atrous convolution for semantic segmentation 25. Zhang H, Zhang H, Wang C, Xie J (2019) Co-occurrent features in semantic segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR) 26. Fu J, Liu J, Wang Y, Zhou J, Wang C, Lu H (2017) Stacked deconvolutional network for semantic segmentation. IEEE Trans Image Process (99) 27. Gould S, Fulton R, Koller D (2009) Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of international conference on computer vision

Design of Concurrent Engineering Systems for Global Product Development Using Artificial Intelligence Arodh Lal Karn, Abolfazl Mehbodniya, Julian L. Webber, Vellingiri Jayagopal, D. Stalin David, Rajasekar Rangasamy, and Sudhakar Sengan Abstract Competition and concurrent engineering (CE) businesses are growing. Faster product creation, quality improvement, manufacturing process adoption, and reduced client demand cost are vital for corporate success. Implement concurrent engineering to make high-quality goods. CE is a parallel, systematic method to design. The chief executive officer (CEO’s) approach impacted how many firms build products. You may use the fundamental ideas of concurrent engineering conversation and the failure mode and effects analysis (FMEA) tool. To provide A. L. Karn Department of Financial and Actuarial Mathematics, School of Mathematics and Physics, Xi’an Jiaotong-Liverpool University, Suzhou 215123, Jiangsu, China e-mail: [email protected] A. Mehbodniya · J. L. Webber Department of Electronics and Communication Engineering, Kuwait College of Science And Technology (KCST), Kuwait City, Kuwait e-mail: [email protected] J. L. Webber e-mail: [email protected] V. Jayagopal Department of Software and System Engineering, School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu 632014, India e-mail: [email protected] D. Stalin David Department of Information Technology, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamil Nadu 600072, India e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru, Karnataka 561203, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_32

425

426

A. L. Karn et al.

a smooth and efficient CE environment, conduct the product development process from market research to detailed design. Concurrent engineering is a novel product development (NPD) approach. Concurrency and cross-functional integration separate CE from traditional product development. NPD continues through developing ideas, validation, proof-of-concept validation, Minimum Viable Product development, pre-launch, and post-launch. In other words, product discovery should continue throughout its life span. Each organization’s growth operations operate simultaneously. Keywords Concurrent engineering · New product development · Integration and concurrency · Failure modes and effects analysis

1 Introduction Simultaneous designing, taking the all-out period of the item life cycle into account, to create items, for example, agents of outside associations, is a method of joint work by cross-utilitarian groups. Simultaneous concurrent engineering (CE) is consistently guaranteed benefits since being viewed as a superior method to grow new items [1]. Its three fundamental components of simultaneous designing were coordinated effort, cycles, and data innovation. CE has been viewed as a significant factor in deciding the organization’s accomplishments. CE is to lessen the deferral and requires overlaying the improvement phase of new items. It will be accomplished by actualizing the guideline of good CE and decreasing deferral. The rule of this CE is that a temporary new item improvement measure has been considered the primary key, and Japanese organizations presented it. Be that as it may, you cannot, in every case, demonstrate the appropriation of CE. The prevalence of CE is to develop, and the application is undeniably different to characterize the CE; its fundamental standard has gotten progressively equivocal [2]. CE technique, there is sometimes a likelihood that you cannot give the guaranteed level and look extravagant along these lines; you need a transient asset. CE is to diminish costs, abbreviate the hour of item advancement measure and can improve the nature of the item. On the off chance that the entirety of the plan exercises is acted in equal, choices are made between various gatherings to incorporate. In any case, CE is simultaneous and does not imply that the item improvement measure crafts the entirety of the exercises. In organizational structure (OS), we mainly play a regulatory role in the company. This includes a reduction of the complexity of the construction company, and its members are divided into different groups 15. To minimize randomness and unpredictability of behaviour and organizational behaviour (OB) through structural elements and power consumption, place all members assigned responsibilities and work [3]. A structured way of action is defined in the general method of OB; the system of a standard code of conduct of a position different from the people of different individuals will perform the OB. Taking compelling special of the implementation process of the OS into account, to be a framework of OB,

Design of Concurrent Engineering Systems for Global Product …

427

it should be based on strategy, but it also should follow. Likewise, the activities of the one-of-a-kind delegate and the gathering must be changed to ensure the corporate targets’ ground-breaking mission to achieve the expected level to address their issues. To make the fundamental changes that are dependable on the association’s endeavours and framework, we are to portray the mysticism of the relationship, achieve the various levelled meta-model of learning, and separate its structure, it and the indispensable objectives of the association [4]. This way, legitimate issues have been especially regarded, and you should be a common concern. You may need to know how stable an association affiliation is. Structure assessment and adjustments of the complicated arrangement of the rethink of the progressive corporate structure introduced in this article will review the plan of the proposed adventure metadata model and courses of action. Investigation outcomes of using a CE in the current thing progression measure, we consider the issue in the thing improvement measure, have proposed a couple of changes in some CE guidelines and devices of the current thing improvement measure. The rest of the paper is, Sect. 1 is the introduction about concurrent engineering and the impact of AI. Section 2 is related work with existing research contributions. Section 3 is the proposed CE model. Section 4 is the result and discussions of the proposed model; people who checked out thing progression, the start of measurable looking over, low down arrangement in the long run-you can fathom the pattern of thing improvement in the mass CE atmosphere. Section 5 is the conclusion and future work about applying the concepts and tools of CE in the PD process, and designers can make a product more effectively and efficiently.

2 Related Works CE is to decrease the expense and time to showcase, as a definitive objective of improving item quality, consumer loyalty “to deliver a simultaneous plan, including support, coordinated item, and related cycles” it is characterized as. As per research findings, there are eight essential components in the simultaneous design. They are partitioned into two angles. (a) human and the board, group building, including the administration and traditional way of thinking, including specialized, (b) specialized angles, the planned standard, the advancement of less improvement time and correspondence organizes and time to advertise, fewer plan changes, hardly any imperfections, adjust and scrap, the high calibre and the arrival of resources: a portion of the potential favourable position is, you need to actualize a to have CE that can be of the organization, for instance, it very well may be expected. The new company’s organizational structure was improved through structural analysis [5]. For this reason, we will evaluate the organization of the company, structure analysis and simplify for the diagnosis, and use a standard approach to implement the complex framework. Several methods, such as the Galois lattice social network analysis (SNA), have been used in this domain, including the Q-analysis method, to capture the algebraic topology’s organizational structure and catch the interest. Its advantages will

428

A. L. Karn et al.

be communication indicating the structure. Q-in the analysis, we have to diagnose the actual OS of the company; let us compare an actual bill and it. Author, the eccentricity of such a system, we use different metrics like complexity and flow. Indicators of these measures ensure synchronization between the 1 of the formal OS and an emergency to changes in cognition in cause business processes [6]. In addition, about 40% of the standard implementation of CE has been increased in the three companies. (1) Support from senior management to promote the corporate culture (2) provides training and education: any company as a process for achieving the CE, some of the factors, such as the following, employees of all levels should ensure the success of its implementation, interdisciplinary team of effective project management. Generally speaking, there are seven steps to achieving CE within an organization. These steps are as follows: (a) develop a strategy for top management; (b) the company’s existing conditions, such as using specific rating tools such as benchmarks, surveys, and performance indicators. Evaluate; (c) create a supportive corporate culture. Relevance to raise’s CE law awareness provides CE implementation training; (d) improves prioritization based on evaluation results; (e) plan is a milestone responsible for all participants/set goals and analyse CE projects change required resources; (f) implement improvements; (g) implement support. Many applications, such as the dynamics of the modelling and business process mining of research and their companies of enterprise architecture of such organizations, in order to explore the knowledge of the organization and enterprise architecture, for the same purpose, but there is also organization culture will be in the collective action of human beings is a part of the organization research. To track the employee of experience with their knowledge, keep up with the work tasks of the time of the information database, how to use the results of the analysis of the re-designed OS by the research department of international business machines (IBM) [7], comparison of strategic information system and deployment of comprehensive meta-model intelligent tissue, restructuring structural analysis framework. Soft computing techniques, it is very relevant to this work. Image processing, application of artificial intelligence techniques, the prediction of the artificial neural network, and optimization techniques are used in various soft computing techniques in various areas in real-life applications [8].

3 Materials and Methods CE is a significant idea in the realm of new item improvement. This, while keeping up the most extraordinary, least expense, and the vast majority of consumer loyalty, is the technique that is utilized to make an opportune item. In the regular item improvement, such statistical surveying, item plan determinations, reasonable plan, point-by-point configuration, make, exercises, for example, deals, are executed successively; the pattern is, before you run any of the accompanying’s, every it is to finish 100% of the

Design of Concurrent Engineering Systems for Global Product …

429

Fig. 1 Proposed CE model

stage. Accordingly, this can increase the time and cost needed for the item improvement measure. Simultaneously, CE coordinates all plan exercises as a structure for planning an adjustment in the make-up, which is a methodical methodology [9]. Figure 1 describes the CE tool’s overview to guarantee better product quality and understanding of using these tools. In the development of similar products and processes, the cost is the beginning of the rapid development due to the increase in the production and use of the slow implementation of the necessary modifications to the short repetition cycle increase, and intensive activities in the early stages of development. Today, it is possible that only these companies to compete in the market successfully can be of the right quality, and therefore, they are adapted to the customer requirements in the hope and the appropriate timing and price, the appropriate product you can provide to customers.

3.1 Enterprise Modelling The enterprise model is to design better companies, their performance analysis and management; in the present study, we play an essential role in its business: model, organizations, information, and human resources. The existing modelling technology initially finds standardization and standardization efforts. The company is concerned about the OS [10]. On the other hand, a learning organization is a skilled acquisition, creation, and transfer, and there is knowledge and conversion of the holding, knowledge will be to improve performance and innovative products and services. These activities of all are, on average, intelligence and learning ability, rely on the

430

A. L. Karn et al.

interpersonal interaction of its members. In other words, the organization of intelligence is not simply being able to equate with human intelligence. So how do you imagine the corporate organization to adapt to competitive action in the smart? We, the primary and comprehensive approach, structural analysis based on the algebraic topology, we believe helps to organize the intellectual paradigm. In the graylog information model (GIM) grey model [11], companies, the physical system, consists of a decision-making and information system. Available, physical, decision-making, and information view: companies, you can use the four views for the explanation. Further, computer integrated manufacturing open system architecture (CIMOSA) classification defines the model-based enterprise engineering methods to produce the operations for general and specific functions [12].

3.2 Product Development Process with CE CE is a critical thought in the domain of new thing improvement. This, while keeping up the best, least cost, and most of the customer steadfastness, is the method used to make an ideal thing. In the conventional thing improvement, such measurable reviewing, thing plan specifics, hypothetical arrangement, point-by-point design, manufacture, work out, for instance, bargains are executed sequentially, the example is before you run any of the going with, each it is to completed 100% of the stage. It has been made in the change stage after endless the thing progression measure. Like this, this can add to an extension in the time and cost required for the thing improvement measure. CE is an efficient strategy to fuse the aggregate of the arrangement activities and gives a structure to finishing the distinction in a plan. The thing improvement measure is the path towards changing over the necessities of customers in the arrangement and collecting of the thing. It gives a guide during the progression pattern of the plan, activities, and cycles fundamental to achieving the unforeseen development and inevitable aftereffects of creation. The central target of the thing headway measure, life cycle costs, enhance, is to keep the idea of things and grow purchaser dependability, most extraordinary versatility and least of activity to a base. A method for measuring the CE environment throughout the product development cycle and layout. Regardless, once a tremendous amount of the thing progression measure, the CE atmosphere, has been made, it has been proposed in the composition. It has been executed a massive part of the things originators and designers, eminent, and there are different things headway models set up [13, 14].

3.3 Failure Mode and Effects Analysis Identify failure mode and effects analysis (FMEA), which is entirely potential failure mode and their causes and methods designed to understand the influence of system or end-user on failure for a given product or process. We prioritize evaluating the cause

Design of Concurrent Engineering Systems for Global Product …

431

of the effect and the corrective measures, the identified failure mode, and the risks related to the problem. Identify and take corrective measures to address the most serious concern. In the case of the process, FMEA improves process control plans, the system or in the case of the design, FMEA improves test and verification plan for identifying, preventing safety problems, product performance, or performance from minimizing the loss of decline. Review the product design of the review and the manufacturing process. You know the characteristics of the critical products and processes to develop a preventive maintenance plan for in-service machinery and equipment for the development of online diagnostic technology. Products are minimum downtime, scrap and rework, and improve the production process built securely to meet design requirements. The work in manufacturing and processing, transportation, receiving parts and materials, putting them in storage, setting up a conveyor system, inspecting and adjusting equipment, and distributing labels [15–17].

4 Result and Discussion CE gadget is readied and must be realized to convey an admirable quality. Ordinarily, all arrangement is organized and repeated simultaneously in this model, yet it is equivalent to restricting unnecessary emphasis in the structure design stage. From this proposed model, people who checked out thing progression, the start of measurable looking over, low down arrangement in the long run-you can fathom the pattern of thing improvement in the mass CE atmosphere. Table 1 shows the simulation parameters that consider the failure mode and effect of analysis based on concurrent engineering tools.

4.1 Global Industrial Growth Analysis CE compresses the conventional sequential product development (PD) process because it allows many PD stages to proceed simultaneously; CE is a way to shorten the PD time. In a way that the central government gives, interdisciplinary teams have the opportunity for professionals all working on similar PD projects of these differences to get more information about PD simultaneously, running all of the industry analysis time. Table 2 shows the industrial level global PD increasing over the years. Table 1 Simulation parameters

Parameters

Tools

Simulation tool

.Net

Model

Failure mode and effect of analysis

Method

Concurrent engineering

432 Table 2 Summary of annual PD

A. L. Karn et al. Years

Global PD (%)

Years

Global PD (%)

2012

52

2017

69

2013

61

2018

74

2014

41

2019

81

2015

50

2020

85

2016

45

2021

90

Fig. 2 Analysis of PD

Figure 2 describes the global PD analysis comparing the years based on the industrial enterprises. In 2021, PD level increased the previous years.

4.2 Grade of Failure Modes FMEA is a crucial design designed to prevent failure or defect from reaching the customer in the prevention help manufacturing engineering tools. Before you determine the cause and design, it is to find the failure’s effect and provide a systematic method for the design team. FMEA, if all the PD system check failure performs a method that may occur. Table 3 shows increased grade levels based on the failure modes and effects for the industries. Figure 3 describes the failure modes, and the effects level grade is based on the performance.

Design of Concurrent Engineering Systems for Global Product … Table 3 Grade of failure modes and effects

433

Grade

Failure modes (%)

Effects (%)

10

30

32

20

40

35

30

41

25

40

54

62

50

69

70

Fig. 3 Grade of failure modes

5 Conclusion and Future Work CE is critical in the PD process. Application of the concepts and tools of CE in the PD process, designers can make a product more effectively and efficiently. Companies that in their PD have been the CE tool, in particular, to reduce the cost incurred, shorten the time of PD process, improve the quality of products, and meet customer requirements, a great deal we benefit from. Moreover, some uncertainty in the design can be reduced, and the product can be designed with a more transparent process. PD time, and other organizations, have been forced to change to parallel PD from sequential. The fundamental element of parallel PD is teamwork. Therefore, the formation and structure of the administrative team will win particular importance. In addition, tools for the CE to develop a new modern industrial production are essential.

References 1. Choong LM, Cheng WK (2021) Machine learning in failure analysis of optical transceiver manufacturing process. Int Conf Comput Inf Sci 2021:160–162 2. Colombo W, Karnouskos S, Shi Y, Yin S, Kaynak O (2016) Industrial cyber-physical systems. Scan Issue Proc IEEE 104(5):899–903

434

A. L. Karn et al.

3. Duan C, Ding Z, Wu Z, Wang X, Li C, Wang X (2019) Failure analysis of the effect of hydrogen on GaAs device. IEEE 26th international symposium on physical and failure analysis of integrated circuits, pp 1–5 4. Fuse J, Sunaoshi T, Kanemura T, Nara Y, Kageyama A, Mizuno T (2018) A case study of a short failure analysis by voltage applied EBAC. IEEE international symposium on the physical and failure analysis of integrated circuits, pp 1–6 5. Haihang W, Qidi W, Caihong Z, Junwei Y (1996) Research on teamwork-based organization environment and enterprise cultures for concurrent engineering. In: Proceedings of the IEEE international conference on industrial technology, pp 339–342 6. Hoseynabadi HA, Oraee H, Tavner PJ (2010) Failure modes and effects analysis (FMEA) for wind turbines. Int J Electr Power Energy Syst 32(7):817–824 7. Papazoglou MP, Heuvel WJVD, Mascolo J (2015) Reference architecture and knowledge-based structures for smart manufacturing networks. IEEE Softw 32:61–69 8. Papazoglou MP (2018) Metaprogramming environment for industry 4.0. In: 6th International conference on enterprise systems, pp 1–8 9. Priyadarshni AU, Sudhakar S (2015) Cluster based certificate revocation by cluster head in mobile ad-hoc network. Int J Appl Eng Res 10(20):16014–16018 10. Qattan NA, Al-Bahi AM, Kada B (2021) Failure modes and effects analysis of t-56 turboprop engine turbine. Annual reliability and maintainability symposium, pp 1–5 11. Sudhakar S, Chenthur Pandian S (2016) Hybrid cluster-based geographical routing protocol to mitigate malicious nodes in mobile ad hoc network. Int J Ad Hoc Ubiquitous Comput 21(4):224–236 12. Sudhakar S, Chenthur Pandian S (2012) Secure packet encryption and key exchange system in mobile ad hoc network. J Comput Sci 8(6):908–912 13. Sudhakar S, Chenthur Pandian S (2013) A trust and co-operative nodes with affects of malicious attacks and measure the performance degradation on geographic aided routing in mobile ad hoc network. Life Sci J 10(4s):158–163 14. Sudhakar S, Chenthur Pandian S (2015) Investigation of attribute aided data aggregation over dynamic routing in wireless sensor. J Eng Sci Technol 10(11):1465–1476 15. Sudhakar S, Chenthur Pandian S (2013) Trustworthy position-based routing to mitigate against the malicious attacks to signifies secured data packet using geographic routing protocol in MANET. WSEAS Trans Commun 12(11):584–603 16. Wu LJ, Tsai CT, Chang PL (2011) Dynamic open innovation model of Research and development management for enterprise globalization. In: 1st International technology management conference, pp 827–835 17. Yang CH, Vyatkin V, Pang C (2014) Distributed automation: a survey and an approach. IEEE Trans Syst Man Cybern: Syst 44(3)

Comparison of Public and Critics Opinion About the Taliban Government Over Afghanistan Through Sentiment Analysis Md Majid Reza, Satwinder Singh, Harish Kundra, and Md Rashid Reza

Abstract The usage of social media has increased exponentially these days. People worldwide are sharing their opinions on different platforms such as Twitter, personal blogs, Facebook, and other similar platforms. Twitter has grown in popularity as a platform for people to express their thoughts and opinions on many different topics. The data from Twitter about the Taliban has been examined in this research work, and various machine learning algorithms have been applied including SVM, LR, and random forest. Text sentiments have been captured via TextBlob. Among the machine learning models applied, SVM outperformed all other models and achieved an accuracy score of around 94% on the tweet dataset and logistic regression outperformed other models with an accuracy score of 83% on the news article dataset. Keywords Hashtags · Machine learning · NLP · Sentiment analysis · Tweets · TextBlob

1 Introduction People are using social media more frequently these days. People from all over the world are sharing their own opinions on social media platforms like Twitter, Facebook, personal blogs, and other similar platforms. Social media (SM) platform like Twitter has become a top social site in popularity where users share their thoughts

M. M. Reza · S. Singh (B) Department of Computer Science and Technology, Central University of Punjab, Bathinda, Punjab, India e-mail: [email protected] H. Kundra Guru Nanak Institutions Technical Campus, Hyderabad, Telengana, India e-mail: [email protected] M. R. Reza Department of Computer Science and Information Technology, Mahatma Gandhi Central University, Motihari, Bihar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_33

435

436

M. M. Reza et al.

and opinions on a number of topics. SM data analysis can assist businesses, governments, security organizations, and the environment in understanding people’s problems, suggestions, and criticisms in order to find the best solution to their issues. The data from Twitter about the Taliban is examined in this research work. There are many research papers that have been done on the topic, but none of them relates to the public opinion analysis based on Twitter tweets about the Taliban. Various machine learning-based techniques have been used to analyze Twitter data. A Twitter streaming API was also used to collect tweets. Sentiment analysis is a great way to find out the sentiments of the public around the world, especially public opinions about a particular topic, product, or idea. Rest of the paper organized as follows, Sect. 2 will discuss related work, Sect. 3 gives the proposed methodology used in the experiment work later Sect. 4 with result and discussion, and finally, Sect. 5 concludes with the conclusion and future works of the current study.

2 Related Work Users from all over the world post a huge amount of information which may be text, images, and videos on social apps like Twitter which results in an exponential increase in the amount of data uploaded on social media. A literature survey has been done on the past work and is summarized below according to the papers reviewed. The study [1] analyzed user sentiments on the Twitter data by the sentiment analysis techniques. The dataset contained nearly 600 tweets being negative tweets and the same number of positive tweets. The SVM classifier was used for the classification of the tweets and achieved an accuracy of 80%. For future work, the authors suggested working with the hybrid classification approaches with the Naïve Bayes and KNN to increase the accuracy of the model. In [2] combined both lexicon and machine learning approaches for the detection of supporters of terrorists on Twitter. 96,679 public tweets ranging from May 22, 2017, to October 31, 2017, were collected as the dataset for the approach. For the lexicon approach, a training dataset was built and classified as positive, negative, and neutral. But for the machine learning stage, just positive and negative labeled tweets were considered for the classification model. Among the labeled tweets, positive based were regarded as non-terrorist supported tweets while negative ones were regarded as terrorist supportive tweets. The proposed approach was able to achieve an accuracy score of 94.8% with a 95.9% F1-score. The study [3] proposed a framework to analyze content related to terrorism. The main focus of their work was to classify tweets into two categories which were extremist and non-extremist. The textual data was compromised of tweets from Twitter. Results were encouraging and provided a way to research further in the said area. An accuracy score of 90% was achieved with the proposed approach. The study [4] proposed an approach to get the sentiments of a leading tourist platform, TripAdvisor. The approach used a hybrid method composed of lexicalbased and machine learning methods. For the lexical-based approach, SenticNet

Comparison of Public and Critics Opinion About the Taliban …

437

was used to get the label of each review. For the machine learning-based methods, ensemble algorithms were used. A few of the examples were logistic regression and decision tree. The results depicted that the random forest model provided a good accuracy score of 98%. This study [5] was based on sentiment analysis to cover the situation in Afghanistan under the Taliban regime. The main objective behind the study was to get the sentiment and volume analysis of tweets. For the sentiment analysis part, a hybrid approach was used with machine learning models including some deep learning models like CNN and GRU. For the volume analysis part, five countries were chosen where a large number of tweets were generated. SVM produced amazing results with an accuracy score of 97% with the proposed hybrid features. The work [6] gathered a dataset from Twitter via Twitter API. Then, the data was preprocessed by NLP techniques. Out of Naïve Bayes and SVM, Naïve Bayes performed well with the classification and obtained an accuracy score of 88.5% and a precision score of 92.94%. From the results, it was depicted that the Naive Bayes did not perform well on the basis of recall score whereas SVM’s recall score was pretty well. Study [7] proposed a mechanism to get the sentiments of tweets in Turkish. The authors followed two methods for the same, one being the polarity lexicon and the other being AI. In the method, a dictionary of words was introduced that matches the words in the tweets. The classification was divided into three classes which were positive, negative, and neutral which were obtained from the results after matching with the tweet set. Machine learning models used were SVM and random forest for classification purposes. On the raw data, RF performs better with an accuracy score of 88%, whereas SVM performed better on cleaned data with an accuracy score of 76%. According to the study, PL methods performance grows continuously from a score of 45% to 57% when the data was cleaned from raw data. This study [8], sentiment analysis was done on Twitter data on the topic “Afghanistan people’s views about the situation of their country.” The study performed the experiment on bulk Twitter data set consisting of nearly 2.09 lac tweets and after the preprocessing stage, nearly 68.9 thousand were selected. From the total data, 34% of the tweets were negatively classified, 35.1% normal, and the rest as positive tweets. Peace, Afghan, and Secure were a few of the words from the positive word list, whereas kill, Taliban, Attack, and Bomber were a few of the negative words. Work by [9] performed a study on the classification of sentiments on the text frequency-inverse document frequency and word to vector embedding. A public dataset of Rotten Tomatoes and TripAdvisor was used. The main contribution of the paper was using ensemble models with machine learning models instead of implementing them alone. NB, kNN, SVM, and LR were used for the study to get sentiment analysis, stacking, and voting as an ensemble. From the results, voting has yielded good results as compared to stacking in the ensemble learning algorithms. From TFIDF and W2V, W2V performed well. In the future, the authors suggested getting the performance of various models most importantly BERT, FastText, and Glove together with ensemble and single ML methods.

438

M. M. Reza et al.

According to the study [10], sentiment analysis can be performed on programs on television by just gathering the data of opinions from audiences via Twitter. Various tools were used for the collection, cleaning, and sentiment mining of the data. This could help in the production of good feedback to the producers and thus would help producers to find negative changes in the perception of the viewer’s show. The study [11] examined various machine learning algorithms like Naïve Bayes (NB), decision tree (DT), support vector machine (SVM), and random forests (RF). Through the evaluation of different algorithms, it was observed that out of the models examined the random forest algorithm using twenty random trees had the highest performance on this dataset. From January 2018 to March 2018, tweets were gathered and posted by the Taliban and the Afghan Government. Nearly 952 tweets from both parts were checked by the four media outlets, etc., and by the protection advocacy civilian group. The authors get information discrepancies between the parties and the media. The authors [12] aimed to review and integrate findings from the various convergent and divergent fields of work designed to throw light on the ways in which social media usage makes easier political protests. The long-term goal of the research program on the effect of social media practice on collective action was to leverage new methods including machine learning algorithms to observe informatively and motivates resources as they permeate through social media networks and to highlight their precise role. The study [13] suggested that tweet content semantic knowledge is required in the analysis for the purpose of predictive ability in terms of extreme sub-groups. This is significant because a lot of Twitter tweets hold similar words but differ in context or even express opposing viewpoints. The framework was implemented in three steps: (a) data collection and labeling, (b) data preprocessing and exploratory data analysis (EDA), and (c) classification model development and evaluation of the data. After the attack on the Kunduz Madrassa by Afghan forces, tweets were collected from the region of Afghanistan from April 2, 2018, to April 8, 2018. The study collected filtered data from 3380 tweets. For reduction of a high-dimensional data space into a low-dimensional space, an EDA using principal component analysis (PCA) was performed on the tweet’s dataset. When compared to other classification models, a support vector machine (SVM) had an average accuracy of 84%, according to the analysis. This work [14] combined quantitative and qualitative analysis to investigate the report of the Taliban conflict in Pakistan media and found the reports escalator and elitist from the peace journalism perspective. For this study, the dataset was collected through discourse analysis, content analysis, and interviews with journalists and analysts. The study offered some suggestions for improvement. The media need to consider the conflict from a humanitarian perspective. The study [15] observed that web scraping was a highly useful tool in the information age, and Python was the best language for implementing it due to its quick learning curve and powerful syntax.

Comparison of Public and Critics Opinion About the Taliban …

439

3 Proposed Approach The main aim of this study is to get sentiments from tweets and news articles using a machine learning algorithm and also compare their sentiments. As per the literature survey and to the best the authors knowledge, the study did not find any efficient way to get facts about the opinion of people on the change of the Democratic Government to the Taliban-led Government. The study reveals data collected from Twitter had limitations in cross-checking the effects of critics’ reviews on the general public. The proper survey has not been done on the public opinions about the Taliban Government and the role of media houses (print and electronic media). This research work will find the relationship between the sentiment analysis of critics’ views and public views through media house articles and Twitter data, respectively.

3.1 Data Collection We used Tweepy, an open-source Python package to extract tweets from hashtags, and NewsAPI to extract news articles by category from different news publishers. With the help of snscrape Python library, 245,205 tweets from hashtags (10 popular hashtags) related to Taliban (Government of Afghanistan) tweets were collected. The data for this research work has been collected for the duration of August 6, 2021, to May 11, 2022.

3.2 Data Cleaning The collected tweets list had some duplicated tweets. Text cleaning is nothing more than filtering the extracted data before analyzing it. It helped to identify and remove non-textual content and content not relevant to the field of study from the dataset. To remove these unnecessary things, NLTK toolkit has been used which contains inbuilt functionalities to work with these things.

3.3 Unnecessary Data Removal One of the most significant steps in any machine learning project is data preprocessing. It includes cleaning and formatting the data before using it in a machine learning algorithm. For NLP, preprocessing steps comprised of the following tasks: • Removing Twitter handles • Removing URLs • Removing punctuations

440 Table 1 Web scraping news articles from valid links

M. M. Reza et al. Total collected links

100,006

After removing duplicate links

98,377

Valid links

34,273

Collected news articles

10,050 (News articles)

• Removing Twitter handles • Removing stop words stemming. In this study, 245,198 tweets were collected from Twitter hashtags, and after preprocessing, these tweets we got 185,657 final tweets. And 24,089 news articles were collected and the final preprocessed News Article 10,050 for further evolution.

3.4 Web Scrapping News Articles Web scraping is the process of crawling the Internet with automated bots and extracting data. The bots gather information by first reducing the targeted site to its most basic form, HTML text, and then scanning through it to gather data based on some predefined parameters. To scrape news articles, links were collected from tweets. Table 1 gives the data collected from Twitter.

3.5 Data Labeling Data labeling was done by a sentiment by a sentiment analyzer called TextBlob. TextBlob gives two properties for the given sentiment. We have trained our model with bag of words or lexicons and tested it on the analyzing statement. TextBlob counts score in the range from − 1 to + 1, therefore the scores were divided into three classes which were from − 1 to < 0 were treated as negative sentiment, scores equal to 0 were treated as neutral sentiment, and scores from > 0 to + 1 were treated as positive sentiments.

3.6 Word Cloud Word clouds are famous for visualizing qualitative information due to the fact they are easy to apply tweets and news articles are given below (Figs. 1 and 2). From the neutral tweets figure above, ordered, worked, Afghan, brothers, interviews, Spokesperson, etc., are some frequent words from a word cloud. On the other hand, excellent control, face covering, aspirations, correcting, humanitarian,

Comparison of Public and Critics Opinion About the Taliban …

441

Fig. 1 Word cloud for positive words

Fig. 2 Word cloud for neutral words

moderate, and supporting are a few frequent words from word cloud for positive tweets (Figs. 3 and 4). Fig. 3 Words for negative tweets

442

M. M. Reza et al.

Fig. 4 Words for neutral news article

From the above word cloud, frequent words of negative tweets are unofficially, killing, sanctioned, criminal, revenge, etc. And, published, Taliban, colleges, caretaker, policy, Ukraine, etc., are a few frequent words from neutral news articles (Figs. 5 and 6). Fig. 5 Words for negative news articles

Fig. 6 Words for positive news articles

Comparison of Public and Critics Opinion About the Taliban …

443

For positive news articles, Afghanistan, Taliban, ordered, united, ambassador, natural, etc., are a few frequent words from a word cloud. Afghan, blamed, freezes, smuggling, defense, extremist, claimed, customs, Taliban, etc., are words for negative news articles.

3.7 Special Character and Stop Words Removal Stopwords in natural language are those words that have very little meaning, such as “is,” “an,” “the,” and so on. Different search engines and other enterprise indexing platforms frequently filter out stopwords when retrieving database results for user queries. Stopwords are to me removed from the data it adds no meaning to the data part. We can use several languages by processing Python libraries such as spaCy, NLTK, TextBlob, Genism, etc. If we require full control over stopwords that we want to remove from my dataset, we can write our custom script.

3.8 Proposed Architecture Figure 7 shows the proposed architecture of the work. The figure depicts how the data was gathered from the Twitter and later how labeling and feature extraction were done. Finally, machine learning models were applied to get the desired output.

Fig. 7 Architecture of the proposed work

444

M. M. Reza et al.

4 Results and Discussion In this section, results of the various runs of the experiments are described that were carried out after applying some machine learning models to the dataset. The sentiment score of tweets were calculated in Python with the help of TextBlob, and it achieved good results. The work implemented the three most popular and useful ML algorithms to achieve good results. SVM and LR achieved good results, both of these models were shown accuracy very close to each other. And the RF model just achieved moderate results for datasets. Tables 2 and 3 show the comparison of different accuracy measures calculated with machine learning algorithms on collected tweets and news articles dataset, respectively. Table 2 Comparison of different accuracy measures calculated with machine learning algorithms on collected tweets dataset Sentiment analyzer

Algorithms

Accuracy (%)

TextBlob

Logistic regression

92.88

SVM

93.98

Random forest

79.35

Tweet sentiments

Precision (%)

Recall (%)

F1-score (%)

Neutral

92

97

95

Positive

94

93

93

Negative

93

88

90

Neutral

94

98

96

Positive

95

94

94

Negative

92

90

91

Neutral

75

91

81

Positive

79

87

83

Negative

91

52

66

Table 3 Comparison of different accuracy measures calculated with machine learning algorithms on collected article/blog dataset Sentiment analyzer

Algorithms

Accuracy (%)

TextBlob

Logistic regression

83

SVM

Random forest

81.70

78.12

News articles sentiments

Precision (%)

Recall (%)

F1-score (%)

Neutral

71

65

68

Positive

85

94

89

Negative

77

56

65

Neutral

71

71

71

Positive

86

91

88

Negative

69

56

62

Neutral

61

79

69

Positive

79

96

87

Negative

84

24

37

Comparison of Public and Critics Opinion About the Taliban …

445

From the table mentioned above Tables 2 and 3, SVM achieved higher accuracy of 93.98% for tweets and logistic regression achieved the highest accuracy for news articles of accuracy 83% (Fig. 8). In the SVM model, the result of tweets is excellent with an accuracy of 93.98%, but the accuracy of logistic regression is good for news articles at 83%. Random forest performed with low results for both tweets and news articles (Figs. 9 and 10).

Fig. 8 Comparison of different models for tweets and news articles accuracy

Fig. 9 Confusion matrix of logistic regression for tweets

446

M. M. Reza et al.

Fig. 10 ROC AUC score: 0.9445861889285351

From the above figure, the true positive value is very high and has a ROC AUC score of approx. 0.94, and a few false-positive rates. It means our model logistic regression for tweets shows high classification accuracy (Figs. 11, 12, 13 and 14). In figures mentioned above, the true positive rate is good and has a ROC AUC score of approx. 0.78% and less false-positive rate. It means our model logistic regression for news articles shows moderate classification accuracy.

Fig. 11 Confusion matrix of logistic regression for news

Comparison of Public and Critics Opinion About the Taliban …

447

Fig. 12 ROC AUC score: 0.7825344140590618

Fig. 13 Confusion matrix of SVM for tweets

From the above figure, true positives are very high and have a ROC AUC score of approx. 0.95 and very few false-positive rates which means our model SVM for tweets shows excellent classification accuracy (Figs. 15 and 16).

448

M. M. Reza et al.

Fig. 14 ROC AUC score: 0.9531175474462

Fig. 15 Confusion matrix of SVM for news articles

In this figure mentioned above, true positive rate seems good and has a ROC AUC score of approx. 0.79 and less false-positive rate. It means our model SVM for news articles shows moderate classification accuracy (Figs. 17 and 18).

Comparison of Public and Critics Opinion About the Taliban …

449

Fig. 16 ROC AUC score: 0.7891636946832367

Fig. 17 Confusion matrix of random forest for tweets

From the above figures, true positive rates are good and have a ROC AUC score of approx. 0.83, and false-positive rates are less. It means our model random forest for tweets shows good classification accuracy (Figs. 19 and 20).

450

M. M. Reza et al.

Fig. 18 ROC AUC score: 0.8289257610760736

Fig. 19 Confusion matrix of random forest for news articles

The figure mentioned above shows true positive rates are fair and have a ROC AUC score of approx. 0.72 and less false-positive rate. It means our model random forest for news articles has moderate classification accuracy (Figs. 21 and 22). By applying the text analyzer “TextBlob” to preprocessed tweets for sentiments, 42.5% of tweets were positive, 33.3% neutral, and 25.3% negative tweets. Here, in

Comparison of Public and Critics Opinion About the Taliban …

451

Fig. 20 ROC AUC score: 0.7278118535039703

Fig. 21 Pie chart of tweets sentiments using TextBlob

news articles related to the Taliban-led Government, we observed surprisingly 71.4% of media houses were in favor of the Taliban and supported the new government of the Taliban based on the dataset collected from the tweets link. And 22.2% of news articles have opposite views on the Taliban-led government and the rest 6.4% had a neutral opinion.

452

M. M. Reza et al.

Fig. 22 Pie chart of article/blog sentiments using TextBlob

4.1 Comparison of Sentiments of Tweets and News Articles We used the Python library TextBlob for NLP works amazingly as a sentiment analyzer. From the above figure, the orange color column represents the tweet’s sentiment percentage and the blue color column represents the news articles’ sentiment percentage. A comparison table showing positive sentiments of news articles toward the Taliban-led government were higher than tweets sentiments collected from popular hashtags which were 71.36% of media houses (news articles) and 41.48% (tweets) of public opinions in favor of the Taliban-led government (Fig. 23). On the other hand, the comparison table shows negative sentiments of tweets at 25.31% which was higher than the news articles with negative sentiments at 22.19%. And for the neutral sentiment on Twitter tweets were also in a huge amount of 33.20% and very less in news articles of 6.45%.

Fig. 23 Comparison of tweets and news article sentiment percentage

Comparison of Public and Critics Opinion About the Taliban …

453

5 Conclusion and Future Work In this work, we performed our approach of sentiment analysis by three different ML models with the size of the dataset from 185,657 tweets and 10,050 news articles and observed the result for public opinions by analyzing sentiments of users around the world about the Taliban-led Afghanistan Government from August 6, 2021, to May 11, 2022. We generated a dataset with help of Tweepy and snscrape Python library for the said models by tweets and validated links from Twitter. We analyzed and found surprising critic opinions of the public. In tweets, 41.48% of user tweets were having positive sentiment toward the Taliban-led Afghanistan Government, and 25.31% of Twitter user tweets were having negative sentiment toward the Taliban-led Afghanistan Government, whereas 33.20% of tweets acted neutral. In other news articles, 71.36% of the data showed positive content, 6.4% were neutral, and 22.19% of data were against the New Taliban-led Afghanistan government. In the future, deep learning methods could be incorporated which result in higher score. Also, other sentiment analyzers are available which also can be incorporated for good accuracy score.

References 1. Tyagi P, Tripathi R (n.d.) A review towards the sentiment analysis techniques for the analysis of twitter data. https://ssrn.com/abstract=3349569 2. Fadel I, Öz C (2020) A sentiment analysis model for terrorist attacks reviews on twitter. Sakarya Univ J Sci https://doi.org/10.16984/saufenbilder.711612 3. Ahmad S, Asghar MZ, Alotaibi FM, Awan I (2019) Detection and classification of social mediabased extremist affiliations using sentiment analysis techniques. Human-Centric Comput Inf Sci 9(1) https://doi.org/10.1186/s13673-019-0185-6 4. Murni TH, Fahrurozi A, Sari I, Lestari DP, Zen RIM (2019) Hybrid method for sentiment analysis using homogeneous ensemble classifier. In: 2019 2nd international conference of computer and informatics engineering (IC2IE), pp 232–236. https://doi.org/10.1109/IC2IE4 7452.2019.8940896 5. Lee E, Rustam F, Ashraf I, Washington PB, Narra M, Shafique R (2022) Inquest of current situation in Afghanistan under Taliban rule using sentiment analysis and volume analysis. IEEE Access 10:10333–10348. https://doi.org/10.1109/ACCESS.2022.3144659 6. Hussein DJ, Rashad MN, Mirza KI, Hussein DL (2022) Machine learning approach to sentiment analysis in data mining. Passer J Passer 4:71–77. https://doi.org/10.24271/psr.43 7. Shehu HA, Tokat S, Sharif MH, Uyaver S (2019) Sentiment analysis of Turkish twitter data. AIP Conf Proc 2183. https://doi.org/10.1063/1.5136197 8. Kamyab M, Tao R, Mohammadi MH, Rasool A (2018) Sentiment analysis on twitter: a text mining approach to the Afghanistan status reviews. ACM Int Conf Proc Ser 14–19. https://doi. org/10.1145/3293663.3293687 9. Ba¸sarslan MS, Kayaalp F (2022) Sentiment analysis with ensemble and machine learning methods in multi-domain and dataset. Turk J Eng. https://doi.org/10.31127/tuje.1079698 10. Wagh B, Shinde JV, Wankhade NR (2016) Sentimental analysis on twitter data using Naive Bayes. IJARCCE 5(12):316–319. https://doi.org/10.17148/ijarcce.2016.51273

454

M. M. Reza et al.

11. Bahar HM (2020) Social media and disinformation in war propaganda: how Afghan government and the Taliban use twitter. Media Asia 47(1–2):34–46. https://doi.org/10.1080/01296612. 2020.1822634 12. Jost JT, Barberá P, Bonneau R, Langer M, Metzger M, Nagler J, Sterling J, Tucker JA (2018) How social media facilitates political protest: information, motivation, and social networks. Polit Psychol 39:85–118. https://doi.org/10.1111/pops.12478 13. Sharif W, Mumtaz S, Shafiq Z, Riaz O, Ali T, Husnain M, Choi GS (2019) An empirical approach for extreme behavior identification through tweets using machine learning. Appl Sci (Switzerland) 9(18). https://doi.org/10.3390/app9183723 14. Hussain S (2016) Media coverage of Taliban: is peace journalism the solution? Asia Pacific Media Educator 26(1):31–46. https://doi.org/10.1177/1326365X16640340 15. Khder MA (2021) Web scraping or web crawling: state of art, techniques, approaches and application. Int J Advan Soft Comput Appl 13(3):144–168. https://doi.org/10.15849/ijasca. 211128.11

Vehicle Object Detection Using Deep Learning-Based Anchor-Free Detector Mansi Verma and Maheshwari Prasad Singh

Abstract Detection of road objects is crucial for autonomous vehicles, and it needs constant improvement. There are certain challenges in this area of computer vision such as extreme weather conditions, blurred images. These challenges limit the object detection performance. In this paper, we analyze and enhance the approach called CenterNet, which is an anchor-free detector. Dataset used in this paper comprises of real-time data, which is grouped in different ways to accommodate the range of realworld scenarios. The experimental results indicate that the suggested method can greatly increase object identification performance in a variety of traffic situations. Keywords Anchor-free detectors · Deep learning · Object detection · Traffic objects

1 Introduction Object detection is regarded as a fundamental measure that enables the system to distinguish things because it is crucial in interpreting and absorbing image contexts. Object detection localize and classifies the various objects. Detecting road objects helps in safety of driving and increases the travel efficiency by reducing the accidents. Although, there are many challenges in the object detection domain, such as illumination conditions, viewpoint variation. Object detection is one of the critical responsibilities since it assists the vehicle in detecting impediments and determining the vehicle’s future course. As a result, extremely accurate object detection methods are required. With the effective use of deep learning methods, object detection performance has increased dramatically in recent years. Deep learning in object detection helps in

M. Verma (B) · M. P. Singh National Institute of Technology Patna, Patna, India e-mail: [email protected] M. P. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_34

455

456

M. Verma and M. P. Singh

extracting features and obtaining the best results. For object detection, deep neural networks [1, 2] play a significant role. Object detection can be categorized into two parts: anchor-based detectors and anchor-free detectors. Anchor-based detectors comprise of two subcategories: singlestage detectors and two-stage detectors. SSD [3], YOLO [4] are examples of singlestage detectors, and RCNN [5], Fast RCNN [6], Faster RCNN [7] are examples of two-stage detectors. The main drawback of these detectors is that they miss oddly shape objects and post-processing. Also, their detection performance is significantly influenced by the anchors’ hyperparameters, such as size, number of anchors, choosing the optimal anchor. There is currently no efficient way for automatically adjusting these hyperparameters, they must be calibrated manually. To overcome the limitation of anchor-based detectors and to improve the flexibility of the detectors, anchor-free detectors was introduced. Anchor-free detectors avoids the use of anchor boxes. It uses keypoints to create the bounding box. CornerNet [8], ExtremeNet [9], and CenterNet [10] are examples of anchor-free detectors. Convolutional neural networks (CNNs) depend on a huge collection of data to achieve optimal accuracy. The recent introduction of immensely popular state-ofthe-art datasets has contributed to the advancement of enhanced bandwidth object detectors capable of detecting a wide range of object classes. The ILSVRC ImageNet dataset [11] contains 200 classes, the PASCAL VOC 2007 dataset [12] contains about 20 classes, and the COCO dataset [13] contains over 80 classes. As a result, a dataset containing a significant number of images are required for complete application of the CNNs under discussion. The annotation, gathering, and retrieval of this kind of dataset, however, is a significant hurdle in the entire object detection workflow. In addition to just being tiresome, tedious, and slow, there may be instances where availability of data is limited. As a result, the key task of dataset creation is critical for improving the performance of models trained with deep convolutional neural networks. The objective of this research is to prepare a custom dataset for object detection. A sample image of the dataset is given in Fig. 1. An overview of related work is provided in Sect. 2. Section 3 describes the CenterNet detection model in detail. Section 4 provides an examination of the implementation details of the proposed method. Section 5 provides the conclusion and future scope of the paper.

2 Related Work Numerous approaches have been proposed for the object detection tasks. Author proposes [14], the atrous spatial pyramid pooling (ASPP) method which includes the space to depth technique to optimize the typical down sampling procedure. They extracted features from several scales in order to increase detection efficiency while minimizing computing cost and parameter count. The efficacy of the proposed method was tested using driving dataset (BDD100K) [15]. The limitation was that it was able to efficiently identify smaller objects.

Vehicle Object Detection Using Deep Learning-Based Anchor-Free …

457

Fig. 1 Sample image

FII CenterNet [16] incorporates an anchor-free technique which provides foreground information that eliminates the interference of complicated background information in traffic scenario. The foreground region proposal (FRP) network divides the foreground using segmentation. FRP is the region proposal network for anchor-free detectors. The proposed model was tested using KITTI dataset [17] and PASCAL VOC 2007 dataset. Author presents [18] soft weighted average ensemble object detector which is created using two different object detection methods. The limitations and advantages of algorithms such as YOLOv3 [19] and Faster RCNN [7] are taken into account and an ensemble method is created. The proposed method was analyzed on KITTI dataset [17]. Author proposed [20] that instead of bounding boxes, bounding ellipses should be used. The testing is performed on the open dataset UA-DETRAC [21]. Images in the dataset are that of the surveillance camera, which is a limitation as it works only on the aerial images. Tinier-YOLO [22] is an anchor-based approach, which reduces the model size. In this proposed model, the object detection performance is enhanced by including passthrough layer. Passthrough layer helps in providing fine features even though the model size is reduced by providing features from front layers. The dataset used for this experiment are COCO and PASCAL VOC. This approach does not provide better results on PASCAL VOC dataset. The mentioned studies are analyzed on the popular datasets like COCO dataset, KITTI dataset, BDD100K dataset, PASCAL VOC dataset, whereas real-time dataset contains never seen images and the performance of the object detection model can be analyzed accurately. Some proposed approaches in the mentioned studies had certain limitations with respect to dataset. Therefore, a custom real-time dataset is generated and then analyzed on a deep learning-based anchor-free detector.

458

M. Verma and M. P. Singh

Fig. 2 Images captured under several conditions: a blurred and b low illumination

3 Problem Definition Detecting road objects for autonomous driving is a crucial task, there are many challenges in this area of computer vision such as blurred images, low illumination, variation in lighting, and complicated backgrounds. In this paper, a real-time dataset is used and the images have been captured under a variety of situations as seen in Fig. 2. Generally, the anchor-based detectors are fast as compared to anchorfree detectors. However, the anchors’ hyper-parameters increase computation cost. Anchor-free methods provide flexibility to the detectors, and they do not depend on predefined anchors. CenterNet seems to have a reasonably high detection results because keypoint estimate network is used. However, it seems to have two major issues that must be solved. While hourglass-104 achieves acceptable performance results, the huge module results in a high processing cost, significantly restricting its real-time usage in self-driving vehicles. As a result, an enhanced way to resolve these issues is proposed.

4 Proposed Solution To address the mentioned problem definition, an anchor-free-based object detection model is used. The method described in this study for identifying vehicle objects is divided into two parts. The creation of the dataset is included in the first part. Real-time images are collected and utilized as input data for neural network training. The trained neural network is then used to detect various vehicle objects, producing bounding box coordinates and associated confidence score as output. In the second stage, we analyze and enhance the performance of CenterNet detection model.

Vehicle Object Detection Using Deep Learning-Based Anchor-Free …

459

Fig. 3 Augmented images: a brightness transformation and b flipped image

4.1 Dataset Creation Images of roadside vehicles were acquired using a digital camera. The location where images were acquired is located under the geographical coordinates of latitude 25°37' 11'' N and longitude 85°10' 43'' E, Ashok Rajpath, Patna, Bihar, India. The original dimensions of the images were 3000 × 4000 pixels. Data augmentation methods is used, and the following images are then annotated. Image Augmentation The dataset originally contained 50 images, these images were then enlarged to 500 images with the help of data augmentation methods, and this helps in enhancing the richness of the real-time dataset. ImageDataGenerator class of Keras [23] is used for the image augmentation methods. The images was preprocessed using the Keras ImageDataGenerator in terms of size, brightness, rotation, horizontal flip, and the dataset was augmented as shown in Fig. 3. Image annotation bounding boxes was drawn by manual annotating onto the objects in the input images with the help of a custom software LabelImg. In order to train the object detection model, XML format label files were generated corresponding to the input images. The real-time dataset was divided into 80% training and 20% testing. LabelImg software helps to easily annotate images and generates the required XML files, and this is helpful for creating custom dataset.

4.2 Object Detection Model The CenterNet model is an anchor-free object detection model that depends on key point estimation. Objects are considered as a single point in CenterNet and heatmaps

460

M. Verma and M. P. Singh

Convolution

Hourglass module

Fig. 4 Detection model of CenterNet

are used to estimate object centers. The heatmap is generated with a Gaussian kernel and a fully convolutional network, and the predicted centers are calculated by looking at the peak values of the heatmap. Using the center localization, object properties such as size and shape can be regressed simply without any prior anchor, and therefore, this detection model outperforms most models. As shown in Fig. 4, the architecture of CenterNet model includes pre-processing module which consists of convolutional module and pooling, hourglass backbone and the output is divided into three layers namely heatmap, offset, and object properties. The convolutional module in the architecture contains Conv layers, batch normalization, and ReLU activation function. In order to predict the properties and key points of the objects, a key point estimates network lite-hourglass was included as the backbone. As discussed above, objects’ center is represented through only one key point. Heatmap H ∈ [0, 1]XYC label is represented by a Gaussian kernel as given in Eq. (1), where heatmap keypoints coordinates are given by (X, ( Y), total ) object classes are given by C, ground truth key point coordinates are Xgt , Ygt , adaptive standard deviation of object size is σ. HXYC

[ ( )2 ( )2 ] X − Xgt + Y − Ygt = exp − 2σ 2

(1)

The object properties S ∈ RXY 2 include object height and width that are saved in the remainder channels according to the key point’s location. Sk ∈ (W , H ) represents the size of object k, and the object’s height and width are denoted by W and H. The ˆ When restoring the heatmap to the full-size estimated object size is denoted by S. prediction, offsets O ∈ RXY 2 are employed to adjust the center point location. The offset prediction is the same across classes. Ok ∈ (WO , HO ) represents the offset of object k, and the coordinate offsets of the object center’s height and width are ˆ denoted by W O and H O . The predicted offsets are represented by O. Loss = LH + λsize Lsize + λoff Loff

(2)

Heatmap loss, offset loss, and object size loss are the respective losses of CenterNet detection model. The mentioned loss functions along with their meaning are represented in Table 1. The loss function of heatmap is primarily based on focal loss. L1 loss is used to train both offset and object size loss functions. As a result, the overall loss is given by Eq. (2) (Fig. 5).

Vehicle Object Detection Using Deep Learning-Based Anchor-Free …

461

Table 1 Loss equations of heatmap loss, offset loss, and object size loss Loss LH

−1 N

 xyc

Equations ⎧( )α ) ( ⎨ 1 − Hˆ xyc log Hˆ xyc )α ) ( ( ) ( ⎩ 1 − Hxyc β Hˆ xyc log 1 − Hˆ xyc ,

L off

1 N

L size

1 N

if H xyc ≥ 1 otherwise

| | |ˆ | − O O | | k k k=1 | N || | ˆ k=1 |Sk − Sk |

N

Fig. 5 Detection results of vehicles using CenterNet detection model: a real images and b images with detected results

462

M. Verma and M. P. Singh

5 Experiment and Result Analysis For the proposed model, our method is evaluated on real-time data images. The goal of this research is to detect persons, vehicles, and various road objects in the gathered traffic environment photos. The dataset also implements some data augmentation methods in order to provide enhanced results. The dataset used in this research contains 500 real-time images. All of the collected data was divided into three sets: a train set that contains 70% (350 images) of the images, a validation set which contains 10% (50 images) of the images, and a test set that contains 20% (100 images) of the images. The CenterNet model was trained and tested in the PyTorch deep learning environment using a graphics processor unit. The research was conducted out on Google Colaboratory using the GPU Tesla K80. PyTorch was employed to our proposed method. The research was conducted out on Google Colaboratory using the GPU Tesla K80. The Google Colaboratory Research tool allows developers to generate and run Python code straight from their browser. Various deep learning tasks are performed on Google Colab, as it is very much helpful. It is a hosted Jupyter notebook in which there is no need of installation and gives a free version along with GPU, TPU Google processing resources. For evaluating our proposed approach, COCO API was used for evaluation. COCO API is an authorized toolkit included with the COCO dataset that includes an eval component and utilizes average recall (AR) and average precision (AP) as the primary performance metrics. The results of these experiments of all evaluated methods were converted into the toolkit’s required format. The intersection over union (IoU) criterion of 0.5 was used to calculate AP. In the data augmentation step, input images were resized to 512 × 512 pixels in order to adjust the input needed for the key point estimate backbone of lite-hourglass framework. To further observe the training process, the batch size is selected to 8 and the maximum number of epochs was set to 20. Other factors such as momentum, initial learning rate are referred to the CenterNet [10] model’s default parameters. The training procedure was performed during training stage, and object detection model was trained once the training parameters was specified. The optimization method Adam was used to optimize network weights iteratively based on training data in order to reduce the training loss. The qualitative detection results when using the anchor-free proposed detection model are shown in Table 2. The results shown in the pictures have bounding boxes on the respective object along with their confidence scores. The results also show that our proposed solution of anchor-free detector can provide satisfactory performance. Situations like low illumination and blurred images are a challenge, but the proposed model provided better results. The performance results in conclusion, support the efficacy, and performance of our proposed ASPP-CenterNet. Our suggested method’s detection advancements can be ascribed to the following factors: In automated driving fields, we proposed the anchor-free way to replace the old anchor-based approach. It effectively predicts

Vehicle Object Detection Using Deep Learning-Based Anchor-Free … Table 2 Performance of AP and AR on the custom dataset

463

IoU

Area

MaxDets

Accuracy

Average precision (%) 0.50:0.95

All

100

47.5

Average precision (%) 0.50

All

100

84.6

Average precision (%) 0.75

All

100

42.4

Average precision (%) 0.50:0.95

Small

100

15.5

Average precision (%) 0.50:0.95

Medium

100

33.6

Average precision (%) 0.50:0.95

Large

100

48.0

Average recall (%) 0.50:0.95

All

1

46.7

Average recall (%) 0.50:0.95

All

10

46.7

Average recall (%) 0.50:0.95

All

100

56.3

Average recall (%) 0.50:0.95

Small

100

56.7

Average recall (%) 0.50:0.95

Medium

100

46.2

items using object center points, and a high precision heatmap enhances detection accuracy. The pooling module collects fine-grained characteristics from multi-scales while incurring little computational expense. Lite-hourglass backbone instead of the hourglass backbone, which improves the road object detection performance.

6 Conclusion In this research paper, a custom real-time dataset was generated and then, the performance was analyzed and enhanced using the anchor-free approach for vehicle object detection, namely CenterNet [10]. The object detection model achieves an average precision (AP) of 83.6% and average recall (AR) of 56.4% on the test data. The paper makes the following contributions: (1) a real-time data of various vehicle objects is generated and pre-processing is performed in terms of size and brightness (2) object detection model of CenterNet using deep learning helps in detecting the objects on real-time dataset. As a result, it is necessary to make sure that the real-time dataset has a good balance of scene diversity, variation, and noise. We assess the effectiveness of these major approaches during the creation of real-time

464

M. Verma and M. P. Singh

dataset and examine their transferability using the object detection model. Furthermore, total performance can be increased by using an ensemble of multiple networks based on their pros and cons. Segmentation can also be considered while detecting objects.

References 1. Boukerche A, Hou Z (2021) Object detection using deep learning methods in traffic scenarios. ACM Comput Surv 54(2) 2. Ayyagari MR, Dargan S, Kumar M (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Computat Methods 27:1071–1092 3. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer International Publishing, Cham, pp 21–37 4. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788 5. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587 6. Girshick R (2015) Fast r-cnn. In: 2015 IEEE international conference on computer vision (ICCV), pp 1440–1448 7. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advan Neural Inf Process Syst 28 8. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750 9. Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 850–859 10. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850 11. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255 12. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338 13. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Springer International Publishing, Cham, pp 740–755. 14. Li G, Xie H, Yan W, Chang Y, Xingda Q (2020) Detection of road objects with small appearance in images for autonomous driving in various traffic situations using a deep learning based approach. IEEE Access 8:211164–211172 15. Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, Madhavan V, Darrell T (2020) Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2636–2645 16. Fan S, Zhu F, Chen S, Zhang H, Tian B, Lv Y, Wang FY (2021) Fii-centernet: an anchorfree detector with foreground attention for traffic object detection. IEEE Trans Veh Technol 70(1):121–132 17. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)

Vehicle Object Detection Using Deep Learning-Based Anchor-Free …

465

18. Wang H, Yu Y, Cai Y, Chen X, Chen L, Li Y (2021) Softweighted-average ensemble vehicle detection method based on single-stage and twostage deep learning models. IEEE Trans Intell Veh 6(1):100–109 19. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804. 02767 20. Yu B, Shin J, Kim G, Roh S, Sohn K (2021) Non-anchor-based vehicle detection for traffic surveillance using bounding ellipses. IEEE Access 9:123061–123074 21. Wen L, Du D, Cai Z, Lei Z, Chang M-C, Qi H, Lim J, Yang M-H, Lyu S (2020) UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput Vis Image Underst 22. Fang W, Wang L, Ren P (2020) Tinier-yolo: a real-time object detection method for constrained environments. IEEE Access 8:1935–1944 23. Chollet F, et al (2015) Keras

An Advanced Approach to Detect Plant Diseases by the Use of CNN Based Image Processing Sovan Bhattacharya, Ayan Banerjee, Saikat Ray, Samik Mandal, and Debkanta Chakraborty

Abstract Deep learning has been used to deliver state-of-the-art accuracy in various important applications such as object detection, speech recognition etc. Recently, this technology showed the importance of its applications to agriculture and boosted its growth. Deep learning has become an efficient tool for solving real problems such as smart agriculture. In this article, CNN models like RELU and Mask-CNN have been used to a great effect in detecting plant diseases. Image data-set from agricultural lands is included in the Deep Learning model and trained to detect the type of illness that affects each area. Real-time recording is collected with the help of modified drones that capture images of the field along with its longitude and latitude. It is not always possible for farmers to find the right illness on the spot at that time, that too manually. Therefore, this technology sends notifications on a regular basis about the field condition to respective farmers and suggests the farmers to take necessary measures at the appropriate time. Keywords CNN · Digital image processing · TensorFlow · SqueezeNet · MASK-R-CNN · Plant diseases

A. Banerjee, S. Ray, S. Mandal and D. Chakraborty contributed equally to this work. S. Bhattacharya (B) · A. Banerjee · S. Ray · S. Mandal · D. Chakraborty Department of CSE, Dr. B. C. Roy Engineering College, Jemua Road, Durgapur 713206, West Bengal, India e-mail: [email protected] D. Chakraborty e-mail: [email protected] S. Bhattacharya Department of CSE, NIT Durgapur, Mahatma Gandhi Road, Durgapur 713209, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_62

467

468

S. Bhattacharya et al.

1 Introduction India is a developing country where the main economy depends on agriculture. About 70% of the Indians mainly depend upon agriculture directly. Hence, crops play a vital role. Agriculture is an area that needs to be given a lot of attention. Although a lot of measures have already been taken, still crop failure is a common issue every year in some or the other part of the country. According to a survey of total yield, nearly 20% of the world’s crop yield is hampered by different weeds and diseases every year, and the loss by pests reaches as high as 60–70%. This massive crop loss is the cause for the farmer to use an enormous amount of pesticides due to different causes whose majority include pests. One of the most important problems now exists is Pest Infection on plants. In this article, we tried to detect the diseased crop of an agricultural field by the use of Image Processing with the help of Convolutional Neural Network (CNN). In Fig. 1, the overview of CNN is depicted clearly. Firstly, the images of different plants or leaves of different plants are given as input. Then, pooling and extraction of features are done of the given input image. Lastly, the features are analyzed by the CNN model, and classification is done based on trained data sets. Other authors [1] used squeezenet, alexnet models, mainly but we used CNN as it gave us a premium quality and desired output. The focus of this paper is on the interpretation of image for pest detection In contrast to the existing work, it is observed some of the articles are based on different improvements of ML Models [2] using CNN and SVM to detect plant diseases. Deep Learning methods are also used [3], using UAV device. YOLOv5, an advanced version of Pytorch is used for obtaining the accurate categorization and identification of rubber tree powdery mildew and anthracnose in natural light conditions [4]. In this article, we developed a new model based on CNN that will detect diseased plants and weeds automatically and more accurately

Fig. 1 Framework CNN models

An Advanced Approach to Detect Plant Diseases …

469

Motivation and Objective So, we observed the motivation of the existing articles. Firstly, the main aim was to develop the detection of the plant diseases using the ML & DL techniques [2], [3]. Secondly, our main objective of this article is detecting the plant diseases and weeds in agricultural fields by a faster as well as low-cost approach. One of the main motive is to automate this tedious task into a simpler and more biologically optimized one using only CNN, which is used by very few existing work. Lastly, using a new developed ML model will give better predicted results and the enabled False Detection Accuracy generated from the F1 confidence scores and feature maps rendering its weed tracking capability more efficient than R-CNN, robotics oriented smart model will be one of our main focus for improving the model. Challenges and Contribution Our main objective in the article is to flourish a new Machine Learning based technique that will predict plant diseases or weeds of an agricultural field. It is assumed that 3 × 3 receptive fields contains a greater amount of non-linearity functions making the decision function discriminatory. The Softmax function was added to the last layer of the backbone network to fuse indexes into probability range the real challenge is to create it as per the real world’s point of view, which we believe we have succeeded to. We performed the analysis on the training data sizes of 80, 70, 60, 50% of the dataset. Our proposed model can automatically and accurately determine pest infected/diseased plant via its leafs. The article is arranged into four major sections: Introduction followed by Research review, then Dataset Preparation and then Methodology after that Experiment with Result and Analysis, finally, conclusion and future work.

2 Research Review Mohanty et al. [1], proposed a model in which they identified 14 crop species from PlantVillage AlexNet, GoogLeNet datsets. Among those 14 species they identified 26 plant diseases. In Chen et al. [4], they identified rice and maize leaf diseases. About 500 images of rice and 466 images of maize were used among which 13,689 images were of infected leaves. The image generation method proposed in this article can develop the usefulness of the CNN model. In Fuentes et al. [5], they detected diseased and pests infected plants by the use of images manually taken with the help of camera with less number of samples. Hence, the precision was lower in practical applications. In Brahimi et al. [6], they classified nine diseases found in tomato leaves. They collected 14,828 images from AlexNet and GoogLeNet. Mishra et al. [7], they identified two corn leaf diseases i.e. Rust and Northern leaf blight. They made the use of Plant Village data set and some real-time images and implemented DCNN (Deep Convolutional Neural Network) on them. In the paper Gurjar et al. [8], they developed alternative control measures of plant disease to detect plant diseases caused by synthetic fungicides. In Nutter et al. [9], they reviewed the ideas of different

470

S. Bhattacharya et al.

diseases, also the methods, precautions and aids those are now-a-days available to enhance the accuracy and precision of visually-based disease severity data. In the paper Poornappriya et al. [10], they introduced a new approach to detect rice plant disease with the help of artificial intelligence. In Aftab et al. [11], they made the use of IOT devices such as Raspberry Pi, and captured the real-time images of the fields using a camera connected to it. Then they implemented image processing techniques to identify different kind of plant diseases in the farmlands. In Venkataramana et al. [12], they proposed an unique Deep Learning Integration approach that includes SVM and CNN algorithm to detect abnormalities in brinjal plants. In Rao et al. [13], they proposed the automated detection of plant diseases using three fold work, BiCNNs, enhanced VGG, and finally applied for real-world application. In Sun et al. [14], they proposed a new CNN based method to overcome the limitations of manual extraction of features using digital image processing. In Kumar et al. [15], proposed a time-delay mathematical model to analyze plant diseases using 3-D plots.

3 Dataset Preparation We have used the “New Plant Disease Dataset”1 which has more than 0.66 upvotes per post ratio and was downloaded by 25,128 users to date. The data set consists of 87,867 RGB images of diseased and healthy crop leaves which are classified into 38 different classes. We scanned 70,294 images from 38 different classes from the training directory whereas 17,572 images from 38 classes from the validation directory. Each class label is either a crop disease or a healthy crop and we are predicting the crop disease from the given image of the plant leaf. Here is the overview of the images in a tabular format (Table 1).

4 Methodology In this section, we discussed about the different steps we used in this article to reach our targeted results. From the Fig. 2, we have seen the three steps are explained in below. • Step 1: Image Capturing The images from the newly available plant disease datasets are collected, which contains the images of different category of diseased plants. The plants are classified according to their appropriate classes. In future, the real-time images can be captured using Arduino or raspberry pi connected with IP camera. • Step 2: Filtration of the Image The training and testing images are collected. The unwanted or irrelevant images from the folders needs to be removed, otherwise 1

https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset.

An Advanced Approach to Detect Plant Diseases … Table 1 Statistic of new plant disease dataset Plant name No. of categories No. of images of diseased plant (disease) Apple Blueberry Cherry Corn Grape Peach Pepper Potato Raspberry Soyabean Strawberry Tomato

3 0 1 3 3 1 1 2 0 0 1 9

5763 0 1683 5457 5530 1838 1913 3878 0 0 1774 18,380

471

Category of the healthy plant

No. of images (healthy)

1 1 1 1 1 2 1 1 1 1 1 1

2008 1816 1826 1859 1692 1728 1988 1824 1781 2022 1824 1926

Fig. 2 Steps of methodology

the training of the model will not not be up to the mark. This will lead to incorrect validation and testing result and the accuracy will be reduced. Hence, the irrelevant or erroneous images are manually eliminated and prepared for the next step. • Step 3: Image Identification Based on Deep Learning (DL) The essential idea behind deep learning is that data characteristics are retrieved by numerous hidden layers, each of which can be compared to a perceptron, using neural networks for data analysis and features learning. Use of CNN network model consists of the input layer, convolution layer, pooling layer, full connection layer, and output layer. In one scenario, the neurons from the convolution layer and the neurons from the pooling layer are connected, but there is no need for a complete connection between them.

472

S. Bhattacharya et al.

5 Experiment and Results In this section we discussed the different types of experiments, that we performed with our model to find the Accuracy, Precision, Recall, and F1-score of our proposed model.

5.1 Evaluation Metric Evaluation metrics are used to assess the quality of models and compare various algorithms. F1-score measures an experiment’s accuracy, calculated from the precision (P) and recall (R) of the experiment, where precision is the proportion of positive objects by the classifier. Simultaneously, it is positive, and the recall score denotes what proportion of objects of a positive class from all objects of a positive class the algorithm found. The computation formulas of accuracy, Precision, Recall, and F1-score is illustrated through Eqs. 1, 2, 3, 4, respectively. TP + TN TP + TN + FP + FN

(1)

P=

TP TP + FP

(2)

R=

TP TP + FN

(3)

Accuracy( A) =

F1-score =

2∗ P ∗ R P+R

(4)

5.2 Variation of Training Ratio with Fixed Data Size Here we have taken the model and executed it with varying training ratios for the optimized and efficient scores and presented in Fig. 3. The Training sizes are varied from 0.8, 0.7, 0.6, 0.5 of the available data from dataset. The Accuracy, Precision Score, Recall and F1 score are presented in Table 2. We have observed that Peach and Cherry types of classes plants accuracy is higher than tomato and potato types of plants. Another observations in this experiment Peach and Cherry types of classes plants accuracy is decreases with increase of training size. In case of tomato and potato types of classes plants accuracy is increases with increase of training size. In the same way precision, recall and F1-score also improved.

50

A 0.72 0.82 0.86 0.82 0.84 0.87 0.83 0.90 0.96

Training ratio

Plants Tomato Potato Grapes Pepper Corn Apple Peach Strawberry Cherry

F1 0.71 0.82 0.87 0.81 0.84 0.88 0.82 0.90 0.96

P 0.77 0.87 0.88 0.86 0.843 0.89 0.87 0.90 0.96

R 0.72 0.81 0.87 0.81 0.83 0.88 0.83 0.90 0.96

A 0.75 0.81 0.89 0.97 0.87 0.86 0.96 0.93 0.95

60 F1 0.75 0.81 0.90 0.97 0.86 0.87 0.96 0.93 0.96

P 0.80 0.85 0.91 0.97 0.88 0.87 0.96 0.93 0.79

R 0.75 0.8 0.90 0.97 0.86 0.87 0.96 0.93 0.987

A 0.75 0.83 0.83 0.82 0.93 0.84 0.88 0.91 0.87

70 F1 0.78 0.83 0.84 0.83 0.93 0.84 0.88 0.91 0.87

P 0.80 0.86 0.86 0.83 0.93 0.87 0.88 0.92 0.88

Table 2 Accuracy (A), precision (P), recall (R) and F1 score of different training ratio with fixed data size

R 0.78 0.84 0.84 0.82 0.93 0.84 0.88 0.92 0.89

A 0.75 0.85 0.81 0.83 0.85 0.87 0.90 0.88 0.90

80 F1 0.75 0.85 0.82 0.83 0.85 0.87 0.90 0.89 0.90

P 0.80 0.88 0.88 0.84 0.86 0.89 0.92 0.90 0.91

R 0.75 0.85 0.82 0.83 0.85 0.87 0.89 0.88 0.90

An Advanced Approach to Detect Plant Diseases … 473

474

S. Bhattacharya et al. Accuracy 50

1.5 60

70

80

1 0.5

Score Value

Score Value

1.5

0

50

70

80

70

80

0.5 0

Plants with Different Training Ratio

Recall

1.5 50

60

1 0.5 0

70

80

Score Value

Score Value

60

1

Plants with Different Training Ratio 1.5

Precision

F1-Score 50

60

1 0.5 0

Plants with Different Training Ratio

Plants with Different Training Ratio

Fig. 3 Accuracy, precision, recall and F1 score of varying training ratio of all the plants with fixed training data size

5.3 Variation of Data Size with Fixed Training Ratio In this section, the data size of the experiment is varied keeping the training ratio fixed. Figure 4 depicts this conditions clearly. We have considered four data sizes, 32, 64, 90, and 128, and fixed the training ratio at 80%. The Accuracy, Precision score, Recall, and F1-score are presented in Table 3. We have observed that the Peach and Cherry types class’s plant accuracy is higher than tomato and potato types of plants. Another observation in this experiment Peach and Cherry types of class plant accuracy increases with the increase of training data size. In the case of tomato and potato types classes, plants’ accuracy decreases with the increase of training data size. Similarly, recall, precision, and F1-scores are also enhanced.

5.4 Variation of Epoch with Fixed Training Ratio and Fixed Data Size In this section, the Training Ration and The data size, both are kept fixed but the epoch of our CNN model is varied. The accuracy score is noted concerning the

R 0.76 0.85 0.82 0.83 0.86 0.87 0.90 0.88 0.94

A 0.78 0.83 0.84 0.83 0.93 0.84 0.88 0.91 0.99

P 0.81 0.89 0.88 0.84 0.86 0.89 0.92 0.90 0.94

A 0.76 0.85 0.82 0.84 0.85 0.87 0.90 0.88 0.97

Plants TOMATO POTATO GRAPES PEPPER CORN APPLE PEACH STRAWBERRY CHERRY

F1 0.75 0.85 0.82 0.83 0.85 0.87 0.90 0.89 0.96

64

32

Data size F1 0.78 0.83 0.84 0.83 0.93 0.84 0.88 0.91 0.99

P 0.81 0.87 0.86 0.83 0.93 0.88 0.89 0.93 0.97

R 0.78 0.84 0.84 0.83 0.94 0.84 0.88 0.93 0.96

A 0.76 0.81 0.90 0.97 0.87 0.86 0.96 0.93 0.99

90 F1 0.75 0.81 0.90 0.97 0.86 0.87 0.96 0.93 0.99

P 0.81 0.85 0.91 0.97 0.88 0.87 0.96 0.94 0.97

Table 3 Accuracy (A), precision (P), recall (R) and F1 score of different data size with fixed training ratio R 0.76 0.80 0.90 0.97 0.86 0.87 0.96 0.93 0.97

A 0.72 0.82 0.87 0.82 0.84 0.87 0.83 0.90 1.00

128 F1 0.71 0.82 0.87 0.81 0.84 0.88 0.82 0.90 1.00

P 0.77 0.87 0.88 0.87 0.84 0.89 0.87 0.90 1.00

R 0.72 0.82 0.87 0.82 0.84 0.88 0.83 0.90 1.00

An Advanced Approach to Detect Plant Diseases … 475

476

S. Bhattacharya et al. Accuracy 32

1.5 64

90

128

1 0.5

Score Value

Score Value

1.5

0

32

64

90

128

90

128

1 0.5 0

PLANTS WITH DATA SIZE Recall 32

64

1 0.5 0

1.5 90

128

Score Value

Score Value

1.5

Precision

PLANTS WITH DATA SIZE F1-Score 32

64

1 0.5 0

PLANTS WITH DATA SIZE

PLANTS WITH DATA SIZE

Fig. 4 Accuracy, precision, recall and F1-score of varying training data size of all the plants

Fig. 5 Epoch versus accuracy with fixed data size and training ratio in different plants

epoch and presented in the form of a line plot in Fig. 5. Among these set of nine plants, Cherry (Olive line) and strawberry(Green Line) show a constant behavior in every epoch.

An Advanced Approach to Detect Plant Diseases …

477

6 Conclusions and Future Scope In this article, we have evaluated the diseased plants’ images enabled with both image processing and the Deep Learning CNN framework. The better one is testified for our model creation and gives better prospects to the farmers. So, we can conclude that by application of Image processing and deep learning networks, we will be able to make better low-cost models and frameworks. Many mapping techniques were used but most of the researcher make use of the village plant data set, but in real-world pest patterns may not always be the same as the given training dataset. In CNN, the model puts desired gradient descent so that the ReLu and Softmax give the desired output. This model has a huge scope in the future on a larger scale as it is economical and can easily be interacted with. The future variant of this model can focus on deploying the module on mobile devices in the form of mobile applications and applying it in a more realistic way.

References 1. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7:1419 2. Turkoglu M, Yaniko˘glu B, Hanbay D (2022) Plantdiseasenet: convolutional neural network ensemble for plant disease and pest detection. Signal Image Video Process 16(2):301–309 3. Liang D, Liu W, Zhao L, Zong S, Luo Y (2022) An improved convolutional neural network for plant disease detection using unmanned aerial vehicle images. Nature Environ Pollut Technol 21(2):899–908 4. Chen Z, Wu R, Lin Y, Li C, Chen S, Yuan Z, Chen S, Zou X (2022) Plant disease recognition model based on improved yolov5. Agronomy 12(2):365 5. Fuentes A, Yoon S, Kim SC, Park DS (2019) A robust deep-learning-based detector for realtime tomato plant diseases and pests recognition. Sens Agric 1(17):153 6. Brahimi M, Boukhalfa K, Moussaoui A (2017) Deep learning for tomato diseases: classification and symptoms visualization. Appl Artif Intell 31(4):299–315 7. Mishra S, Sachan R, Rajpal D (2020) Deep convolutional neural network based detection system for real-time corn plant disease recognition. Proc Comput Sci 167:2003–2010 8. Gurjar MS, Ali S, Akhtar M, Singh KS (2012) Efficacy of plant extracts in plant disease management 9. Nutter FW, Esker PD, Netto RAC (2006) Disease assessment concepts and the advancements made in improving the accuracy and precision of plant disease data. Eur J Plant Pathol 115(1):95–103 10. Poornappriya T, Gopinath R (2022) Rice plant disease identification using artificial intelligence approaches 11. Aftab S, Lal C, Beejal SK, Fatima A (2022) Raspberry pi (python AI) for plant disease detection. Int J Cur Res Rev 14(03):36 12. Venkataramana A, Kumar KS, Suganthi N, Rajeswari R (2022) Prediction of brinjal plant disease using support vector machine and convolutional neural network algorithm based on deep learning. J Mobile Multimed 771–788 13. Rao DS, Ch RB, Kiran VS, Rajasekhar N, Srinivas K, Akshay PS, Mohan GS, Bharadwaj BL (2022) Plant disease classification using deep bilinear CNN. Intel Autom Soft Comput 31(1):161–176

478

S. Bhattacharya et al.

14. Sun X, Li G, Qu P, Xie X, Pan X, Zhang W (2022) Research on plant disease identification based on cnn. Cognit Robot 2:155–163. https://doi.org/10.1016/j.cogr.2022.07.001 15. Kumar P, Baleanu D, Erturk VS, Inc M, Govindaraj V (2022) A delayed plant disease model with caputo fractional derivatives. Adv Continuous Discrete Models 2022(1):1–22

Certain Investigations of MEMS for Optimised Sensor Coverage Abolfazl Mehbodniya, Muruganantham Rajamanickam, Julian L. Webber, D. Stalin David, Devi Mani, Rajasekar Rangasamy, and Sudhakar Sengan

Abstract In wireless sensor networks (WSN), area coverage is a non-trivial challenge due to a lack of knowledge about the relatively limited sensor node (SN) set needed to cover an area of study, as well as the constraints of energy reserve, control, and communication ranges. WSN performance in terms of area-coverage optimisation is a critical issue for the successful operation of any WSN. Research on WSN holes is reviewed in terms of its current state of development and the relative merits and drawbacks of the various solutions proposed to combat various types of holes. A. Mehbodniya · J. L. Webber Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Kuwait City, Kuwait e-mail: [email protected] J. L. Webber e-mail: [email protected] M. Rajamanickam Department of Information Technology, TKR College of Engineering and Technology, Telangana 500097, India e-mail: [email protected] D. Stalin David Department of Information Technology, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamil Nadu 600072, India e-mail: [email protected] D. Mani Department Of Computer Science, College of Science and Arts (Female), Sarat Abidah Campus, King Khalid University, Asir - Abha, Kingdom of Saudi Arabia e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru, Karnataka 561203, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_35

479

480

A. Mehbodniya et al.

Coverage holes can be detected by nodes working together autonomously in this paper’s distributed algorithms. Research papers were planned by node type, distribution process, data transfer and sensing range, complete coverage tracking, and global positioning system (GPS) approach. Keywords Wireless sensor network · IoT · MEMS · Coverage · FoI

1 Introduction A WSN is a geographically dispersed network of wirelessly interconnected devices. Sensor nodes (SNs) and at least one sink node, referred to as the base station, are the elements of a WSN [1]. MEMS feature enables WSNs cheap, medium-sized, and allowed to do more than one thing. They are used in road transport and infrastructural facilities and can monitor forest areas for animals or fires. All in all, WSNs can include hundreds or even thousands of sensors. High QoS can only be achieved by utilising a large number of highly efficient nodes in a distributed system. Given the random distribution of sensors, helping to ensure network connectivity with lower power consumption is a core flaw in a WSN. As a long-standing issue in WSNs, detecting coverage has received much attention recently. In contrast, most previous studies have focused on one type of redundancy, sensing/communication, and not the other. They discuss coordinating a single task to maintain network connectivity concurrently [2]. This study indicates that the transmission distance must be at least twice the capacity of the detecting range in order for a slightly curved area to be totally covered by mobile nodes. One of them, the coverage problem, is a hot topic for research. Positioning a set number of SN with predetermined sensing ranges in an optimal pattern in order to maximise the area they cover while minimising coverage gaps can be described as this problem. Coverage maximisation methodologies have been proposed in several ways. Based on two computational intelligence algorithms, the solution to this problem has been pursued in the research presented in this article. We present algorithms for the energyefficient detection of coverage holes in WSNs, which can be detected in a distributive manner. The proposed hole detection techniques can be combined in the light of the theoretical study of the intersection of the sensors capable of detecting chips. First, the coverage hole’s nodes are identified. This is done by obtaining the nodes’ locations, the overlap of their sensing coverage, and the non-overlapping region radius. The coverage hole can be discovered by cooperating with adjacent nodes [3, 4]. A. Sensing Unit: In a WSN, an SN is a sensing device that interacts with real objects. WSNs have two types of sensor networks: active and passive. An active WSN that also serves as a monitoring device measures the signal frequency level provided by the WSN and mirrored by the target. The sensor module requires an additional power source to generate signals. A passive sensor measures the signal power emitted by the physical environment. Passive SN does not need an external source to generate signals. Passive sensor networks use less power

Certain Investigations of MEMS for Optimised Sensor Coverage

481

than active ones [5]. Figure 1 shows the operation of passive and active SN. SN signals are digital or analogue. Digital and analogue produce binary and continuous signals. SN, respectively (Fig. 2). B. Communication Unit: Data can be transmitted and received, and packets can be controlled by the SN’s communication unit. Two-way communication in a WSN can occur via two different communication methods between different WSNs. In WSN literature [6], it has been shown that multi-hop communication can save energy in large-scale WSNs. C. Computing Unit: In an SN, the devices are interconnected to a processing unit that runs the operating system. For specific functions, WSN-SN employs personal computers, embedded devices, high-speed programmable logic controllers, communication processors, and electrical software devices [7]. The rest of the paper is Sect. 1 the introduction to the article. Section 2 is related to the works of existing research. Section 3 is the proposed model. Section 4 is the result and discussion of the paper. Section 5 is the conclusion and future study of the work.

Fig. 1 Passive and active SN

Fig. 2 Components of SN on WSN

482

A. Mehbodniya et al.

2 Related Works The detection and healing of holes in mobile WSNs is a problem. The primary issues are determining the region of interest (ROI) boundary, detecting coverage holes and measuring their characteristic features, deciding the suitable target locations, and allocating mobile nodes to the target locations. Therefore, to address these issues, we suggest a solution known as “hole detection and repair” [8]. There are two phases to this distributed and localised algorithm: first, it identifies boundary nodes and holes; and then it treats hole healing. First, a lightweight localised protocol is used over the Gabriel network graph, and then a haling hole area is used in the second phase. When it comes to dealing with holes, HEALS provides a cost-effective solution and an accurate one. WSN coverage holes can now be demonstrated for this network technology. Coverage hole detecting and coverage description are the two phases of the scheme. Additionally, it spotlights the problem areas in the holes. This method’s graphical description can be used to heal coverage gaps. This method’s simulation results predict coverage gaps, and graphic gaps can be used to repair actual gaps. O (bn ), where b represents the neighbour sensor in each node and “n” is the number of SN in the network, is the computational complexity of the proposed method. The coverage issue is significant in WSNs. Mobility is exploited in hybrid WSNs to improve area coverage. When designing a hole healing algorithm, the main goal of using mobile SN is to heal coverage holes. First, determine the existence and size of a coverage hole [9, 10]. Second, assess where to relocate base stations to repair coverage holes. They use the triangular-oriented diagram to solve this problem. To achieve this, they used circumcircle and incircle in this diagram. Compared to the Voronoi diagram, this diagram is easier to build and requires fewer calculations. The researchers developed a virtual force-directed particle swarm optimisation (VFPSO) algorithm and presented it. To improve coverage in WSNs with both mobile and stationary nodes, VFPSO is a self-organising algorithm. Attractive and repulsive forces are implemented in the proposed algorithm, employing the virtual motion routes of sensing devices and the rate of travel. PSO and VF are used together. In this manner, particle velocities are updated according to both historical local and globally optimal solutions, as well as the virtual forces of SN [11–13].

3 Coverage and Classification of Coverage Coverage: The connectivity of a WSN is essential. The WSN’s connectivity refers to its ability to communicate with the data sink. The data aggregation by a node cannot be processed if no path is available from the SN to the data sink. Node transmission distance specifies how far other nodes can send data to each other. The sensing range, on the other hand, refers to the area that a node is capable of monitoring. However, even though the two ranges are often similar, they are often distinct [14].

Certain Investigations of MEMS for Optimised Sensor Coverage

483

Area Coverage: Area coverage, also known as blanket coverage, is used to monitor the FoI by collecting deployed nodes in WSNs. Figure 3 shows a WSN tracking a particular FoI, with the circles showing the SN’s detection range [15]. Coverage of the Target: Target coverage (TC), also widely recognised as point coverage and is used to focus attention on the FoI’s specified targets. Three SN are shown monitoring the FoI’s five targets in a TC scenario. TC1, TC3, and TC4 are covered by a single SN. Two devices, TC2 and TC5, cover the additional targets simultaneously. Figure 4 shows how TC saves energy by only checking the FoI’s equipped location for targets [16].

Fig. 3 Target coverage

Fig. 4 Coverage of FoI a area coverage b target

484

A. Mehbodniya et al.

Barrier Coverage (BC): Intrusions can be detected with the help of BC. Barrier intrusions can be detected by SN. The accuracy with which FoI events are detected determines whether a barrier is poor/good. There are gaps in a weak barrier. It ensures that targets moving in the same direction are followed. Sensors may be unable to detect complex pathways. The SN must notice or detect intruders due to the high BC [17].

4 Coverage Problem Classification 4.1 Discovery of Network Coverage SN location and network structure are among the attributes that go into creating this infrastructure. In many cases, deterministic SN distribution is impractical. Covering a random area with SN is much more difficult. Art museum coverage can be improved with determined network coverage [18]. Computational geometry investigates museum issues. Points represent the chamber in this issue, while polygons represent the guards. This challenge requires at least one guard to watch the entire room (sensors). The goal is covered because all guard positions are predetermined [19].

4.2 Randomly Selected Coverage There is no predetermined information about sensor areas or topologies in random network coverage, which is the exact opposite of defined network coverage. Throughout time, the topology and location of the system change. Changes in time and geography can affect the location of combat targets. Since it examines the problem of planar coverage, the geometric technique is widely used for random deployment. Static random coverage nodes are close together in order to provide coverage. The mobility feature is used to reposition the best site for mobile nodes, as low energy as potential is used to sustain coverage in the accidental system.

4.3 Goal of Coverage This section goes over the distinctive decisions presented for total BC. There are various types of BC strategies based on coverage and other considerations, as shown below.

Certain Investigations of MEMS for Optimised Sensor Coverage

485

Coverage of the Territory Every point within the territory’s boundaries is secure due to its BC. There are three broad classes of area-coverage algorithms, each with its own requirements. Coverage algorithms include one-coverage, k-coverage, and linked coverage.

4.4 Algorithms for Coverage Step 1. Step 2. Step 3. Step 4. Step 5. Step 6. Step 7. Step 8. Step 9. Step 10. Step 11.

BC measure: Range coverage (RC) = 0 For (RC = 1) Do For (Node = 1) Do If (Node = RC) RC = RC + 1 End End End return BC End

4.5 Algorithms for k-Coverage Step 1. Step 2. Step 3. Step 4. Step 5. Step 6. Step 7. Step 8. Step 9. Step 10. Step 11. Step 12. Step 13.

K-coverage assessment: Maximum = 0 BC Range = 0 For (POI → k-coverage), Do For (K = k-coverage of POI) Do d = Node distance If (Node BC /= Target node) BC Range = BC Range + d End End End return Coverage range End

4.6 Point Coverage With respect to the set of points P and its representatives, we can signify “pi” as described in the following: (Xi, Yi). The WSN lifetime can be extended while

486

A. Mehbodniya et al.

providing target point coverage with the method described here. The SN is divided into different set covers. When a set cover is activated, it covers every sensor network target point in that round. The set covers collecting data that is then routed to an access point for further examination. The authors suggest a greedy method for solving the problem in addition to linear programming.

4.7 Creating a Barrier All routes “bi ” in sensor “S” area A are depicted on Map C. Belt region B intersects Bi. Belt zone B is an extended, narrow area with little breathing room (LBCP). The coverage of the barrier cannot be used to determine whether SN covers the entire strip. If the target moves, this strategy assumes it will travel less across the belt zone. If the belt zone changes, SN will be notified. Local BC can cover the entire belt if the belt area is large enough. They devised a reliable model for analysing frequency in thin strips of a region of constant size. Based on BC and network connection in a beltshaped region, this method can provide reliable measurements—SN acts as a barrier, allowing target position variations to be detected. Modelling validates outcomes. The accuracy of the technique for small areas is demonstrated by simulation results.

4.8 Protocols for Coverage-Aware Deployment The best sensors comprehend coverage (Table 1). A deployment protocol subproblem is the coverage hole problem. It entails locating areas that are not monitored by sensors. As a result of this problem, mobile sensors move around to fill in sensing ranges and expand the area they cover. For the highest covering sensor deployment problem (MCSDP), the goal is to attempt to determine how many sensors are required to achieve the highest security cover. Deployment is a problem with multiple competing objectives that are NP-hard. PSODA, a PSO-based deployment strategy, addresses the deterministic aspect of the deployment problem in WSNs. The main goal of the optimisation problem is to cut down on the number of PSODA MCSDP sensors while keeping coverage barriers for all of the targeted cells. The method produces a single-linked barrier. MobiBar uses an ideal disc model for sensing and communication. Coverage optimisation and link stability estimation routing (MSCOLER) protocol are based on mobile sink (MS) to (a) recover network coverage and (b) avoid transmission faults. Table 1 shows how MSCOLER moves the mobile sensors near the coverage holes during the first cycle. It does this by using a grid-based firefly simulated annealing. Figure 5 shows that firefly simulated annealing (FSA), a way to solve the problem, finds the best places for mobile sensors to fill coverage gaps.

Certain Investigations of MEMS for Optimised Sensor Coverage

487

Table 1 Comparison of coverage protocols Coverage protocol

Target

Model of sensing

Location awareness

Distribution of protocol

Characteristics

FSA

Support k-BC

Boolean

NA





LSER

Support MCSDP/point coverage

Boolean

NA





RSSI

Full coverage

Boolean

Yes





ETX

k-coverage

Elfes

Yes





LQI

Target k-coverage

Boolean

NA





Fig. 5 Assessment of coverage protocols

5 Conclusion and Future Work The research presented in this article focuses on optimising the area coverage of WSNs. Sensors can store location data from their one-hop neighbours and use that data to discover coverage holes without the support and advice of a sink in this

488

A. Mehbodniya et al.

article’s proposed distributed detection scheme for those holes. One of the main focuses of the study is on how WSN QoS is affected by automated coverage control. Also, coverage and connectivity are intimately associated with holing clustering and hole area best estimate. On the other hand, SN must be dispersed throughout the area to avoid coverage gaps. We believe the proposed algorithm is used to select holes in the most straightforward and time-efficient manner, specially designed for sensors with resource-constrained devices and energy. Furthermore, we will recommend a communication protocol for filling coverage gaps by moving a few sensors with a high sensing overlap with their neighbours without interfering with conventional communication.

References 1. Alzenad M, El-Keyi A, Lagum F, Yanikomeroglu H (2017) 3D placement of an unmanned aerial vehicle base station (UAV-BS) for energy-efficient maximal coverage. IEEE Wirel Commun Lett 6(4):434–437 2. Dorling K, Heinrichs J, Messier GG, Magierowski S (2017) Vehicle routing problems for drone delivery. IEEE Trans Syst Man Cybern Syst 47(1):70–85 3. Kalantari E, Yanikomeroglu H, Yongaçoglu A (2016) On the number and 3D placement of drone base stations in wireless cellular networks. In: Proceeding IEEE 84th vehicular technology conference, pp 1–6 4. Kwon T, Cioffi JM (2013) Random deployment of data collectors for serving randomly-located sensors. IEEE Trans Wirel Commun 12(6):2556–2565 5. Lagum F, Bor-Yaliniz I, Yanikomeroglu H (2018) Strategic densification with UAV-BSs in cellular networks. IEEE Wirel Commun Lett 7(3):384–387 6. Liang L, Xu L, Cao B, Jia J (2018) A cluster-based congestion-mitigating access scheme for massive M2M communications in internet of things. IEEE Internet Things J 5(3):2200–2211 7. Lin TM, Lee CH, Cheng JP, Chen WT (2014) PRADA: prioritized random access with dynamic access barring for MTC in 3GPP LTE-A networks. IEEE Trans Veh Technol 63(5):2467–2472 8. Moussa HG, Zhuang W (2020) Energy- and delay-aware two-hop noma-enabled massive cellular IoT communications. IEEE Internet Things J 7(1):558–569 9. Moussa HG, Zhuang W (2019) RACH performance analysis for large-scale cellular IoT applications. IEEE Internet Things J 6(2):3364–3372 10. Novlan TD, Dhillon HS, Andrews JG (2013) Analytical modeling of uplink cellular networks. IEEE Trans Wirel Commun 12(6):2669–2679 11. Pan C, Ren H, Deng Y, Elkashlan M, Nallanathan A (2019) Joint blocklength and location optimization for URLLC-enabled UAV relay systems. IEEE Commun Lett 23(3):498–501 12. Savkin AV, Huang H (2021) Range-based reactive deployment of autonomous drones for optimal coverage in disaster areas. IEEE Trans Syst, Man, Cybern: Syst 51(7):4606–4610 13. Yang Z (2018) Joint altitude beamwidth location and bandwidth optimization for UAV-enabled communications. IEEE Commun Lett 22(8):1716–1719 14. Sudhakar S, Chenthur Pandian S (2012) Secure packet encryption and key exchange system in mobile ad hoc network. J Comput Sci 8(6):908–912 15. Sudhakar S, Chenthur Pandian S (2016) Hybrid cluster-based geographical routing protocol to mitigate malicious nodes in mobile ad hoc network. Int J Ad Hoc Ubiquitous Comput 21(4):224–236 16. Priyadarshni AU, Sudhakar S (2015) Cluster based certificate revocation by cluster head in mobile ad-hoc network. Int J Appl Eng Res 10(20):16014–16018

Certain Investigations of MEMS for Optimised Sensor Coverage

489

17. Sudhakar S, Pandian SC (2015) Investigation of attribute aided data aggregation over dynamic routing in wireless sensor. J Eng Sci Technol 10(11):1465–1476 18. Sudhakar S, Pandian SC (2013) Trustworthy position-based routing to mitigate against the malicious attacks to signifies secured data packet using geographic routing protocol in MANET. WSEAS Trans Commun 12(11):584–603 19. Sudhakar S, Chenthur Pandian S (2013) A trust and co-operative nodes with affects of malicious attacks and measure the performance degradation on geographic aided routing in mobile ad hoc network. Life Sci J 10(4s):158–163

Voice-Based Intelligent Virtual Assistant for Windows K. M. Bhargav, Akash Bhat, Snigdha Sen, A. Vamsi Kalyan Reddy, and S. D. Ashrith

Abstract An average individual can pronounce about 140 words per minute but only can type 40. Voice communication is the most convenient and efficient way to interact. The majority of individuals prefer speech-to-text-based communication. Voice assistant for windows is a software program or application that any windows user can make use of with the help of their voice just by saying what the user wants. In most cases, personal assistants will respond to questions and carry out tasks utilizing a natural language user interface. Our voice is the input key here. It is a digital assistant that employs voice recognition, speech synthesis, natural language processing, and classification algorithms to give exceptional service via an interface. The existing system is incapable of performing or executing custom commands. As a result, we attempted to address this flaw in our project. The main job of an intelligent assistant is to classify the input into a task. In this paper, we have used the Naive Bayes classification algorithm for task classification which gave the best accuracy among KNN, logistic regression, random forest, decision tree, and SVM. The assistant will simplify the user’s work by allowing them to perform a task with just one voice command. Keywords Speech recognition · Natural language processing · Machine learning · Voice assistant · Task classification · Image processing

1 Introduction Artificial intelligence is developing and spreading in every field quickly since it can understand human language which then converts it into a command, processes it, and provides services to humans. The usage of virtual assistants is growing quickly too. It is the time in which machines are learning to communicate with humans, K. M. Bhargav (B) · A. Bhat · S. Sen · A. V. K. Reddy · S. D. Ashrith Department of CSE, Global Academy of Technology, Bengaluru, India e-mail: [email protected] S. Sen e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_36

491

492

K. M. Bhargav et al.

rather than humans learning to communicate with machines by examining their activities, inquiring about their hobbies and routines, etc., and attempting to become an intelligent personal assistant to the user [1]. A virtual assistant is a software program or application that any windows user can make use of with the help of their voice just by saying what the user wants, and then the assistant responds to the user, i.e., the assistant will react via program-generated voice. Voice assistants identify the human voice and react through inbuilt voice [2]. This voice assistant will collect audio from the microphone, turn it to text by making use of a speech recognition system, and then forwards it to Google text to speech (GTTS) API. Then, the GTTS function or engine converts the given textual content into an audio file format in English, which is then played using the Python play sound package [3]. Users can only use their voice to ask questions, control media playback, and conduct other fundamental tasks, such as sending an email, scanning any image to extract and copy textual content from it, opening or dismissing any application, file manipulation, and so on. Imagine you encounter an image that has text which is not selectable or uncopiable textual content which you like to copy, with this personal assistant you can just do it with the help of your voice. Voice assistants can return useful information by listening for certain phrases and filtering out ambient noise in the commands, often termed intents, given by the user [4]. The important thing in voice assistants is to gather some tasks that can be performed and wait for the matching command from the user to execute a particular task [5]. In this paper, we have tried to come up with a strategy to build an assistant that can help users by executing some basic tasks that the user wants by listening to the user and providing services to the user like a personal assistant. Cortana (virtual assistant) is an excellent example of a voice assistant on the desktop which performs some web searches and shows the generated result to the user. Most of us today just want a program like voice assistant to listen to us, accomplish certain activities, such as opening a specific application, or anticipate our every need and respond accordingly.

2 Literature Survey Several researchers proposed some different methods for building a voice assistant. Most of them were almost common but used a different strategy. In [2], a survey is conducted on learning-based human–machine dialogue systems, with a focus on the various dialogue models. It starts by introducing the core method of creating a dialogue model and then the system dialogue model’s characteristics and classifications, explains several typical models, and evaluates the benefits and drawbacks of various dialogue models. Then, the dialogue model’s regularly used database and evaluation metrics are examined. In addition, the evaluation metrics of these dialogue models are thoroughly examined. Two methods were used to build a dialogue model, a data-driven and a non-data-driven model. In [6], the main principle used is automatic speech recognition (ASR) which captures the speech and converts it into a waveform, which is then given to acoustic analysis. Acoustic analysis is performed

Voice-Based Intelligent Virtual Assistant for Windows

493

on three levels, acoustic modeling, pronunciation modeling, and language modeling. And then the output of acoustic modeling is given to the encoder which generates the text. In [7], it addresses the gaps in building chatbots and their development of them and also differentiates chatbots from the different domains. Overall it concluded that chatbots can be branched into three groups, and all these three groups aim at refining the accessibility, and reliability of chatbots. Speech recognition technology is gaining popularity, and it is being used in more applications nowadays [6]. In [7], mel frequency cepstral coefficient (MFCC) technique is used for speech recognition. The task of the MFCC algorithm is to transform voice data into a feature vector, hence the input for the MFCC method is speech signal and the output is feature vector. It works by taking speech signals and converting them to cosine waves using the fast Fourier transform, then scaling them using mel scale filtering and giving them as mel frequency spectrum to logarithmic function. The obtained result from the logarithmic function is sent to the discrete cosine transform, where the speech signals are transformed to cosine waves, and it uses cepstral coefficients as a medium to convert them into derivatives of feature vector and their feature vector is obtained. In [8], the research proposes SecureNLP. It addresses the problem of insecure NLP data that is stored and used for processing. SecureNLP is built with the recurrent neural network. It is used for protecting NLP data and maintaining privacy in multi-level computation. But multi-party multiplication, on the other hand, maybe computationally expensive for some lightweight, low-cost systems. In [9], a usability test has been performed along with some debriefing discussions with voice assistant users to measure and analyze the task usage and its reasons. It recommends that voice assistant designers on dealing with tasks separately to understand the usability issues. It also speaks about task-independent factors that can affect the usage of voice assistants. In [10], recurrent neural network is used to suppress the noise and enhance the audio text in the recording before the text-to-speech conversion. The model is trained by giving different noisy recordings and clean recordings and then with both. This mainly enhances the recorded waveform and improves the quality of data using RNN. But this model did not perform well for mixed noisy recordings. Hence, in this paper, after considering all this research, we tried to come up with a strategy to build an efficient voice assistant by adding more and more features.

3 System Design In Fig. 1, we have shown the design of our proposed system. We can summarize Fig. 1 as below: 1. 2. 3. 4.

Speech recognition and extraction of text from speech. Cleaning the extracted text using NLP. Training the classification model using the sample command dataset. Classifying the cleaned text into a task.

494

K. M. Bhargav et al.

Fig. 1 System design

5. Executing the identified task. There are five main steps in our proposed voice assistant design. The important step is the creation of a classification model. This model acts as the brain of this design as it identifies the task to be performed as commanded by the user.

4 Implementation The proposed strategy began with the user supplying vocal input to the voice assistant via a microphone, which was then processed and analyzed by the voice assistant. The main task was to create a dataset for training the classification model. So, we made a list of tasks that we are going to include in the assistant. Then, for each task, we listed the possible ways the user might ask the assistant to execute that task and we did this for all the tasks that we had listed before. We assigned a task number for each task. So, our dataset was created with this data with possible commands in the first column and corresponding task numbers in the second which acted as the target column for training the classification model. Initially, all the required Python packages were installed, and then, the following procedure was followed.

Voice-Based Intelligent Virtual Assistant for Windows

495

4.1 Dataset Creation The assistant should perform the same task even when commanded differently or asked differently. Hence, we created a dataset with all possible ways a task can be commanded and their corresponding task number, which acts as a target variable here. On average, there are about six to eight different commands or ways for each task.

4.2 Speech Recognition When the user says the trigger word followed by the command to conduct a certain task, the assistant is triggered [11]. The user’s voice is recorded using Python’s automatic speech recognition modules, and the recorded voice is used as input. Speech-to-text conversion procedures are used to transform the captured voice into text. The transformed text will be the output. The program converts speech input to text using Google’s online speech recognition algorithm GTTS. It receives the input in the form of voice and extracts the possible text from it and responds to the user with generated text.

4.3 Text Cleaning This module aims to output cleaned and computer-understandable text. Natural language processing is used here for tokenizing, stemming, lemmatization, and cleaning of text. This is done by importing the nltk Python library. The text may contain some noise that can reduce the accuracy of the result. Hence, the noise present in the text must be removed. For this purpose, NLP is used, which cleans the text by removing noisy words from the text.

4.4 Task or Command Classification After generating clean text, the main part of the assistant is to identify what task is the user expecting [12]. For this purpose, we have created a text classification model which classifies the cleaned text given to it into a matching task. Initially, a dataset is created with possible text and corresponding command or task for the text. We have used the Naive Bayes classifier algorithm as it gave the best accuracy of 97.6% when compared with the other five classification algorithms logistic regression (95.34%), KNN (90.69%), support vector machine (76.74%), decision tree (72.5%), random forest classifier (90.69%) which can be observed in Fig. 2. As observed

496

K. M. Bhargav et al.

Fig. 2 Algorithm accuracy comparison

logistic regression and random forest classifier also gave good accuracy. The assistant should identify the correct task or command which is requested by the user. Here, a classification algorithm is used to map the text to a command or task. The cleaned text is given to the model to classify the text to the best possible command or task. After identifying the task or command, it is then executed.

4.5 Execution of Tasks The task can be asking questions, commanding to open an application, playing a particular song or video, searching anything on the web like the meaning of a word, etc., and performing other basic functions such as opening a camera, scanning any image to extract and copy text from it, close or open an application, file operations, browsing, jokes, facts, news, weather information, search anything on Google, etc. The input here is the identified task from the task identification module. For example, if the user encounters an image or anything on the screen which has textual content which cannot be copied but the user wants the text to be copied, then with this assistant the user can accomplish that task without any effort just by saying what the user wants in voice. In this example, the flow is directed to the image processing module. In this module, the image is processed and cleaned using the OpenCV library. Here, the noise and unwanted data in the image are removed and cleaned. Then, to extract the text from the processed image, a Python OCR tool called Pytesseract or Python-tesseract is used [13]. This OCR tool extracts the text from the cleaned image, and then, the extracted text is copied to the clipboard, or the assistant can read the extracted text if required. As shown in Fig. 3, which shows the text displayed on the notepad is extracted from an image that was displayed on the screen. This task was triggered when the user asked the assistant to copy the text from the displayed image.

Voice-Based Intelligent Virtual Assistant for Windows

497

Fig. 3 Extracted text from an image

After the completion of the given task, the flow is directed back to the main module where the assistant acknowledges that the task is completed and the assistant is ready to take another command, here the assistant says, “text copied to clipboard”. In this way, the assigned task is completed with ease by just saying what the user wants.

5 Results The assistant was built and tested module-wise. As observed the assistant was able to recognize the user’s voice and identify the task almost all the time. But for one or two tasks in which the commands were similar, the assistant was not able to identify the exact task. For example, if the user just says “Google” the assistant was classifying it as “Google Search”, but the user might be expecting “Google Chrome”. To conclude the assistant was able to perform well. We have used a speech recognition library for recognizing the voice and converting it to text. Then, we made use of NLP for cleaning the text and extracting the root words from the text. After that, the cleaned text is classified into the best matching task by a classification model. This model was built using the Naïve Bayes classifier as it gave the best accuracy (97.6%) when compared with the other five classifiers. And finally, the identified task was executed. The existing system (i.e., Cortana) does not offer more features as this assistant offers and more importantly, the existing systems perform tasks only when exact commands are given, otherwise, it just gives the web results, whereas our assistant identifies a task even when asked differently. For example, when the user wants to open a paint application a few ways the user can command the assistant are “open paint”, “I want to draw something”, “can you open the painting application”, and

498

K. M. Bhargav et al.

“I want to sketch”, etc., for all these different commands our assistant identifies as opening paint application. Hence, there is no need to ask questions in a very rigid and particular manner. The user should be familiar with the basic rules of English grammar and the assistant. Software or any application which mainly depends on user interaction needs a user-friendly interface. So, we built an interface for our assistant as shown in Fig. 4. An assistant cannot be listening to all the time because if it does it starts processing everything it listens to and might execute some tasks even though the user did not ask to do so. Hence, the assistant needs to be triggered when the user wishes to ask something to the assistant. So, for our assistant, we came up with the trigger word “cricket”. The user can trigger the assistant by saying the trigger word even saying differently like “hey cricket”, “hello cricket”, etc., the user can also trigger the assistant by pressing the trigger button as shown in Fig. 4. In case, the microphone in the device has some problems, then we have also provided a text entry field where the user can enter any text or command. Overall, the assistant was able to listen to the user when required, process the request and was able to provide the correct result or execute the correct task as the user expected to for all most all the time. Fig. 4 Interface for the assistant

Voice-Based Intelligent Virtual Assistant for Windows

499

The assistant can be triggered in three ways, by saying the trigger word “cricket” (which can be changed), by pressing the listen button, or by typing in the entry field as shown in Fig. 4. When one of the ways is triggered, the other two will be disabled. The results and responses will be displayed on the screen. The assistant was tested in all possible ways to give a user-friendly experience and good result was observed.

6 Conclusion To conclude, we have utilized speech recognition, NLP, classification algorithm, and other Python tools for building a user-friendly voice assistant. In this paper, we have implemented the basic functionality of a voice assistant along with so many features or tasks that can be performed with the help of voice commands such as copying text from an image with the voice, and it can execute custom commands which are not present in other assistants. The main part of the project was to identify the task when the user asks something. For this part have made use of the Naive Bayes classifier, which classifies the user commands into the best matching task. We have also tried using other classification algorithms, but the Naive Bayes algorithm gave the best accuracy among them. Other systems just show web results if the user does not give the correct predefined command. So, we tried to build an assistant that is flexible and that can identify and perform tasks even when asked differently. The idea is to give users a simple and quick way to get answers or results to their commands. It has the potential to make human daily activities simpler, faster, and less time-consuming. So, we have tried to build an assistant which gives a user-friendly experience to the user with an interface and responds to the user with maximum accuracy. In the future, the assistant can be made more intelligent by making it frame the sentences or responses on its own while interacting with the user by considering the user’s emotion or sentiment, intent, and expectations. The user’s work pattern while using the system can be analyzed by the assistant and offer to help the user accordingly. This assistant can also be made more efficient by improving the response speed while interacting with the user. With all this work a more user-friendly, efficient assistant can be built so that the user can finish the work with effortless and ease.

References 1. Lalit Kumar (2020) Desktop voice assistant using natural language processing (NLP). Int J Modern Trends Sci Technol (IJMTST), 6(12):332–335 2. Cui F, Cui Q, Song Y (2021) A survey on learning-based approaches for modeling and classification of human-machine dialog systems. IEEE Trans Neural Networks Learn Syst 32(4):1418–1432 3. Subhash S, Srivatsa PN, Siddesh S, Ullas A, Santhosh B (2020) Artificial intelligence-based voice assistant. In: 2020 fourth world conference on smart trends in systems, security and sustainability (WorldS4). IEEE, pp 593–596

500

K. M. Bhargav et al.

4. Janssen A, Passlick J, Cardona DR, Breitner MH (2020) Virtual assistance in any context: a taxonomy of design elements for domain-specific chatbots. Bus Inf Syst Eng 62:211–225 5. Saibaba CMH, Waris SF, Raju SH, Sarma V, Jadala VC, Prasad C (2021) Intelligent voice assistant by using OpenCV approach. In: 2021 second international conference on electronics and sustainable communication systems (ICESC), pp. 1586–1593 6. Kim T-K (2020) Short research on voice control system based on artificial intelligence assistant. In: 2020 international conference on electronics, information, and communication (ICEIC), pp 1–2 7. Jyothi CR, Beracah P, Gowtham M (2021) Voice assistant in accessing real world applications. J Eng Sci (JES) 12(7):333–335 8. Feng Q, He D, Liu Z, Wang H, Choo KR (2020) SecureNLP: a system for multi-party privacypreserving natural language processing. IEEE Trans Inf Forensics Secur 15:3709–3721 9. Motta I, Quaresma M (2021) Understanding task differences to leverage the usability and adoption of voice assistants (VAs). In: Design, user experience, and usability: design for contemporary technological environments. Springer International Publishing, https://doi.org/10.1007/ 978-3-030-78227-6_35 10. Valentini-Botinhao C, Yamagishi J (2018) Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans Audio, Speech, Lang Proc 26(8):1420–1433 11. Verlekar SP, Naik S, Bisht V, Roy P, Honnavar A (2021) SVAP the windows personal handyman assistant. Int Res J Eng Technol (IRJET) 08(05):3614–3617 12. Shin H, Paek J (2018) Automatic task classification via support vector machine and crowdsourcing. Mobile Inf Syst 2018:6920679, 9 13. Dhanawade A, Drode A, Johnson G, Rao A, Upadhya S (2020) Open CV based information extraction from cheques. In: 2020 fourth international conference on computing methodologies and communication (ICCMC), pp 93–97

Cryptocurrency Trading Bot with Sentimental Analysis and Backtracking Using Predictive ML Abhishek Srinivas Murthy, A. Akshay, and Bhagyashri R. Hanji

Abstract Algorithmic trading is a process of converting a trading strategy into computer code which buys and sells the shares or performs trades in an automated, fast, and accurate way. Sentiment analysis is a powerful social media tool that enables us to understand its users. It is an important factor because emotions and attitudes toward a topic can become actionable pieces of information useful in understanding market trends, saving time and effort by the means of automation. Bringing together the art of sentimental analysis of social media and backtracking of historical price data, the paper, coupled with state-of-the-art APIs from leading crypto exchanges, is set to predict the best options and place trade orders, taking into account variables set by the user such as STOP LOSS and risk profiles. Combining the historical data from Binance APIs, coupled with sentimental analysis of Twitter tweets, our work aims at delivering highly accurate trade orders and executing them in real time without any human intervention. Keywords Algo trading · Application programing interface (API) · Long-short term memory (LSTM) · Naïve Bayes · Natural language processing (NLP) · RNN

1 Introduction With the ever-growing trend of cryptocurrencies and decentralized blockchain architecture, algo-trading has revolutionized how the markets function, and crypto markets are no exception. A blockchain is an open, distributed ledger that records transactions in code. In practice, it’s like a checkbook that’s distributed across countless computers around the world. Blockchain is a technology that has risen greatly over the years, it has many applications in various sectors of computer science. One of those many applications which make use of blockchain is cryptocurrency exchanges.

A. Srinivas Murthy · A. Akshay · B. R. Hanji (B) Computer Science and Engineering, Global Academy of Technology, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_37

501

502

A. Srinivas Murthy et al.

Cryptocurrency is a digital form of money that functions based on a technique called cryptography, this technique helps in transforming information into codes that cannot be broken into. These digitized currencies are managed and watched over by a main book called blockchain. There are a variety of cryptocurrencies that are in circulations whose security and transaction details are maintained by these blockchains. Now with blockchain being a trusted methodology, the software is used as a log to show tractions that are being made on it. Every transaction that is made, and every copy of the blockchain is updated simultaneously with the new information, keeping all records accurate. The question of fraud is very unlikely with the use of blockchain technology as each transaction that is made is checked against other users’ records for validation. A cryptocurrency is a medium of exchange that is digital. Unlike many other currencies that are hard printed and whose value is controlled and overlooked by a centralized organization, the RBI for example looks after the money matters regarding the Indian rupee. Cryptocurrencies are de-centralized and are managed by a large number of users over the internet. The applications of cryptocurrency are vast but limited in terms of services that accept it as it still a growing trend. Although, it can be still used to buy regular goods and services which accept these currencies. Most of the investors see crypto as an asset, similar to those of stocks and valuable metals such as gold. It is advised to know the risk element that is involved in the market along with a fair share of research to fully understand the working of the system. The availability of crypto is immense and it can be purchased in secured ways such as peer-to-peer networks and also on the many popular and well-established crypto exchanges such as Binance and Coinbase. Payment using cryptocurrency is depends on what one is trying to buy, one will need a digital wallet or a cryptocurrency wallet, which is a software program that interacts with the blockchain and allows users to send and receive cryptocurrency. The popularity of cryptocurrency comes from trading of coins on exchanges or peer-to-peer, which makes the crypto market highly volatile and fast paced making Algorithmic Trading ideal.

2 Related Work In the study done by [1] “Karunya Rathan, Somarouthu Venkat Sai, Tubati Sai Manikanta. Crypto-Currency price prediction using Decision Tree and Regression Technique”, this study first identifies the trends that happen on a given day and how these trends influence the Bitcoin price. The dataset till current date is taken with open, high, low, and close price details of various coins are considered while Bitcoin is taken as the ideal choice. Different machine learning algorithms are taken into consideration to exploit the dataset to predict the prices of Bitcoin in the view of finding the ideal algorithm choice which gives the best accuracy. The experiment is

Cryptocurrency Trading Bot with Sentimental Analysis …

503

evaluated by making use of decision tree and regression model methods and the accuracy is compared for the same. The experiment results showcase that linear regression methods outperforms that of decision tree. The work done is suited as a reference as it considers multiple methods to predict values. The work done, however, does not consider the Market Sentiment and real time data into consideration. In paper [2], the authors, “Mohammad J. Hamayel, Amani YousefOwda. A Novel Cryptocurrency Price Prediction Model Using GRU, LST Mandbi-LSTM Machine Learning Algorithms”. This paper makes use of three types of recurrent neural network (RNN) algorithms, namely, GRU, LSTM, and bi-LSTM to predict three types of cryptocurrencies, Bitcoin (BTC), Litecoin (LTC), and Ethereum (ETC), the work done using these models show excellent predictions mainly depending on paper [3], from the authors, “Connor Lamon, Eric Nielsen, Eric Redondo. Cryptocurrency Price Prediction Using News and Social Media Sentiment”. This paper analyses the ability of news and social media data to predict the price changes for cryptocurrencies. The data is in the form of news and social media data which is collected on a daily basis and labelled based on actual price changes one day in the future for each coin, rather than the ideal sentimental analysis choice of positive ornegative sentiment. As opposed to other methods, the market sentiment is taken into consideration but, is the only factor that is used for the prediction of prices while the backtracking data of the coins in play are ignored. In paper [4] the authors, “Medha Mathur, Satyam Mhadalekar, Sahil Mhatre, Vanita Mane. Algorithmic Trading Bot”. The proposed work aims to create a trading bot by keeping in mind about the characterized sets of trading guidelines. These guidelines are passed onto a given program are reliant on factors such as timing, value, amount or any mathematical model. Aside from the profitable openings and low losses that are available to the traders through orthodox methods, the paper aims to prove that algo-trading renders the market more liquid and makes the trading practice more precise by considering the factors of human feelings on trading. The work makes use of the moving average methodology to predict the prices and reflects as per the live market simulation. Although backtracking is taken into consideration, market sentiment is ignored.

3 Literature Review Summary In short, after reviewing the works of the above-mentioned papers, it was found that real time market sentimental analysis with backtracking of historical data to predict market prices is absent, automated bottrading using complex AI/ML models over conventional stop loss and moving average is absent, only select coins which are well established are analyzed and automatic prediction and execution of trade is absent (Table 1).

504

A. Srinivas Murthy et al.

Table 1 Summary of literature survey Title of the paper

Advantages

Disadvantages

“Crypto-currency price prediction using decision tree and regression techniques (2019)”

• Implemented more than one machine learning to predict the value • The experiment observes that linear regression outperforms decision tree

• Market sentiment is not taken into consideration • Real-time data is not considered

“A novel cryptocurrency price • New non-conventional prediction model using GRU, algorithms are explored LSTM and bi-LSTM machine • Backtracking is taken into consideration learning algorithms (2021)”

• LST Mandbi-LSTM considers only recent data • Market sentiment is not taken into consideration

“Cryptocurrency price prediction using news and social media sentiment (2017)”

• Market sentiment is taken into consideration

• Limited coins are evaluated • Backtracking of data is ignored

“Algorithmic Trading Bot. (2021)”

• Backtracking is taken into consideration • Automated trading process • Live market simulation

• Market sentiment is not taken into consideration • Price prediction is done by moving average

4 Methodology The work objective is to scrape recent social media tweets and determine the polarity of the posts as positive, negative or neutral. Also, query historical data using APIs and predict the prices making use of backtracking machine learning algorithms. The aim of the work is to combine both sentimental analysis and backtracking scores to determine top crypto coins to trade with by strategically placing buy orders and perform sell operations if the price crosses the stop loss threshold on the crypto exchanges. The overall design and use case scenarios are shown in Figs. 1 and 2, respectively. A. Sentimental Analysis: Sentimental Analysis is responsible for computing the weights based on sentiment of the market. It first fetches the top performing coins from Binance API, then analyses the sentiment of each coin by fetching the recent tweets relevant to the coin. Once the sentiment score is computed, it assigns weights to the list of the coins. The method makes use of Naïve Bayes algorithm to perform the NLP operation on the recent tweets that have been gathered. B. Binance Coin Performance Analyzer: Binance exchange APIs are used to fetch the recent top trending coins along with their historical data, the API sends the list of top 10 coins for the above-mentioned sentimental analysis. Once the weights are received, the historical scores are computed.

505

Fig. 1 System architecture

Cryptocurrency Trading Bot with Sentimental Analysis …

506

A. Srinivas Murthy et al.

Fig. 2 Use case diagram

The final weighted list is then sent to the trading module to perform trade operations of the exchange platform. The methodology makes use of the Long-Short term memory algorithm to perform historical analysis of the trending coins that are received from the Binance API. C. Binance Trading Module: Binance Trading module is responsible for collecting the weighted list of coins based on the performance analyzer and sentiment analyzer to place orders on the exchange using Binance Trading APIs. The placed orders are monitored for success rate and logged into a file. D. Naïve Bayes Algorithm: Naïve Bayes is a supervised machine learning algorithm that is popularly used in text classification and other classification-based problems. The algorithm is based on the Bayes theorem and is used to determine the probability of a given hypothesis. It is often used in NLP tasks such as sentimental analysis. E. Long Short-Term Memory (LSTM): Each connection contains a weight that can be changed. Input nodes, output nodes, and hidden nodes are all examples of nodes, each having a function of their own. Recurrent Neural Networks (RNNs) use feedback as a type of memory to overcome the problem of sequential data. As a result, the model’s previous entries leave an imprint. LSTM takes this concept a step further by incorporating both short-and longterm memory components. As a result, the LSTM is an excellent tool for anything

Cryptocurrency Trading Bot with Sentimental Analysis …

507

involving a sequence. The LSTM model can be used for a variety of tasks, including time series forecasting. Furthermore, the inclusion of in the ports prevents the system from disappearing or experiencing an explosive gradient. F. Cumulative Returns: A very important statistic that is to be considered while trading is Cumulative returns, it is the overall change in the price that is invested of a given period of time, the amount of time that is involved is not taken into account. The Cumulative of assets that do not have interest can be easily figured out by calculating the profit or loss made over the original price. Figure 1 depicts the system architecture. The program is divided into three primary modules. The first module consists of sentiment analysis, where the Twitter API is used to fetch the tweets and the analysis is performed on the fetched tweets. The second module consists of Binance Back Tracking module that is connected to a timer to trigger the running of the program. The backtracking module fetches the latest historical data and determines the top-performing coins. The third module is the trading module which takes the result from both sentiment analysis module and backtracking module to place trades on the exchange.

5 Results This section describes the screens of the “Cryptocurrency trading bot with Sentiment Analysis and Backtracking using predictive ML”. The snapshots are shown below for each module. Figure 3 depicts the prediction of the top-performing coins by considering the sentimental analysis of tweets and real-time coin performance data from Binance using the Binance API as on June 14, 2022. Figure 4 shows a plot of the sentimental analysis that is performed on the tweets based on the top-performing coins that have been fed into the system using the Fig. 3 List of top-performing coins

508

A. Srinivas Murthy et al.

Binance Coin Performance Analyzer module. The plot on the x-axis depicts the polarity of the tweets with a range of −1 to +1 and the plot on the y-axis depicts the subjectivity of the tweets ranging from 0 to 1 (Fig. 5).

Fig. 4 Sentiment analysis plot

Fig. 5 Predicted graph of WAVEUSDT

Cryptocurrency Trading Bot with Sentimental Analysis …

509

The above figure shows the coin prices for WAVEUSDT as captured on June 14th, 2022. The line in red represents the historical data of the last 24 h and the line in green is the prediction of cumulative returns. It is observed from the sharp spike in the green line that the LSTM algorithm predicts that this coin to be suitable for trading with reasonable returns (Fig. 6). The figure shows the coin prices for AKROUSDT as captured on June 14th, 2022. The line in red represents the historical data of the last 24 h and the line in green is the prediction of cumulative returns. It is observed from the sharp dip in the green line that the LSTM algorithm predicts that this coin has a sidewise moment in the market and the next hour shows poor performance (Fig. 7).

Fig. 6 Predicted graph of AKROUSDT

Fig. 7 Predicted graph of EPXUDT

510

A. Srinivas Murthy et al.

Fig. 8 Predicted graph of INJUSDT

The figure shows the coin prices for EXPUSDT as captured on June 14th, 2022. The line in red represents the historical data of the last 24 h and the line in green is the prediction of cumulative returns. It is observed that this coin even with positive predicted cumulative returns for the next hour, it does not seem to break the resistance line (Fig. 8). The figure shows the coin prices for INJUSDT as captured on June 14th, 2022. The line in red represents the historical data of the last 24 h and the line in green is the prediction of cumulative returns. It is observed from the sharp dip in the green line that the LSTM algorithm predicts that this coin has a high probability of loss in the market and the next hour shows poor performance (Fig. 9). The image shows trade logs that have been performed on the predicted coins with BUY/SELL order details and errors if any that may occur during runtime. Each order describes fields such as order Id, client Id, transaction time, and order quantity.

6 Conclusion On running the script for a few hours, the bot showed marginal positive profits. Coin Dust, trade commissions, and API usage fees have not been accounted for, resulting in slower cumulative returns. Sentiment analysis of Twitter usually shows negative sentiment toward the top-performing coins, hinting that upcoming and good performing altcoin do not always correlate to the market sentiment. The model can be improved taking into account more coins and using a weighted distribution of funds however, it will require a higher working capital. It was established through our study that algo-trading is indeed promising, however, the decision-making engine is a Machine Learning Blackbox. It is very

Cryptocurrency Trading Bot with Sentimental Analysis …

511

Fig. 9 Telegram Bot displaying trade logs

hard to justify the reasons behind the decisions taken by the program. Combining Sentiment Analysis along with backtracking of data shows only marginal returns when implemented in simple form. Complex feature considerations such as automatic Stop-Loss calculation, sentiment recalculation, and accounting for API delays may increase the performance and in turn increase returns. There is massive scope for future enhancements. Performance-optimizing methods such as multi-threading, parallel processing, and sub-routine optimization can be implemented. Taking into consideration the API cost and overheads, API call time outs and soft limits and better step calculation will improve the system. Deployment of the code using the cronjob in place of a timed delay in execution of the looping function will improve its performance.

512

A. Srinivas Murthy et al.

A web dashboard coupled with cloud deployment and a graphical representation of metrics with a user interface to directly withdraw and deposit funds will add to the user experience.

References 1. Karunya Rathan, Somarouthu Venkat Sai, Tubati Sai Manikanta. Crypto-currency price prediction using decision tree and regression technique. Proceedings of the third international conference on trends in electronics and informatics (ICOEI 2019) IEEE Xplore part number: CFP19J32-ART, ISBN: 978-1-5386-943 2. Hamayel MJ, Owda AY (2021) A novel cryptocurrency price prediction model using GRU, LSTM and bi-LSTM machine learning algorithms. Algorithms. AI 2021 2:477–496 3. Lamon C, Nielsen E, Redondo E (2017) Cryptocurrency price prediction using news and social media sentiment 4. Mathur M, Mhadalekar S, Mhatre S, Mane V. Algorithmic trading bot. ITM web conference volume 40, 2021 international conference on automation, computing and communication 2021 (ICACC-2021)

Abhishek Srinivas Murthy is currently pursuing Bachelor od Engineering in Computer Science and Engineering, Global Academy of Technology, Bangalore, India. They have published papers in national and international conferences. Their area of interest lies in Network Security, Cryptography, Machine Learning, Big-data, Automation and Deep Learning. A. Akshay is are currently pursuing Bachelor od Engineering in Computer Science and Engineering, Global Academy of Technology, Bangalore, India. They have published papers in national and international conferences. Their area of interest lies in Network Security, Cryptography, Machine Learning, Big-data, Automation and Deep Learning. Bhagyashri R. Hanji is currently working as Professor and Head, Computer Science and Engineering, Global Academy of Technology, Bengaluru. She has published several papers in International/National conferences and journals. She has served in various capacities as reviewer, Technical Program Committee Member, Advisor Board Member and Editor for various Journals and conferences. Her area of interests lies in Machine Language, Deep Learning, Cryptography, Information and Network Security.

Efficient Pseudo-Random Number Generator Using Number-Theoretic Transform Anupama Arjun Pandit , Atul Kumar , and Arun Mishra

Abstract In the finite field of integers, the number theoretic transform (NTT) is a specific variant of the discrete fourier transform (DFT). NTT is the essential method that permits efficient computing. Therefore, NTT could be used in the construction of lattice-based pseudo-random number generator. To construct the lattice-based pseudo-random number generator (PRNG), higher degree of polynomial multiplication is required. Therefore, NTT could be used for fast multiplication as it performs point-wise multiplication. In this paper, we would examine the algorithmic properties and performance of NTT for fast multiplication by implementing pseudo-random number generator using NTT. Keywords Number-theoretic transform (NTT) · Pseudo-random number generator (PRNG) · Discrete fourier transform (DFT) · Chinese remainder theorem (CRT) · Lattice-based cryptography

1 Introduction To generate pseudo-random numbers in the era of cutting-edge technology, an efficient pseudo-random number generator [1] is necessary. These numbers can be employed by encryption techniques [2], digital signature methods [3], and as a nonce to guarantee that various runs of the same protocol are distinct [4]. In the near future, a lattice-based [5] pseudo-random number generator will become necessary since it is resistant to quantum attacks. At present no quantum algorithm exist that can break the lattice-based algorithms, hence these generators are secure against quantum attacks [6]. Lattice-based algorithms involve higher degree of polynomial   multiplication therefore school-method multiplication is not efficient as it takes O n 2 time [7]. To mitigate this inefficiency number-theoretic transform (NTT) [8] could be used. To eliminate arithmetic operations with large numbers, the Chinese Remainder Theorem (CRT) [9] is applied. CRT turns a series of coefficients into numerous series A. A. Pandit · A. Kumar · A. Mishra (B) Computer Science and Engineering, Defence Institute of Advanced Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_38

513

514

A. A. Pandit et al.

of residues, each with a distinct modulus when a polynomial is treated as a series of coefficients. Each series is the same length as the polynomial’s degree, and the number of series is the same as the number of primes required to uniquely represent a huge integer. NTT is a specific variant of the discrete fourier transform (DFT) [11] in the finite field of integers. When converting a series of complex numbers of length  n, DFT employs the powers of the nth root of unity as twiddle factors e−2πi/n . NTT, on the other hand, performs modular arithmetic operations in an integer space using  the i . powers of the nth root of unity modulo a prime number as twiddle factors γ2n NTT’s goal is to multiply two polynomials so that the coefficients of the resulting polynomials may be computed using a specific modulo. NTT has the advantage of having no precision errors because all computations are done in integers. NTT has a significant limitation in that NTT could be only accomplished with a prime modulo of the form 2a · b + 1, where a and b are arbitrary constants. Therefore, for a random mod CRT would be used. In the present work, we would compare the performance of Pseudo-Random Number Generator with and without making use of Number Theoretic Transform. Organisation of the paper—This paper is organized as follows: Sect. 2 presents preliminaries which address the fundamentals of pseudo-random number generator, Chinese remainder theorem, discrete fourier transform, Twiddle factor, and latticebased cryptography. In Sect. 3, we outline our proposed methodology to generate efficient pseudo-random number generator. In Sect. 4, we go through the implementation details of pseudo-random number generator. We present the result of our implementation in Sect. 5. Finally, Sect. 6 presents our Conclusion and Future scope.

2 Preliminaries 2.1 Pseudo-Random Number Generator As Fig. 1 shows, Pseudo-Random Number Generator (PRNG) [1] grabs a seed from the entity such as any process or user, performs numerous mathematical operations on the seed, and generates a random number that may be used as an initial value/seed for succeeding iteration. AS PRNG contains a finite number of states, it is almost likely that its value will be mirrored at some point in the future, therefore it is also known as Pseudo RNG or Deterministic RNG. The cryptographic applications where PRNGS are employed are as follows: (i) Key exchange protocol (ii) Digital signature algorithm (iii) Nonce to make sure that every execution of the same protocol is distinct.

Efficient Pseudo-Random Number Generator Using Number-Theoretic …

515

Initial value/Seed

Fig. 1 Pseudo-random number generator

Deterministic Algorithm (DA)

Pseudo-Random bitstream

2.2 Chinese Remainder Theorem Suppose q 1 , q 2 , · · · , q k are positive integers which are pairwise relatively prime and greater than one. Suppose z 1 , z2 , · · · , zk are any integers. Then there exists an integer solution z such that z ≡ z j modq j for each j = 1, 2, 3, · · · , k [10]. z ≡ z 1 (mod q1 ) z ≡ z 2 (mod q2 ) z ≡ z 3 (mod q3 ) .. . z ≡ z k (mod qk )

The ability of Chinese Remainder Theorem (CRT) to build a residue number system is one of its most appealing features. That is, operations on the “small” values z j are equivalent to operations on the “large” value z. The CRT, for example, could be designed to optimize N = p · q in RSA algorithm. All computations could be performed on the smaller z j values instead of the large z values specifically by setting z 1 ≡ z(mod p) and z 2 ≡ z(mod q).

2.3 Discrete Fourier Transform Any signal is decomposed into a sum of complex sine and cosine waves by using the fourier transform. The discrete fourier transform (DFT) [11] is a function that enables switching from one domain to another, and the fast fourier transform (FFT) is an algorithm for computing the DFT in an efficient manner. Assume the input polynomial is Z (x) = z 0 + z 1 x + · · · + z n−1 x n−1 , then evaluate Z at all nth roots of unity. The divide-and-conquer strategy would be employed. To begin, the polynomial Z may decompose into Z even and Z odd , where Z even (x) = z 0 +z 2 x +···, where Z odd (x) = z 1 +z 3 x +···. Then Z (x) would be Z even (x)+ Z odd (x). Z even and Z odd could be computed recursively at all the n th /2 root of unity to evaluate

516

A. A. Pandit et al.

the polynomial Z . The n th /2 root of unity are 1, ζ, ζ 2 , ..., ζ 2 −1 where ζ = ω2 . The steps in FFT(Z , n) are as follows: n

(i) Calculate Z even, Z odd .    (ii) Calculate FFT Z even , n2 , FFT Z odd , n2         (iii) For 0 ≤ k < n2 : Z ωk = Z even ω2k + ωk · Z odd ω2k + ωk · Z odd ω2k =  k   Z even ζ + ωk · Z odd ζ k        n (iv) For n2 ≤ k < n : Z ωk = Z even ω2k + ωk · Z odd ω2k = Z even ζ k− 2 + ωk ·  n Z odd ζ k− 2 .   The evaluations of Z at Z (1), Z (ω), · · ·, Z ωn−1 are used to compute the unique polynomial Z as follows in the form of a matrix equation: ⎤ ⎡ ⎤ ⎤ ⎡ Z0 1 1 1 1 Z (1) ⎢ ⎢ Z (ω) ⎥ ⎢ 1 ω ⎥ ω2 ωn−1 ⎥ ⎢ ⎥ ⎢ Z1 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ · · ... ⎢ ⎥ ⎢ · ⎥ ⎥ ⎢· · ⎢ ⎥, ⎢ ⎥ ⎥, ⎢ ⎢ ⎥ ⎢ · ⎥ ⎥ ⎢· · · · · ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎣ ⎦ ⎣ · ⎦ ⎦ ⎣· · · ·  ·n−1  Z ω 1 ωn−1 ω2(n−1) ω(n−1)(n−1) Z n−1 ⎡

where the n × n matrix is signified as the Fourier matrix M. The inverse of the matrix M would be required to compute Z from the evaluation. The following is the inverse of the matrix M. ⎤ ⎡ 1 1 1 1 ⎢ 1 ω−1 ω−2 ω−(n−1) ⎥ ⎥ ⎢ ⎥ ⎢ 1⎢ · · · ... ⎥ −1 M = ⎢ ⎥ ⎥ · · · n⎢ · ⎥ ⎢ ⎦ ⎣· · · · −(n−1) −2(n−1) −(n−1)(n−1) ω ω 1ω   DFT has an order of time complexity of O n 2 , whereas FFT has an order of O(n log n). The most popular Fast Fourier Transform (FFT) algorithm is the Cooley−T ukey algorithm [12]. It recursively defines the DFT of a random composite size N = N1 · N2 in terms of N1 smaller DFTs of sizes N2 to reduce the time complexity for highly composite N to O(N log N ). Another FFT algorithm is the Gentleman-Sande [13] algorithm.

2.4 Twiddle Factor Any of the trigonometric constant coefficients multiplied by the data in the Fast Fourier Transform (FFT) process is referred to as a twiddle factor [14]. Twiddle factors are expressed mathematically as: ωnN = e−i2πn/N = cos(2π n/N ) −

Efficient Pseudo-Random Number Generator Using Number-Theoretic …

517

isin(2π n/N ), where n = 0, 1, 2 . . . , N − 1. The twiddle factor is a rotating vector quantity that rotates in steps of N samples. The twiddle factor is used in DFT and Inverse DFT (IDFT) computations to minimize computational complexity. Because the twiddle factor has cyclic features and is periodic, the twiddle factor’s values repeat every N − cycles.

2.5 Lattice-Based Cryptography The security of lattice-based cryptographic systems is derived from gauged contumacy of lattice problems like shortest vector problem (SVP) [15]. A lattice [16] is a collection of a uniformly spaced grid of points in n-dimensional space. In other words, lattice is a set of vectors generated by n-linearly independent vectors b1 , b2 , · · · , bn ∈ R n .

L(b1 , b2 , · · · , bn ) =

n

xi bi : xi ∈ Z

i=1

Here b1 , b2 , · · · , bn are basis vectors of the lattice (Fig. 2). In lattice-based cryptography, vector is another decorative name for a point and a tuple of numbers are called the coordinates of the vector, whereas a basis is a linearly independent collection of vectors that can be used to generate any point in the uniformly spaced grid of the lattice. Short bases are more practical at the time of solving hard lattice problems such as shortest vector problem (SVP) [15], closest vector problem (CVP) [17], and shortest independent vector problem (SIVP) [18]. • Shortest Vector Problem: Determine the shortest vector u closest to the origin for a lattice L. In other words, find out a vector u for a given basis B = b1 , b2 , · · · , bn ∈ Zn×n which satisfies the following condition: u = λ1 (L(B)) where u ∈ L(B)\{0}

Fig. 2 Lattice representation by two possible bases

518

A. A. Pandit et al.

• Closest Vector Problem: Determine the vector u closest to the given point t which is not a lattice point for a lattice L. In other words, find out a vector u to a given vector t ∈ Zn for a given basis B = b1 , b2 , · · · , bn ∈ Zn×n which satisfies the following condition: u − t = dist(L(B), t)where u ∈ L(B) • Shortest Independent Vector Problem (SIVP): Determine the set of linearly independent vectors u 1 , u 2 , · · · , u n for a lattice L, and it should also be short vectors. In other words, find out a set of linearly independent vector u 1 , u 2 , · · · , u n for a given basis B = b1 , b2 , · · · , bn ∈ Zn×n which satisfies the following condition: u i  ≤ λn where u ∈ L(B)for i ∈ [n] Lattice-based cryptography [19] is assumed to be resistant against quantum attacks as it uses very complex large dimensional geometric structures. The security of a lattice-based cryptosystem is derived from the hardness of lattice problems. Learning with errors (LWE) [20] and learning with rounding (LWR) [21] are the most advanced lattice-based cryptosystems. The variant of LWE is ring learning with errors (RLWE). LWE could be used for the construction of a non-deterministic pseudo-random number generator. Similarly, LWR could be used for the construction of a deterministic pseudo-random number generator. • Learning with Errors (LWE): LWE [20] problem is to find secret vector s by solving given noisy linear equations. If the error vector denoted as e = e1 , e2 , · · · , em were not introduced in the following system of linear equations then it could be rewritten as s = A−1 b and could be easily solved by using Gauss Elimination method where A is a matrix such that A = a11 , · · · , amn , secret s = s1 , s2 , · · · , sn , b = b1 , b2 , · · · , bm , q is prime modulus. Since errors are augmented in the following equations and number of equations are too large; therefore, it is not possible to solve these equations in polynomial time. This reflects the hardness of this problem. Since there is no quantum algorithm exist which could solve these equations in polynomial time therefore these LWE algorithm is quantum safe [21]. a11 · s1 + a12 · s2 + · · · + a1n · sn + e1 = b1 mod q a21 · s1 + a22 · s2 + · · · + a2n · sn + e2 = b2 mod q .. .

am1 · s1 + am2 · s2 + · · · + amn · sn + em = bm mod q The steps of LWE algorithm are as follows: i.

Key Generation: public key = {A; b = s · A + e}& secr et key = s

Efficient Pseudo-Random Number Generator Using Number-Theoretic …

519

ii. Encryption: Enc( public key, bit) = {ci pher text pr eamble(u)

q    = A · x, ci pher text u  = b · x + bit · , 2 where x ∈ {0, 1}n iii. Decryption: 



Dec secr et key, u, u





=

  0 if u  = s · u mod q ∈ − q4 , q4 1 if u  = s · u mod q ∈ q4 , 3q4

3 Proposed Methodology to Generate Efficient Pseudo-Random Number Generator To construct a series of random numbers, a PRNG requires a cryptographically secure initial value or seed. If it is not secure, an attacker may produce the whole random number sequence since the output of the PRNG might be determined only by the seed. It is not required for the seed to be unique, but if it is used again, there is a risk of attack. The following are two important aspects of seed: • Seed should be protected as cryptographic content (e.g. key) • Use a cryptographically secure source to generate the seed. A random number generator might be used to distribute the RNG seeds for each operation to eliminate duplicate seeds. The learning with rounding (LWR) [21] method could be used for this. Its implementation is also highly efficient, as it is built on lattice-based cryptography, which is effective in the post-quantum era and consists of very solid security proofs based on worst-case hardness. It is also considered to be quantum-resistant because no attack on lattice-based cryptography has been discovered too far. – Learning with Rounding (LWR): The derandomized version of LWE is called LWR. In LWR, a slight error with A · s ∈ Zq may be used to mask their true value, which would then be multiplied by qp for some p < q. If q > p and A · s is a deterministic rounded version, the LWR  problem would be hard. As a result, in this example, b = A · s p = qp · A · s . For the implementation aspect, p

we usually take the floor of this value. In this case, result would be decreased from mod q to mod p. Other operations, such as encryption and decryption, will stay unchanged. LWR is at least as hard as LWE for the relevant parameters, and it provides a worst-case assurance for LWR [21]. The advantage of LWR is that it eliminates the heavy error sampling process, requiring fewer random bits.

520

A. A. Pandit et al.

Because of its predictable nature, LWR-based cryptosystems could be employed in symmetric cryptography. For our PRNG, we employ the LWR one-way function to construct a cipher LWR vector that cannot be reversed without the secret information. As LWR is using huge multiplication such that A · s in Key Generation step, NTT could be used to speed up the multiplication operation. √ i i i Suppose xi = xi γ2N , yNi = yi γ2N and z i = z i γ2N where γ2N = ω N . To compute z = x · y over Zq [x]/ x + 1 , Negative Wrapped Convolution (NWC) technique [22] is performed as: z = INTT N (NTT N (x) NTT N (y)) where the symbol denotes point-wise multiplication. If q ≡ 1(mod 2N ) is satisfied by the prime q, then ω N and its square root γ2N exist, as per the NWC technique. This approach, however, has a “scramble” that includes pre-processing before NTT and post-processing after Inverse NTT (INTT), as discussed below. The scaled vectors x and y are subjected to classic N -point NTT; the scaled vector z is produced after the classic N -point INTT, and the final result z may be retrieved by computing −i z i = γ2N . The NWC technique [22] eliminates the explicit reduction and doubling i before NTT of NTT/INTT, but it does need the coefficients to be scaled with γ2N −i after INTT. and the results to be scaled with γ2N The NTT algorithm without pre-processing [23] is as follows: Suppose the vectors v and V indicate v0 , v1 , · · · , v N −1 and (V0 , V1 , · · · , VN −1 ) respectively, where vi ∈ Zq , Vi ∈ Zq , i = 1, 2, 3, · · · , N − 1. Suppose ω N be a √ primitive N th root of unity in Zq and γ2N = ω N . i Input: v, N , q, γ2N , i = 0, 1, 2, · · · , N − 1. Output: V = N T T (v). (i) (ii) (iii) (iv)

V ← scramble(v) f orl = 1to log2 N do r ← 2l f or m = 0 to r2 − 1 do (2m+1)N

(v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) end f or (xiv) r etur n V

ω = γ2N r f or n = 0 to Nr − 1 do f = Vnr +m g = ω · Vnr +m+ r2 mod q Vnr +m = ( f + g)mod q Vnr +m+ r2 = ( f − g)mod q end f or end f or

The Inverse NTT (INTT) algorithm without post-processing [23] is as follows:

Efficient Pseudo-Random Number Generator Using Number-Theoretic …

521

Suppose the vectors v and V indicate v0 , v1 , · · · , v N −1 and V0 , V1 , · · · , VN −1 , respectively, where vi ∈ Zq , Vi ∈ Zq , i = 1, 2, 3, · · · , N − 1. Suppose ω N be a √ primitive N th root of unity in Zq and γ2N = ω N . −i Input: v, N , q, γ2N , i = 0, 1, 2, · · · , N − 1. Output: V = I N T T (v). (i) (ii) (iii)

f orl = 1to log2 N do r ← 2l f or m = 0 to r2 − 1 do −(2m+1)N

(iv) ω = γ2N r (v) f or n = 0 to Nr − 1 do (vi) f = Vnr +m (vii) g = Vnr +m+ r2 mod q mod q (viii) Vnr +m = ( f +g) 2 (ix) Vnr +m+ r2 = ( f −g) · ω mod q 2 (x) end f or (xi) end f or (xii) end f or (xiii) V ← scramble(v) (xiv) r etur nV −i The pre-processing denotes the coefficient-wise multiplications of vi and γ2N −1 before NTT. The post-processing denotes the final scaling by N in the classic −i after the classic INTT. INTT and the coefficient-wise multiplication by γ2N

4 Implementation The flow chart of the Pseudo-Random Number Generator is shown in Fig. 3. The key Generation step of the Learning with Rounding (LWR) algorithm takes 128–bits as an initial value in a random manner and it generates the public key. Now encryption step of LWR uses a public key and encrypts the input bit to produce a cipher vector of length 256. The binary form of generated cipher vector is used as a secure seed by the linear feedback shift register (LFSR) [24]. A pseudo-random number stream is now generated by the LFSR. We have implemented PRNG in ‘C’ language using Visual Studio 2019 IDE. As we could see in key generation step of LWE, we are multiplying matrix A and secret vector s such that s · A. Similarly, in encryption step we are doing matrix– vector multiplication such that A · x and vector-vector multiplication such that b · x. Since the construction of PRNG requires higher degree of polynomial multiplication and we are using polynomial of degree-256; therefore, we have used NTT for fast multiplication.

522

A. A. Pandit et al.

Initial Value (128 bit)

Fig. 3 Flow chart

Key Generation using LWR Public Key Encryption using LWR Cipher Vector Linear Feedback Shift Register (LFSR)

Random Number Stream

5 Result We keep splitting given two polynomials until we only have degree-0 polynomials remaining. After completing the transformation, we may perform all operations on the polynomials” pointwise,” which has an O(n) complexity. We recursed along a tree with log2 n” layers,” each of which performed n multiplications with some twiddle factor to change the polynomials. As a result, the transformation is O(nlogn) in terms of its complexity. Figure 4 is showing the execution time of polynomial multiplication of degrees 4, 8, 12, and 16, respectively. We could easily observe that polynomial multiplication using NTT is quite fast as compared to polynomial multiplication using schoolmethod. Therefore, NTT could play the important role in construction of an efficient pseudo-random number generator.

6 Conclusion and Future Scope In this paper, we have discussed Quantum-Safe Pseudo-Random Number Generator. We have found that there is no algorithm exist classical as well as quantum which could break the lattice-based learning with rounding scheme; therefore, it is quantum-safe. We also discussed that NTT could be used for fast multiplication as NTT performed multiplication in O(nlogn) time while school-method perform   same multiplication in O n 2 time. Hence, we proposed that NTT could be used in construction of efficient pseudo-random number generator. Furthermore, in the

Efficient Pseudo-Random Number Generator Using Number-Theoretic …

523

Fig. 4 Execution time of polynomial multiplication with variant degrees

future, multiplication by using NTT could be enhanced by identifying concurrent processes for parallel execution.

References 1. Cang S, Kang Z, Wang Z (2021) Pseudo-random number generator based on a generalized conservative Sprott—a system. Nonlinear Dyn 104(1):827–844 2. Rivest RL, Shamir A, Adleman L (1978) A method for obtaining digital signatures and publickey cryptosystems. Commun ACM 21(2):120–126 3. Kuppuswamy P, Appa PM, Al-Khalidi DSQ (2012) A new efficient digital signature scheme algorithm based on block cipher. IOSR J Comput Eng (IOSRJCE) 7(1):47–52 4. Køien GM (2015) A brief survey of nonces and nonce usage. In: SECURWARE international conference on emerging security information, systems and technologies 5. Chi DP, Choi JW, San Kim J, Kim T (2015) Lattice based cryptography for beginners. Cryptology ePrint archive 6. Regev O (2010) The learning with errors problem. Invited Surv CCC 7(30):11 7. Ilter MB, Murat CENK (2017) Efficient big integer multiplication in cryptography. Int J Inf Secur Sci 6(4):70–78 8. Creutzburg R, Tasche M (1986) Number-theoretic transforms of prescribed length. Math Comput 47(176):693–701 9. Piazza N (2018) The Chinese remainder theorem 10. Alhassan EA, Tian K, Abban OJ, Ohiemi IE, Adjabui M, Armah G, Agyemang S (2021) On some algebraic properties of the Chinese remainder theorem with applications to real life 11. Sandeep S (2020) Fast integer multiplication, pp 1–3

524

A. A. Pandit et al.

12. Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301 13. Schupp S (2003) Lifting a butterfly—a component-based FFT. Sci Program 11(4):291–307 14. Yamini Gayathri T (2017) An efficient multi precision floating point complex multiplier unit in FFT. Int J Eng Res V6(06). https://doi.org/10.17577/ijertv6is060395 15. Peikert C (2013) SVP, gram-schmidt. LLL 1:1–5 16. Micciancio D, Regev O (2009) Lattice-based cryptography. Post Quantum Crypt 015848:147– 191. https://doi.org/10.1007/978-3-540-88702-75 17. Micciancio D (2003) University of California, SD. Closest vector problem, p 3. https://doi.org/ 10.1007/978-1-4615-0897-7 18. Micciancio D (2008) Efficient reductions among lattice problems. In: Proceedings of the annual ACM-SIAM symposium on discrete algorithms, pp 84–93 19. Chi DP, Choi JW, Kim JS, Kim T (2015) Lattice based cryptography for beginners. EPrint 20. Regev O (2009) On lattices, learning with errors, random linear codes, and cryptography. J ACM 56(6):1–37. https://doi.org/10.1145/1568318.1568324 21. Banerjee A, Peikert C, Rosen A (April 2012) Pseudorandom functions and lattices. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, Berlin, Heidelberg, pp 719–737 22. Chen DD, Mentens N, Vercauteren F, Roy SS, Cheung RC, Pao D, Verbauwhede I (2014) Highspeed polynomial multiplication architecture for ring-LWE an SHE cryptosystems. IEEE Trans Circuits Syst I:Regul Pap 62(1):157–166 23. Zhang N, Yang B, Chen C, Yin S, Wei S, Liu L (2020) Highly efficient architecture of NewHopeNIST on FPGA using low-complexity NTT/INTT. IACR Trans Cryptographic Hardware Embed Syst 49–72 24. Hassan S, Bokhari MU (2019) Design of pseudo random number generator using linear feedback shift register. Int J Eng Adv Technol

Medical Images Analysis for Segmentation and Classification Using DNN Abolfazl Mehbodniya, Satheesh Narayanasami, Julian L. Webber, Amarendra Kothalanka, Sudhakar Sengan, Rajasekar Rangasamy, and D. Stalin David Abstract The healthcare business is unique in comparison to other fields. It is a high-priority area, and people have high expectations about the quality of treatment and services regardless of the cost. Despite accounting for a significant portion of the total cost, it did not meet the public’s expectations. Most of the time, medical experts are responsible for providing their interpretations of medical data. The interpretation of an image by a human expert is relatively restricted because of its subjectivity, the complexity of the image, the wide variances that occur between various interpreters, A. Mehbodniya · J. L. Webber Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha, Kuwait e-mail: [email protected] J. L. Webber e-mail: [email protected] S. Narayanasami Department of Computer Science and Engineering, St. Martin’s Engineering College, Secunderabad, Telangana 500100, India e-mail: [email protected] A. Kothalanka Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, Vaddeswaram, Andhra Pradesh 522302, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru, Karnataka 561203, India e-mail: [email protected] D. Stalin David Department of Information Technology, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamil Nadu 600072, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_39

525

526

A. Mehbodniya et al.

and the fact that interpreters might get tired. Many other real-world applications have succeeded with Deep Learning (DL), and now it provides exciting results for Medical Imaging (MI) with a high degree of accuracy. One of the essential methods for the future development of health-related applications is being hailed. We spoke about the most recent developments in DL architecture and how to optimise it so that it may be utilised for the segmentation and classification of MI. In the last part, we spoke about the difficulties associated with using DL-based approaches for MI and the open research question. Keywords Medical image analysis · Convolution neural network · Health care · Image segmentation

1 Introduction When healthcare data was scarce, the data was relatively massive (going to big data) due to the significant progress in picture collecting technologies, which makes image processing challenging. This rapid expansion of medical pictures and modalities necessitates enormous and time-consuming efforts by medical experts, which are subjective, prone to human error, and may differ significantly amongst experts. ML techniques could be used to automate the diagnosis process, but most ML techniques can’t handle very complicated problems [1]. The successful marriage of high-speed computers with ML potential gives the ability to cope with vast amounts of medical image data for detailed and well-organized findings [2]. For example, the DL will assist in the selection and extraction of features and the construction of new ones. Furthermore, it will not only diagnose the disease but also assess the analytical targets and provide actionable prediction models to assist physicians efficiently. In the last few years, ML and artificial intelligence (AI) has advanced rapidly. The processing of MIs, CAD, MI interpretation, fusion, and registration have benefited from ML and AI techniques. Image retrieval and analysis, image-guided therapy, and ML techniques gather information from images and represent them effectively and efficiently. Doctors can use ML and AI to diagnose and predict illness risks more accurately and quickly, allowing them to prevent diseases before they occur. These strategies improve doctors’ and researchers’ abilities to comprehend how to assess genetic changes that lead to disease. SVM, NN, KNN, and other non-learning algorithms are used in these techniques and DL algorithms such as CNN, RNN, LSTM, ELM, and GAN [3]. It requires time to fine-tune features in prior algorithms, which are based on domain experts. Raw images can be processed in their raw form with new algorithms. Data is fed into later algorithms so they can learn and improve independently. In order to train these algorithms to extract various features, depictions, and relevant data from images, many images demonstrating the positive behaviour of data are used. Even though automated disease detection using traditional methods has been around for decades, new developments in ML have led to a resurgence in DL [4].

Medical Images Analysis for Segmentation and Classification Using DNN

527

The rest of the paper is Sect. 1 is the introduction to Medical Image Segmentation. Section 2 is related to current work. Section 3 is the Proposed model of Medical Image Segmentation using AI. Section 4 is the result and discussion about the work. Section 5 is the conclusion and future work of the research work.

2 Literature Review Medical Image Segmentation (MIS) has been widely adopted over the past decade using Deep Convolutional Neural Networks (DCNN) [5]. Since inductive biases are inherent in convolutional architectures, they cannot account for long-range repositories within an image using these methodologies. The self-attention mechanism of recently proposed Transformer-based architectures encodes long-range dependencies and learns highly graphic representations. Applying transformer-based network models for MIS tasks is, therefore, something we should investigate. Large-scale datasets are required to train most transformer-based network models for CV applications. MI datasets have fewer data samples than CV datasets, making it challenging to train healthcare transformers. The Gated Axial-Attention model has added new control methods to the self-attention module, making the existing frameworks bigger [6]. Disease diagnosis and treatment planning necessitate the use of medical information systems (MIS) [7]. The de facto standard for many MIS tasks is now the U-Net architectural form. This has been an enormous success. Since coevolutionary operations are inherently local, U-Net has difficulty accurately simulating long-range dependencies. Transformers, which have emerged as alternative models with inherent global self-attention methods, can result in limited localization abilities due to a lack of low-level details in the design—proposed TransUNet as a solid alternative for MIS, which deserves Transformers and U-Net. The input sequence for extracting global contexts is the Transformer encoded by tokenized image patches from a CNN feature map. In the field of CV, DL has grown in popularity tremendously. Image segmentation, particularly for MI, has been revolutionised thanks to CNN. U-Net is the dominant approach to the MIS task in this regard. Generally speaking, the U-Net is excellent at segmenting multimodal MI, as well as some particularly challenging cases [8]. In addition, it has been discovered that classical U-Net architecture has several limitations. Researchers developed a CNN model with a residual module to replace the encoder and decoder’s skip connection. We developed the DC-UNet on top of the original, more effective CNN model as a possible successor to the U-Net structure. When our model was tested on three datasets with challenging cases, we found that the performance improved by 2.922, 1.51, and 12.56% [9–11].

528

A. Mehbodniya et al.

3 Proposed Methodology In many image diagnosis tasks, an initial search for abnormalities and quantifying measurements and changes over time are required. In order to improve the quality of MI diagnosis and clarification, ML-based automated image analysis tools are critical enablers (Fig. 1). They make finding results more quickly and efficiently possible. It is one of the most widely used techniques that provides the highest level of precision in the field. It ushered in a new era of Medical Image Analysis (MIA) [9]. AI solutions in healthcare range from cancer screening to disease scanning to personalised treatment-specific suggestions. Today, physicians have access to many data sources, including X-rays, CT and MRI scans, pathology images, and genomic sequences. However, the tools necessary to turn this data into information are still lacking. The following paragraphs describe the most recent DL applications in MIA [12, 13].

3.1 Data Collection Dataset size and quality are the two most essential factors in the classification accuracy of DL classifiers. However, the lack of readily available datasets in MI has been one of the most significant obstacles to DL’s success. Developing sizeable MI data, however, is quite tricky, as annotating the data requires a great deal of time from medical experts and, to avoid human error, multiple expert opinions. However, the dataset was collected from different diagnostics centres and hospital pathology departments [14].

3.2 Data Pre-Processing For ML, we need data—lots of it. The more we have, the better our model. ML algorithms are data-hungry. But there’s a catch. They need data in a specific format. In

Fig. 1 Proposed framework

Medical Images Analysis for Segmentation and Classification Using DNN

529

the real world, several terabytes of data are generated by multiple sources. However, none of it can be put to immediate use. Data can be found in audio, video, images, text, charts, and logs. For the ML algorithms to produce valuable results, the data must first be cleaned and formatted for use. Data pre-processing is the process of cleaning raw data so that it can be used for ML tasks. It’s the first and foremost step while doing an ML project. It’s the phase that is generally the most time-consuming as well. It is a methodical approach to decomposing tables in order to eliminate data repetition and undesirable characteristics such as Insertion, Update, and Deletion Anomalies (IUDA) [15]. Reducing the amount of redundant information in related tables is the primary goal of this process. We have to Normalize the image by 255 because the image pixels range from 0 to 255, so it is divided by 255; by doing normalization, we can fit it into the 0 to 1 range. An array of integers or strings representing the values that categorical (discrete) features take on should be the input to this transformer. The features are encoded using a one-hot encoding scheme (also known as “oneof-K” or “dummy”). Each category is represented by a binary column in the sparse matrix or dense array returned by this method (depending on the sparse parameter). The encoder, by default, creates categories based on the feature’s unique values. The categories can also be defined manually. For categorical data to be used with many sci-kit-learn estimators, like linear models and SVMs with the standard kernels, this encoding is needed [16].

3.3 Model Architecture The proposed architecture was built using CNN on various datasets on diseases such as malaria, retinopathy, BT, and Breast Cancer (BC) [17]. The proposed model has 1input layers, 7-Hidden layers (HL), and one output layer. It is the shape of the image that forms the basis of the input layer, which can range from 50 to 224 for different datasets. The initial HL consists of a Convolution Layer (CL) with 32 filters, Kernel Size (KS) 3, 3 with Activation Function (AF) ‘ReLu’, followed by a 2 × 2 Maxpool Layer (ML), batch normalisation and dropout. The next HL consists of a CL with 64 filters, KS 3,3 with AF ‘ReLu’, followed by a 2 × 2 ML, batch normalization, and drop out. The following HL consists of a CL with 128 filters, KS 3, 3 with AF ‘ReLu’, followed by a 2 × 2 ML, batch normalization, and drop out. The next HL consists of a CL with 256 filters, KS 3, 3 with AF ‘ReLu’, followed by a 2 × 2 ML. The 4th CL consists of 4-dimension output, and Dense Layer (DL) requires a 2-D output, so, between the CL and the DL, the flattener is used to convert 4D to 2D. The 5th-HL consists of a DL with 256 units. The 6th-HL consists of a DL with 128 units. The 7th-HL consists of a DL with 64 units followed by a dropout. Lastly, the output layer comprises two units that use a sigmoid function to predict which classes are infected and which are not [18].

530

A. Mehbodniya et al.

Fig. 2 Proposed architecture

Fig. 3 Model testing

3.4 Model Training The model was trained on various data sets on diseases such as malaria, retinopathy, Brain Tumor (BT), and BC (Figs. 2 and 3). The model was trained with the following parameters. • • • • • •

Number of epochs-20 Batch Size–128 Validation split–0.3 Optimizer–‘Adam’ Loss–Binary cross-entropy Metrics–accuracy.

3.5 Model Testing The test image was given and converted into a NumPy array. The input image should be resized according to the model input requirements on which it was trained. The CNN model requires 4D data, and our input image is 3D data, so the input image should be reshaped to 4D. Once it is reshaped, the image should be normalised by 255, and it has been converted into float 32. The image has been sent to the model for prediction. Where the model tries to identify the features related to infected or uninfected diseases, we will finally get the percentage of the predicted label.

Medical Images Analysis for Segmentation and Classification Using DNN

531

4 Results and Discussion Use medical images of various skin lesions to test the proposed segmentation technique described in this section. An image is used for training and testing purposes in the proposed method. Only the testing images are used to calculate the classification accuracy because the classifiers are trained solely on those images. Three metrics are used to evaluate the proposed method for finding cancer in medical images: sensitivity, specificity, and accuracy). TP, TN, FP, and FN are examples of TP and TN, respectively. Section 3 trained the model using two different approaches, as described. Different metrics parameters have been used to evaluate the model’s AUC-ROC curve, F1 score, CM, and report. As shown in Table 1, the accuracies of various models can vary depending on the number of layers used. The Malaria data set on the proposed CNN correctly predicted 3999 as infected out of 4157 and 3905 as uninfected out of 4210 by giving training accuracy of 98.23% and test accuracy of 95.6%. For the Pneumonia dataset proposed, CNN could predict 20,488 as infected out of 22,000 correctly and 25,810 as uninfected out of 27,000 by giving training accuracy of 100% and test accuracy of 99.5%. The Retinopathy data set proposed by CNN could predict 1918 as infected out of 2130 correctly and 1650 as uninfected out of 1700 by giving training accuracy of 98.8% and test accuracy of 95.6%. The BC data set on the proposed CNN correctly predicted that 3999 of 4157 people were infected and 3905 of 4210 people were not. This gave a training accuracy of 86.9% and a test accuracy of 92.2% [19, 20]. Sensitivit y = TP/(TP + FN) Specificity = TN/(TN + FP) Accuracy = (TN + TP)/(TN + TP + FN + FP)

(15)

4.1 Comparative Analysis Comparing our proposed method to three benchmark DL methods, such as ResNet50, Xception, and DenseNet121, is the goal of this section of the paper. Table 1 Accuracy Architecture

Data Set Disease

Validation (%)

Test (%)

Cross-validation (%)

Proposed CNN

BT

95.75

95.54

89.05

Proposed CNN

Retino Pathy

87.60

87.21

75.65

Proposed CNN + VGG 16

BC

88.06

88.13

78.82

Proposed CNN

Pneu Monia

59.47

59.47

55.73

532 Table 2 Comparison of DL with the proposed model

A. Mehbodniya et al. Bench mark methods

Sensitivity (%)

Specificity (%)

Accuracy (%)

Xception

97.91

99.19

99.15

CNN

95.41

98.48

97.48

ResNet50

94.53

99.67

98.67

DenseNet121

91.68

97.57

95.78

Proposed architecture (DCNN)

98.69

99.89

99.87

ResNet50: Consisting of convolutional, maxpool, batch normalisation, activation, and a DL (ResNet), the 178-layer DCNN architecture is used for many computer vision tasks. The system allows it to train 150 + layers of DCNN quickly and easily. Due to the vanishing gradients, DCNN training was difficult before ResNet. The Add feature lets you make a skip connection by using the previous layer’s result as the next layer’s input. Xception: As well as 36 convolutional layers for extracting the features, Xception’s “Extreme Inception” also encompasses pooling layers like Maxpool and global average pools. These layers are organised into 14 modules containing three 3 × 3 convolutional filters and two strides. Batch normalisation comes before the convolutional layers. The nonlinearity of each layer’s output is emphasised using the ReLu AF, which is implemented throughout the model. At the end of the architecture, after a global average pool layer, there is a dense network and LR. DenseNet121: DenseNet121 differs from other architectures because it is built from dense blocks. A dense block comprises layers that concatenate the features of the previous layers. After the CL, dense blocks are separated by a convolutional filter size of 1 × 1 and a 2 × 2 average pooling layer with a stride of 2 units. Zero padding, convolution, batch normalisation, and activation layers make up each dense block. The block concludes with a concatenation layer, which combines the output of the layers that came before it. A 0.2-dropout layer is represented by a 1-unit DL and sigmoid activation. The implementation of the proposed and evaluated by comparing methods are shown in Table 2. Figure 4 depicts the combined experimental outcomes of previously used and newly developed approaches.

5 Conclusion and Future Work It has become increasingly crucial in automating our daily lives in the last few years, delivering significant improvements over traditional machine learning algorithms. Most researchers believe that DL-based applications will replace humans in daily activities within the next 15 years, based on their impressive performance.

Medical Images Analysis for Segmentation and Classification Using DNN

533

Fig. 4 Results of DL with a proposed model

DL in health care, especially medical images, is slow compared to real-world problems. We’ve highlighted open research issues. Many large research organisations are developing a DL-based solution for medical images. We hope ML will soon replace humans in most medical applications, especially diagnosis. Despite its growth, we shouldn’t consider it the only solution. Unavailable annotated datasets constitute a significant barrier. If good training results can be accessed without negatively affecting the achievement of the DL algorithm, this issue can still be resolved. The unknown is the healthcare applications of big data. We need more advanced and powerful DL methods to deal with patient records because of their sensitivity and challenges.

References 1. Bir P, Balas VE (2020) A review on medical image analysis with convolutional neural networks. In: IEEE international conference on computing, power and communication technologies, pp 870–876 2. Gao J, Jiang Q, Zhou B, Chen D (2019) Convolutional neural networks for computer-aided detection or diagnosis in medical image analysis: an overview. Math Biosci Eng 16(6):6536 3. Huang J, Wei Y, Liang D (2021) A deep neural network for fusion with medical image pair. In: IEEE 2nd international conference on information technology, big data and artificial intelligence, pp 553–557 4. Jafari M, Auer D, Francis S, Garibaldi J, Chen X (2020) DRU-Net: an efficient deep convolutional neural network for medical image segmentation. In: IEEE 17th international symposium on biomedical imaging, pp 1144–1148 5. Nisa SQ, Ismail AR, Ali MABM, Khan MS (2020) Medical image analysis using deep learning: a review. In: IEEE 7th international conference on engineering technologies and applied sciences, pp 1–3 6. Niu L, Xiong G, Shen Z, Pan Z, Chen S, Dong X (2021) Face image-based automatic diagnosis by deep neural networks. In: IEEE 16th conference on industrial electronics and applications, pp 1352–1357

534

A. Mehbodniya et al.

7. Noothout JMH (2020) Deep learning-based regression and classification for automatic landmark localization in medical images. IEEE Trans Med Imaging 39(12):4011–4022 8. Pinckaers H, Van Ginneken B, Litjens G (2022) Streaming convolutional neural networks for end-to-end learning with multi-megapixel images. IEEE Trans Pattern Anal Mach Intell 44(3):1581–1590 9. Qi Y, Guo Y, Wang Y (2021) Image quality enhancement using a deep neural network for plane wave medical ultrasound imaging. IEEE Trans Ultrason Ferroelectr Freq Control 68(4):926– 934 10. Sahiner AP, Hadjiiski LM, Wang X, Drukker K, Cha KH (2019) Deep learning in medical imaging and radiation therapy. Med Phys 46(1):e1-E36 11. Sengupta S, Basak S, Saikia P, Paul S, Tsalavoutis V, Atiah F (2020) A review of deep learning with special emphasis on architectures applications and recent trends. Knowl-Based Syst 105596 12. Spirito D (2021) Reconstructing undersampled photoacoustic microscopy images using deep learning. IEEE Trans Med Imaging 40(2):562–570 13. Wang W (2020) Medical image classification using deep learning. In: Deep learning in healthcare. Intelligent systems reference library. Springer, Cham, p 171 14. Zhou X (2020) A comprehensive review for breast histopathology image analysis using classical and deep neural networks. IEEE Access 8:90931–90956 15. Sudhakar S, Chenthur Pandian S (2012) Secure packet encryption and key exchange system in mobile ad hoc network. J Comput Sci 8(6):908–912 16. Sudhakar S, Chenthur Pandian S (2016) Hybrid cluster-based geographical routing protocol to mitigate malicious nodes in mobile ad hoc network. Int J Ad Hoc Ubiquitous Comput 21(4):224–236 17. Priyadarshni AU, Sudhakar S (2015) Cluster based certificate revocation by cluster head in mobile ad-hoc network. Int J Appl Eng Res 10(20):16014–16018 18. Sudhakar S, Chenthur Pandian S (2015) Investigation of attribute aided data aggregation over dynamic routing in wireless sensor. J Eng Sci Technol 10(11):1465–1476 19. Sudhakar S, Chenthur Pandian S (2013) Trustworthy position-based routing to mitigate against the malicious attacks to signifies secured data packet using geographic routing protocol in MANET. WSEAS Trans Commun 12(11):584–603 20. Sudhakar S, Chenthur Pandian S (2013) A trust and co-operative nodes with affects of malicious attacks and measure the performance degradation on geographic aided routing in mobile ad hoc network. Life Sci J 10(4s):158–163

Robotic Basket and Intelligent Stacking Supported Automated Hygienic Shopping System Rishiraj Jagdish Tripathi, Aakanksha Tripathi, and Geeta Tripathi

Abstract Recently, COVID-19 pandemic has reemphasized on hygienic system for shopping and by introducing automation it can be assured with little intervention of human beings. Here is an idea of automated shopping system using intelligent stacking and Robotic Basket. A display will be designed for the user to view the items available. User may enter the shopping list into the system, pay the bill, and get the token. This data will be sent to the centralized management system. A Robotic Basket free at that moment will take the command and do the further job. Robotic Basket will be able to perform certain specific functions. It will be able to move through the stacks of items and collect the products on its own. Once the product list along with quantity of each is given by the customer, the basket will join the reception area flashing the token number. There will not be any human intervention in collecting the product from shopping center. The smart system will also tally whether the number of products given by the customer is equal to the products collected by the basket. When the job is executed by the basket, it may flash and announce the token number, and then they will receive the products, and basket’s job is done. The basket will contain pedal-operated sanitizer dispenser. After customer collects the items shopped, the basket will enter the queue of waiting baskets. At the entrance of the queue, there will be sanitizer sprayer that will sanitize the newly coming basket after customer has collected items from it. This will be very beneficial for hands free shopping experiences, especially for elderly people, pregnant women, and specially able people. Keyword Robotic basket · Intelligent stacking · Hygienic system · Shop management system

R. J. Tripathi (B) University of Hertfordshire, Hartfield, UK e-mail: [email protected] A. Tripathi Maharashtra Institute of Technology, Aurangabad, MS, India G. Tripathi Guru Nanak Institute Technical Campus, Hyderabad, TS, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_41

535

536

R. J. Tripathi et al.

1 Introduction In countries like India, population is enormously high and the situation get worse if we encounter anything like recent COVID-19 pandemic. Using the technological advancements to cater such situations to prevent rather than controlling after effects is need of the hour. Shopping management systems in variety of flavors are available. Shopping is not only necessity for buying the much needed survival to status quo as per the studies. Some of the psychological studies have also shown that it is one of the activities responsible to produce the happiness. Though numerous ways of online shopping have emerged and the convenience of shopping from sitting anywhere purchasing anything is just a button click away, the feel of shopping with looking for products and bringing it personally has a different level of satisfaction. During lockdown, it was realized that a safe and hygienic shopping system is what people are looking for. Design of one such system is presented in this paper that can provide a hygienic system for shopping with added functionality of smooth maintenance of the overall system.

2 Related Work Several factors like waiting in queue for moving trolleys, carrying the trolley manually, and many such factors have led to the work in this area to provide better solutions. Some of the works done in the line are either robotic trolleys that follow the owner’s instructions or lead or follow the owner as given in papers [1–3]. But it doesn’t give any idea for its use in automated shopping. It can be a sort of basically for the travelers to avoid the carrying of heavy baggage for even smaller distances. Even though dragging or pushing trolley is provided automation, but the items of the shopping list are to be picked up and put in the cart by the buyer. The idea proposed by authors in papers [4–9] are related to use of various types of sensors and how it can be used to move in different directions using remote instructions or processing of signals captured. Again the buyer has to move along with the trolley to pick up and drop the items in the cart. Authors in paper [10] have proposed an idea of better service at the airports by having modified version of the trolleys carrying luggage. Though it is in a different domain, few features of automation in trolley are there which are not sufficient for its use in shopping system. Further work in the automated shopping system has been for automated bill generation. In paper [11–13], the authors have given an idea where the buyers are moving their trolleys in the shopping mall and as they select the product, a scanner on the trolley reads the code, RFID in some cases, on the product and adds it to the bill. So the buyers can avoid the long queue at the checking out counter for scanning products and paying the bill then. In the papers [14, 15], authors presented the idea of robotic assistant trolley that may be enlisting the important constituents of product after reading its code and can suggest if it is to be added in the cart or not along with

Robotic Basket and Intelligent Stacking Supported Automated Hygienic …

537

automated billing. In most of all these papers, the shopping trolleys are moved around by the buyers and even the products are picked up by the buyers and placed in the trolleys. While in some other cases of Amazon, Flipkart, and similar applications, the total shopping system is online where the customers can buy online, and products are delivered to the mentioned address. The feel of going out for shopping is missed in such cases.

2.1 Research Gap The online shopping system is best suited for the customers who are willing to get the products delivered at their doorstep. But there are many evidences of lot of differences product shown while ordering and the actual product delivered. Many times replacement or refund policies are complicated, and so the users may find it difficult to be confident on such way of shopping. Moreover, the customer may feel for some time or the other to visit the shop and buy the desired product. This may include customers visiting shops and picking up the desired items from the shelf, putting them in the trolley and if their mind changes then placing them back onto the shelf. This is meaning that if several customers visit the shop and touching, then maybe same product before finally someone buying it. The new norms after COVID-19 are to always be careful of hygiene as a precautionary measure. Another issue is, there should always be someone to keep check on the sufficient items on the shelf. If any of the item quantity is below the threshold value, place an order for it. Finally at the end of the check the billing and sanitize the shop and trolleys and other items in the shop as much as possible. And arrange the trolleys in proper places. Moreover keep watch on if any of buyer should not pick up any item unbilled. If vending machine-based shopping systems are used, then also each of the buyer has to put money in the vending machine, and using the panel has to select items and quantities from the given list. This again is a threat to safe from hygiene point of view according to the new norms. So, a solution is required, wherein buyers can see the products from the list available along with the details and place the order as shopping list, pay digitally, get a token, and collect the items. Check it and then take it. The trolleys are moving on their own collecting items for each customer according to their shopping list fed. Ensuring the shopped items get handed over to authenticate buyer and get sanitized automatically while returning back automatically to join the trolley queue back. In such a system, the buyer will have the feel of safe shopping without having fear of contracting anything that causes contagious disease.

2.2 Objectives Few of the objectives of this overall implementation are like providing automated trolley that will collect items according to submitted list. Better management of items

538

R. J. Tripathi et al.

in stacks. Automated stacking will drop items as per the list received efficiently into the basket. Alert to the registered number when any item below threshold count on the stack. This number may further be used to initiate inventory purchasing. The overall system will ensure the modules which are in communication with each other wirelessly. The trolley and the overall shop will have automated spraying to ensure hygiene. Overall system will make the shopping for customers and maintenance for owners a better experience.

3 Methodology and Design The idea of automated shopping system in relation to automated basket cart and the stacking similar to vending machines is in brief shown by the flowchart in Fig. 1. The customer has to place the order for shopping looking at the items and their details displayed on a screen, pay bills, and get a token. An alert at end will be given with flashing token number as the basket is ready with the order. There is no need for the customer to pick up the items and place it in the basket. The basket will automatically do it. After successful payment, all baskets in the queue will receive the

Fig. 1 Flowchart of overall system of automated cart basket and stacks as in vending machines

Robotic Basket and Intelligent Stacking Supported Automated Hygienic …

539

list and token number. The basket will check if it is the first basket in the queue. If yes, it will move for shopping, and other baskets will delete the recently received shopping list and token. The basket while moving automatically will collect all the listed items. The communication between the automated baskets is shown in Fig. 2 with number (104), the stacks shown with number (106), and the centralized controlling system shown by number (102) in Fig. 2. Figure 3 gives various components of the overall control system various modules, as a–e. The side view of the basket trolley is in Fig. 4 and its top view with the various dashboard functions in Fig. 5. The dashboard unit on the basket will display the token number and have a scanner to scan the token with the customer. The circuitry of push buttons to open/close drawers will start working if the scanned customer token matches with the stored token in the basket. This ensures the basket is handed over to proper customer. The items will be placed in the proper compartments of the basket in an organized way like grains, milk/oil or any other liquid material packets, bottles of different varieties, chips/biscuits or any other items that may get crushed, small-sized miscellaneous items like chocolates, eraser, pencil, etc., and a section for toiletries, etc. This will make it easier to collect the shopped items. The drawers 1, 2, 3 of the basket can be opened by pressing corresponding buttons 1, 2, 3 on the dashboard of the basket. A pair of hooks on the basket will help to hold the bag for placing the shopped items. A pedal-operated sanitizer dispenser will be attached to the basket for the customer to sanitize hands before collecting the shopped items. The customer will press a button labeled ‘Done!’ after collecting all the items from the basket. This will be acting as a command for the basket to join the basket queue Fig. 2 Centralized controlling mechanism (102), connecting baskets (104), and the stacks (106)

540

Fig. 3 Various modules in centralized controlling mechanism (102)

Fig. 4 Side view of the trolley basket showing various components

R. J. Tripathi et al.

Robotic Basket and Intelligent Stacking Supported Automated Hygienic …

541

Fig. 5 Top view of the trolley basket showing various components and dashboard

at the rear end automatically. So, there will be no need to manually get the basket and put it in queue. Before the empty basket joins the queue, it will be sanitized with the automatic spraying system. This will ensure the hygiene. A counter to count quantity for each item will be initiated in the basket after receiving the shopping list. Every item will be given a unique code. This will make the basket easy to track the item and its quantity as per shopping list. Similar category of items can be placed in a group of stack like in a vending machine kind of structure as shown in Figs. 6 and 7, in top and front view, respectively, for each item. Each group will have a number for identification. Each stack will have number matching to the unique code given to the item and a mechanism to make single quantity of item to slide in the basket. Stacks will be mounted in a way as shown in Figs. 8 and 9, in side and front views. The line follower mechanism in the basket will make it move on the proper track. A prototype at the basic level was implemented as shown in Fig. 10. If the shopping list has some quantity of one or more items from a stack group, it will stop in front of the stack group and open the particular drawer of the basket. It will send a message to corresponding stack to activate the mechanism of the stack to slide out item in required quantity. With every slide out, the counter will be reduced by one. The process will continue until the counter becomes zero. This will continue for the stacks in sequence in the group. After completing receiving of items from a stack group, the drawer will be closed, and the basket will move to the next group.

542 Fig. 6 Top view of one item dispenser

Fig. 7 Front view of one item dispenser

R. J. Tripathi et al.

Robotic Basket and Intelligent Stacking Supported Automated Hygienic …

543

Fig. 8 Side view of stack containing individual item dispensers of similar category

Fig. 9 Side view of stack containing individual item dispensers of similar category

If any of the items on the stack are below a threshold value, an alert with the unique item code will be send to the saved mobile number of the shopkeeper using the concept of IoT. This will also update data on the display board near the order placing unit to make the customer aware of the availability of the item. This will continue for all the stack groups until all the initialized counters as per shopping list are zero.

544

R. J. Tripathi et al.

Fig. 10 Basic prototype of the trolley basket

Once the list is completed, an alert will be flashed through the dashboard of the basket as the basket will move to the collection counter for the customer. This system won’t require even the shopkeeper or any supporting staff to remain present all the time. The mechanism on the basket will require power to drive the corresponding electronic circuitry. This may be done with the help of DC batteries of requisite power or an arrangement for solar panels and charging units can be done. If solar panels and charging units are used, then the baskets maybe charged in off time and made available in the working time. The communication between different units installed on the basket, the stack groups, the stacks, automated sanitizer spraying system, and the shopping list order placing unit will all be done using either Bluetooth or Wi-Fi or similar. A display containing the list of items available and their details will be there near the order placing units to help the customers prepare a proper shopping list.

4 Impact of the Proposed System The system will be automated shopping system without customer’s physical intervention to select the items and without bothering for social distancing and hygiene. This will be beneficial for the shopkeepers too, to manage the overall things as the alerts will be received to shopkeeper from time to time. Likely, impact on various components or stakeholders of the system are discussed as follows.

Robotic Basket and Intelligent Stacking Supported Automated Hygienic …

545

4.1 For Shopkeeper This intelligent shopping system with minimum human intervention will save human efforts. Efficient management of customer and the items is another advantage for the shop owners. Easy to manage crowd as an intelligent basket is there to automatically collect the shopping items. Better utilization of space in the shop as only trolleys moving in line, the space can be utilized more efficiently. The buying process becomes faster as the intelligent basket cart is doing it which is expected to work faster than human beings. Automated sanitization of the basket while joining back the basket or cart queue, thereby maintaining hygiene in COVID and in general too. Ease of managing the shopping list and billing system is added benefit for the shopkeepers. Almost full automation may require minimal intervention of the shopkeeper, so that he/she may get time to plan for growth of the units. Automated alerts to the shopkeeper’s registered phone number maybe set for if quantity of any item is below threshold. This avoids the shopkeeper to go and personally check the quantity of items on the stack. So it is easier to keep track of accounts and inventory of items. Analyzing the data using AI and ML techniques for further business decisions can be done.

4.2 For Customer Intelligent shopping with minimum human intervention saves the efforts of customers while shopping. An intelligent basket cart to automatically collect the shopping items, so no need to get manually the items placed in the basket. Customer can get the basket with ordered items fast as the process of moving basket and placing required items in it is automated. The items are placed in different compartments according to their categories. This will help the customer to get the items intact and in an organized way and easier to cross verify if anything missed. Ease of shopping for elderly and physically challenged people. Before collecting the shopped items, the customer has to scan the token. This will ensure correct authenticated handing over of the shopping basket items.

4.3 For Society The use of technology to ease the shopping experience of the customer will have a better societal impact. Such a system, if implemented, will ensure safety of customers by proper measures of hygiene, thereby reducing the risk. Less conflict cases as the basket will hand over shopped items only after verifying authentic token.

546

R. J. Tripathi et al.

5 Conclusion The proposed idea has multifold benefits if implemented in practical. Though the system seems complex and costly to implement as compared to the un-automated regular shopping system, but it has several added benefits worth investment. It is not only making use of technology for the service of people but also ensuring wellbeing of the society by taking proper precautions. Improvements in this system can be incorporated by analyzing the data, if it is saved, over a period like daily basis or quarterly or yearly, using various techniques of AI, machine learning and data analytics.

References 1. McHale BC, Winkle DC, Atchley MD, Chakrobartty S, High DR (2019) Shopping facility assistance system and method having a motorized transport unit that selectively leads or follows a user within a shopping facility. In: Google patents 2. Dehigaspege L, Liyanage M, Liyanage N, Marzook M, Dhammearatchi L (2017) Follow me multifunctional automated trolley. Int J Eng Res Technol (IJERT) 6(7):84–90 3. Ng YL, Lim CS, Danapalasingam KA, Tan MLP, Tan CW (2015) Automatic human guided shopping trolley with smart shopping system. Jurnal Teknologi 73(3). https://doi.org/10.11113/ jt.v73.4246 4. Rupanagudi SR et al (2015) A novel video processing based cost effective smart trolley system for supermarkets using FPGA. In: 2015 international conference on communication, information & computing technology (ICCICT), pp 1–6. https://doi.org/10.1109/ICCICT.2015.704 5723 5. Wang Y-C, Yang C-C (2016) 3S-cart: a lightweight, interactive sensor based cart for smart shopping in super markets. IEEE Sens J 16(17):6774–6781. https://doi.org/10.1109/JSEN. 2016.2586101 6. Suryanto ED (2018) Design of automatic mobile trolley using ultrasonic sensors. J Phys Conf Ser 1007(1):012058. https://doi.org/10.1088/1742-6596/1007/1/012058 7. Islam MM, Lam A, Fukuda H, Kobayashi Y, Kuno Y (2019) A person-following shopping support robot based on human pose skeleton data and Lidar sensor. In: International conference on intelligent computing, pp 9–19. https://doi.org/10.1007/978-3-030-26766-7_2 8. Rawashdeh NA, Haddad RM, Jadallah OA, To’ma AE (2017) A person-following robotic cart controlled via a smartphone application: design and evaluation. Int Conf Res Educ Mechatron (REM) 2017:1–5. https://doi.org/10.1109/REM.2017.8075245 9. Gunawan AA et al (2019) Development of smart trolley system based on android smartphone sensors. Proc Comput Sci 157:629–637. https://doi.org/10.1016/j.procs.2019.08.225 10. Novalina SD, Panjaitan A, Simanjuntak RP, Tamba BY (2019) Design of automatization of trolley cover based on microcontroller AT Mega 8535 as a learning media at ATKP Medan, vol 648. In: IOP conference series: materials science and engineering, the 2019 international conference on information technology and engineering management, 27–29 June 2019 11. Chandrasekar P, Sangeetha T (2014) Smart shopping cart with automatic billing system through RFID and ZigBee. In: International conference on information communication and embedded systems (ICICES2014), pp 1–4. https://doi.org/10.1109/ICICES.2014.7033996 12. Yewatkar A, Inamdar F, Singh R, Ayushya AB (2016) Smart cart with automatic billing, product information, product recommendation using RFID & Zigbee with anti-theft. In: 7th international conference on communication, computing and virtualization, pp 793–800. https:// doi.org/10.1016/j.procs.2016.03.107

Robotic Basket and Intelligent Stacking Supported Automated Hygienic …

547

13. Dhianeswar R, Gowtham M, Sumathi S (2018) Smart trolley with automatic master follower and billing system. In: International conference on computer networks, big data and IoT, pp 778–791. https://doi.org/10.1007/978-3-030-24643-3_92 14. Bertacchini F, Bilotta E, Pantano PB (2017) Shopping with a robotic companion. Comput Hum Behav 77:382–395. https://doi.org/10.1016/j.chb.2017.02.064 15. Onozato T, Tamura H, Kambayashi Y, Katayama S (2010) A control system for the robot shopping cart. In: 2010 IRAST international congress on computer applications and computational science (CACS 2010), pp 907–910. https://doi.org/10.1007/978-3-642-23851-2_29

Traffic Sign Detection—A Module in Autonomous Vehicles I. Amrita

and Bhagyashri R. Hanji

Abstract Autonomous driving vehicles have made advancements in the modern world of research. Tesla is one such example of autonomous driving car. The selfdriving cars are designed in such a way that it is capable of handling any situations which humans can handle while driving a car. There are many applications of this autonomous driving vehicle, and one such example is the physically challenged person feeling the safety of driving a car. Without the need of the driver, autonomous cars have proven to be efficient and reduce road traffic accidents. For the complete working of these cars, there are multiple modules that perform different tasks, and one such module is the traffic signboard detection. The present work aims to detect the traffic signboard by using German Traffic Sign Recognition Benchmark (GTSRB) dataset obtained from Kaggle. The Convolutional Neural Network architecture is used to solve the problem of classification of traffic signals. The class names are converted from Text to Speech by using Google speech (gTTS). Unit test cases are written for validating the outputs against inputs at each module. The accuracy obtained by the Convolutional Neural Network is 98%. However, this work can be improved by the detection of other obstacles coming along the way. This work has provided the accurate classification of signboards, which is important to follow the traffic rules. Keyword Accuracy · Autonomous vehicle · Classification · Signboard · Text to speech (TTS)

1 Introduction Huge number of deaths occur due to road traffic accidents because of negligence by the driver. There maybe multiple reasons such as intoxication of the driver, high I. Amrita (B) · B. R. Hanji Department of Computer Science, Global Academy of Technology, Bangalore, India e-mail: [email protected] B. R. Hanji e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_42

549

550

I. Amrita and B. R. Hanji

speed, violence of traffic rules, distractions like phone calls, animals, negligence of the drivers, and much more. According to the National Crime Records Bureau news, 36 in every 100 road accidents are fatal, which is the highest count from the past 20 years. There is a need for reduction of this number by applying and following certain precautionary rules. The main idea behind self-driving cars is to ensure safe driving by following certain rules. The autonomous vehicles are trained in such a way that there is a feeling that the car is driven by an expert. The pre-trained algorithms helps in detection of traffic signs, obstacles, speed, and direction of other vehicles that are passing nearby, adjust to the climate change accordingly, and also to deliver safety for the passengers in the car. There are multiple companies that are leading in the field of self-driving cars; however, the implementation and algorithms used might differ slightly from one company to the other, but providing safety at any circumstance is provided. There are different algorithms for every module which is tracked at every point of time by the sensors and is made sure that the accuracies are maintained at each and every point of time. In the recent times, there is an advancement in the application of deep learning which is very much useful in the real-world solution to the problems. The application of deep learning has proven its accuracy in the field of medicine, agriculture, robotics, testing, driverless vehicles, and much more. The brief of the organization of the paper is as follows: Sect. 1 briefs about the introduction of the present work. Section 2 provides the information of the similar works. Section 3 involves in giving the dataset description. The methodology is described in the Sect. 4 trailed by Results and Discussion in Sect. 5. Section 6 gives the Conclusion and Future work. Finally, the list of references are attached at the end followed by the details of the authors.

2 Related Works Liu et al. proposed a cascade-secade model for traffic sign detection by using Chinese Traffic Sign Benchmark dataset and obtained a result greater than 6% compared to the normal performances. However, the model which is proposed can be used to solve other real-time problems [1]. Shen et al. derived the adaptive pyramid convolution and found a significant increase in the results which could solve real-world traffic solving with small images [2]. Wan et al. proposed YOLO v3 pruning architecture which could achieve an accuracy of 94% on Chinese traffic database [3]. Rodriguez et al. on a Mexican dataset combined RCNN and YOLO v3 for classification and found that it is a great combination and was able to achieve 99.7% accuracy [4]. The Random gradient succession with momentum (RGSM) was compared along with CNN architecture and found that weighted CNN gave an accuracy of 99.85% by Sudha et al. [5]. Lopez et al. conducted the performance test for GPU, TPU and CPU and found that the combination of SSD and MobileNet in TPU environment gave the better result [6]. Yogesh et al. used the CARLA simulator database for finding the efficacy of YOLO algorithm [7].

Traffic Sign Detection—A Module in Autonomous Vehicles

551

Haque et al. proposed and used thin yet deep convolutional network architecture and found that the model outperforms other models even when there are five times less parameters [8]. Dewi et al. demonstrated the working of YOLO v3, v4 and compared the results and found that v4 is better [9]. They also used the synthetic data to generate a complex data out of DCGAN networks [10]. Bi et al. improvised the VGG for efficiency-enhancement and proposed a new model which could increase the efficiency by 30% [11]. 2D-shallow 3D hybrid CNN model was proposed by Bayoudh et al. for comparison with normal CNN model. The proposed model showed the considerable increase in the efficiency in terms of accuracy [12]. The team Yazdan et al. addressed the problem of the failure in recognizing the images by overcoming the effects of scale and rotation. They were able to improve the accuracy of the algorithms by 4% [13]. Qin et al. used the conventional deep learning methods for detection of traffic signboard images [14]. Yolo v4-tiny algorithm is used for the recognition of the signboard by Wang et al. [15]. Hasan et al. used Support Vector Machines (SVM) and CNN for classifying the traffic signboards. The accuracy achieved was 96% and 94%, respectively [16]. Zaibi et al. used LeNet for traffic signboard classification [17]. Wei et al. used the attention and path-aggregation methods for detection of traffic signboard images of Chinese database [18]. Video traffic detection was done by using driving simulation in the traffic database [19]. Residual Networks were used by the Devi et al. team for traffic signboard classification and detection [20]. The deep learning applications are found not only in the detection of traffic signs, but also in various other fields, and the recent study of the COVID-19 pandemic spread by Snigdha et al. is also one such example of the applications of machine learning and deep learning [21]. When we look at the entire literatures referred in this work, it is found that majority of the literatures are using YOLO algorithm for their analysis, which can be seen in Fig. 1. Hence, it is decided to work on Convolutional Neural Network for computation in this work.

3 Description of Dataset The dataset used for the present work is the traffic signboard images of GTSRBGerman Traffic Sign Recognition Benchmark (https://www.kaggle.com/datasets/ meowmeowmeowmeowmeow/gtsrb-german-traffic-sign) published by Kaggle. The data is originally published by INI Benchmark website which had considered the multi-class image classification challenge held at IJCNN-2011. The dataset was made public by them. The dataset consists of more than 50,000 images in total. The dataset is split into meta, training, and testing data. The metadata consisted of 42 images and testing data consisted of 12,631 images in total. The training set consists of 43 classes with class labels as 0–42. Each and every class is distinct with different folders and related images in that folder. By using this dataset, the classification is done. The examples of the representation of the data are done in Fig. 2.

552

I. Amrita and B. R. Hanji

Fig. 1 Comparing methodologies used in the literatures referred

(a)

(b)

(c)

Fig. 2 Examples of signboards in the dataset

4 Methodology The current work, traffic sign detection in the autonomous vehicles consists of two different modules. They are the recognition module and the speech module. The dataset is passed into the recognition module, and the value returned by the recognition module is passed as a parameter to the speech module and the speech initiated. The detected image from the recognition module is converted into the speech. Figure 3 represents the workflow across various modules. The description of each module is given below. i. Sign Recognition Module: This module consists of the image recognition of the traffic signboard images. There are three steps involved in this module. They are data preprocessing and augmentation. The image segmentation is done by

Traffic Sign Detection—A Module in Autonomous Vehicles

553

Fig. 3 Workflow diagram

the development of Convolutional Neural Network (CNN) and finally the image name is returned by the module. The data preprocessing involves ImageDataGenerator. The ImageDataGenerator is used for shear rotation of the images from 0 to 360° for producing multiple images per epoch and hence increasing the training data. The Convolutional Neural Networks are helpful for classifying the images into their respective classes based on the training data. The test set is for testing the accuracy of the model. There is no need for the splitting of the data into training and testing set since the dataset provided contained the same. Finally, the returned value is the module name based on the test image. ii. Speech Recognition Module: The speech recognition module is using the Google Text to Speech (gTTS). The result returned from the recognition module is converted into speech. This is helpful for the driver to feel safe while driving. This is an application for the blind person driving a car without the help of any driver. Every module is written and tested with unit test code for every unit. The unit test is done by using the pytest module in the python. The images are tested against their class values, and the speech is tested in such a way that the text input for the speech

554

I. Amrita and B. R. Hanji

recognition module is only the class names. However, there are other examples that can be considered for the conversion of test to speech. The implementation in this paper is done for only the image recognition.

5 Results and Discussion The results obtained in this work are displayed below. The training accuracy of the CNN algorithm is 94.68% and testing accuracy is 98.19%. The graph for the accuracies is described in Fig. 4. The graph for the loss is described in Fig. 5. Table 1 represents the accuracy and loss values obtained in the experimentation of this work. The log file is generated at each step, and these logs are evaluated against the testing and training test cases, and comparisons are done at each step. If the result or any image is found to be malicious, then it is rejected. Fig. 4 Accuracy graph

Fig. 5 Loss graph

Traffic Sign Detection—A Module in Autonomous Vehicles Table 1 Accuracies and loss function values

Table 2 Comparing the results obtained with the literatures referred

555

Classification result

Accuracy (%)

Loss

Training

98.19

0.0678

Validation

94.68

0.0236

Literatures

Accuracy (%)

M. Sudha et al.

99.85

Hasan et al.

94

CNN used in this literature

98.19

From the accuracy and loss graphs, it is clearly known that accuracy is increasing with epochs and loss is decreasing with epochs. The results obtained in the current work is compared with the literatures using CNN and tabulated in Table 2. The Convolutional Neural Network is used in this work for the image segmentation because of the accessibility to write the test cases against each module. The Google Text to Speech API was readily available for use in the python library and hence the use of gTTS.

6 Conclusion and Future Work The traffic signboard classification is one of the applications in classic deep learning. The utilization of Keras and TensorFlow packages in python is a boon for the programers to reduce the computation and the programming efforts. There are many such other libraries that are useful for many other applications. In this work, the prediction was done with the help of CNN and achieved 98% accuracy. The drawback of this work is that the implementation is limited with only image classification using deep learning. There are many other methods for classification and segmentation of images and even live video like You Only Look Once (YOLO) algorithms. In the future, the implementation of various other different modules of the autonomous driving cars are to be done.

References 1. Liu Z et al (2021) Cascade saccade machine learning network with hierarchical classes for traffic sign detection. Sustain Cities Soc 67:102700 2. Shen L et al (2021) Group multi-scale attention pyramid network for traffic sign detection. Neurocomputing 452:1–14 3. Wan J et al (2021) An efficient small traffic sign detection method based on YOLOv3. J Signal Process Syst 93(8):899–911

556

I. Amrita and B. R. Hanji

4. Rodríguez RC et al (2022) Mexican traffic sign detection and classification using deep learning. Expert Syst Appl 202:117247 5. Sudha M (2021) Traffic sign detection and recognition using RGSM and a novel feature extraction method. Peer-to-Peer Netw Appl 14(4):2026–2037 6. Lopez-Montiel M et al (2021) Evaluation method of deep learning-based embedded systems for traffic sign detection. IEEE Access 9:101217–101238 7. Valeja Y et al (2021) Traffic sign detection using Clara and Yolo in python. In: 2021 7th international conference on advanced computing and communication systems (ICACCS), vol 1. IEEE 8. Haque WA et al (2021) DeepThin: a novel lightweight CNN architecture for traffic sign recognition without GPU requirements. Expert Syst Appl 168:114481 9. Dewi C et al (2021) Yolo V4 for advanced traffic sign recognition with synthetic training data generated by various GAN. IEEE Access 9:97228–97242 10. Dewi C et al (2021) Synthetic data generation using DCGAN for improved traffic sign recognition. Neural Comput Appl:1–16 11. Bi Z et al (2021) Improved VGG model-based efficient traffic sign recognition for safe driving in 5G scenarios. Int J Mach Learn Cybern 12(11):3069–3080 12. BBayoudh K, Fayçal H, Mtibaa A (2021) Transfer learning based hybrid 2D-3D CNN for traffic sign recognition and semantic road detection applied in advanced driver assistance systems. Appl Intell 51(1):124–142 13. Yazdan R, Varshosaz M (2021) Improving traffic sign recognition results in urban areas by overcoming the impact of scale and rotation. ISPRS J Photogramm Remote Sens 171:18–35 14. Qin Z, Yan WQ (2021) Traffic-sign recognition using deep learning. In: International symposium on geometry and vision. Springer, Cham 15. Wang L et al (2021) An improved light-weight traffic sign recognition algorithm based on YOLOv4-tiny. IEEE Access 9:124963–124971 16. Hasan N, Anzum T, Jahan N (2021) Traffic sign recognition system (TSRS): SVM and convolutional neural network. In: Inventive communication and computational technologies. Springer, Singapore, pp 69–79 17. Zaibi A, Ladgham A, Sakly A (2021) A lightweight model for traffic sign classification based on enhanced LeNet-5 network. J Sens 2021 18. Wei H et al (2022) MTSDet: multi-scale traffic sign detection with attention and path aggregation. Appl Intell:1–13 19. Zhang T et al (2022) An efficient framework of developing video-based driving simulation for traffic sign evaluation. J Safety Res 81:101–109 20. Kiruthika Devi S, Subalalitha CN (2022) A deep learning-based residual network model for traffic sign detection and classification. In: Ubiquitous intelligent systems. Springer, Singapore, pp 71–83 21. Sen S et al (2021) Analysis, visualization and prediction of COVID-19 pandemic spread using machine learning. In: Innovations in computer science and engineering. Springer, Singapore, pp 597–603

Ms. I. Amrita is currently pursuing Master of Technology in Computer Science and Engineering, Global Academy of Technology, Bangalore, India. She is also in the course of internship in Bosch Global Software Technologies, Bangalore. She completed her Bachelor of Technology in Computer Science and Engineering Global Academy of Technolo-gy, Bangalore. She has published several papers in international conferences. Her area of interest lies in Machine Learning, Bigdata, Automation and Deep Learning.

Traffic Sign Detection—A Module in Autonomous Vehicles

557

Dr. Bhagyashri R. Hanji is currently working as Professor and Head, Computer Science and Engineering, Global Academy of Technology, Bengaluru. She has published several papers in international/national conferences and journals. She has served in various capacities as reviewer, Technical Program Committee Member, Advisor Board Member and Vitor for various journals and conferences. Her area of interests lies in Machine Language, Deep Learning, Cryptography, Information and Network Security.

A Deep Learning Model for Arabic Fake News Detection Based on Transformers Ahmed Binmahdfoudh

Abstract The spread of Fake News (FN) on social media is accelerating. This study examines the articles from a multimodal perspective that includes both the text and the related image. In order to complete this thesis, several studies were conducted. Using the first, we can automatically distinguish between various forms of media on social networks. With the help of a comparison with an earlier image, the second one enables the identification and localization of text modifications. A unique system could be created by drawing on predictions made by other researchers and combining their findings. Our model BERT-CNN outperforms the best state-of-the-art methods. Our recommended model BERT-CNN reaches an accuracy of 79.1% and outperforms the CNN (62.2%), BERT-CNN (71.6%), CNN-LSTM (75.3%), and multilingual BERT (78%). Keyword Arabic language · Deep learning · Fake news · Natural language processing

1 Introduction Social networks (SN) have been fighting against misinformation and FN on their platforms [1]. Or instead, try to fight against this false information as the volume is essential, and the reproaches of inaction, unsuitable, or insufficient actions are frequent [2]. The recent reports of tweets from Donald Trump’s account in November by Twitter Inc. and the deletion of his profile in January show that moderating or even deleting controversial NC and the accounts of their authors on social media is a complex challenge and a prerogative that many consider should not be the sole responsibility of the company’s managing these platforms. For several years, in parallel with the initiatives of the media, journalists and associations, the editors of SN platforms, in order to fight against FN [3] and misinformation, have been working on algorithms, setting up war rooms for elections, A. Binmahdfoudh (B) Department of Computer Engineering, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_43

559

560

A. Binmahdfoudh

developing functionalities for alerting, and making moderators work. Thus, Twitter regularly cleans up accounts that create or propagate false news when Facebook imagines rating its users based on the FN they report or displaying banners on questionable information [4]. In 2017, these same platforms initiated partnerships with certain journalists to work on fact checking. However, criticism of the means committed and the state of mind of specific SN for this fight are frequent and contributing journalists have shared their disappointment since 2018. For the purpose of preventing users from being sucked in by clickbait, the research study is searching for a method to identify and filter out Arabic FN sites. It is important to identify these answers to problems because they will advantage both the technology companies that are tackling the issue and the people who read. In this paper, our solution uses Machine Learning (ML) and Deep Learning (DL) techniques to FND in Arabic media. The following are the paper’s contributions: • We present ArabFand, a tool for Arabic Fake News Detection (FND) based on BERT-CNN. • The results of the ArabFand model are compared to those of other models already in use. The following is how this paper is structured: an analysis of the available research on FND is presented in Sect. 2. In Sect. 3, ArabFand and the multiple stages we must take to construct our suggested model are introduced. Section 4 discussed our model’s experimental studies. The assessment is shown in Sect. 5. We discuss the outcomes in Sect. 6 and underline some upcoming projects.

2 Related Works Rapid and explosive social media advancement in recent years has contributed to a significant rise in FNs’ numbers [5, 6]. These days, FN is annoying, invasive, distracting, and ubiquitous. Consequently, it is necessary to create a cost-effective detection system for FN. Fake News: False stories spread on the Internet/other social media. Designed to influence political opinions or just as a joke. They are characterized by the following: Volume: The volume of FN is enormous such that anyone can quickly write FN on the Internet without any verification procedure. Variety: There are many sources of FN: rumours, satirical news, fake reviews, FN, fake posters, conspiracy models, fake statements by legislators, etc. Speed: The appearance of FN is speedy. This complicates their detection by countermeasure systems. Despite the existing research in FND, investigate showed in this ground for the Arabic language is restricted and limited. FN is based on four main components: inventor/broadcaster, target victim, News Content (NC), and social context [7].

A Deep Learning Model for Arabic Fake News Detection Based …

561

Creator/Spreader: Creators of online FN may or may not be human. Targeted victims: the victims of FN can be users of social media or other recent platforms. Depending on the objectives of the news, the goals may be students, voters, parents, public elderly, etc. News Content: The body of the news is referred to as NC. Both physical and intangible NC is present, such as the title, body text, and multimedia (e.g. purpose, sentiment, topics). Social Context: How news spreads on the Internet is shown by the social context. The user, network, and broadcast pattern analyses all fall under the category of social context analysis. There must be a clear understanding of who is behind the FN and why it is so widely circulated in the social realms. It is possible for FN’s creator or spreader to be either human or not. Human: The only carriers of FN on social media are social bots or cyborgs. To undermine the trustworthiness of the online social community, these automated accounts have been programmed to disseminate false information. Non-human: The most prevalent non-human creators are social robots or cyborgs, according to FN. It’s a computer program that mimics human behaviour, creating content and interacting with humans via social media to spread rumours, spam, and malware. Physical and intangible news components are both present in every news item. Physical NC: Physical FN content contains news headlines, news bodies, images, videos, hashtags, mention signals, emojis, etc. Due to specific meanings and features, these components are essential for detecting FN. Non-physical NC: Non-physical content pertains to the opinions, emotions, attitudes, and sentiments that news producers wish to convey. For instance, millions of reviews are posted every day on online shopping websites; these unfavourable reviews constitute a significant issue for both brands and online shoppers because they not only have an impact on the decision-making process during the manufacturing process but also have the potential to ruin a brand’s reputation quickly. The term “social context” describes the entire activity system and social setting in which news dissemination takes place [8]. Interactive media technologies are increasingly dominating news-sharing methods today. Social: Online users can not only study about trends but also share their stories and advocate. If they share these knowledges and exchanges within positive social clusters, and the members of these groups share the same ideas, their influence of these can be amplified. This makes it easier to spread FN. According to the authors of [9], several truthiness performance indicators can be classified into two major groups: linguistic landmark approaches (ML) and network analysis approaches (FND). A straightforward method for detecting FN using a naive Bayesian classifier is presented by the authors of [10]. This method is evaluated using a set of data that was taken directly from Facebook news posts. They assert that they can achieve an accuracy of 74%. By contrasting two dissimilar feature extraction techniques and six diverse classification methods, the authors of [11] propose a FND model that makes use of N-gram analysis and ML methods. The results of the experiments

562

A. Binmahdfoudh

demonstrate that the so-called feature extraction method yields the best performance (TF-IDF). The accuracy of the Linear Support Vector Machine (LSVM) classifier was 0.92. According to [12], SN users can verify the truthfulness of the information and define what constitutes a “true” or “false” statement. The methods by which they are validated, the function of medias, and what to anticipate from investigators and social institutions are also covered. According to various modalities, the authors of [9] suggest several strategies and cue types (text, image, and social information). The cost of combining and merging these methods to assess and confirm shared information is also covered. FND models [13] are the next topic for discussion: NC Models: These models can be classified into the following: Knowledge-Based Model: The knowledge-based approach aims to use sources to verify facts in NC. Style-Based Model: FN editors use some specific writing styles needed to appeal to a real news story, such as neutral words and describing events with facts. Better writing quality (considering pinned words, punctuation, and sentence length). • Social Context Models: SN provides additional research resources to complement and enhance topical context models. Social context models are also used for rumour detection and identification of fake NC on Facebook. These patterns are a function of position and spread: 1. Position-based: It is a process that can determine the feelings the user expresses, in favour, against or neutral. There are two ways to represent the position of the explicit or implicit user. Exact positions are where readers give direct expressions, such as thumbs up or down. Implicit positions are positions where sentiments are extracted from social media posts. 2. Spread-based: It is a process that can determine the relationship between relevant events on posts. There are two categories, homogeneous propagation, which contains a single entity, such as a message or an event, and heterogeneous propagation, which contains several entities simultaneously. The news targets current real situations and complete stories, covering different issues (e.g. criminology, health, sport, politics, etc.). Here are some impacts of FN on different domains [5]: (a) In online media, FN helps build false momentum accounts (e.g. recovery of maximum clicks and members). (b) Influences political propaganda, affecting voting decisions in elections; (c) Influencing the financial markets, where millions could be lost.

3 Proposed Model: ArabFand Our system is based on the use of DL to FND. The system considers as input a raw base of remarks and their characteristics and transforms it into a base of characteristics usable by the learning phase. Cleaning, filtering, and encoding operations are carried out by this transformation, which is referred to as preprocessing. A training portion and a testing portion are divided up into the pre-treated base.

A Deep Learning Model for Arabic Fake News Detection Based …

563

3.1 Preprocessing Our goal is to extract the best features to detect FN. We start with the preprocessing of the data from the raw data set, which is subdivided into three types: the textual, the categorical, and the numerical data, which represent, respectively: the text of the news, the basis of the news with its author finally the date and the sensation assumed off by the news. • Textual Data: Corresponds to a preprocessed version of the author’s original text. • Cleaning: Eliminates stop words such as a, about, am, you, are… and special characters such as !?,:,;, and any non-useful information; in our case, these are the digits in the text. – Steaming: Consists of transforming valid words into roots; for example, the words actor, acting, and re-enact are all transformed into the same root act. – Encoding: A Bag of Words (BoW) and an N-gram are combined, and the result is then subjected to the TF-IDF method in order to convert all the words in the comment into a digital vector. – BoW: In this template, the text is represented as a vector containing its words, regardless of their order, but keeping the multiplicity. This technique is mainly used to calculate different measures that characterize the text, for example, the most repeated word in all the documents, i.e. the data set called corpus. But the problem here is that a word repeated throughout the corpus does not mean that it is imperative or characterizes a specific document. To solve this problem, the frequency of the term is weighted by the document’s importance in the corpus. In our system, we have used the TF-IDF method [10]. A statistical technique called Term Frequency-Inverse Document Frequency (TF-IDF) is used to evaluate how effective a term is in a corpus of documents. According to how many times a word initially appeared in the document, the weight increases. This method is used in search engines to measure the significance of a document to a request [12]. Several calculation formulas have been proposed for this method. In our work, we used the following formula: TF(t) = Number of rates of term ‘t’ in the document (n)/Number of terms in the document, keeping the multiplicity of each term (k). IDF(t) = Number of documents (D)/Number of documents citing this term (Dt ). N-gram: in this technique, the text is represented as a vector containing blocks of words, taking into account their order and keeping the multiplicity [12].

3.2 Classification It combines two modules, training and validation, each using a part of the characteristics database divided into two parts, a training base and a test base. The training module uses the training base to provide a decision model, while the validation

564

A. Binmahdfoudh

module uses the test base to measure the decision—performance of the provided model. • Training: We have chosen the Support Vector Machine (SVM) algorithm, which we previously covered in the chapter, to train our model for two reasons: • Because it gives the best results for Text Mining [9]. • To use the decision function value as a confidence level for news classification. When the decision function has a positive value, it concurrently characterizes news that can be accepted and the degree to which it is accurate, and when it has a negative value, it authorizes news that is false and the degree to which it is false. The training results in a model or pattern, representing the analysis of data and their transformation into useful information by establishing relationships between them. Several metrics are used to estimate the quality of the model based on the following values: – VP: Positive examples classified correctly. – FP: The misclassified positive examples. Validation: Measures the model’s ability to recognize new examples. To do this, some examples are discarded from the outset for testing the model. Over-fitting or testing the model on the same training data set is the primary benefit of using this method. Not arbitrarily, but rather under a specific sampling, the subdivision is carried out [13]: • Holdout Method: The data set of size n is subdivided into two parts, the first generally 60% or more: learning, and the second 40%: test. • K-Fold Cross-Validation: The data set is split into m parts, m − 1 parts for training and one for testing. This operation is repeated m times, and we obtain a recognition rate each time. In the end, we calculate these rates’ average and standard deviation to estimate the model’s performance. • Leave-One-Out Cross-Validation: Test data set is subdivided into m parts such that m = k examples, where k represents the total number of examples in the database. At each operation, the learning is done on k − 1 and the test on the remaining example. This is a particular case of cross-validation. • Parameter Revision: It is aims to expand the model with the tuning or adjustment of the limits of the SVM and to change the variant of cross-validation or the value of k in the case of k-folds cross-validation. There are many parameters of the SVM, but the most important are: • Cost: This parameter designates the optimization of the SVM to avoid misclassifying the training data. For high values of C, the optimization will choose a smallermargin hyper-plane. Equally, a minimal value of C will root the optimization to search for a larger-margin parting hyper-plane. • Gamma: With low values denoting “far” and high values denoting “near”, this parameter specifies how much a single training set can impact. • Degree: This parameter presents the kernel degree. • Epsilon: The tolerance of the termination criterion is determined by this parameter. This error rate is the absolute minimum.

A Deep Learning Model for Arabic Fake News Detection Based …

565

4 Evaluation We introduce our findings regarding this corpus and present the data sets that were used to FND in this section.

4.1 Data Sets We need a data set to train the system on in order to use its algorithms to predict the class data from the test database, just like any other ML system. Each piece of data in the raw FN data set we used and prepared was converted into a feature vector and saved in a CSV file. After that, the file is split into two sections, one for training and the other for testing and validation. Following validation, we moved on to the use stage, which entails determining the type of the newly discovered unlabelled data (true/false). We built our database because, to our knowledge and according to our research, there isn’t a publicly available data set for FN that contains enough characteristics for FN and real news. The “Getting Real about FN” database, which contains FN, and the “All the News” database, which contains real news, were combined. These databases were acquired from the Kaggle website, which hosts competitions in data science. Companies present data science problems and reward data scientists for achieving the best results. Anthony Goldbloom founded the business in 2010. 1. Getting Real About FN: Text and metadata removed from 244 online sites flagged as fraudulent by Daniel Sieradski’s BS Detector Chrome, using the webhose.io API. 12,999 SN posts are included in this database, which is organized into 20 columns of various types, such as categorical, numerical, and textual data. This database contains 2. All The News: It is a collection of text and meta-data culled from a variety of sources in newspapers. The data was extracted using Beautiful Soup and stored in SQLite. Ten columns of text and metadata, including categorical, numerical, and textual columns, make up this database. Two available sets of unlabelled data were used for pretraining our suggested model. Social media-related information is present in all data sets: the two data sets shown in Table 1 were used for all experiments. Preprocessing was performed on distinct data sets by removing links, emoji symbols, and punctuation. Using various models, including BERT, CNN, CNN-LSTM, and multilingual-BERT, they were split into two groups: 80% trained and 20% tested. For this reason, we used our model BERT as an embedding method and combined it with a CNN.

566

A. Binmahdfoudh

Table 1 Comprehensive understanding of the assessment training data sets Data set

Size (k)

Classes

Positive labels

Negative labels

Data 1

15

2

7500

7500

Data 2

5

2

5000

5000

4.2 Results Every model is accomplished with a batch size of 16 and an epoch count of 5. On the DATA1 data set, tests were also run to confirm earlier classification outcomes. Table 2 summarizes the results of the FN classification for data1. Our proposed model, BERTCNN, outperforms the CNN (62.2%), BERT-CNN (71.6%), CNN-LSTM (75.3%), and multilingual BERT, with an accuracy of 79.1%. With an accuracy of 80.6%, a precision of 83.9%, a recall metric of 80.6%, and an F1-measure of 81.1%, BERTCNN embeddings combined with CNN show state-of-the-art results and outperform other models for all evaluation procedures. The outcomes of the various proposed models on the TSAC data set are given in Table 2. Our proposed models, CNN combined with LSTM, multilingual-BERT, and BERT combined with CNN, outperform existing research results. Furthermore, we can see that the BERT-CNN-CNN model has the highest accuracy with 90.8%, compared to 89.2% for BERT-CNN, 88.7% for the multilingual BERT model, 81.6% for CNN-LSTM, 63.8% for CNN, and 51% for the BERT-CNN model. F1 metric results bear this out as well. The CNN-LSTM model had the highest performance for the precision measure, scoring 95.9%, followed by the TunBERT-CNN-CNN (91.3%), and TunBERT-CNN (90.6%), Multilingual-BERT (90.4%), CNN (36.6%), and BERT-CNN (90.4%). With a value of 90.8%, TunBERT-CNN-CNN demonstrated the best recall performance, ahead of CNN (82.3%), CNN-LSTM (75.3%), multilingual-BERT (87.7%), TunBERT-CNN (88.4%), and BERT-CNN (0%). Table 2 DATA1 results Proposed models Model

A

Model

A

P

R

F1

SVM

0.77

BERT-CNN

0.531

0

0

0

MLP

0.78

CNN

0.657

0.376

0.8123

0.510

CNN-LSTM

0.836

0.983

0.772

0.841

Multilingual BERT

0.887

0.904

0.877

0.890

BERT

0.892

0.906

0.884

0.854

BERT-CNN

0.908

0.913

0.908

0.910

A Deep Learning Model for Arabic Fake News Detection Based …

567

5 Conclusion and Future Work To find the traits and methods that will best identify FN, we present a method of FND using the SVM in this study. The field of FN, its effects, and its detection techniques were first studied. After that, we developed and implemented a solution based on the use of cleaning approaches, steaming, encoding by N-gram, a BoW, and TF-IDF for preprocessing the raw texts of the news, followed by the extraction of additional characteristics that identify FN. In the end, we built a model that allows the classification of new information by applying the BERT-CNN model to our feature database. The best state-of-the-art models can be found in our results on two well-known data sets. Future work will involve the application of BERT-CNN to a wide assortment of NLP tasks, including the proof of identity of languages, prognostication of the following sentence, and question-answering for reading and comprehension.

References 1. Maftei A, Holman A-C, Merlici I-A (2022) Using fake news as means of cyber-bullying: the link with compulsive internet use and online moral disengagement. Comput Human Behav 127:107032 2. Rohera D et al (2022) A taxonomy of fake news classification techniques: survey and implementation aspects. IEEE Access 10:30367–30394 3. Wang X, Chao F, Yu G, Zhang K (2022) Factors influencing fake news rebuttal acceptance during the COVID-19 pandemic and the moderating effect of cognitive ability. Comput Human Behav 130:107174 4. Bodaghi A, Oliveira J (2022) The theatre of fake news spreading, who plays which role? A study on real graphs of spreading on Twitter. Expert Syst Appl 189:116110 5. Zhang X, Ghorbani AA (2020) An overview of online fake news: characterization, detection, and discussion. Inf Process Manage 57(2):102025 6. Giachanou A, Ghanem B, Ríssola EA, Rosso P, Crestani F, Oberski D (2022) The impact of psycholinguistic pat- terns in discriminating between fake news spreaders and fact-checkers. Data Knowl Eng 138:101960 7. Conroy NJ, Rubin VL, Chen Y (2015) Automatic deception detection: methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1–4 8. Granik M, Mesyura V (2017) Fake news detection using Naive Bayes classifier. In: 2017 IEEE first Ukraine conference on electrical and computer engineering (UKRCON). IEEE, pp 900–903 9. Ahmed H, Traore I, Saad S (2017) Detection of online fake news using n-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments, pp 127–138 10. Maigrot C, Kijak E, Claveau V (2018) Fusion of machine learning for the detection of fake news in social media. Doc Numer 21(3):55–80 11. Ahmed S, Hinkelmann K, Corradini F (2019) Combining machine learning with knowledge engineering to detect fake news in social networks-a survey. In: Proceedings of the AAAI 2019, Spring symposium, 12, 2019 12. Aggarwal CC, Zhai CX (2012) Mining text data. Springer Science Business Media 13. Payam R, Lei T, Huan L (2009) Cross-validation. Encyclopaedia of database systems, pp 532–538

Virtual Lab Simulator for Software Engineering Experiment Report to Evaluate Student Assessment Sushama A. Deshmukh

and Geeta Tripathi

Abstract Virtual laboratory is considered one of the safe online platforms for learners to practice. Using the virtual laboratory, authentic experimentations are performed to evaluate the own ideas and acquire the results. Thus, virtual learning assists in boosting the performance of the learners by providing confidence through blended learning. Hence, this research introduces a virtual lab simulator for software engineering students to develop the quality software through the software development life cycle (SDLC) phases and the student assessment evaluation technique. While analyzing the existing virtual labs, the facility of generating real-time user performance evaluation analysis is not available. The assessment evaluation is also downloaded in PDF for processing to the National Board of Accreditation (NBA) and National Assessment and Accreditation Council (NAAC) using the proposed Virtual lab simulation. It also provides a detailed, real-time performance evaluation report of the user in downloadable form. It can be used as an educational and assessment tool in teaching–learning scenarios. Higher education students can easily understand the basic concepts of software engineering by performing through this lab simulator. The inclusion of educational tools with the assessment of student’s learning and generation of performance analysis/evaluation report of a student in graphical as well as in score format helps to visualize and monitor student progress towards achieving learning outcomes with the topic understanding as well. The method’s performance is analyzed based on the performance time and acquired 5 min 50 s for performing one task, which is slightly higher than the expected time of 2 min 22 s. Keywords Software engineering · Simulator · Virtual lab · Assessment evaluation · Simulated experiments

S. A. Deshmukh (B) Maharashtra Institute of Technology, Aurangabad, Maharashtra, India e-mail: [email protected] G. Tripathi Guru Nanak Institutions Technical Campus Hyderabad, Ibrahimpatnam, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_44

569

570

S. A. Deshmukh and G. Tripathi

1 Introduction Due to the widespread corona pandemic in the year 2019–20, the whole world realized the importance of online/e-learning almost all professions. The impact of COVID-19 affected almost every sector, the whole world. The education sector is badly affected by the pandemic due to the lockdown. According to UNESCO Monitoring [1], most educational institutions were temporarily closed, which impacts online learning. As per the status displayed on the date 20 Jan 2021 by UNESCO, the total numbers of affected learners are 320,713, 810 including Preprimary, Primary, Secondary and Tertiary in India. One article highlighted new modes of learning, new perspectives and trends that emerged from the measures taken by HEIs and educational authorities of India to continue seamless educational services during the crisis [2]. The virtual labs are often embedded with the learning management system (LMS) to provide an enhanced learning experience to the students through internet connectivity. It provides lab-based environments for the learners with affordable cost and a flexible environment that ensures the working as per the learner’s convenience [3, 4]. The virtual laboratory or remote lab is an interactive environment for performing experiments like a playground where one can learn doing mistakes with no time limitations. Virtual labs simulate the tools for testing the experiments in almost all the domains like biology, physics, biochemistry, chemistry and many other domains. The virtual lab is necessary to provide remote access to the laboratory. While considering software engineering, knowledge-building and inculcation information are employed to provide the most effective learning experience, encouraging and guiding students in learning fundamental concepts of software engineering [5]. In addition, the theoretical concepts present in software engineering are also available for students to participate passively in the lectures to make the ideas remain abstract. The remaining section of the proposed method is organized as follows: Sect. 2 details the related works, and Sect. 3 details the proposed methodology of the proposed virtual lab simulator for assessing the student’s performance. Section 4 elucidates the implementation, and Sect. 5 presents the result and discussion. Finally, Sect. 6 concludes the work.

2 Related Works As per the current situation, online educational tool demand is increasing tremendously. It is an emergency to integrate virtual labs in the process of teaching and learning; in A1-Quds Open University, they have implemented and successfully applied the system, which has resulted in the positive evaluation of UG students [6]. It is found that many virtual labs have been developed in the last few years [7]. Earlier Java Applets and Virtual Private Network connections were used to design and develop remote labs [8]. A low-cost 3D virtual lab was designed and considered

Virtual Lab Simulator for Software Engineering Experiment Report …

571

an alternative for educators and students [9]. An interactive and intelligent lab simulator is developed in [10] for physics education and assured that the virtual physics system is effective and helpful for students in comprehensive learning. Performing a virtual experiment is more effective than a real one with pulleys. To acquire a thorough knowledge of topics, scientific methods and processes, a virtual laboratory is a more efficient tool to enhance students’ learning than traditional teaching methods [11]. An instructional design model is proposed for student learning, where the author commonly explored how to design and develop online labs [12]. Animations propose active learning for learning various control concepts; without a detailed mathematical explanation, the lab can engage students in the subject through playing games, puzzles, etc. Animations are developed by java simulations [13]. Evaluating online educational software should require pedagogy effectiveness and usability [14]. The enhanced understanding of learning content and practical skills were employed using the laboratory activities. The traditional method of laboratory activities requires a higher amount of learning and is not flexible, but virtual learning provides effective learning with cost effectiveness and comfortable timing. In Saudi Arabia, the interactive lab and safety criteria were designed using the web-based platform to enhance learning capability through Virtual Science Lab (VSL). The additional feature of the VSL is repeated experimentation capability individually per the learner’s requirement [15]. Educators and developers improve the creative structure of online courses by deciding the usability criteria with pedagogical elements [16]. Using 4D methods, a STEM-based virtual lab is developed, which can improve the scientific literacy of students [17]. Virtual laboratories are used to supplement learning in physical laboratories [18]. The review emphasizes the impact of vlabs, as an adaptive learning tool and its role in knowledge transfer in a blended classroom environment. According to the study, ICT-based vlab is used as a teaching tool in online education [19]. Study and analysis of students’ performance and attitude by investigating the effects of vlabs instructional strategy in learning show the positive effect [20]. A scoring algorithm is developed for the user’s performance evaluation [21]. The virtual lab designed by [22–24] provides online support for the user to perform the simulation programs with the available resources effectively, which the physical laboratories do not provide. But, the assessment of the performance is not available with the developed virtual labs.

2.1 Research Gap Problem or scenario-based learning helps improve user performance [25]. The study of the software engineering virtual lab, designed and developed by IIT Kharagpur, motivated us to contribute for incorporating the feature of generating a performance evaluation report of an individual user for the performance. The experiment simulator is not a user interaction, as the simulator plays theory information based on video only. There is no scope to perform any interactive task within the simulator; it does not provide a real-time performance evaluation report. The performance evaluation

572

S. A. Deshmukh and G. Tripathi

report is necessary to understand the progress and can also be used as an educational assessment tool. In addition, the failure to analyze the performance [26], the failure to provide practical skills [6], and the failure to deliver the outcomes of the learning motivate to propose the student assessment evaluation tool to understand the student’s progress to achieve the objective. Hence, this research introduces an efficient student assessment tool for evaluating the students’ performance by downloading the result they acquired through the experimentation. The key contribution of the research is • Proposed Student Assessment evaluation: The proposed student assessment evaluation for software engineering by considering the virtual lab simulator to explore and learn the parameters for developing quality software through the SDLC phase, process, techniques and tools. After completing the simulation, the assessment report is downloaded in PDF form for evaluation that is not available in the existing virtual lab simulator.

3 Methodology 3.1 Design The study of many virtual labs with performance is carried out in academics. The study found that no existing vlab gives users’ performance analysis with an evaluation report. Although it is very important to understand whether the lab performance’s learning objective is achieved, it is also necessary to understand the student’s progress. Therefore, a remote or virtual lab simulator is designed for different devices to enable the user to explore and practice. The user’s performance evaluation is based on the experiment to be carried out as a detailed sequence of tasks. This virtual laboratory attempts to provide basic software engineering concepts on software process models, requirements analysis, use case diagrams, agile software development and designing a test case. Simulations are used to reinforce one’s understanding. The main focus is to enable the students to interact with the “virtual” teacher. The simulator is incorporated with problembased and scenario-based learning, which is also an effective teaching tool. In the task-designing process, every problem statement is designed using the philosophy of Bloom’s taxonomy. On average, 24 min are required to execute this paper’s first experiment presented as a sample experiment. Figure 1 shows a system diagram of a software engineering web-based lab simulator. When the userland into the simulator, the first experiment is, “Identify Appropriate Software Process Model for the Given Scenario and Comparison of Basic Software Process Models,” this experiment includes five tasks in the simulator those are, 1. Rearrange the phases of the Software Development Life Cycle in Sequential Manner, 2. Classify the Elements of Quality by User, Client & Developer,

Virtual Lab Simulator for Software Engineering Experiment Report …

573

Fig. 1 System diagram of software engineering web-based lab simulator

3. Compare Types of Software Process Models Based on advantages and disadvantages, 4. Analyze the given scenario to identify the appropriate software Process Model for the given scenario and also design that model, 5. Analyze which model is suitable for the given parameter value, such as—Requirement Availability, Budget, Development Time and Reusability of Components. After completing each task, it is mandatory to give a quiz to assess the topic understanding. For every correct attempt of the quiz, the user will get a pop-up message showing a reward/award.

3.2 System Diagram See Fig. 1.

574

S. A. Deshmukh and G. Tripathi

4 Implementation The simulation is constructed by using programming tools such as HTML, CSS and JavaScript. It includes five tasks in one simulation (topic includes Software Development Process Model (SDLC), Linear Sequential Model, Prototype Model, Evolutionary Model and different real-life scenarios). Once the user lands on the simulator, they can perform the different tasks in the experiment; also, at the end of every task, the student needs to solve the assessment questions to know the status of topic understanding, until and unless the student solves the questions correctly, won’t be able to move further. Students can use the PROCEDURE button to overview how to perform the task. Different buttons are provided, i.e., to evaluate the performed task, students can use the SUBMIT button; the user can use the RESET button to restart the task, and can move further by the NEXT button after the completion of the task and can go back with the BACK button. HINT option can be used in case students find difficulty solving the question. We have introduced a feature where students can view their performance at the end of each experiment titled as “Performance Evaluation” followed by the experiment name and also downloadable for further documentation. This will help students analyze their performance, thus getting them involved in improvisation. At the end of each experiment, the student can get a detailed analysis of performed experiment in downloadable PDF form. It would be kept for record by the teacher to assess student performance. A detailed performance evaluation report of the experiment is given in the following figures as a sample.

5 Results and Discussion The evaluation of the proposed method is based on the experimental analysis and the performance analysis, detailed in the following sections. Table 1 shows the experimental procedure for performing two tasks.

5.1 Experimental Analysis For experimenting, at the initial level, proposed lab is provided to 5 students to perform the first experiment, “Identify Appropriate Software Process Model for the Given Scenario and Comparison of Basic Software Process Models,” to understand its usefulness. According to the review of students’ comments, the simulator is very interactive; they understood where is the applicability of the concepts or topic in real life because of the problem or scenario-based tasks; students liked the concept of learning through playing games such as card deck pile game used for the task “Classify the Elements of Quality by User, Client and Developer,” also match the paired task in the simulator to “Compare Types of Software Process Models Based

Virtual Lab Simulator for Software Engineering Experiment Report …

575

Table 1 Procedure for performing two task What will the simulator do?

What will students do?

Purpose of the task

Simulator will start

Click on simulation tab

To start the simulator

The first simulator tab visualize 1. Simulator tab1 will provide 1. According to the question, the title “test case design they will select the right test techniques” in the pane; the case design button user needs to select any five test case design techniques at a time. if you try to select more than five test case design techniques, it will show the pop-up “you cannot select more than five options at a time.“ After that, click on the EVALUATION button If the answers are correct will be shown in green

Understand the different test case design techniques in software engineering

2. After selecting all five correct 2. Click on the NEXT button options, the assessment questions will be shown 3. NEXT button will direct to simulation tab 2 1. Second Simulator tab will provide the title “draw test case for the login page.”

1. According to the question, the user will select the right test case from the drop-down

Understand the test case design in software engineering

2. User needs to select the entire 2. Click on the NEXT button drop-down starting form and open the login page to enter the correct username and password 3. After selecting the entire drop, Click on the EVALUATION button to evaluate the correct answers 4. The assessment questionnaire will be shown if all the answers are correct

on Advantages and disadvantages,” to understand the difference between types of models. Furthermore, the incorporation of quizzes and game made them feel interesting with increased engagement in learning. Students could know their progress and understanding of the topic by attempting quizzes. Simulator screens for the different tasks are shown in Fig. 2.

576

S. A. Deshmukh and G. Tripathi

Fig. 2 Simulator screens

5.2 Performance Evaluation The performance of the proposed virtual lab simulator for assessing the student’s performance is evaluated based on the time, in which the analysis by considering the expected time and the performance time is depicted in Fig. 3. Figure 3a shows the acquired performance time of 25 min 00 s for performing five tasks. Likewise, the performance time by considering the 3 and 2 tasks in Fig. 3b, c, respectively.

5.3 Discussion The existing virtual simulator lab has challenges like failure in analyzing the performance, failure in providing the practical skills, and failure to deliver the learning

Virtual Lab Simulator for Software Engineering Experiment Report …

(a)

577

(b)

(c) Fig. 3 Performance evaluation for various tasks

outcomes motivated to propose the student assessment evaluation tool. These challenges are overcome by the proposed method, including the student assessment evaluation. The method’s performance is evaluated based on the performance and expected time. The experimental and performance analyses conclude that the proposed method can access the remote virtual lab for software engineering with flexible time and environment. In addition, both the learning and teaching can be performed through the proposed method and the evaluation tool assists in tracking the learner’s progress. The availability of the user’s performance in the PDT format downloading capability helps them to visit the accreditation communities like the NAAC and NBA. The analysis based on the level is evaluated based on Bloom’s taxonomy mapping. Besides, the proposed virtual lab simulator incorporates several game-oriented tasks and quizzes to make the lab interesting to engage the students in learning.

6 Conclusion The proposed virtual lab simulator with the student performance assessment criteria enables to the evaluation of the learner’s performance. The inability of the existing virtual lab simulator to evaluate the student’s performance and the downloading capability limits the simulator’s performance. Including the teaching and learning

578

S. A. Deshmukh and G. Tripathi

capabilities with the virtual practical skills empowers the learners to enhance their knowledge of experimentation and the learning process. The proposed virtual lab simulator enhances the learning process over traditional teaching–learning scenarios. The completion of each experiment by the user is evaluated based on the parameters like the Expected Time (ET), Performance Time (PT), Hints Used and Remark about the performance of each task. The student also checks understanding of the topic by attempting quizzes. Software engineering remote lab simulator proposed here can be used as an educational tool by the students and an assessment tool by the teacher. The analysis of the proposed method is evaluated based on the expected time and the performance time and acquired the values of 2 min 22 s and 05 min 50 s, respectively. The lab can be extended in the future by adding more experiments from different software engineering topics. In addition, the features like faculty interaction and doubt sessions can be arranged through a virtual laboratory platform. Also, the lab can be converted to an android application to make it more accessible.

References 1. Education: from disruption to recovery. https://en.unesco.org/covid19/educationresponse 2. COVID-19 educational disruption and response. https://en.unesco.org/news/covid-19-educat ional-disruption-and-response. Last accessed on June 2022 3. Raut NB, Gorman G (2022) Emergency transition to remote learning: DoIt@ home lab in engineering. In: Learning and teaching in higher education: Gulf perspectives, (ahead-of-print) 4. Bolkas D, Chiampi JD, Fioti J, Gaffney D (2022) First assessment results of surveying engineering labs in immersive and interactive virtual reality. J Survey Eng 148(1):04021028 5. Ng DTK (2022) Online lab design for aviation engineering students in higher education: a pilot study. In: Interactive learning environments, pp 1–18 6. Yang HH, Yin SK (2016) A study of elementary school students’ geometric reasoning using digital origami simulation tool. Int J E-Learn Educ Technol Digital Media 2(1):1–8 7. Chevalier A, Bura M, Copot C, Ionescu C, Keyser RD (2015) Development and student evaluation of an internet-based control engineering laboratory. In: Preprints of the 3rd IFAC workshop on internet based control education, pp 4–6 8. Hegedus AS (2013) Remote laboratory: a novel tool for control engineering laboratories, master thesis, Ghent University 9. Liu D, Valdiviezo-DiazPriscila P, Riofrio G, GuamánLuis LRB (2015) Integration of virtual labs into science e-learning—VARE2015. Proc Comput Sci:7595-102. https://doi.org/10.1016/ j.procs.2015.12.224 10. Myneni LS, Narayanan NH, Rebello S, Rouinfar A, Pumtambekar S. An interactive and intelligent learning system for physics education. IEEE Trans Learn Technol 6(3):228–239 11. Achuthan K, Francis SP, Diwakar S (2017) Augmented reflective learning and knowledge retention perceived among students in classrooms involving virtual laboratories. Educ Inf Technol 22:2825–2855. https://doi.org/10.1007/s10639-017-9626-x 12. Ahmed M, Hasegawa S (2014) An instructional design model and criteria for designing and developing online virtual labs. Int J Dig Inf Wirel Commun 4:355–371 13. Ramirez-Ramirez M, Ramirez JM, Fernandez-Samaca L (2013) Interactive animations for learning by playing concepts of control systems, June 2013. In: 2013 21st Mediterranean conference on conference: control & automation (MED). https://doi.org/10.1109/MED.2013. 6608779 14. Çelik S (2012) Development of usability criteria for e-learning content development software. Turkish Online J Distance Educ 13:336–345

Virtual Lab Simulator for Software Engineering Experiment Report …

579

15. Aljuhani K, Sonbul M, Althabiti M et al (2018) Creating a virtual science lab (VSL): the adoption of virtual labs in Saudi schools. Smart Learn Environ 5:16. https://doi.org/10.1186/ s40561-018-0067-9 16. Sandoval ZV (2015) The development of a usability instrument for e-learning in educational settings 17. Ismail I, Permanasari A, Setiawan W (2016) Stem virtual lab: an alternative practical media to enhance student’s scientific literacy. Jurnal Pendidikan IPA Indonesia 5:239–246 18. Altalbe A, Bergmann N (2017) The effectiveness of virtual laboratories for electrical engineering students from faculty and student perspectives 19. Radhamani R, Divakar A, Nair AA, Sivadas A, Mohan G, Nizar N, Nair B, Achuthan K, Diwakar S (2018) Virtual laboratories in biotechnology are significant educational informatics tools. In: 2018 international conference on advances in computing, communications and informatics (ICACCI), pp 1547–1551 20. Ojo OM, Owolabi OT (2020) Effects of virtual laboratory instructional strategy on secondary school students’ learning outcomes in physics practical. Int J Sci Res Publ 10:10048 21. Zafeiropoulos V, Kalles D (2016) Performance evaluation in virtual lab training, October 2016. In: Conference: the online, open and flexible higher education conference project: Onlabs 22. College of Staten Island. https://library.csi.cuny.edu/oer/virtuallabs-simulations. Last accessed on July 2022 23. Virtual Labs at Amrita Vishwa Vidyapeetham. https://vlab.amrita.edu/. Last accessed on July 2022 24. Logic Design and Computer Organization Virtual Lab. http://vlabs.iitkgp.ernet.in/coa/. Last accessed on July 2022 25. Jevremovic A, Shimic G, Veinovic M, Ristic N (2017) IP addressing: problem-based learning approach on computer networks. IEEE Trans Learn Technol 10(3):367–378 26. Jena PK (2020) Impact of Covid-19 on higher education in India. Int J Adv Educ Res (IJAER) 5

Sushama A. Deshmukh Sushama Deshmukh is an Assistant Professor in Computer Science and Engineering, at MIT, Aurangabad, India. She is awarded with Infosys—Bronze Partner Faculty. She worked as a member of peer reviewer committee Nationally for Virtual Lab development, a project by Ministry of Human Resource Development. Along with this she has guided and reviewed virtual lab by IIT Kanpur that is successfully deployed and live. She has contributed to the educational domain by developing an educational tool— Virtual Lab Simulator to perform experiments of Software Engineering and also registered copyright in Software Category for “Software Engineering Virtual Lab with the User’s Performance Evaluation Report for the Experiment: Identify Appropriate Software Process Model for the Given Scenario and Comparison of Basic Software Process Models.“ She has patents and copyrights under her credit. Along with this she has published book chapter, papers in International Journals. She is life member of Computer Society of India and Indian Society for Technical Education. Her research interests include Virtual Lab, Software Engineering, AI ML, Data Management and Security, also like to work for healthcare domain. She has guided and supervised various sponsored and research projects.

580

S. A. Deshmukh and G. Tripathi Dr. Geeta Tripathi Dr. Geeta Tripathi is currently working as a Professor in the Department of Computer Science and Engineering, GNITC, Hyderabad, TS, India. She has PhD in Computer Science Engineering in 2017 from Nagpur University, Maharashtra. She also holds Masters Diploma in Business Management and also in Computer Science and Applications. She has additional Masters, apart from ME, in Computer Applications (MCA) and Business Administration (MBA) from Nagpur University. During her teaching experience since 1997 for UG and PG, she has published papers in National and International Journals and Conferences. She also has published Patents and Copyrights under IPR. She has worked as reviewer for journal and Conferences. Her research interests include Wireless Networks, Cloud Computing, Machine Learning, IOT. She has been honored as Resource Person, Judge, Session Chair, committee member at National and International conferences, workshops, STTP and Project Competitions. She is also member of ACM and life member of IEI and ISTE.

Analyze Dark Web and Security Threats Samar Ansh

and Satwinder Singh

Abstract The deepest area of data storage where data mining and data management are not possible without the Tor (network) Policy is known as the dark web. The dark web is a paradise for government and private sponsored cybercriminals. In another word, the dark web is known as the underworld of the Internet used for sponsored and organized cybercrime. Tor network at the entry relay/guard user source IP replaced with local IP (i.e., 10.0.2.15) by default and every user machine ID (IP) recognize as local IP (10.0.2.15). A single source IP allocated for each user without collision makes the user an anomaly or invisible over the Internet. Tor browser works similar to VPN by default as a function to hide the source IP, but the advantage is Tor network’s volunteer devices are used as a tunnel to establish communication and offer freedom from surveillance of user activity. Tor browser offers a circuit (IP Route) for user activity, where the circuit allows available Tor IP at the exit relay for the user. The dark web uses the same IP at entry relay around the world, but at exit relay, IP is different and available based on country. In a dark web network, data transfer as an encapsulation of packet/massage is placed after three-layer of different encryption. Proposed six different machine-learning classifiers (Logistic Regression, Random Forest, Gradient Boosting, Ada Boosts, K-Nearest Neighbors, Decision Tree) used to the optimal solution and proceed to analyze security threats perform in the Dark web based on the communication protocol and user activity as data flow and active state. Keywords Cyber threat · Darknet · Dark web · Deep web · Tor · Project onion · VPN

S. Ansh (B) · S. Singh Department of Computer Science and Technology, Central University of Punjab, Bathinda 151401, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_45

581

582

S. Ansh and S. Singh

1 Introduction Dark web is a tool of the Internet with the most challenging and unpredictable methods found and used by cybercriminals, terrorists, and sponsored spies to achieve their goal of illicit purposes. The cybercrime that occurs within the dark web is similar to the crime in the real world. Web services, on the other hand, are major challenges for identifying criminals because of their sheer scale, unpredictable environment, and anonymity. Dark web is a tool of the Internet with the most challenging and unpredictable methods found and used by a cybercriminal, terrorists, and sponsored spies to achieve their goal of illicit purposes. The cybercrime that occurs within the dark web is similar to the crime in the real world. Web services, on the other hand, are major challenges for identifying criminals because of their sheer scale, unpredictable environment, and anonymity. The dark web’s traffic classification is crucial to categorizing real-time application execution. Analyzing Dark web traffic aids in the early identification and monitoring of malware assaults (detection) before the onslaught and harmful actions following the breakout. But now these days cybercriminals are contemporarily using VPNs to replace original information with forge for illicit purposes. Different applications (i.e., audio-stream, video-stream, browsing, chat, email, VoIP-calls, etc.) and modes are used to threaten and achieve illicit purposes. Dark web (Tor application) traffic detects 13–15% (predict) compared with chrome and Firefox application. In the name of privacy existing users are switching to the Tor browser regularly, approximately (0.5%) every year from Firefox/Chrome/Opera and IE. This is very dangerous, switching users either threatening to others else threatened by someone over the dark web.

1.1 Deep Web The word “deep web” means “behind the open or surface web and a few dark webs,” and it means “below the open or surface web and a few dark webs.” Only certain websites and article links may be searched and indexed by search engines. Websites and article links are used to rank search results based on relevance, inbound links, and keywords. Because search engines do not cache URLs (search engines scan the Internet by viewing one web page, then the links on that page, and then the links on subsequent sites), data is not returned to the user. Almost every time you search within a website, you are going to a deep Internet page. The deep web performs to detect content by links, but helps to protect user personal information and maintain privacy. The material is password locked behind a paywall and is proprietary, includes personally identifiable information (PII), or is controlled by law to restrict access (such as email accounts, tax records, payment systems, etc.).

Analyze Dark Web and Security Threats

583

1.2 Dark Web A large volume of data is not indexed by search engines (Google, Bing, Yahoo, etc.) and needs a special application or authorization to access these data. In other words information that exists on Internet (World Wide Web), but needs a special application (Tor) and authorization to access is known as the dark web. Almost 96% of un-indexed data is stored on the dark web which needs the Tor application to access.

1.3 Tor The Onion Router (Tor) is an open-source application used to access dark web networks, eventually, becoming a US option. Tor is a browser created by the Naval Research Laboratory (Onion Router). Tunnel-based network (Tor volunteer computer) is used to establish communication to “hide user’s identity (IP) and to protect user’s Internet privacy.” Tor browsers create a tunnel by volunteer device (5000) to establish secure communication (circuit) for the dark web and users become anonymous on the Internet to monitor or censor. The rest of the paper is organized as follows.

2 Literature Review Definition of the dark web is criminal threats, technical and legal challenges with unknown global network structures where techniques of detection methods, algorithms, and tools are used by criminals on the Internet. On the market of the dark web, transactions taking place must be followed to find out about criminals’ strategic methods. Multilayer structure without indexed, split of the dark web makes it very difficult to detect criminals and crime. Dark web environment much unexpected as daily the old sites continue to disappear while new sites are emerging, strong digital evidence is needed to assemble forensic legal frameworks to ensure victory barriers to arrest and criminal prosecution. Content over the Internet cannot be accessed with standard search engines. Dark web information is usually not available to it most people and is deliberately hidden on the standard Internet browser, known as the “CLEARNET.” Onion Router (abbreviated Tor) is one of the main ways to access the dark web “which connects your internet. Tracks by combining your online traffic with data from multiple servers around the world to make you invisible” [39]. There are lots of opportunities on the dark web (Tor) for malicious players to “exchange illegal assets anonymously.” The dark web is becoming increasingly valuable, particularly in terms of unlawful activities and services. Safety measures should address these issues and take steps to eliminate them. Developing technologies that

584

S. Ansh and S. Singh

can encrypt anonymity (such as the dark web and its specialized application) have put lawmakers and policymakers on the challenge of successfully battling dangerous online players [37]. The area of the dark web is a sharp and new subject in the domain of study and research, the research available on this topic is finger countable. In this section of the research paper, I review books about the dark web, black market user base online, online compliance, committed crime, and digital communities. Dark web is a worldwide network that users access pair-to-pair secure connection (Silk Road). Silk Road marketplace, accessible at “http://silkroad6ownowfk.onion”, was only accessible via the Tor application and always featured a sequence of root domains followed by “onions.” Dark web started with ARPANET; an Internet founder founded by the Pentagon in 1969. As computer interactions begin, mystery network of dark web increases the range of remote access to launch emerge ARPANET” [25]. These networks eventually become a US option. “Naval Research Laboratory,” which presents a software (web browser) called Tor (“Onion Router”). Tor application (network) “hides user IP and identity with help of tor node to tor node connection to protect American workers overseas and their opponents.” [8, 12, 34]. However, in 2014, Onion Router (Tor) software was made available for public use and Tor sites become a haven for criminals and terrorists for drug trafficking, child pornography, and terror activity. AlQahtani and El-Alfy [14] in fact, there are concerns about using the dark web because most people have access dark web for illicit intentions, some have a legitimate goal to run anonymously their business online. So why use the dark web? Generally, Dark web is used for a variety of important reasons, such as (a) protection of privacy rights to the intended citizenship and public scrutiny form, (b) opponent protection from political retaliation, (c) whistling and news leaks (private information), (d) cybercrime (fraud, cyber-attack, etc.), and (f) sales of illegal goods and contents (child pornography, private file, illegal or fraudulent software, etc.) over the dark web market [9, 14, 30]. Tor’s anonymity is based on simple and equitable mechanisms. The technique works by passing a request to the site through at least three relays, which are machines that are chosen at random to check people into the Tor network [21]. Tor is named after the execution that an extra layer of encryption includes for each computer to the signal that only it can decipher (one-way hashing). The request then travels to a computer known as an “exit relay” (exit to the Tor network), which is where the receiver believes it originated [6, 10]. This makes users anonymous because “exit relay” is probably called on behalf of masses of the various machine (users) and randomizing algorithms verb dictate which exit relay is used to communicate with the host. More than 7000 computers are working as volunteers/nodes to relay the Tor communication [3]. In effect, each request identity replaces the originator and is hidden among the many layers of the Tor. One of the main ways to access the deep web is the Tor browser “blocks your online tracks by combining your online traffic with data from multiple servers around the world to make you invisible.” [39]. As computer interactions begin to grow, “the number of private networks is beginning to emerge on the side of the ARPANET.” [25].

Analyze Dark Web and Security Threats

585

Anonymous access social networks—often referred to as dark web—are becoming popular with criminals smuggling drugs, illegal weapons trading, fake IDs, ID theft, and child pornography. In another word, the dark web is a completely illegal online marketplace that has been set up offering all kinds of illegal services driven by hidden and anonymous money laundering. To establish this anonymity, communication is done by using special anonymous networks such as I2p and Tor browsers [15, 39]. In addition to this, legal and technological issues, as well as illicit usage, will be investigated further. The deep web is on the search for anonymity, which is an important aspect of the deep web’s corrupt and uncontrolled character. Despite the literature review of a variety of contexts, its elements are destructive, adverse, and highly susceptive to a variety of illegal activities, and they need to be policed or stopped, due to lack of control and weak regulations [35]. An early challenge for the dark web Internet was that it became difficult to discover hidden web portals. The Hidden Wiki delivered the first wave of customers in 2004 [“Darknet Markets Are Not beyond the Reach of Law 2016”]. This site offers a list of all presently operational dark web portals (websites), as well as user comments and information on what can be accessed through each website. Another option is to use Tor protocol and specific search engines, i.e., “Ahmia,” which crawls any hidden site, and grams, which is particularly designed to locate hidden sites providing criminal goods and services, such as counterfeit money, narcotics, and firearms [6]. To perform real-world transactions, dark web markets adopted Bitcoin, pseudonymous money that is difficult to trace as Tor. Bitcoin has become the darknet’s first official and standard currency. Bitcoin and others are virtual currencies that are uninsured and variable, first introduced in 2009. They are kept in digital wallets that are encrypted. Designs of bitcoins are very tough to identify back to the individual that spent them [7]. Log files maintain the record of each transaction, but only mention the wallet ID, not the identity of the buyer or seller allows you to purchase bitcoins, a consumer ought to log into a bitcoin change, inclusive of the famous “Mt. Gox,” where buyers and sellers alternate exchange local currencies for bitcoins [40]. Bitcoins may also be mined by volunteering a computer’s CPU time to solve challenging arithmetic challenges. While bitcoin is the money of choice on the dark web today, “Zerocoin” is a cryptocurrency in the stage of development that will be even more anonymous than the transactions of Bitcoin [3]. The dark web changed into advanced in small steps, and it became now not designed to be what it’s far these days. Tor’s developer on the NRL wanted an easy manner for army employees to communicate overseas. Hidden Wiki’s founders develop an index for common users to higher apprehend and browse the content of the dark web. Bitcoin was introduced to facilitate paying anonymously. Developers of this technology have a vision of privacy, now not ill-intentioned, however, their intentions have not stopped unlawful interest from blossoming inside the shadows developed via the Tor network.

586

S. Ansh and S. Singh

Most Tor users, after all, are simply looking for anonymity and may be utilizing Tor for genuine purposes. Only 1.5 percent of Tor users browse the dark web content, even though it generates a lot of traffic [38]. The problem is the inspection of the dark web and Tor are virtually only. It is not possible to make an application or tool that maintains users nameless while additionally tracking their interests to ensure that they may be now not getting access to unlawful websites. Tor’s developer would love to think that the browser especially includes the site visitors of newshounds valiantly writing stories from countries where there are no laws protecting loose speech, however that is not always the case. For most visitors to hidden dark web websites, the usage of the Tor application is for accessing and distributing photographs of infant abuse and placing orders for unlawful drugs. Toddlers abuse money owed for the most important set of dark web network traffic. “Dr. Gareth Owen and Nick Savage, researchers at the college of Portsmouth,” carried out a 6-month examination that discover hidden services and the facilities of Tor. They concluded that over 80% of Tor traffic queries to hidden services over sites seen in the investigation were routed toward recognized child exploitation websites [36]. They did renowned that these records may not be a wonderfully correct illustration, on account that authorities’ organizations regularly use computers to automatically get entry to a web portal containing pix of toddler abuse as a part of their research. It is nearly hard to tell how much of the 80 percent is due to police action and how much is due to traffic generated by a human at a computer. Although half of the child abuse visitors detected have been police pastimes, a lot of personal visitors remain on the dark web network targeting toddler abuse websites. The fact is images of child abuse are not isolated over dark web networks. In 2014, as part of an attempt to reduce the spread of child abuse content, the Internet Watch Foundation conducted research. Only 51 (approximately 0.2%) of the 31,266 web links featuring content (photos, video) of child abuse were on the portal of the dark web. (There is no method or technique available to search all dark websites, and as a result, the data may be skewed to appear like a decrease-than-accurate percent of web portals containing pictures of child exploit which might be on the dark web.) [3]. This graph represents that, while the dark web facilitates cybercrime, it is not the most convenient way to do it. The drug trade is the most generally connected topic with the dark web, and it is a key component of dark web markets. In reality, a huge percentage of dark web networks and hidden web portals is represented in “Dr. Gareth Owen’s” examination of dark web surfing conduct [36]. These websites are genuinely less complicated for enforcement officials to infiltrate because the officials are capable of higher hiding their identity once they cross undercover. Moreover, the medicine must be bodily brought, which leaves a window open for classic policing to understand the dealers. One of the most important and most notorious darkish Internet marketplaces became Silk Avenue. It was developed in 2011 by “Ross William Ulbricht” who concealed beneath the alias of “Dread Pirate Roberts” (DPR) [6]. It is far expected that DPR obtained commissions of over $13 million from permitting carriers to apply his Silk Road platform. In October 2013, the FBI closed down

Analyze Dark Web and Security Threats

587

Silk Road. In the investigation by the FBI, they decided that more than $1.2 billion in income had come about concerning 150,000 clients and 4000 companies [27]. Those astounding numbers display the size of illegal change on the dark web. Ulbricht turned into tried and sentenced to existence in jail in the might also of 2015 [6]. Silk Street’s closure in 2013 is no longer the end of dark web markets. In reality, several more sprang up to take Silk Avenue’s place. These sites are used for more than just selling narcotics. They sell something that businesses need to put on the Internet. Just a couple of weeks after the 2013 credit card credential breach at target, dark web markets have been selling stolen credit score cards at a charge of $20 to $one hundred in keeping with the card [6]. There is a clear call for a black market online, so it is not a problem that it will use on its very own. Governments must work to approve policy so that they will deal with the difficulties placed forth with the aid of the Tor network in an attentive and willing reminder. When FBI techniques were less than effective, the demise of Silk Road became a story. Although the operator, Ross Ulbricht, was apprehended, the darknet market for illicit items has flourished since the shutdown of Silk Road in October 2013 by the FBI. The market was once located along Silk Road in the same region but is now more diverse. There is a Reddit Black Market, the list of reliable sites is significantly broader than the trusted list, and the market catalog is being updated to help users know their credibility status [32]. After the shutdown of Silk Road, users flocked to an earlier unknown location called the “Sheep Marketplace.” This web portal governed the market of the dark web until a trader exploited the risk and stole $6 million worth of bitcoins [32]. Silk Road 2.0 was introduced by the previous management of the first Silk Road on 6 November 2013. This happened just one month after the closure of Silk’s first road. Service of Silk Road 2.0 was a very short duration. It was hacked and stole $2.7 million from users’ bitcoins by a merchant in February 2014 [32]. There was an internal route of the online marketplace on the dark web that dealt with illegal drugs and weapons, etc. The Federal Bureau of Investigation (FBI) shuts down the Silk Road web portal in 2013. But web portals like the “mythical Hydra,” resurfaced like Silk Road 2.0 mid-month. FBI fails to detect the new Silk Road immediately and it took a couple of years to track down its administrator and servers [17]. Till the day Silk Road become the first choice for criminals and not going to end. Since May 2016 dark net market was operational and is considered the most powerful “Black Markets Are Not Over Legal Access 2016.” [11]. Therefore, when the government demolished Silk Road, the operation was not a complete success, as it did little to deter others from creating new dark web markets, nor did it prosecute retailers or customers for their transactions place. As “Eric Jardine” explores and directs, these crimes over the dark web can be downplayed, but other programs appear and simply replace their place [33].

588

S. Ansh and S. Singh

Two distinct black online hobby circuits can be labeled rational as totally useful but with commendable features: whistling and hacktivism. Whistleblowing is an important part of what keeps democracy under test, however, it can potentially disclose strategies of government and resources if they are not eliminated through formal channels. The black Internet was used by whistleblowers including “Chelsea Manning, Julian Assange, and Edward Snowden” to reveal the secrets of the authorities [27]. If the media had used the official, congressional approach to make their grievances, the issues would have been addressed without the distribution of publicly divided records. Hacktivism is another non-black and white matter while the wishes of other activators may be controversial, their methods are always annoying and most impactable, but illegal. For example, in October 2011, an anonymous hacktivist organization crashed into a web hosting provider known as Freedom Web hosting, using a paid Denial of Service (DDoS) attack. They did this with the help of tracking signatures of web portals that abuse children. This Freedom website is also hosted on the server. Criminals also stole the details of 1,500 customers from the city of Lolita, a website that abuses children, and hacked them online [6]. Removing the intention to remove the child abuse content from the web portal is honorable. However, vigilante justice has no place online since culprits cannot be held accountable. In 2001 remarked that cybercrime is not much different from real-world crime and affection—it has just been killed in a new way with a virtual weapon [18]. If Internet Protocol (IP) addresses cannot be identified, anonymity on the Internet is guaranteed. The client software of Tor delivers Internet traffic via a global hidden network of the tunnel to hide user information and avoids any monitoring activities. This feature of the dark web offers more suitable for cybercriminals, who are always trying to hide their tracks [2, 23]. A well-known preferred channel for governments to exchange documents privately is dark web so that journalists can pass through several regional surveys and opposition to avoid controlling authoritarian regimes [20]. The Onion Router is a way of communicating technique anonymously with a computer network. Messages are encrypted and routed via numerous network nodes known as onion routers. Each onion root, like an onion peel, peels the encryption layer to disclose the route instructions and transmits a message (signal) to the presequenced next route, where the same steps are repeated till delivery to the destination. This process prevents central nodes from knowing the source, location, and content of a message [25]. It is reasonable to believe that specialized sites facilitate the exchange of both modes physical and private information i.e., login credentials to access paid web pornographic sites, as well as PayPal credentials [26]. The Assassination Market’s website offers to bet that predicts when a group can bet on the death of a particular person, and collect payments if the date is “correctly” estimated [1]. This encourages murder since the murderer may benefit by placing an exact wager at the time of the suspect’s death because he or she knows when the event will take place. Because the charge is to know the date rather than to commit an act of murder, it

Analyze Dark Web and Security Threats

589

is very difficult to give a criminal charge for the murder. White Wolves and C’thuthlu are two well-known websites to hire assassins. White Wolves and C’thuthlu are two well-known websites to hire assassins [21, 24]. Various types of firearms are offered by the “Euro arms” web portal that mostly delivers to customers’ doors around Europe. The characters of these weapons are sold separately and the website should be recognized separately on the dark web [4, 5, 22]. On the dark web, pedophilia, often known as CP (child pornography), is freely available. A specific law permits pornography on the website. The dark web offers a variety of web portals and forums for clients who wish to engage in pedophilia [21]. Like any other invented technology, anonymity may be used for both good and harm. The vast majority of individuals do not want their online persona to be linked to their offline persona. They may fear political or economic retaliation, persecution, or even death. Instead of using their own identities to defend themselves, many individuals prefer to talk in fictitious or anonymous words. The following are a few examples of when people use Tor’s online anonymity [25, 29]. Monitoring the dark web will continue to be difficult due to its structure and nature. Efforts to address it should concentrate on the topics listed below [13, 16, 19]. Europol authorities have raised worry that bitcoins are increasingly being used in illicit operations. Since its beginning in 2014, DDoS “4” Bitcoin (DD4BC), a cybercrime squad named after the Distributed Denial of Service (DDoS) assault, has hit more than 140 firms. This inspires other gangs, which leads to cyber extortion. According to Europol, the DD4BC organization began threatening victims through email with DDoS assaults until a bitcoin ransom could not be paid. The emergence of Bitcoin coincided with the growth of cyber-terrorists on the dark web [28]. Crawlers/spiders are designed to collect dark web content. Spiders get access to password-protected websites and download files at random. Spiders are trained to download all of the sites “pdf, Word, HTML, PHP, CGI, links, images, ASP files, videos, and audios.” The forum capture program is about 15 and their format is determined by the spidering forum utility. A comprehensive forum comprises topics, authors, threads, postings, and timeline tags that allow members’ interactions to be rebuilt. Occasional spidering forums and growing updates are perfected based on research requirements. Using computer-assisted language approaches, the forum’s content is gathered and analyzed in English, Arabic, French, Chinese, and Spanish. Specific multimedia collection techniques and files from websites and websites and spidering have been developed [31].

3 Problem Statement As name dark web/deep web has a mysterious secret over the Internet. There is finger countable research paper available. The technique used for committing a crime by criminals is due to the Tor browser hiding the location and identity of the user. This article will help to detect and characterize applications used over the Tor browser and

590

S. Ansh and S. Singh

VPN application. In 2016 research was conducted by CIC to detect and characterize applications used over the Tor browser and VPN. In the past 2 years, activity over dark web has increased by approximately 300%, and recently in 2020 more than 22 billion new records were added to the dark network. In the dark web forum, an estimated 10% of users post resources as cybercrime sellers, and an estimated 90% of posts from buyers looking to contract. This research work proposes for novel techniques to detect, analyze, and characterize VPN and Tor applications together as the real representative of dark web and VPN traffic, public datasets namely—CICDarknet2020. Tor and VPN traffic are covered in this dataset, respectively, with details of the user application. Another side we do not know active users over the dark web and the mode of the operand (application) for illicit activity. A major part of this paper is to collect information about the following: a. Traffic Load on dark web (Tor browser), b. Traffic redirect with VPN service, and c. Applications are used with non-Tor browser and non-VPN. The article covers a wide area of dark web activity and security threats for a security provider.

4 Objectives Dark web is designed and developed with the idea of invisibility over the Internet to spy on the user of the Internet by Gov. of USA is still a mystery for other than US Gov.’s law enforcement agencies where cybercriminals perform illicit activity. The objectives of this dissertation are: a. To collect the peer reviews of a research paper from various security platforms. b. To extract the features from a peer-review dataset of scientific papers. c. To analyze and signalize the dark web traffic and activity using Deep Image Learning.

5 Methodology Dataset name “CICDarknet2020” is a combination of ISCXVPN2016 and ISCXTor2017 and is available in the public domain for dark web traffic from “Canadian Institute for Cyber Security.” The dataset consists (sample) of 141,530 instances where 46,782 instances are marked as VPN, 1392 instances are marked as TOR, and the rest of the instances are labeled as Non-TOR/Non-VPN (Open Internet). The dataset offers 85 features including source and destination IP with port numbers that recognize the flow of packets with a volume of data. “CICDarknet2020” dataset has been processed to get mature data in the desired format, then it is divided into

Analyze Dark Web and Security Threats

591

two parts testing (20%) and training (80%). The data was gathered and prepared for analysis and characterization to detect service (communication) based on source and destination IP. Those 86 attributes of the prepared dataset can be divided into five groups: (a) (b) (c) (d) (e)

Attributes based on TOR. Attributes based on VPN. Attributes based on NON-VPN/NON-Tor. Attributes based on Source and destination IP. Attributes based on the protocol.

The first group (Tor) is on non-routable anonymous IP-based communication with P2P communication for users. Tor IP can be detected as the IP of the source or destination. The fifth group performs to detect and analyze the type of application running on port no. with protocol over communication. The instances of Tor (dark web) are the lowest in the dataset. The group of NON-VPN/NON-Tor (Open Internet) has a larger no. of instances in the dataset.

6 Machine-Learning Models Machine-learning models are classified as either supervised or unsupervised. The model is separated into two categories if it is supervised: regression and classification. We will go with various models (Logistic Regression, Random Forest, Gradient Boosting, Ada Boosts, K-Nearest Neighbors, Decision Tree) of machines to signify and categorize in the sections. The detection accuracy of these models will be evaluated and the best among them will be considered. Smote technique is used to balance data to improve the minority set of the dark web user in the dataset.

7 Results and Discussion This portion outline the findings of the Decision Tree model creation utilizing a variety of machine-learning methodologies to fulfill desired solution and found model of BAG-DT is the best-optimized result. Dataset “CICDarknet2020” processed and found actual user of Tor (dark web) is 9% of sample 141,530 users where a non-Tor user is on top and runner with 61%, as shown in Fig. 1 and application shared over the Tor network is shown in Fig. 4. Each type of communication used TCP to establish communication and deliver data safely. The % of TCP protocol is higher compares to UDP, but in VPN and nonVPN UDP protocol is widely used to transfer (Download) data as shown in Figs. 2, 3 and 4. Tor (dark web) is known for the illicit market (Illegal Software and Data Selling). Analyzing and comparing uses of file transfer protocol (UDP) based on available

592

S. Ansh and S. Singh

Fig. 1 All internet traffic

Fig. 2 Applied protocol

Fig. 3 UDP protocol

Dataset “CICDarknet2020” and surprise non-VPN user proceed with 48% and dark web used UPD just 1% of total File Transfer activity as shown in Fig. 3. Analyze data display Tor browser user access TCP, UDP, and HOPOPT protocol as shown in Table 1. Analyzed dataset of dark web to understand and explain the activity of over Tor network. There are eight different types of activity found with multiple available

Analyze Dark Web and Security Threats

593

Fig. 4 Dark web users’ activity

Table 1 Protocol applied over the dark web

TCP

UDP (%)

HOPOPT (%)

93

5

2

applications that can run over the dark web. Dark web users access VOIP service (21%) mostly and that is why VOIP is the highest traffic and Internet surfing in on 2nd position over the Tor network. Even the Onion (Tor) network is surveillancefree, but the traffic of email is lowest (1%) within Tor. The lowest traffic of email and highest traffic of VOIP show that the Tor network is not trustable for their users and not going to share storable data (Email 1% and chat 5%) over the dark web network. Based on the Tor activity report, we pretend cybercriminals are browsing illegal content and avoiding storage communication to avoid any trail and footprint.

8 Conclusions Precisely, this paper includes a broad definition of dark web criminal threats, unknown network architecture, detecting methods, algorithms, and tools to provide technical and forensic hurdles and techniques used by finding crime and criminals on the dark web. Cybercriminals have instant intelligence mandatory ways to see yourself within dark web, but their no. of active users is 1.4% only. As a result, the stakes are tremendous. Lastly law and security an international border agency is one of the biggest obstacles to function. Hidden web size requires more effective ways to reduce potential threats dark web. Black market and transactions taking place tracks must be tracked to discover offenders using modern procedures. Unindexed, separated from multilayer the formation of the dark web makes it difficult to see changes. The dark web ecosystem is much unexpected as daily the old sites keep disappearing as new sites emerge, strong digital evidence is needed to be gathered in law enforcement agencies to ensure victory obstacles to arresting and prosecuting criminals.

594

S. Ansh and S. Singh

This study proposes, models, implements, evaluates, and reports on an efficient autonomous dark web traffic detection system (DTDS). The proposed system characterizes the performance of supervised machine-learning techniques, including decision tree ensembles models that were evaluated on a modern and inclusive dataset (CIC-Darknet-2020) involving a large number of captured cyber-attacks and available services provided by the dark web organized into four classes (VPN, TOR, non-VPN, non-TOR).

References 1. Abbasi, Chen H (2007) Affect intensity analysis of dark web forums, presented at the IEEE intelligent of security information, May 2007 2. Akhoondi M, Yu C, Madhyastha HV (2012) LASTor: a low-latency AS-aware tor client, presented at the IEEE symposium of security privacy, May 2012 3. Clemmitt M (2016) The dark web. Accessed 30 Aug 2016. http://library.cqpress.com/cqrese archer/document.php?id=cqresrre2016011500 4. Alipoaie A, Shortis P (2015) From dealer to doorstep—how drugs are sold on the dark net, GDPO situation analysis, Swansea University, Global Drugs Policy Observatory, Swansea, U.K., Technical Report 5. Darknet Markets Are Not beyond the Reach of Law (2016) Accessed 30 Aug 2016. https://dar kwebnews.com/darknet-markets/darknet-not-beyond-law/ 6. Finklea K (2015) Dark web. Accessed 30 Aug 2016. https://www.fas.org/sgp/crs/misc/R44 101.pdf 7. Biddle P, England P, Peinado M, Willman B (2003) The Darknet and the future of content protection. In: Feigenbaum J (eds) Digital rights management. DRM 2002. Lecture notes in computer science, vol 2696. Springer, Berlin, Heidelberg 8. https://anshchoudhary.wordpress.com/2017/03/14/therise-and-challenge-of-dark-net-drugmarkets/. 9. Bischoff P (2018) how to access the deep web and darknet. https://www.comparitech.com/ blog/vpn-privacy/ 10. Butler S (2018) Dark web history. https://www.technadu.com/dark-web-history/52017/ 11. Jaishankar K (2016) Int J Cyber Criminol (IJCC) 10 (1):40–61. ISSN: 0973-5089, Jan–June 2016. https://doi.org/10.5281/zenodo.58521 12. Rathod (2017) Darknet forensics. Int J Emerg Trends Technol Comput Sci (IJETTCS) 6(4):077079. ISSN 2278-6856 13. Senker (2016) Cybercrime & the dark net: revealing the hidden underworld of the internet. Arcturus Publishing, London. ISBN 9781784285555 14. AlQahtani AA, El-Alfy E-S-M (2015) Anonymous connections based on onion routing: review and a visualization tool. Proc Comput Sci 52:121128 15. Arash H, Lashkari G, Kaur A, Rahali A. https://ieeexplore.ieee.org/document/9251210 16. Darknet traffic big-data analysis and network management to real-time automating the malicious intent detection process by a weight agnostic neural networks framework 17. Mac R (2014) Feds shutter illegal drug marketplace silk road 2.0, arrest 26-year-old San Francisco programmer. Forbes, Nov 6 18. Grabosky P (2001) Virtual criminality: old wine in new bottles? Soc Leg Stud 10:243–249 19. Ciancaglini V, Balduzzi M, Goncharov M, McArdle R (2013) Deepweb and cybercrime: it’s not all about TOR. Trend micro research paper. October 20. Gehl RW (2014) Power/freedom on the dark web: a digital ethnography of the dark web social network. New Media & Society, Oct 15. http://nms.sagepub.com/content/early/2014/10/16/ 1461444814554900.full#ref-38

Analyze Dark Web and Security Threats

595

21. Greenberg A (2013) Meet the ‘assassination market’ creator who’s crowd funding murder with bitcoins.” Forbes, Nov 18 22. Love D (2013) There’s a secret internet for drug dealers, assassins, and pedophiles. Business Insider, Mar 6 23. Paganini P (2012) The good and the bad of the deep web. Security Affairs, Sept 17 24. Pocock Z (2014) How to navigate the deep web. Critic 03 25. Tor Project (2014) Tor: overview. www.torproject.org/about/overview.html.en 26. Westin K (2014) Stolen credit cards and the black market: how the deep web underground economy works. LinkedIn, Aug 22 27. Rudesill DS, Caverlee J, Sui D (2015) The deep web and the darknet 28. PGP encryption [online]. Retrieved Dec 16, 2019, form https://www.technadu.com/pgpencryp tion-dark-web/57005/ 29. J Comput Commun 7:30–43 (2019). ISSN Online: 2327-5227 30. Chen H, Yang C (2008) Intelligence and security informatics. Springer, Berlin 31. Willard P, Bellamy C (2012) Principles of methodology. SAGE, London 32. Swearingen J (2014) A year after death of silk road, darknet markets are booming. Accessed 30 Aug 2016. https://finance.yahoo.com/news/death-silk-road-darknet-markets-142500702.html 33. Jardine E (2015) The dark web dilemma: tor, anonymity and online policing. Accessed 4 Dec 2016. https://www.cigionline.org/sites/default/files/no.21.pdf 34. Godawatte K, Raza M, Murtaza M, Saeed A (2019, Dec 5–7). Dark web along with the dark web marketing and surveillance [paper presentation]. PDCAT 2019: gold coast, Australia 35. González P, Guitton C (eds) (2013) Fingerprinting tor. Inf Manage Comput Secur:73–90 36. Owen G, Savage N (2015) The tor dark net. Accessed 13 Dec 2016. https://www.ourinternet. org/sites/default/files/publications/no20_0.pdf 37. Beshiri A, Susuri A (2019) Dark web and its impact in online anonymity and privacy: a critical analysis and review. J Comput Commun 7:30–43. https://doi.org/10.4236/jcc.2019.73004 38. Ward M (2014) Tor’s most visited hidden sites host child abuse images. Accessed 30 Aug 2016. http://www.bbc.com/news/technology-30637010 39. Hodson (2014). Invisible internet. https://www.sciencedirect.com/science/article/pii/S02624 07914605935 40. Yellin, Pagliery, Aratari. Darknet market. https://doi.org/10.1080/23738871.2017.1298643

Deep Convolutional Neural Networks-Based Market Strategy for Early-Stage Product Development Gladson Maria Britto James, Abolfazl Mehbodniya, Anto Bennet Maria, Julian L. Webber, D. Stalin David, Rajasekar Rangasamy, and Sudhakar Sengan Abstract New Product Development (NPD) is to stay competitive; it is seen as one of the finest techniques to cope with quick changes in the goods confronting the market. Researchers have adjusted the measurements to become some NPD so that research variables may impact the firm, but they have not attempted to measure. The measuring component is crucial to give proper control over product development operations. It will employ a web-based tool for monitoring the growth of NPD skills in the field area. If the organization has identified an essential mastering function, G. M. B. James Department of Computer Science and Engineering, Malla Reddy College of Engineering, Secunderabad, Telangana 500100, India e-mail: [email protected] A. Mehbodniya · J. L. Webber Department of Electronics and Communication Engineering, Kuwait College of Science And Technology (KCST), Doha, Kuwait e-mail: [email protected] J. L. Webber e-mail: [email protected] A. B. Maria Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu 600062, India e-mail: [email protected] D. Stalin David Department of Information Technology, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamil Nadu 600072, India e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru, Karnataka 561203, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_46

597

598

G. M. B. James et al.

you may create long-term objectives. To suggest the networks are employed in two phases, the second Neural Network (NN) model output is the value of the capacity level as NPD. Test and train data is utilized to select the NN. One technique to cope with uncertainty and risk in a NPD process is understanding the success of NPD in Critical Success Factors (CSFs). The relationship between the selection of key metrics and the accumulation of factors contributing to success is necessary in order to train a neural network to accurately predict the results of an NPD. Entering the data is acquired by a corporation in the electronics sector. Companies have adopted a new product development process and do not have a new process; however, these items may be changed. The findings suggest that NN technology is strong enough to predict successful NPD in the electronics sector. Keywords Deep learning · Critical success factors · Neural network · New product development · Test and train data

1 Introduction NPD is fundamental to achieving mechanical endeavors. To improve the seriousness of endeavors, to give another item called the client’s solicitation to meet at the opportune time, you can build the benefit. To stay aware of the fast changes in the market, keep on growing new items. Although the vulnerability measure identified an unpredictable cycle that devours cash and a ton of time, NPD is a disappointment of the misfortune caused on the business for the organization, around 33%, that the entirety of the new mechanical items has been assessed to have fizzled. It will be hard to foresee the accomplishment of the NPD, some of which are emotional and relies upon numerous elements. Indeed, the issue is exacerbated by the absence of information. The NN has become a helpful apparatus for taking care of a portion of the issues. It has been utilized to comprehend the inadequate or unsure information and the known and obscure example issues. Innovation, design acknowledgment, climate demonstration, and forecast have been utilized in various ways, such as expectation and grouping issues [1]. By gaining accessible information on every one of the issues, you can tackle these problems. NN investigates the connection between the presentation of the achievement factors and the result of the NPD. The model depends on the different parts of the balanced scorecard, and it will be utilized to foresee the achievement of new item advancement. Thus, the general rules for organizations need to anticipate the probability of accomplishing their new item. In the subsequent technique, such as speculation pay, while estimating capacities, for example, quality and return, the principal strategy is immediate, to gauge the basic information, by reviews and contextual investigations work, association aptitudes tend to be developed. There is an outcome with all the earmarks being identified with the eye.

Deep Convolutional Neural Networks-Based Market Strategy …

599

Many of the analysts of the subsequent technique kindly attempt to check the capacity in a backhanded manner to evaluate the capacity. Mansour utilizes the technique for bunch conversation to utilize measurable investigations. Item improvement work was resolved and investigated using the systematic pecking order measure. NPD should not solely focus on concept, strategy, and cycle, but also on the progress of the mass producing cycle at the time of item use, the improvement of reconfiguring sales and development updates, the development of the organization’s structure of up-and-coming items, and the suggestion of an uninterrupted improvement model at the stage. Products were utilizing the cycle of fluffy examination chain of importance, the estimation cycle, and the capacity of the board innovation. In these techniques, given the weight computation of the board capacity, it probably will not have the option to assist you with overseeing the manufacture of a specific control to the firm NPD capacity [2]. To handle together with another proportion of time from the time and enough multifaceted nature, it should be adaptable to adjust to the practical limit. The estimation strategy is easy to use; to refresh the general significance of different elements, it must have the option to store the past information. The examination has zeroed in on the gadgets business as the principle mechanical area of Thailand. The information in one of the creators of the achievement components of the NPD is produced using past work. Be that as it may, to partake in one of the new item advancement and item improvement measures, individuals have had the option to re-plan items: information is partitioned into two classifications [3]. The organization of the paper in Sect. 1 is the introduction NPD is fundamental to achieving mechanical endeavors. Section 2 is related works. Section 3 is the proposed method that has been included. Section 4 is the result and discussion about the different correct proportions prediction on success and MSE. Section 5 is the conclusion and future work of the research work.

2 Related Work The design and development of new products are a complex process with various steps. In order to increase the likelihood of success, it is, in the structure of the NPD process, has been proposed by many experts and researchers [4]. Most models are shown as a reflection of the linear process of incorrectly actual application. Due to the complexity of today’s products, the continuous NPD process will not be able to correspond to the length of the rapid market changes [5]. In order to solve problems, the emergence of concurrent engineering has brought together people from different fields to the team. Instead, the sequential processing, as shown in the figure, ends NPD activities simultaneously or in duplicate. In particular, it will be taken into account in the product design process of the manufacturing process. By cooperation among team members of the NPD, the sale, and the new product, you can improve the lead time for the other dimensions, such as quality and cost. Success and failure of the NPD are, you can find a large amount of literature. Some

600

G. M. B. James et al.

of the success factors have been found in different contexts: business attitude, NPD activities, and organizational change and implementation method. For example, a study of concurrent engineering in the Hong Kong electronics industry has concluded four significant factors. Using the benefits and costs as a measure of the success of CE, attitude management is considered the most crucial factor for the success of the NPD [6]. Another study on product innovation management was conducted through interviews with eight companies participating in NPD activities with senior management. The conclusion is that common factors for NPD success focus on a clear and clear vision, good leadership, customer needs, collaboration between team members, NPD, and effectiveness between key customers and suppliers [7]. It means that it is good communication. In addition to the study of NPD success, the causes of failure have been studied. Some studies claim that NPD projects alone are percent successful. Based on the lessons learned from the failure of 15 projects from seven South Korean carriers, some say that the main factors leading to NPD failure are invalid market research, accurate forecasts of product demand, and customers. We concluded that it was an inadequate response to the request. Financial, customer, internal, and learning and growth: To measure the NPD’s success, 21 pieces of performance indicators have been set following the four sides of the balanced scorecard. With particular emphasis on the electronics industry, a set of data is investigated in previous studies and extracted from the industry. Be solved in the following chapter, which has been used in the subsequent analysis. One common pattern recognition, recognition, classification, and used to solve the problem of control systems is Intelligent Technology NN. It is based on the human brain model, defined as relationship inference. NN is by repeating the adjustment of their weight, and you learn to solve the problem [8]. NN consists connect neurons to them via a link having a weight value associated with it, respectively. Each neuron computes a weighted sum of the input link and compares the results with the threshold value. It depends on the activation function using the output of each neuron. Step or hard limit function, linear function, and a sigmoid function are used as an activation function of the general was NN. The function of symbols and procedures is used for most classification and pattern recognition problems. A linear function has been used for the linear estimation problem. Sigmoid function has been used in the backpropagation problem. The amount of data is primarily limited to many uses. To improve network generalization, the available data should be separated into three groups, training, validation, and test sets. Training sets are used for network learning. The network’s ability is dependent on several factors: Network structure, amount and model input quality, data conversion, and model validation are not related to the problem. Network architecture includes the number of hidden layers of nodes used in each layer. If the number is insufficient, the network is learning, and it may not be able to solve the problem. If the number is too large, it may require more computation time, and the network might not be able to generalize to new data [9].

Deep Convolutional Neural Networks-Based Market Strategy …

601

3 Material Method The proposed method has been included. Network input variables, the design of the NN structure, training, the data you enter for the test and verification, and selection obtained by the company in the electronics industry: financial, customer, internal, and learning and growth [10]. It consists of the performance indicators of the four areas, 21 following the success factors, and the balanced scorecard of 36 of the key in the NPD. Figure 1 describes a NN as an example; for example, we have been trained with the appropriate data in the case industry. The selection of data required for training is significant for the NN. The amount of training and test data is also essential, and the result obtained determines the accuracy. Used to train the NN, it will be used to obtain the industrial data and results. The utility function method and the fuzzy analytic hierarchy process are used to calculate the company’s capacity [11–13]. Step 1. Step 2. Step 3. Step 4. Step 5. Step 6.

Start The input image (starting point) Layer cycle (cycle operation) Pool layer (pool) An artificial NN is the input layer (flattening) End

They are, by measuring the ability of the function at the Micro-Level (M-L), in order to calculate the ability score for the function at the M-L, tousled the data collected from records of the company. They are using multiple regression analysis in order to integrate these functional capabilities into the NPD capabilities of the company. The train is carried out in two stages. The first step includes calculating the capacity scores of NPD M-L at the macro-level at the second level unit. In the first stage, you can develop each respective for network R&D and collaboration 16, seven internal and outsourced functions [14, 15].

Fig. 1 Training and testing of NNs

602

G. M. B. James et al.

Fig. 2 Analysis of the functions and variables

3.1 Analysis Function and Variables in NPD Measures for each function can be obtained by using the survey method. The questionnaire, which involved outsourcing to NPD and R & D Manager (top-level), was sent to many manufacturing companies in collaboration with India. A response came from 40 six companies, identification of development functions in each product was based on the response functions and variables. The function selected at the macro-level is the internal function, commissioned functions, R & D capabilities, and collaboration features. Figure 2 describes the R & D function; in other places, radical, innovative products through a formal agreement, collaboration capabilities, the introduction of technology, and the development of new technologies related to the development may be required. Some product development activities differ from other companies in hiring specialized technical knowledge and hiring other companies. Product development provides market research, product design, process design, product analysis, control sales and technical assistance, and maintenance of selected functions at the M-L. Some activities with these characteristics can be done locally, while others are outsourced enough. The answer showed effectiveness and reliability at all M-L and macro-level using game functions and variable analysis. Several variables that contribute to specific properties are identified from the response during each selected activation period. Some variations are, in essence, concrete, while others are so abstract. These variables are of importance given by the respondent, in terms of ranking, in order of the lake given the scale [16, 17].

3.2 New Product Development Process Creating NPD is a complete function of bringing new products to market. The process is to transform a product and sales service by opening up opportunities in the market for NPD. Figure 3 describes that in a competitive environment of demand and customers, continuous Training and strategy greatly enhance customers’ needs and desires; it develops new products in a standardized way that is convenient to expand market share further.

Deep Convolutional Neural Networks-Based Market Strategy …

603

Fig. 3 NPD process

Concept Development and Testing: The idea of passing through the screening stage is to develop the product details concept of sales and engineering on paper. In essence, this concept, the target market, in its advantages, features, and attributes, not only the product, shows the plan proposes selling price of the product. Similarly, the concept is that you must include the recognition estimated cost of the competing brands in the selected market and manufactured products. If the concept has been developed, in order to evaluate it by asking many potential customers, you need to test it based on the idea of the feasibility and market. Business Analysis: The NPD measure will probably assess the general cost, deals potential, and anticipated productivity of the item idea. It is relied upon to accomplish industry, productivity, and hole items through examination, huge market potential, market size, and development rate, just as deals desires and figures. The principle motivation behind this examination is to recognize thoughts that are plainly possible and monetarily doable to execute. Our thought is that for this situation, it’s unreasonable. Marketing Strategy Development: Before the rate is a feasible idea, it could use an excellent candidate to develop a marketing strategy. In its basic form, at the time of introduction, a proposal of marketing, product, price, distribution, and develop a promotion strategy can be used to call this stage. This is because these strategies, so you can be changed following the vitality of the environment; it is appropriate to note that they should be flexible. Product Development: It is the natural or actual good example of practical reasoning. This is the stage. On the off chance that you make a vehicle, the organization will deliver a model vehicle type toy vehicle with all the elements of change and plan idea improvement. On the off chance that it is a help and a total assistance bundle has been created, there will be a showcase prepared for this trial. Test Marketing: To ascertain confirmation or conduct other sorts of product test results, the company has conducted interview sessions with participants at an expo under feasible and economic circumstances. It will be conducted. The company is, you can use the results of the test in order to make the necessary adjustments to the planned marketing strategy. However, the company is more accessible, faster, and sometimes better than the trigger of new product promotion to the plan, it is possible

604

G. M. B. James et al.

to imitate a look at it in order to come up with its version, so as not to expose to competitors. It must be extra careful in testing. NN Training and Validation: Commercially available software mat lab version 7.0 NN toolbox is used to operate the NN. Before preprocessing the data, the normalized data suitable for executing a training network in the transfer function will be in the range of 0.8–0.8. Performance indicators of training and validation data set and the two success factors will not pass through the network for all training. Set parameters for the “network training” function to minimize its effect. The Mean Square Error (MSE) is used as a training performance goal, and it is set to 0.5. Otherwise, is depicted in Fig. 3, and this outcome is referred to as “failing.” The next step is to compute the success rate and failure rate between the percentages of accurate predictions made by the NPD. The actual value of the MSE prediction will also be computed based on the target. Then, to calculate the success and nonsuccess between the percentages of correct predictions of the NPD. MSE will also be calculated the actual prediction value from its target.

4 Result and Discussion Different correct proportions prediction on success and MSE from the network structure NPD is data group 1 from the layer and hidden five nodes in order to obtain the best prediction data group, and it is organized. The correct success rate of prediction by the NPD percent, MSE. The same percentage of correct prediction is obtained from nodes of a two-layer model, a large MSE. Two hidden layers can obtain the best prediction for a group with a 20 and 5-node model—the proportion of correct prediction and the MSE, respectively, percent. Analysis of the NDP and MSE: Measures of performance: a RMSE and the percentage deviation are within the specified range. Even though data varies dramatically, the network can be trained using the hidden layer. Figure 4 describes the use of multiple intermediate layers but does not have the benefit of a visible model of accuracy. Verification results show that the MSE is the stage, is within 6%, and can decrease with an increase in the number of input variables. In stage MST, it is within less than 2% and the target range. Therefore, it can be concluded that the NN has been trained to have a good predictive ability. NN of Training and Testing Model: Determining training and test data level is essential in obtaining results. Used in the training of NNs, it will be used to obtain industrial data and results. Application process method and ambiguous analysis hierarchies are used to calculate process efficiency. The number of nodes in the hidden layer can determine the efficiency of the connection to know the complex relationships. It is possible to have more than one hidden layer to enhance the learning network functionality. Figure 5 describes an analysis of the NN of the training and

Deep Convolutional Neural Networks-Based Market Strategy …

605

Fig. 4 NDP and MSE

Fig. 5 NNN of training and testing

testing model. The NN model will make the high-performance testing and training. The two processes are mainly used by NPD products.

5 Conclusion and Future Work NPD is vital for the success of the organization. NPD process, in fact, commercially successful, is resource-intensive. Therefore, the prediction of the success of the NPD will be beneficial to all companies engaged in the development of new products. It will introduce a vital factor for the success of NPD and its performance prediction report. NN is used to create the model between the predicted CSF and performance evaluation index and for the NPD to run correctly. In a network architecture that includes the number of nodes used in each layer and the type of transfer function used in each layer, the data is preprocessed, dataset clustering, model training, validation, and predicted processing for pre-network performance. It is important. The results show that after well-trained NN models, backpropagation and feed-forward NNs can

606

G. M. B. James et al.

predict the success of NPD. The percentage of correct and successful predictions in NPD seems to be high. From the questionnaire, the number of data available for such variables of interest and the accuracy of the predicted results as the quality of the data.

References 1. Gaham M, Bouzouia B (2009) Intelligent product-driven manufacturing control: a mixed genetic algorithms and machine learning approach to product intelligence synthesis. In: XXII international symposium on information, communication and automation technologies, pp 1–8 2. Gu Y, Shen L, Song W, Zhang Y (2021) Simulating a production system as an agent-based model: a case study of a gear reducer factory. In: 3rd international symposium on robotics & intelligent manufacturing technology, pp 198–201 3. Lee YC, Myung H (2019) Hierarchical sampling optimization of particle filter for global robot localization in pervasive network environment. ETRI J 41(6):782–796 4. Leitão P, Restivo F (2006) ADACOR: a holonic architecture for agile and adaptive manufacturing control. Comput Ind 57:121–130 5. Liang X, Ding Y, Wang Z, Hao K, Hone K, Wang H (2014) Bidirectional optimization of the melting spinning process. IEEE Trans Cybern 44(2):240–251 6. López M, Martín J, Gangoiti U, Armentia A, Estévez E, Marcos M (2018) Supporting product oriented manufacturing: a model driven and agent based approach. In: IEEE 16th international conference on industrial informatics, pp 133–139 7. Marik V, Lazansky J (2007) Industrial applications of agent technologies. Control Eng Pract 15(11):1364–1380 8. McFarlane D, Sarma S, Chirn JL, Wong CY, Ashton K (2003) Auto ID systems and intelligent manufacturing control engineering application of. Artif Intell 16(4):365–376 9. Noh S, Park J (2020) System design for automation in multi-agent-based manufacturing systems. In: 20th international conference on control, automation and systems, pp 986–990 10. Shen W, Norrie DH (2001) Dynamic manufacturing scheduling using both functional and resource related agents. ICAE 8(1):17–30 11. Xiang W, Lee HP (2008) Ant colony intelligence in multi-agent dynamic manufacturing scheduling. Eng Appl Artif Intell 21:73–85 12. Sudhakar S, Chenthur Pandian S (2016) Hybrid cluster-based geographical routing protocol to mitigate malicious nodes in mobile ad hoc network. Int J Ad Hoc Ubiquitous Comput 21(4):224–236 13. Sudhakar S, Chenthur Pandian S (2012) Secure packet encryption and key exchange system in mobile ad hoc network. J Comput Sci 8(6):908–912 14. Sudhakar S, Chenthur Pandian S (2013) A trust and co-operative nodes with affects of malicious attacks and measure the performance degradation on geographic aided routing in mobile ad hoc network. Life Sci J 10(4s):158–163 15. Sudhakar S, Chenthur Pandian S (2015) Investigation of attribute aided data aggregation over dynamic routing in wireless sensor. J Eng Sci Technol 10(11):1465–1476 16. Sudhakar S, Chenthur Pandian S (2013) Trustworthy position-based routing to mitigate against the malicious attacks to signifies secured data packet using geographic routing protocol in MANET. WSEAS Trans Commun 12(11):584–603 17. Priyadarshni AU, Sudhakar S (2015) Cluster based certificate revocation by cluster head in mobile ad-hoc network. Int J Appl Eng Res 10(20):16014–16018

Calorie Count of a Fruit Image Using Convolutional Neural Network B. Kirananjali, M. Himasai, T. Lakshmi Sujitha, and Y. Kalyan Chakravarti

Abstract Intake of healthy food is very important; people can use technology to assist them in determining what to eat to maintain their health stable and to be energetic. It helps the people to choose the correct food. People can make better decision about what to eat by knowing the calorie count of a food they are taking in quantity. Hence, in this paper, we are building a model for detecting the food calories with the help of Convolutional Neural Networks (CNNs) algorithm which is a part of machine learning as well as the deep learning. We can extract more accurate representations of the visual content using this specific sort of neural network model. We have built a model neural network-based algorithm that can identify food from a picture and shows its estimated calorie content. We used an existing dataset made up of 100 × 100 pixel images from various classes in order to accomplish our aim. We have achieved 0.91 accuracy by using the Convolutional Neural Network model. Keywords Convolutional neural network (CNN) · Deep learning (DL) · Food recognition · Machine learning (ML)

1 Introduction Both adults and children are thought to be affected by the worldwide obesity issue [1]. An excessive food intake with a lack of exercise results in obesity [2]. As a result, the requirement to precisely measure diet arises. People are unable to assess or regulate their daily calorie intake due to a lack of nutritional knowledge, inconsistent eating habits, or a lack of self-control. Because of the convenience of access to the internet, at the touch of a button, food is now easily delivered to our doorsteps. Various programs to calculate food intake were created to circumvent human labor erroneous data. To solve these issues, a number of e-health techniques are applied that use the concept of image processing to determine the number of calories in a particular food item.

B. Kirananjali · M. Himasai · T. Lakshmi Sujitha · Y. Kalyan Chakravarti (B) Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_47

607

608

B. Kirananjali et al.

The problem statement is “Based on the detected food image estimating the calories of that particular food item.” The fruit calories can be detected using our application. We developed CNN algorithm, the reason of choosing this architecture is, it can classify the image and recognize because of its high accuracy. The application’s user is required to supply data. A machine learning algorithm will be utilized to calculate a fruit’s calories based on the input data that is provided. To enable the application to detect, it is to be trained based on the given data about the fruit images. After training the application predictions can be made. This project is intended to introduce the use of software technology in the field of health care. The main objective of our work includes studying the different layers present in the neural networks which gives the accurate results by classifying the image and dimensionality reduction to obtain more accuracy. Consistently, people are confronted with numerous varieties or sorts of fruit to eat. An individual can use an accurate calorie count of a fruit detecting model to assist them to determine what to eat and when to eat. It assists the individual in selecting the appropriate food. Everyone can make better decisions if they have an application which is having the capacity to determine the food calorie with more accurate results. The rest of the paper is organized as follows: Sect. 1.1 specifies the related work regarding the project, Sect. 2 mentioned about the dataset including few parameters such as density, calorie, and shape of the fruit and also mentioned about design methodology along with the algorithm and parameters used. The experimental results are presented in Sect. 3, and observations and conclusion are provided in Sects. 4 and 5, respectively.

1.1 Related Work In [1] a CNN model for measuring the calories of a food, in this paper, they proposed a calorie count estimation. The system calculates food calories by taking input picture that was placed in the plate. Their primary objective is to find several foods in one image. To get greater accuracy for many food images, they made a prototype using the Inception v3 model. All of the photos in the dataset were used by the algorithm to extract features, which were then recorded as weights. In [2] Calorie prediction of food intake using a generalized regression neural network. This project used machine learning and image processing to calculate food calories. For that, they detect the object, using CNN. To recognize the image properly, they segment the image using grab cut algorithm. After the user inputs a food image, the system classifies the inputted food item and outputs a result that correctly identifies the given food image. In [3] Food Recognition and Calorie Measurement using Image Processing and Convolutional Neural Network, the purpose of this article is to provide the user

Calorie Count of a Fruit Image Using Convolutional Neural Network

609

with an efficient, knowledgeable, and accurate method that helps them develop selfawareness regarding their calorie consumption. They used a particularly inventive mix of graph cut segmentation. The experimental findings of the system a represented in this section. They have merged deep neural networks and edge detection classification in this study. We may increase the precision of our food classification by combining these two techniques. In [4] Real Food Size Estimation from Images for Accurate Calorie Counting, the experimental findings of employing digital photography to estimate caloric intake were provided in this research. The Generalized Regression Neural Network (GRNN) was chosen because it trains more quickly than traditional feedforward networks, this section will go into further depth about the experimental setup, the food picture database, GRNN training, and testing. Processing was done on a multicore, highperformance machine. In [5] Deep Learning Neural Network for Measuring Food Calories, in this paper, they examined their three earlier works and suggested two new ones. The most useful strategy at the moment is DepthCalorieCam. However, in order to expand the system into large-scale categories, huge calorie-annotated 3D food volume data is required, which is highly expensive and time-consuming. As a consequence, the correlation coefficient is more than 0.9. In their upcoming study, they intend to combine their actual size estimates with food segmentation to calculate food calories from taken photos while taking into account the 2D size of meals. In [6] A Food Detection and Calorie Estimation System Based on Convolutional Neural Networks, the objective of implementing an app was effective to detect food items and calculate their caloric content. For both feature extraction and picture recognition, they employed Convolutional Neural Network, a cutting-edge deep learning technique. After the user inputs a food image, the algorithm classifies the food item and shows the expected food item’s name and the number of calories it contains. The user sees a pop-up message asking them to confirm the forecasted meal item.

2 Proposed Work For the purpose of detecting fruit images and calculating the quantity of calories present, many models have been developed by examining calories in food items and both the daily dietary information and calorie intake of an individual. To determine the equivalent, a number of related methodologies and algorithms have been developed. CNN has identified as the best approach to be applied for categorization and successful results. Dataset A total of more than 1 Lakh images are present in the dataset. We have collected the dataset from the paper.

610

B. Kirananjali et al.

Table 1 Calorie dataset Fruit

Density

Calorie

Label

Shape

Apple

0.60

52

1

Sphere

Banana

0.94

41

2

Cylinder

Grapes

0.64

89

3

Sphere

Cherry

0.64

16

4

Sphere

Watermelon

0.51

40

5

Sphere

Orange

0.48

47

6

Sphere

Papaya

0.48

18

7

Oval

Pear

0.43

19

9

Oval

Strawberry

0.34

23

8

Sphere

“Fruit Dataset Deep Learning Project With CNN” [7]. The dataset was divided into sixty-seven thousand images of training data and twenty-two thousand images of testing data. The dataset was mounted from drive and has been uploaded in Google Collab. We used CNN model for identifying the images and for estimating the calories present in the given image [8]. First, we used our dataset to perform on two classes, and we split it again into a training and validation dataset. We used 6241 photos for the training dataset out of the approximately 67,000 total, and 3120 images from the validation data, which included images from 24 different classes. We used nine different foods in this project, including apple, banana, grapes, cherry, watermelon, orange, papaya, pear, and strawberry which is shown in Table 1. Their details, including calories, labels, and densities, are provided in the table below. This training data table is used to generate the testing data for the project. Design Methodology The suggested work design is illustrated in Fig. 1. The dataset we have taken consists of images of different classes, the images are of different categories of images. Test and train data are separated from the dataset. The dataset will be trained with training data before putting the model into application. The test data is used to evaluate the model. It reads the layers like Conv2D, Maxpooling2D, Activation, Dropout, Flatten, and Dense. After convolutional process for better performance of the model, we use pooling layer to reduce the dimensionality, so the model can perform with more accurate result. The knowledge to improve the model’s performance, preprocessing of data is used. To begin data preprocessing, we imported several libraries, including Matplotlib, pyplot as plt, glob, os, TensorFlow, and NumPy as np. Our entire dataset was loaded to Google Drive and placed into the folder called “food 360.” There were two folders named train and validation in that folder. Finally, resized our images such that they were 100 pixels wide and 100 pixels height. We used our inception regression model to extract features after resizing the images.

Calorie Count of a Fruit Image Using Convolutional Neural Network

611

Fig. 1 Architecture of the proposed method

We created five convolutional layers with a 3 × 3 kernel size in the suggested approach. Each convolutional layer is followed by a pooling layer to minimize the size of the image, while still retaining spatial invariance this results the low cost of calculation over the entire CNN network. ReLU, or rectified linear units, is an activation function used in deep learning models [9]. It enhances training process efficiency and provides sparse results. Figure 1 describes the architecture of Convolutional Neural Networks (CNN), one important application of a kind of deep neural networks is image recognition. Hidden layers are used to extract and learn the characteristics of training data, while fully linked layers are used to classify the images. CNN is made up of certain basic units, including both hidden and fully connected layers. Step 1: To start the computation, distributer and om weights to each linkage. Step 2: Utilizing the data sources and the (Input → Hiddenhub)linkages find the actuation pace of stowed away hubs Step 3: Find the actuation pace of the result hubs using the yield produced by the stowed away hubs and connections. Step 4: Find the mistake rate at the result hub and recalibrate everyone of the linkages between stowed away hubs and result hubs Step 5: Utilizing the loads and error discovered at the yield hub, overflow the error to the hubs that will be stowed away. Step 6: Recalibrate the weights between hidden and the input nodes Step 7: Rehash the cycle till the assembly model met. Step 8: Utilizing the last linkage loads score the implementation efficiency of the result hubs.

612

B. Kirananjali et al.

3 Discussion on Experimental Investigations Parameters The parameters we used in the neural networks are as follows: (1) Kernel function Filter size = n ∗ m

(1)

The bit is only a channel that is utilized to separate the elements from the pictures. The part is a network that moves over the info information, plays out the speck item with the sub-locale of information, and gets the result as the grid of spot items. Either a hyperbolic function or sigmoid function can be used to represent a neuron’s output as an activation function of its input. tanh(x) = e(e x − e−x )/(e x + e−x )

(2)

(2) Size of the Output Tensor of a ConvLayer The size of the output image, O=

I − K + 2P +1 S

(3)

The parameters “I” and “O” specify the size of the input and output images, respectively, “K” specifies the size of kernels used in ConvLayer, “N” number of kernels, “S” it specifies the stride of the convolutional operation, “P,” padding. Description of Requirements and Tools The requirements for the project are: Python3IDE like Idle, Jupyter Notebook, Google Colab, etc. Python generates understandable and compressed code. Python’s simplicity enables developers to create reliable solutions, whereas machine learning and AI are built on complex algorithms and adaptable workflows. Developers may concentrate only on solving an ML problem rather than worrying about the language’s technical details. Because of its ease of learning, Python also appeals to many developers. Python code is easily understood by humans, which facilitates the development of machine learning models. The packages use dare: 1. An open-source library for deep learning applications is called TensorFlow. Additionally, supported is conventional machine learning. Tensors, which are multidimensional arrays with more dimensions [10], are the only type of data that TensorFlow allows. Multi-dimensional arrays are useful for working with vast volumes of data. 2. A high-level deep learning API for neural networks called Keras was created by Google. It is created in Python and is designed to simplify the development

Calorie Count of a Fruit Image Using Convolutional Neural Network

613

of neural networks. Numerous neural networks may also be computed in the backend. 3. For data science, data analysis, and machine learning tasks, a popular opensource Python package called Pandas is employed. It is constructed on top of the multi-dimensional array-supporting NumPy library.

4 Results and Observations Figure 2 represents the plotting between the names and their respective amounts from the dataset of original size. Additionally, our training’s accuracy increases along with the number of epochs. The validation accuracy and loss change less steadily than the training accuracy and loss when we add fewer photos to the validation dataset. The model classifies the images and reduces the dimensionality for better performance and trains the CNN model after detecting the image, by getting the edge line detections it identifies the object and estimates the calories, and the obtained training and validation loss are as follows. These two graphs allow us to display the variations in loss and precision with regard to epochs. Figures 3 and 4 are the visualization of the train and validation loss as well as the accuracy of the model. However, the user-provided fruit picture is successfully recognized by the model and displays the classified image, then also shows the calorie of taken image which is calculated in the internal of the CNN model. This helps the user for keeping track of their calories which is taking in quantity. Figure 5 shows, recognized image that has been found similar to the image in the dataset and training the dataset, building the CNN model for it.

Fig. 2 Plotting between names and amounts of dataset of original size

614

Fig. 3 Train and validation loss for CNN model Fig. 4 Graphical representation of accuracy for CNN model

Fig. 5 Recognized image and respective calories

B. Kirananjali et al.

Calorie Count of a Fruit Image Using Convolutional Neural Network

615

5 Conclusion and Future Work In this paper, we proposed machine learning algorithms for measuring calories of a fruit based on the input which is taken as image from the data. Machine learning is an important decision support tool for fruit calorie detection, including supporting decisions on what to eat in the measure of quantity by our health condition. The introduction of this project is helpful to every individual to reduce the problems of obesity and maintain a good health. In future this project can be expanded by creating a website [11] which will be helpful to detect and display the calorie count of a fruit and based on customer health it suggests whether it is safe to eat [12] or not based on the total day intake.

References 1. Hu H, Zhang Z, Song Y (2020) Image based food calories estimation using various models of machine learning, pp 1874–1878. https://doi.org/10.1109/ICMCCE51767.2020.00411 2. Ayon SA, Mashrafi C, Yousuf A, Hossain F, Hossain MI (2021) FoodieCal: a convolutional neural network based food detection and calorie estimation system, pp 1–6. https://doi.org/10. 1109/NCCC49330.2021.9428820 3. Pouladzadeh P, Kuhad P, Peddi SVB, Yassine A, Shirmohammadi S (2016) Food calorie measurement using deep learning neural network, pp 1–6. https://doi.org/10.1109/I2MTC. 2016.7520547 4. Gunawan T, Kartiwi M, Malik N, Ismail N (2018) Food intake calorie prediction using generalized regression neural network, pp 1–4. https://doi.org/10.1109/ICSIMA.2018.8688787 5. Haynos A (2017) Assessment of dietary intake/dietary restriction. https://doi.org/10.1007/978981-287-104-6_158 6. Ege T, Ando Y, Tanno R, Shimoda W, Yanai K (2019) Image-based estimation of real food size for accurate food calorie estimation, pp 274–279. https://doi.org/10.1109/MIPR.2019.00056 7. Reddy VH, Kumari S, Muralidharan V, Gigoo K, Thakare B (2019) Food recognition and calorie measurement using image processing and convolutional neural network, pp 109–115. https://doi.org/10.1109/RTEICT46194.2019.9016694 8. Tanno R, Ege T, Yanai K (2018) AR DeepCalorieCam V2: food calorie estimation with CNN and AR-based actual size estimation, pp 1–2. https://doi.org/10.1145/3281505.3281580 9. Shimoda W, Yanai K (2016) Foodness proposal for multiple food detection by training of single food images, pp 13–21. https://doi.org/10.1145/2986035.2986043 10. Zhu F, Bosch M, Khanna N, Boushey C, Delp E (2015) Multiple hypotheses image segmentation and classification with application to dietary assessment. IEEE J Biomed Health Inform 19:377– 88. https://doi.org/10.1109/JBHI.2014.2304925 11. Tanno R, Okamoto K, Yanai K (2016) DeepFoodCam: ADCNN-based real-time mobile food recognition system, pp 89–89. https://doi.org/10.1145/2986035.2986044 12. Todd L, Wells N, Wilkins J, Echon R (2017) Digital food image analysis as a measure of children’s fruit and vegetable consumption in the elementary school cafeteria: a description and critique. J Hunger Environ Nutri 12:1–13. https://doi.org/10.1080/19320248.2016.1275996

A Novel Adaptive Fault Tolerance Algorithm Towards Robust and Reliable Distributed Applications to Reuse System Components Lalu Banothu, M. Chandra Mohan, and Charupalli Sunil Kumar

Abstract Distributed component-based systems relay on components that run in different servers located geographically. In large distributed applications experiencing faults is common due to complexity, heterogeneity and a host of other reasons. Therefore, realizing a reliable and fault-tolerant distributed system is challenging. Existing techniques such as N-version programming and modular redundancy have certain drawbacks such as increased complexity and error prone. In this paper, we proposed an algorithm known as Adaptive Fault Tolerance (AFT) algorithm which evaluates different functions or components in the system and assigns weights to them besides following a novel redundancy strategy. The algorithm exploits function calls and frequency of calls to arrive at finding importance of functions. This know-how helps in identifying functions that are vulnerable and the functions that are reliable. The AFT helps in achieving fault-tolerant distributed component-based applications comprising of untrusted components. Experimental results showed that AFT outperforms many existing fault-tolerant approaches. Thus, the proposed algorithm helps in realizing robust and reliable distributed applications that reuse components and drive home businesses. Keywords Software engineering · Component-based distributed system · Fault tolerance · Adaptive fault tolerance · Software implemented fault tolerance

1 Introduction Distributed component-based applications are made up of different software components located in any server across the globe. The components are essentially interoperable and thus they can be reused in different applications as well. In other words, L. Banothu (B) · M. Chandra Mohan Professor, Department of Computer Science and Engineering, JNTUH College of Engineering, Hyderabad, India e-mail: [email protected] C. Sunil Kumar Professor & Dean in CSE, Apollo University, Chittoor, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_48

617

618

L. Banothu et al.

a distributed component-based application is made up of several components. Those components need not be in-house and some of them might be hired from third parties. Therefore, there is heterogeneity in the components in terms of location, platform and ownership. When such components are used in distributed applications, it is important to ensure that the components work fine as expected. However, in reality, there have been instances that advocate the possible faults in the distributed system for many reasons. Moreover, the components are not in the control of an organization as they come from different parties. Due to such complexity in the component-based distributed system, it is challenging to have a fault-tolerant strategy that ensures smooth functioning of the application in hand. Literature has found many fault-tolerant techniques associated with the distributed applications. Fault tolerance algorithm for cloud services is explored in [1] while the fusion-based fault-tolerant algorithms are defined for sensor data in [2]. Fault tolerance strategy for Software Defined Network (SDN) is proposed in [3] and component ranking-based fault tolerance strategies are discussed in [4]. Fault tolerance along with energy efficiency with dynamic voltage and frequency scaling (DVFS) technology is explored in [6]. In some works, such as [7, 11], fault-tolerant methods defined have led to load balancing as well. A distributed publish-subscribe system with fault tolerance is proposed in [10]. Other important contributions found in the literature include Fault Detection Algorithm (FDA) for dynamically serving components [13] and multi-level checkpoint-based approach [15]. From the literature, it is observed that existing techniques such as N-version programming and modular redundancy have certain drawbacks such as increased complexity and error prone. To overcome this problem, in this paper, an Adaptive Fault Tolerance (AFT) algorithm is proposed. Our contributions in this paper are as follows. 1. We proposed an algorithm known as Adaptive Fault Tolerance (AFT). This algorithm evaluates different functions or components in the system and assigns weights to them besides following a novel redundancy strategy. The algorithm exploits function calls and frequency of calls to arrive at finding importance of functions. This know-how helps in identifying functions that are vulnerable and the functions that are reliable. 2. A ranking algorithm is implemented in order to know the significance of functions associated with different components. 3. A distributed component-based system is built with several components and the performance of the proposed algorithm is evaluated. The remainder of the paper is structured as follows: Sect. 2 reviews different existing methods for fault tolerance in component-based distributed software systems. Section 3 presents the proposed methodology and underlying algorithms. Section 4 presents results of the empirical study. Section 5 concludes the paper and gives directions for future work.

A Novel Adaptive Fault Tolerance Algorithm Towards Robust …

619

2 Related Work This section reviews literature on the existing fault-tolerant methods for distributed component-based systems. Al-Jaroodi et al. [1] proposed an algorithm for a distributed system offering cloud services. Their algorithm is delay-tolerant with minimal fault discovery overhead. This algorithm results in load balancing. It brings co-ordination among servers to achieve fault tolerance in task execution. Ao et al. [2] proposed different algorithms for sensor data fusion in distributed environment for fault tolerance. It has fusion approaches to meet requirements based on the kind of data sensed. Botelho et al. [3] focused on fault-tolerant data store that is used for Software Defined Network (SDN). They proposed a consistent fault-tolerant algorithm for controlling SDN-based procedures. Zheng et al. [4] proposed a faulttolerant algorithm for cloud-based distributed applications based on two-component ranking approaches. The first approach exploits component invocation frequencies and structures while the second one fuses system structure information. Qu and Xiong [5] proposed a replication algorithm that serves storage efficiency in cloud. The algorithm is fault-tolerant and resilient besides being highly efficient. Xie et al. [6] proposed a fault-tolerant approach for scheduling in distributed embedded applications. They designed it to be energy efficient by using the technology known as Dynamic voltage and frequency scaling (DVFS). Balasangameshwara and Raju [7] proposed a fault tolerance approach known as AlgHybrid_LB which is meant for distributed environments. It also results in the load balancing of resources in grid computing. Gao et al. [8] investigated on fault tolerance control and fault diagnosis. They discussed about different fault-tolerant approaches that contribute to the stable distributed applications. Kwon and Tilevich [9] proposed a fault-tolerant and energy-efficient approach in distributed application where mobile system is involved. Fault-tolerant means of offloading tasks in mobile cloud environment brings about efficiency. Ramachandran et al. [10] defined a fault-tolerant method for distributed publish-subscribe system with persistence with the help of blockchain storage. Gao et al. [11] explored different fault-tolerant techniques and fault diagnosis models for distributed component applications. They investigated on both signalbased and model-based approaches used for fault tolerance. Fatta et al. [12] defined fault-tolerant approach to clustering task in a distributed environment. Yadav and Sidhu [13] proposed a fault-tolerant method for replica management in distributed environments. Their algorithm is named as Fault Detection Algorithm (FDA) which dynamically serves number of components in a distributed setting. Kumari and Kaur [14] focused on the large-scale distributed applications associated with cloud by defining fault-tolerant algorithms. The applications run with fault tolerance in place and that is achieved with checkpointing algorithms. Qiang et al. [15] proposed a method known as “Distributed-application oriented Multi-Level Checkpoint/Restart (CDMCR)”. It focused on fault tolerance at multiple levels. From the literature, it is observed that existing techniques such as N-version programming and modular redundancy have certain drawbacks such as increased complexity and error prone. To

620

L. Banothu et al.

overcome this problem, in this paper, an Adaptive Fault Tolerance (AFT) algorithm is proposed.

3 Proposed System This section presents the proposed methodology and the underlying algorithm for fault tolerance in distributed component-based system. The proposed system is based on a case study application known as Distributed Reservation System (DRS). The system model associated with DRS is presented in Fig. 1. The system is implemented with Java distributed component technology known as Remote Method Invocation (RMI). The system has many server components that are used by an application. The server components are remotely located and used by the client application. Since the components work in a distributed environment, the system is suitable candidate for the research of fault tolerance. It has provision for reservation of all travelling related services. The server side components include Car Server, Flight Server, Room Server and Middleware Server. These server components that run in different geographical locations are part of a client application. Therefore, it is indispensable that the application needs fault-tolerant approach in the usage of components. The system model is used in order to identify different functions and ensure that the proposed fault tolerance algorithm ensures smooth functioning of the application.

Distributed Reservation System

Request

RMI Client

Car server Flight server

response

Room server Middleware

RMI server components

AFT Algorithm Fig. 1 The system model

Reliable and Fault Tolerant services

A Novel Adaptive Fault Tolerance Algorithm Towards Robust …

621

The functions from the server components are mapped to functions with measurement of frequency of invocations. It is made based on the function ranking approach. As presented in Fig. 2, the functions of the server side components are presented with the invoke and invocated dynamics with respect to other components. Function ranking is used in order to identify most vulnerable functions that need fault tolerance. The function ranking measure is based on the PageRank [16] algorithm which is widely used. It makes use of the data presented in Table 1. A function graph which is nothing but a weighted directed graph is generated in order to reflect function invocation dynamics. A node vi in the graph denotes a function in the graph and eij denotes a link from two nodes such as vi and vj . There is weight value which is non-negative denoted as w(eij ) associated with the link. The weight is computed as in Eq. 1. ( ) f ei j

)

(

ω ei j = ∑

Fig. 2 Graphical representation of the functions and their invocation dynamics in the system

ek j ∈IN(v j )

( ) f ek j

(1)

1

V1

V5 0.7

0.6

V3 0.4 0.3

1

0.4

V4 0.2

0.4

V2

V6

Table 1 Functions and their invocation dynamics Function id

Function ranking

Invoke numbers

Invoked numbers

Function ranking

F1

0.31929

2

2

0.31929

F2

0.1970

2

1

0.1970

F3

0.1797

1

2

0.1797

F4

0.1569

1

2

0.1569

F5

0.0841

3

1

0.0841

F6

0.0624

0

1

0.0624

622

L. Banothu et al.

In the same fashion, the weight of function vi the sum of edges denoted as w(eki ) has to be multiplied with function weight denoted as vk . The weight w(vi ) is computed as in Eq. 2. ω(vi ) =



ω(eki ) × ω(vk )

(2)

eki ∈IN(vi )

Afterwards, the ranking is computed for each function denoted as vi as given in Eq. 3. ω(vi ) =

∑ 1−d ω(vk )ω(eki ) +d n e ∈IN(v ) ki

(3)

i

where d is the damping factor that can have value between 0 and 1. It is useful in adjusting derived significance values. There is need for vector of weight of functions in order to have further processing. It is computed as in Eq. 4. ⎛

⎞ ω(v1 ) ⎜ ω(v2 ) ⎟ ⎜ ⎟ W =⎜ . ⎟ ⎝ .. ⎠

(4)

ω(vn ) Similarly, D, which denotes matrix reflecting invocation dynamics, is computed as in Eq. 5. ⎛

ω(e11 ) ω(e12 ) . . . ⎜ ω(e21 ) ω(e22 ) . . . D=⎜ ⎝... ... ... ω(en1 ) ω(en2 ) . . .

⎞ ω(e1n ) ω(e2n ) ⎟ ⎟ ⎠ ...

(5)

ω(enn )

In case where vi has nothing to call, it is assumed that 1/n is assigned to w(vi1 ) to w(vin ). Therefore, in the vector form, equations can be redefined as in Eq. 6. ⎞ ⎞ ⎛ ω(v1 ) ω(v1 ) ⎜ ω(v2 ) ⎟ ⎜ ω(v2 ) ⎟ 1 − d ⎟ ⎟ ⎜ ⎜ + d Dt ⎜ . ⎟ ⎜ . ⎟= ⎝ .. ⎠ ⎝ .. ⎠ n ⎛

ω(vn )

(6)

ω(vn )

where the matrix transposed is denoted as Dt , this operation is repeated to make the ranking values stable. The proposed strategy for fault tolerance is known as Adaptive Fault Tolerance (ATF). As per the function ranking algorithm, the strategy considers various functions in the distributed system and assigns suitable number of components using function ranking. It takes care of distribution of required number of

A Novel Adaptive Fault Tolerance Algorithm Towards Robust …

623

components that are equivalent to achieve system reliability. The value for number of equivalent components varies as there might be some components that fail. In case the results are agreed by majority components, it is understood that the assigned task is completed. If not, the results’ degree of confidence is decreased. The proposed algorithm considers the gap between results and majority results and determines the required reliability. The degree of confidence is found based on the Bayes’ theorem. Considering that m number of components give result with certain probability denoted as r and n number of components return results with probability denoted as 1 − r. In this case, the correct result is denoted as C(r, m, n) which is from the m components while the result of n components is not correct. Then, according to the Bayes theorem, the C(r, m, n) is computed as in Eq. 7. r m+ j (1 − r )n+ j r m+ j (1 − r )n+ j + (1 − r )m+ j r n+ j r m (1 − r )n+ j rj = j r r m (1 − r )n+ j + (1 − r )m+ j r n r m (1 − r )n r j (1 − r ) j = j r (1 − r ) j r m (1 − r )n + (1 − r )m r n r m (1 − r )n = C(r, m, n) = m r (1 − r )n + (1 − r )m r n

C(r, m + j, n + j ) =

(7)

If components obtain m + n results and satisfy the confidence, such components are reliable and the system as a whole is reliable. Then based on m, Eq. 8 is based on. P( p(x) ≥ 0.5) ) ((( ) m + 2n = p(x)m+n (1 − p(x))n m+n )−1 ) (( ( ) ) m + 2n m + 2n m+n n n m+n × p(x) p(x) (1 − p(x)) (1 − p(x)) + m+n n p(x)m+n (1 − p(x))n p(x)m+n (1 − p(x))n + p(x)n (1 − p(x))m+n p(x)m p(x)n (1 − p(x))n = p(x)n (1 − p(x))n p(x)m + (1 − p(x))m p(x)m = m p(x) + (1 − p(x))m =

(8)

From the above equation, it is understood and concluded that ((x) ≥ 5 and for all n which is identical.

624

L. Banothu et al.

C(r ) =

∝ ∑ (m + 2n)

(9)

n=0

For obtaining system reliability, execution of m + n components that are functionally equivalent, there will be result and n components get other results. Then the cost factor is computed as in Eq. 9 where reliability of a component is denoted as r. Algorithm 1: Adaptive Fault Tolerance (AFT) 1. Initialize X a to 0 2. Initialize X b to 0 3. Initialize r to NULL 4. while X a − X b < m do 5. choose independent components 6. Execute m − |X a − X b | times for each component 7. X a = X a + number of a result by redundant execution 8. X b = X b + number of b result by redundant execution 9. If X a > X b then 10. r = a 11. else 12. r = b 13. end 14. end

As presented in Algorithm 1, it follows the proposed strategy with adaptive fault tolerance approach. The system reliability is based on the fault tolerance strategy applied.

4 Experimental Results Experiments are made with different strategies associated with the proposed AFT algorithm. Different fault-tolerant approaches followed with AFT and observations are made include NoAFT (no fault-tolerant algorithm is applied), RandomAFT (AFT is applied for only k functions that are randomly selected), AFT (algorithm is applied to top-k vulnerable functions) and AllAFT (fault tolerance is applied to all functions).

1 1

0.91

0.92

0.94

13

14

15 1

1 1

0.88

0.9

1

1

0.99

0.98

11

0.85

10

0.95 0.97

12

0.82

0.83

8

0.78

7

9

0.72

0.75

5

6

0.68

4 0.92

0.78 0.86

0.6

0.65

2

3

Component reliability = 0.65 0.65

Component reliability = 0.55

0.55

Function reliabilities

1

Requirement factors Component reliability = 0.75

1

1

1

1

1

1

1

1

1

0.99

0.98

0.97

0.96

0.9

0.75

Component reliability = 0.85

1

1

1

1

1

1

1

1

1

1

1

1

0.99

0.96

0.85

A Novel Adaptive Fault Tolerance Algorithm Towards Robust … 625

Top 20%

Top 10%

Top 5%

0.6583

0.7703

CFIR

MajorR

0.6953

0.7953

CFIR

0.819

MajorR

MajorR

0.8813

0.7303

MajorR

CFIR

CF = 3

0.8237

CFIR

Top 1%

Cost factors

Method

Redundant components CF = 5

0.6257

0.5373

0.6663

0.5803

0.7043

0.6207

0.807

0.7623

CF = 7

0.5243

0.413

0.5757

0.4763

0.6247

0.5367

0.755

0.6977

CF = 9

0.4503

0.3037

0.5093

0.3787

0.5663

0.451

0.7163

0.6407

CF = 11

0.394

0.2447

0.4593

0.326

0.5223

0.4043

0.6877

0.6107

CF = 13

0.3507

0.2133

0.421

0.2983

0.487

0.3787

0.665

0.594

CF = 15

0.317

0.197

0.391

0.284

0.4613

0.367

0.6477

0.5857

CF = 17

0.29

0.1883

0.367

0.2763

0.461

0.3667

0.6337

0.5813

626 L. Banothu et al.

A Novel Adaptive Fault Tolerance Algorithm Towards Robust …

627

7 6 5 4 3 2 1 0 CFIR MajorR CFIR MajorR CFIR MajorR CFIR MajorR

Redundant components

Function ID V1 V2 V3 V4 V5 V6

Function Ranking 0.3199 0.197 0.1797 0.1569 0.0841 0.0624

Invoked Numbers 2 1 2 2 1 1

Top 1%

Invoke Numbers 2 2 1 1 3 0

Top 5%

Top 10%

Top 20%

3.5 3 2.5

Value

Method

2 1.5 1 0.5 0 V1

V2

V3

V4

V5

V6

Function ID Function Ranking

Invoked Numbers

Invoke Numbers

Redundant Method components

1%

Top 1%

0.0833 0.1604 0.2314 0.2976 0.3588 0.4151 0.4665 0.5143 0.5581 0.5982

CFIR

Component failure probability 2%

3%

4%

5%

6%

7%

8%

9%

10%

RandomR 0.1379 0.2566 0.3598 0.4473 0.5237 0.5897 0.6464 0.6957 0.7374 0.7732 Top 5%

CFIR

0.0521 0.1024 0.1508 0.1973 0.2423 0.2855 0.3267 0.3662 0.405

RandomR 0.1329 0.249 Top 10%

CFIR

0.3461 0.4361 0.5098 0.574

0.0401 0.0794 0.1179 0.1556 0.193

RandomR 0.1262 0.2354 0.332

0.415

0.4408

0.6306 0.6796 0.7242 0.7612

0.2292 0.2648 0.2992 0.3332 0.3661

0.4913 0.5545 0.6149 0.6624 0.7025 0.7402

Top 0%

NoR

0.1393 0.2592 0.3624 0.4512 0.5276 0.5934 0.6501 0.6988 0.7408 0.7769

Top 100%

AllR

0.0005 0.002

0.0045 0.0081 0.0126 0.0183 0.025

0.0326 0.0415 0.0513

628

L. Banothu et al.

6 Series10

5

Series9

4 3

Series8

2

Series7

1

Series6

0

Series5 Method

CFIR RandomR CFIR RandomR CFIR RandomR

Redundant components

Top 1%

Top 5%

Top 10%

NoR

AllR

Series4

Top 0%

Top 100%

Series3

Implicit Effects on System Reliability

System failure probabilities 0.85 0.90

3

0.82

5

0.75

0.8

7

0.69

0.75

9

0.63

0.71

11

0.62

0.69

13

0.61

0.68

15 17

0.6 0.59

0.67 0.66

Cost factors

0.5 0.3 0.1 0

0.73 0.61 0.52 0.45 0.4 0.38 0.37 0.36

0.82 0.7 0.64 0.57 0.52 0.49 0.47 0.46

System failure probabilities IterativeR

MajorR

3

0.7

0.8

5

0.57

0.66

7

0.47

0.56

9

0.37

0.48

11

0.32

0.43

13

0.3

0.42

15

0.29

0.39

17

0.28

0.38

3

6

9

12

15

18

21

Cost factors (a) Top-K = 1%

1 system failure probabilities

3 5 7 9 11 13 15 17

0.7

System failure probabilities

0.8 0.6 0.4 0.2 0 0

3

6

9

12

15

18

21

Cost factors…

System Failure probabilities

Cost factors

0.88

0.9 System Failure Probabilities

Cost factors 1

1 0.8 0.6 0.4

IterativeR

0.2

MajorR

0 0

3

6

9

2 15 18 21

Cost Factors (c) Top-K = 10%

A Novel Adaptive Fault Tolerance Algorithm Towards Robust … Component failure probability

629

0.6

System failure probabilities

0.4 1

0

0.09

0.12

0.12

3

0.01

0.21

0.35

0.35

5

0.02

0.31

0.51

0.51

7

0.03

0.42

0.62

0.62

9

0.04

0.52

0.7

0.7

0.2 0 1

3

5

7

9

Component failure probability…

System failure probabilities

0.8 Component failure probability

System failure probabilities

1

0

0.05

0.12

0.13

3

0.01

0.13

0.31

0.33

5

0.02

0.22

0.49

0.51

7

0.03

0.31

0.62

0.64

9

0.04

0.39

0.71

0.73

0.7 0.6 0.5 0.4 0.3 0.2

0.1 0 1

3 5 7 Component failure probability (b) Top-K = 5%

9

System failure probabilities

Component failure probability

AIIR

CFIR

RandomR

NoR

1

0

0.02

0.12

0.14

3

0.01

0.11

0.3

0.34

5

0.02

0.13

0.4

0.5

7

0.03

0.22

0.59

0.62

9

0.04

0.3

0.69

0.72

System failure probabilities

0.8 0.7 0.6

0.5 AIIAFT

0.4

AFT 0.3

RandomAFT

0.2

NoAFT

0.1 0 1

3

5

7

9

Component failure probability (%) (c) Top-K = 10%

5 Conclusion and Future Work In this paper, we proposed an algorithm known as Adaptive Fault Tolerance (AFT) algorithm which evaluates different functions or components in the system and assigns weights to them besides following a novel redundancy strategy. The algorithm exploits function calls and frequency of calls to arrive at finding importance of functions. This know-how helps in identifying functions that are vulnerable and the functions that are reliable. The AFT helps in achieving fault-tolerant distributed

630

L. Banothu et al.

component-based applications comprising of untrusted components. Experimental results showed that AFT outperforms many existing fault-tolerant approaches. Thus, the proposed algorithm helps in realizing robust and reliable distributed applications that reuse components and drive home businesses. However, there are several possibilities for improving our work. In the ranking procedure, it is possible to exploit failure exposure probability of components. The effect of failure propagation can also be investigated to improve ranking procedure.

References 1. Al-Jaroodi J, Mohamed N, Nuaimi KA (2012) IEEE 2012 second symposium on network cloud computing and applications (NCCA)—London, United Kingdom (2012.12.3–2012.12.4). In: 2012 second symposium on network cloud computing and applications—an efficient faulttolerant algorithm for distributed cloud services, pp 1–8 2. Ao B, Wang Y, Yu L, Brooks RR, Iyengar SS (2016) On precision bound of distributed faulttolerant sensor fusion algorithms. ACM Comput Surv 49(1):1–23 3. Botelho FA, Ramos FMV, Kreutz D, Bessani AN (2013) IEEE 2013 second European workshop on software defined networks (EWSDN)—Berlin, Germany (2013.10.10–2013.10.11). In: 2013 second European workshop on software defined networks—on the feasibility of a consistent and fault-tolerant data store for SDNs, pp 38–43 4. Zheng Z, Zhou TC, Lyu MR, King I (2012) Component ranking for fault-tolerant cloud applications. IEEE Trans Serv Comput 5(4):540–550 5. Qu Y, Xiong N (2012) IEEE 2012 41st international conference on parallel processing (ICPP)— Pittsburgh, PA, USA (2012.09.10–2012.09.13). In: 2012 41st international conference on parallel processing—RFH: a resilient, fault-tolerant and high-efficient replication algorithm for distributed cloud storage, pp 520–529 6. Xie G, Chen Y, Xiao X, Xu C, Li R, Li K (2017). Energy-efficient fault-tolerant scheduling of reliable parallel applications on heterogeneous distributed embedded systems. IEEE Trans Sustain Comput:1–16 7. Balasangameshwara J, Raju N (2012) A hybrid policy for fault tolerant load balancing in grid computing environments 35(1):412–422 8. Gao Z, Ding SX, Cecati C (2015) Real-time fault diagnosis and fault-tolerant control. IEEE Trans Industr Electron 62(6):3752–3756 9. Kwon Y-W, Tilevich E (2012) IEEE 2012 IEEE 32nd international conference on distributed computing systems (ICDCS)—Macau, China (2012.06.18–2012.06.21). In: 2012 IEEE 32nd international conference on distributed computing systems—energy-efficient and fault-tolerant distributed mobile execution, pp 586–595 10. Ramachandran GS, Wright K-L, Zheng L, Navaney P, Naveed M, Krishnamachari B, Dhaliwal J (2019) IEEE 2019 IEEE international conference on blockchain and cryptocurrency (ICBC)—Seoul, Korea (South) (2019.5.14–2019.5.17). In: 2019 IEEE international conference on blockchain and cryptocurrency (ICBC)—trinity: a byzantine fault-tolerant distributed publish-subscribe system with immutable blockchain-based persistence, pp 227–235 11. Gao Z, Cecati C, Ding SX (2015) A survey of fault diagnosis and fault-tolerant techniques, 2014; part I: fault diagnosis with model-based and signal-based approaches. IEEE Trans Industr Electron 62(6):3757–3767 12. Di Fatta G, Blasa F, Cafiero S, Fortino G (2013) Fault tolerant decentralised -Means clustering for asynchronous large-scale networks. J Parall Distrib Comput 73(3):317–329 13. Yadav R, Sidhu AS (2015) IEEE 2015 IEEE 3rd international conference on MOOCs, innovation and technology in education (MITE)—Amritsar, India (2015.10.1–2015.10.2). In:

A Novel Adaptive Fault Tolerance Algorithm Towards Robust …

631

2015 IEEE 3rd international conference on MOOCs, innovation and technology in education (MITE)—fault tolerant algorithm for replication management in distributed cloud system, pp 78–83 14. Kumari P, Kaur P (2020) Checkpointing algorithms for fault-tolerant execution of large-scale distributed applications in cloud. Wirel Personal Commun:1–25 15. Qiang W, Jiang C, Ran L, Zou D, Jin H (2015) CDMCR: multi-level fault-tolerant system for distributed applications in cloud. Secur Commun Netw:1–13 16. Brin S (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117

Virtual Machine Migration Framework with Configuration Change Management Y. Niranjan, M. V. Narayana, and M. Vijaya Sudha

Abstract Research on ways to reduce the cost of managing and relocating VMs is clearly needed in light of the current increase in data center utilization and virtual machine maintenance costs. Because energy is a major component of virtual machine administration, the most effective migration and management solutions must use the least amount of energy feasible. Researchers have extremely limited options for improving the energy efficiency of virtual machines since it is totally reliant on the number of apps running on the computer. There has been a lot of research done to find ways to reduce energy consumption as a second parameter. This research investigates the issue of energy consumption during virtual machine migration and provides a novel virtual machine migration approach that reduces energy consumption. In two upgrades, VM selection and VM migration, the novel technique has been shown to reduce energy consumption by over 47%. Keywords Migration · VM components · VMM · Migration technique · Energy efficient VM migration

1 Introduction Virtualization is at the heart of cloud computing’s ability to scale its infrastructure. One of the most common uses of virtualization in the cloud is creating several instances of the same resources for different customers or client applications [1]. For service providers, this means they may provide a variety of computing options on the same machine, all while meeting the needs of their customers. Service providers can address the need for scalable and application-dependent environments by using Y. Niranjan Useful Sensors Inc., 800 W EI Camino Real, Suite 180, Mountain View, CA 94040, USA M. V. Narayana (B) Guru Nanak Institutions Technical Campus, Hyderabad, Telangana, India e-mail: [email protected] M. Vijaya Sudha Ramachandra College of Engineering, Eluru, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_49

633

634

Y. Niranjan et al.

virtualization. In addition, the client may save money by not having to manage their own processing and storage infrastructure. In most cases, service providers employ virtual machines to segregate the real hardware from their computing environments. One of the most significant factors to consider when evaluating the viability of virtualization through virtual machines is how it may be used in both private and public cloud environments to meet the demands for various systems, maximize resources, and reduce installation costs. In private cloud environments, the client is more to host the infrastructure on their own premises. The same virtual machine may be used for a number of functions in both on-premises and off-premises hosting, depending on the needs of the customer. Hence, the use of virtual computers to accomplish virtualization saves money in both scenarios [2]. Users in a public cloud environment, on the other hand, have a variety of virtual machine configuration options at their disposal. For carriers, this means that the expense of maintaining distinct and customized hardware configurations is reduced. Virtualization is mostly used for load balancing purposes. The data center’s applications benefit from faster response times thanks to general load balancing strategies. Using cloud-based load balancing approaches, users may access globally distributed or geodetically distributed services, thanks to geographically dispersed server infrastructures. Cloud-based data centers are being used to illustrate the advantages of load balancing, which is sometimes referred to as “Cyber Spikes,” to handle unexpected traffic surges. Increasing dependability at the expense of VM migration is possible by making the application demand-based scalable without decreasing its performancebased. It is important to include energy usage while assessing the cost of virtual machine migration. A new approach for virtual machine migration that uses less energy is presented in this paper. The rest of the paper is organized as follows: Sect. 2 describes the benefits of virtualization; Sect. 3 describes the components used in the virtualizations mechanism; Sect. 4 presents the existing literature; Sect. 5 presents the proposed methodology; Sect. 6 results; and Sect. 7 conclusion.

2 Benefits of Virtualization Virtual machine migration and adaptation have been demonstrated in many recent studies to increase client application performance and reduce infrastructure expenses. Migration’s effect on performance and productivity are examined in this article. Allows the supplier, client, and researchers to access additional aspects of the system via a reduced abstraction. System-level codes, hardware usage statistics, application traces, and failure and down timing component settings may all be accessed and controlled individually in order to get a better understanding of the performance boundaries. The operating system and hardware components may be changed or altered effortlessly in virtual machines. Using a virtual environment, the provider, the client, and

Virtual Machine Migration Framework with Configuration Change …

635

the researchers may adjust the computing system without having to spend the whole amount of time required to replace or improve the computer system. The snapshot capability enables the provider, client, by replicating the virtual machines themselves. Because of this, restoring a previous computing environment from a backup is a rapid and painless process. Each provider has a somewhat different setup for its virtual machines, but they all have the same basic features. In order to prevent the lack of support and facilities, moving to virtual machine computing is the best option. Cloud-hosted applications are constantly at risk of being automatically and regularly updated by the service provider at no additional expense. Hosting a conventional system, on the other hand, comes at a cost and with a time commitment for future improvements. Modern industrial needs necessitate the use of a common testing and development environment. As a result, sharing a virtual machine image will make it possible to quickly and simply duplicate the same development and testing environment. The proximity of data and processing units has been shown to boost computing speed in recent studies. For enormous datasets that cannot be stored in a limited quantity of memory, moving computing closer to the data is a more efficient option. As a result, hosting virtual machines in the cloud with large data management tools is a very effective way to boost performance.

3 Virtualization Components and VMMS After thoroughly examining the advantages and disadvantages of virtualization, we now turn our attention to the hardware and software components that make up a virtualized environment. The unique framework will be better understood and proposed with the aid of these two components. It is possible to separate hardware components from software stacks using virtual machines, and this boosts productivity, as was stated earlier on in this piece. We know that the standard framework is on the physical hardware and that the virtual machine copies the hardware logically in the process of virtualization. The operating system is installed, and the programs are executed in the virtual machine (Fig. 1). There is a bare minimum of room for improvement here, as shown by the solutions offered by several firms and the results of independent searches. Consequently, we concentrate on the possibilities for development in Hypervisor design. Here, we learn about the architecture of Hypervisor technology so that we may make improvements to it. “Virtual machines, networking, file systems, virtual machine images, software stacks, replication, and other software controllers” are all under the Hypervisor’s control. To better understand how the Hypervisor works, we’ve broken it down into three primary layers: control network and interface (Fig. 2). This knowledge enables us to see that combining many physical servers has not yet achieved the pinnacle of performance and research. This comprehension, complex and diverse server manufacturers and settings have prevented the aggregation of

636

Y. Niranjan et al.

Fig. 1 Hardware components for virtualization

Fig. 2 Generic architecture for hypervisor

physical servers running virtual machines from being accomplished. Five and six are examples of this. As a result, we have come up with a brand-new way to manage storage and processing virtualization. The suggested framework’s specifics are addressed in further depth in the following sections [3–7].

Virtual Machine Migration Framework with Configuration Change …

637

4 Current State of Art Real-time and embedded systems now allow hardware virtualization. Commodity operating systems and their applications cannot run efficiently in parallel on a single platform without it. The ability to safely combine real-time and non-real-time applications thanks to spatial and temporal isolation is a major characteristic of these systems. Microkernels are the ideal solution for such demands since they offer the framework for the development of secure and real-time aware systems. Microkernels have evolved into micro Hypervisors as a result of learning to handle virtualization capabilities offered by the hardware in the last few years. A multi-core ARM platform running Linux and Free RTOS is shown in our demo, which is open-source and commercially supported [8, 9]. Hardware, operating systems, storage, processing power, memory, and network resources are all examples of virtualized resources. The use of virtualization in critical systems is increasing quickly, and virtualization infrastructures are becoming more common. Security flaws, on the other hand, have the potential to do significant damage. An example virtualization system’s security configuration has been investigated in this research, which focuses on the security concerns associated with virtualization, as well as solutions to mitigate these risks. Memory virtualization’s impact can still not be reduced to a minimal degree. The native TSP page walk speed can only be achieved by using a shadow page table (SPT) (conventional shadow paging). In order to change page tables quickly, nesting requires a lengthy two-dimensional page walk via the hardware MMU. Virtualization approaches based on hardware have been developed as part of this research. The highest CPU privilege levels are seen in architectures like Sunway and RISC-V. A programming interface that operates in hardware mode may be used to provide hardware support functions in software. This two-dimensional page walk-in hardware mode is called Nested Paging (SNP). For real-time page table synchronization, Swift Shadow Paging (SSP) is a hardware-intercepted TLB flushing method. It is possible to combine SSP and SNP to create ASP. By traversing two-dimensional page tables, ASP handles SPT page faults in hardware mode, minimizing Hypervisor involvement [10]. Cloud computing and virtualization have seen enormous growth and acceptance in the recent past. The concept of “the cloud” refers to the ability to access apps that are stored in a remote data center rather than on your computer. Many firms have already begun deploying these technologies in order to decrease expenses even more via better use of resources. With virtualization and cloud computing, customers may access programs on both the internet and their company’s intranet. Virtualized hardware is used extensively in many cloud computing applications. Therefore, we examined current virtual hardware services thoroughly to learn how they operate and choose the service that is most suited to cloud computing and virtualization in various contexts [11]. Virtualization approaches have been more popular in recent years. Virtualization is supposed to be invisible to non-privileged users; therefore, they should not

638

Y. Niranjan et al.

be able to tell whether a system is virtualized or not. Security concerns such as escaping dynamic malware analysis in virtual machines (VM) and leveraging vulnerabilities for cross-VM attacks might arise from this detection method. Because of the many traces and fingerprints that conventional software-based virtualization produces, it is possible to identify virtualization with little work. However, mainstream hardware-assisted virtualization improves virtualization transparency dramatically [12], making it more transparent and harder to detect. This study focuses on the issue of employing virtualization technology in computer operating system instruction because students must comprehend the relationships between computer systems, users, and the hardware platform. Virtualization technology, in particular, is meant to mimic the actual computer hardware environment and then execute the operating system in a virtual computer environment. Many operating systems, including Microsoft Windows, Linux, and Unix, are supported by virtual machine software because it is able to replicate actual computer hardware. This paper’s major goal is to use the VMware virtualization program to explain operating system concepts. Virtualized hardware for a given operating system is available from VMware [13]. In the wake of the Internet of Things (IoT) and cloud computing’s popularity, Fog computing, a new computing paradigm that encourages the processing of data close to its source, has emerged. Fog, a complementary technology to the cloud, offers several enticing advantages, such as low latency, cheap cost, high multitenancy, high scalability, and a consolidation of the IoT ecosystem. However, despite Fog’s widespread use, a thorough understanding has not yet been achieved. This study investigates object virtualization to overcome resource limits on sensory-level nodes and service virtualization to quickly construct customized apps for end-users in order to handle all of these challenges. To further enhance the network service provisioning flexibility, network function virtualization (NFV) is being researched. To show the implementation of a virtual fog along with the Internet of Things (IoT) continuum [14, 15], a layered structure that includes smart items, Fog, and cloud is provided. The ability to build safe cloud computing systems is dependent on the processor’s security capabilities. Hardware structures implementing the processor’s security features must be pre-silicon verified to confirm their functionality and to minimize or eliminate any final product design flaws. This study proposes a new, fully automated approach for creating and composing hardware security verification tests. In order to ensure the security of a cutting-edge CPU, this system was used as a vital component of the design verification process. A comparison is made between the proposed system’s test creation capabilities compared to the previously employed and entirely human method. GPUs are increasingly being used in High-Performance Clusters and Cloud configurations because of their ability to accelerate computationally heavy activities and graphics-related calculations. Cloud providers provide virtual machine instances with graphics processing units (GPUs). Virtualization-aware GPU hardware (NVIDIA vGPUs) has made it simpler and more cost-effective to allocate and share physical GPU resources across virtual machines. The vGPU scheduling method and a user-configurable vGPU profile decide the sharing mechanism and its scope.

Virtual Machine Migration Framework with Configuration Change …

639

A complete empirical investigation of hardware-assisted virtualized GPU systems is presented as part of this paper. We look at the influence of vGPU scheduling strategies and the interference consequences of running homogeneous and heterogeneous workloads at the same time [16]. We also show how different workload factors affect the optimum vGPU design options. Embedded and mobile computing systems are now able to benefit from virtualization methods, which have evolved rapidly in recent years. Virtualization promotes system security by allowing untrusted programs to function in segregated contexts while maximizing system usage. This offers a way to access hardware performance counters from a virtualized environment based on a microkernel [17, 18] to make this a reality.

5 Proposed Computing Virtualization Management Framework Hypervisor framework enhancement is needed after a thorough examination of virtual machine frameworks and implementations. In general, we’ve found that the issues listed below are present in the vast majority of popular Hypervisor programs. It is practically difficult to monitor and manage individual hardware from a single firmware since Hypervisors are constrained to certain hardware suppliers. Thus, we provide the Storage and Computing Virtualization Management Framework (Fig. 3). Here are some of the issues:

Fig. 3 Storage and computing virtualization and management system

640

• • • • •

Y. Niranjan et al.

Absence of comprehensive oversight Failure to maintain a system for backup and restore Control over cross-platform image and hardware replication Computing capacity monitoring and management is made easier Easier storage capacity monitoring and administration.

Here, we suggest a new framework to address all of the stated problems. An open-source Hypervisor implementation named Kernel-Based Virtual Machine (aka KVM) is used as the basis for the suggested framework of software and monitoring applications. The following is a breakdown of the framework’s many components: The physical layer is made up of several servers made by various manufacturers of actual physical hardware. The servers are available in a variety of configurations and maybe purchased from any vendor. On top of the real hardware, there is a layer of virtual machine implementation that is general and standard. The Hypervisor tool and the physical layer of the proposed implementation are used to gather virtual machine performance parameters, which are then sent to the virtual machine layers through a tiny software agent. The following layer will get the same information from the software agent. The monitoring and management Layer is the highest level of implementation, and it does exactly what its name implies. Multiple software agents make up the management and monitoring layer, which is detailed in full here: It is the job of the memory supervisor to ensure that the memory monitoring system is up to date at all times. Monitor the system based on metrics, such as the total amount of shared memory, the total amount of active memory, the amount of overhead memory, the total amount of swappable memory, the temperature of the memory units, and the total shared memory, and the amount of total shared memory. The system will be monitored by the storage supervisor based on factors like the storage container’s unique name, container size in GB, and GB of container use. Keeping the network monitoring system up to date is the responsibility of the network supervisor. For example, the storage supervisor will keep track of the Network Interface Card’s unique ID, the total amount of time it has been online and the amount of time it has been offline as well as the unique IP address and MAC address it has been allocated. When it comes to input/output management, the I/O manager is in charge of keeping the input/output or peripheral monitoring system up to date. System characteristics including Unique Device ID, Read or Write type, a number of reading operations and a number of write operations will be tracked by an input/output or peripheral supervisor.

It is the replication controller’s job to replicate delta changes or full replications across several physical servers in the layer underneath the virtual machine. The vast majority of the issues raised by the parallel studies have been resolved as a result of the project’s execution. Using the backup and replication controller, this study primarily proposes cross-image automation and shows that the results are adequate.

Virtual Machine Migration Framework with Configuration Change …

641

6 Results This work has performed extensive testing to demonstrate the improvement over the existing migration techniques [19–29]. The various considered migration techniques are listed with the used acronyms here (Table 1). The simulation of the algorithm is based on CloudSim, which is a framework for modeling and simulation of cloud computing infrastructures and services.

Table 1 List of techniques used for performance comparison

Used name in this work

Selection policy

Allocation policy

“IQR MC”

“Maximum correlation”

“Inter quartile range”

“IQR MMT”

“Minimum migration time”

“Inter quartile range”

“LR MC”

“Random selection”

“Local regression”

“LR MMT”

“Minimum migration time”

“Local regression”

“LR MU”

“Minimum utilization”

“Local regression”

“LR RS”

“Rom selection”

“Local regression”

“LRR MC”

“Maximum correlation”

“RobustLocal regression”

“LRR MMT”

“Minimum migration time”

“RobustLocal regression”

“LRR MU”

“Minimum utilization”

“RobustLocal regression”

“LRR RS”

“Rom selection”

“RobustLocal regression”

“MAD MC”

“Maximum correlation”

“Median absolute deviation”

“MAD MMT”

“Minimum migration time”

“Median absolute deviation”

“MAD MU”

“Minimum utilization”

“Median absolute deviation”

“MAD RS”

“Rom selection”

“Median absolute deviation”

“THR MC”

“Maximum correlation”

“Static threshold”

“THR MMT”

“Minimum migration time”

“Static threshold”

“THR MU”

“Minimum utilization”

“Static threshold”

“THR RS”

“Rom selection”

“Static threshold”

642

Y. Niranjan et al.

Table 2 Experimental setup

Table 3 Algorithms execution results

Setup parameters

Number of physical hosts

Number of virtual machines

Total simulation time (in s)

Values

950

1125

900,000.00

Algorithm

Every consumption (kWh)

Variation comparison

“LRR RS”

36.50

Deteriorate

“LR MC”

42.30

Deteriorate

“LRR MC”

40.43

Deteriorate

“LR MMT”

41.94

Enhanced

“LRR MMT”

43.66

Enhanced

LR MU

33.79

Enhanced

LRR MU

42.45

Enhanced

THR MC

33.75

Enhanced

THR RS

42.93

Enhanced

THR MMT

45.06

Enhanced

THR MU

40.16

Enhanced

MAD RS

34.05

Enhanced

MAD MC

40.35

Enhanced

MAD MMT

44.03

Enhanced

IQR MC

40.36

Enhanced

MAD MU

38.52

Enhanced

IQR RS

43.01

Enhanced

IQR MMT

39.48

Enhanced

IQR MU

38.62

Enhanced

The experimental setup used for this work is been explained here (Table 2).

Firstly, the work analyzes the energy consumption by the existing algorithms and the proposed algorithm (Table 3). The result is also been analyzed graphically (Fig. 4).

7 Conclusion Virtual machine migrations from the on-side implementation of the systems may provide several advantages, including precise management, decreased hardware limits, replication control, availability, frequent updates to cost control, collaborative

Virtual Machine Migration Framework with Configuration Change …

643

Fig. 4 Hardware components for virtualization

approach, and manageable data loads. The best virtual machine migration strategy, as shown in this study, may reduce energy consumption by 13%.

References 1. Rao J, Wei Y, Gong J, Xu C-Z (2013) Qos guarantees and service differentiation for dynamic cloud applications. IEEE Trans Netw Serv Manage 10(1):43–55 2. Li C, Raghunathan A, Jha NK (2010) Secure virtual machine execution under an untrusted management OS. In: Proceedings of international conference on cloud computing, pp 172–180, July 2010 3. Intel VT-D (2012). http://www.intel.com/technology/virtualization/technology.htm 4. BYTEmark (2012). http://www.tux.org/mayer/linux/byte/bdoc.pdf 5. Armbrust M, Fox A, Griffith R, Joseph AD, Katz RH, Konwinski A, Lee G, Patterson DA, Rabkin A, Stoica I, Zaharia M (2010) Above the clouds: a berkeley view of cloud computing. Technical report UCB/EECS-2009-28. http://www.eecs.berkeley.edu/Pubs/Tec hRpts/2009/EECS-2009-28.html 6. Deshane T, Shepherd Z, Matthews JN, Ben-Yehuda M, Shah A, Rao B (2010) Quantitative comparison of Xen and KVM. Xen summit Boston 2008. http://xen.org/xensummit/xensum mit_summer_2008.html 7. Dong Y, Yang X, Li X, Li J, Tian K, Guan H (2010) High performance network virtualization with SR-IOV. In: Proceedings of IEEE 16th international symposium high performance computer architecture (HPCA), pp 1–10 8. Lackorzynski A, Warg A (2016) Demo abstract: timing aware hardware virtualization on the L4Re microkernel systems. In: 2016 IEEE real-time and embedded technology and applications symposium (RTAS), Vienna, Austria, p 1 9. Arslan ˙I, Özbilgin ˙IG (2017) Virtualization and security: examination of a virtualization platform structure. In: 2017 international conference on computer science and engineering (UBMK), Antalya, Turkey, pp 221–226 10. Sha S, Zhang Y, Luo Y, Wang X, Wang Z. Accelerating address translation for virtualization by leveraging hardware mode. IEEE Trans Comput 11. Singh M (2018) Virtualization in cloud computing—a study. In: 2018 international conference on advances in computing, communication control and networking (ICACCCN), Greater Noida, India, pp 64–67 12. Zhang Z, Cheng Y, Gao Y, Nepal S, Liu D, Zou Y (2021) Detecting hardware-assisted virtualization with inconspicuous features. IEEE Trans Inf Forensics Secur 16:16–27

644

Y. Niranjan et al.

13. Yile F (2016) Utilizing the virtualization technology in computer operating system teaching. In: 2016 eighth international conference on measuring technology and mechatronics automation (ICMTMA), Macau, China, pp 885–888 14. Erulanova A, Yessenbekova G, Zhanysbayeva K, Tlebaldinova A, Zhantassova Z, Zhomartkyzy G (2020) Hardware and software support of technological processes virtualization. In: 2020 7th international conference on electrical and electronics engineering (ICEEE), Antalya, Turkey, pp 333–337 15. Li J, Jin J, Yuan D, Zhang H (2018) Virtual fog: a virtualization enabled fog computing framework for internet of things. IEEE Internet Things J 5(1):121–131 16. Kan S, Dworak J (2017) Systematic test generation for secure hardware supported virtualization. In: 2017 IEEE 15th international conference on dependable, autonomic and secure computing, 15th international conference on pervasive intelligence and computing, 3rd international conference on big data intelligence and computing and cyber science and technology congress (DASC/PiCom/DataCom/CyberSciTech), Orlando, FL, USA, pp 550–556 17. Garg A, Kulkarni P, Kurkure U, Sivaraman H, Vu L (2019) Empirical analysis of hardwareassisted GPU virtualization, 2019 IEEE 26th international conference on high performance computing, data, and analytics (HiPC), Hyderabad, India, pp 395–405 18. Mathew D, Jose BA, Mathew J, Patra P (2020) Enabling hardware performance counters for microkernel-based virtualization on embedded systems. IEEE Access 8:110550–110564 19. Padala P (2010) Automated management of virtualized data centers. University of Michigan 20. Patikirikorala T, Colman A, Han J, Wang L (2012) A systematic survey on the design of selfadaptive software systems using control engineering approaches. In: Proceedings of symposium software engineering adaptive self-management system, pp 33–42 21. Wang X, Wang Y (2011) Coordinating power control and performance management for virtualized server clusters. IEEE Trans Parallel Distrib Syst 22(2):245–259 22. Wang X, Chen M, Fu X (2010) MIMO power control for high-density servers in an enclosure. IEEE Trans Parallel Distrib Syst 21(10):1412–1426 23. Patikirikorala T, Colman A, Han J, Wang L (2011) A multi-model framework to implement selfmanaging control systems for QoS management. In: Proceedings of international symposium software engineering adaptive self-management system, pp 218–227 24. Liu X, Wang C, Zhou B, Chen J, Yang T, Zomaya A (2013) Priority-based consolidation of parallel workloads in the cloud. IEEE Trans Parallel Distrib Syst 24(9):1874–1883 25. Carrera D, Steinder M, Whalley I, Torres J, Ayguad E (2012) Autonomic placement of mixed batch and transactional workloads. IEEE Trans Parallel Distrib Syst 23(2):219–231 26. Lee Y, Zomaya A (2012) Energy efficient utilization of resources in cloud computing systems. J Supercomput 60(2):268–280 27. Ferreto T, Netto M, Calheiros R, De Rose C (2011) Server consolidation with migration control for virtualized data centers. Future Gener Comput Syst 27(8):1027–1034 28. Mills K, Filliben J, Dabrowski C (2011) Comparing VM-placement algorithms for on-demand clouds. In: Proceedings of IEEE 3rd international conference on cloud computing technology science, pp 91–98 29. Software support of technological processes virtualization, 2020 7th international conference on electrical and electronics engineering (ICEEE), Antalya, Turkey, p 333-3

Glioma Segmentation in MR Images Using 2D Double U-Net: An Empirical Investigation Julian L. Webber, R. S. Nancy Noella, Abolfazl Mehbodniya, V. Ramachandran, D. Stalin David, Rajasekar Rangasamy, and Sudhakar Sengan Abstract Glioma develops inside the brain. Automatic glioma segmentation reduces computational time and increases survival owing to previous treatment preparations—2D double U-Net segments glioblastoma from multi-modal MRI images (MRI). 2D brain models need fewer computational resources than 3D models. Residual U-Net (RU-Net) and dilated convolution U-Net make up the Double UNet design (RU-Net). RU-residual Net’s route contains three 33 convolution kernels. DU-Net contains three 33 and 55 convolution kernels with a two dilation rate. RUNet is more uncomplicated to optimise than deep networks and makes fewer training mistakes than deep networks. DU-Net gathers contextual information and creates J. L. Webber · A. Mehbodniya Department of Electronics and Communication Engineering, Kuwait College of Science And Technology (KCST), Doha, Kuwait e-mail: [email protected] A. Mehbodniya e-mail: [email protected] R. S. Nancy Noella Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu 600119, India V. Ramachandran Department of Computer Science and Engineering, GITAM University, Bengaluru, Karnataka 561203, India D. Stalin David Department of Information Technology, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamil Nadu 600072, India e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru, Karnataka 561203, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_50

645

646

J. L. Webber et al.

details. The suggested model was assessed using the BraTS 2017 training dataset and attained dice similarity coefficients of 0.91, 0.85, and 0.88 for total, enhanced, and tumour core (TC), respectively. Keywords Deep learning · Glioma · Tumour · Segmentation · U-Net · Residual · Dilation

1 Introduction Glioma is one of the worst brain tumours and has a 1–15-year survival rate. It originates from star-shaped astrocyte cells. The most malignant primary astrocytoma is glioblastoma. The World Health Organization (WHO) has classified gliomas based on their malignancy levels, from Grade I–IV. LGG contains Grades II and III, and HGG has Grade IV (Glioblastoma). Grade I glioma has a low proliferative process and is easily cured surgically. Glioblastoma has a lower mean survival rate of 15 months than other grades. The complete removal of the tumour is impossible due to its highly invasive properties. The most common brain imaging methods are CT and MRI. Deep neural networks (DNN) play a vital role in health care. The DNN-based imaging techniques help predict disorders and other issues in the internal organs. The advanced computeraided diagnosis has fewer errors and better results. DNN has a more significant number of hidden layers to get discriminative features. It has several applications in the healthcare sector; protein structure prediction, protein interaction prediction, human–computer interface, brain-body interface, etc. The model gets a better result when the non-brain image is removed. Preprocessing techniques remove unwanted noise due to magnetic fields and correct the bias field of an image. Attention-guided 2D U-Net, deep supervised 3D squeeze and excitation V-Net (DSSE-V-Net), and Convolutional Neural Network (CNN) architecture methodologies were implemented along with 4ITK bias field correction. It has been performed on each MRI sequence to correct the image, improving the prediction. The high variance of pixel intensity produces the worst performance for the model. So, the actual pixel intensity of images has been converted between two intensity scales by the intensity normalisation technique. It provides a zero mean and unit variance rescale of the data. A 3D volume of the brain image has been converted into 2D slices of the brain. Among 155 image slices, a few training image slices were removed due to zero pixel values in the image. Mean subtraction and normalisation have been performed well for medical image classification. The model’s performance was good by removing 1% of the top and bottom outliers [1]. A Deep Convolutional Neural Network (DCNN) has been classified as pneumonia from the chest X-ray dataset. The DCNN model contains three deep networks, where two networks are based on Transfer Learning (TL) from the CNN architecture, and one is a capsule network. The results, such as sensitivity and specificity, were high during the prediction stage. Stacked dilated convolution U-Net (SDU-Net)

Glioma Segmentation in MR Images Using 2D Double U-Net: …

647

has a larger receptive field and more layers. It produces good classification results compared to vanilla U-Net. The cumulative features from the high-level and middle-level features are fed into the SVM classifier and give an accuracy of 96.7%. The one-dimensional CNN has classified ECG signals with an accuracy of 95%. The ResUNet segment remotely sensed data with different configurations of the dice loss cost function. It had Pyramid scene parse pooling and the ISPRS Potsdam dataset [2]. SegNet architecture 16 improved the segmentation of indoor and road scene understanding. Both the encoder and the decoder of SegNet have 13 convolution layers. The segmentation of brain tumour images was improved by a deep residual dilation network with middle supervision (RDM-Net). It effectively eliminates the vanishing gradient problem and increases the receptive field that maintains spatial resolution. The spatial fusion block collects the region of a small tumour that propagates middle-level features into the final layer. The middle supervision block further reduces cumulative errors and speeds up the flow of information. The performance of the network has been strengthened by using the residual-dilated block. The U-Net has 23 convolution layers with 3 × 3 convolution. It was implemented to segment neuronal structures in the ISBI challenge of 2015 and won the challenge. The Conditional Random Field (CRF) followed by Connected Component Analysis (CCA) for the 3DFCNN model smooths the segmentation performance. The said post-processing step reduces the false-positive of the segmentation network. A 3D fully connected CRF method achieves good structural segmentation [3]. The organisation of the article is Sect. 1 is the introduction to the DNN-based imaging techniques that help predict disorders and other issues in the internal organs. Section 2 is related to background research works. Section 3 is the proposed DL model. Section 4 is the result and discussion. Section 5 is the conclusion and future work of the article.

2 Methods This section describes the methodology to segment WT and tumour sub-regions from the input MRI sequences. It consists of the data normalisation and data augmentation pre-processing techniques, a deep neural network, and the basic building blocks of Double U-Net [4].

2.1 Pre-processing To get the segmented output, it gets input MRI sequences, viz. Flair, T2, and T1 images. The pre-processing step consists of normalisation and augmentation. The dataset images have different pixel intensity values and a high standard deviation. The intensity normalisation gives a smaller range of values where the mean is close to

648

J. L. Webber et al.

zero, and the standard deviation is approximately 119. Other intensity normalisation techniques do not provide a better solution for medical imaging applications than Zscore normalisation. It gives the lower negative to smaller positive values and better handling of outliers. Equation (1) describes the normalised image, where ‘x i ’ is the sample image at the ith pixel, μx is the mean of the sample image, and ‘σ x ’ is the standard deviation of the sample image [5]. xi ˆ =

xi − μx σx

(1)

2.2 Deep Neural Networks Network 1 has RU-Net, and Network 2 has DU-Net. RU-Net and DU-Net contain encoder and decoder paths similar to the original U-Net architecture. RU-Net gets an input MR image with a dimension of 5 × 240 × 240 × 2, where the batch size is 5, the height and width of an image are [240, 240], and the number of multi-modal MR images is 2. Next, DU-Net takes the input MR image and the output of RUNet and adds them pixel by pixel [6]. The additional 64 convolutional kernels to the input MR image make it compatible with network 2. The predicted mask from the RU-Net collects local features, and the DU-Net receives more global context because of the larger receptive fields from dilated convolution. Finally, the addition of Network 1 and Network 2 gives a predicted tumour. The proposed methodology has two basic functional blocks; the Convolved Residual (CR) block and the dilation block (D block). Network 1, RU-Net, is built by nine CR blocks in the encoder and decoder paths. The encoder path has a down-sample with strides two followed by a CR block, and the decoder path has a CR block followed by an up-sample with strides two and concatenation layers. Network 2, DU-Net, is built by nine D blocks in the encoder and decoder paths. The encoder path has a down-sample with strides two followed by a D block, and the decoder path has a D block followed by an up-sample with strides two and catenation layers. The encoder path of RU-Net and DU-Net minimises the spatial dimension in half and increases the channels by multiples of two [7]. The decoder paths of RU-Net and DU-Net multiply the spatial dimensions by two and decrease the channels by half. The RU-Net accepts the input MR image in the dimension of 5 × 240 × 240 × 2. The concatenation of RU-Net output and input MR image is the input to the DU-Net, which has the dimension of 5 × 240 × 240 × 64. The addition of RU-Net output and DU-Net output has the dimension of 5 × 240 × 240 × 64. The final convolution layer has a sigmoid activation function which produces the predicted output in the dimension of 5 × 240 × 240 × 1. The standard convolution has a dilation rate of α = 1.

Glioma Segmentation in MR Images Using 2D Double U-Net: …

2.2.1

649

CR Block

The CR block has skip-1 and skip-2 layer connections. The modified residual block has three convolution layers, not in the usual way. Input or previous output feeds into the 3 × 3 convolution kernel, which generates level 1 feature maps, and the same feature maps feed into the second 3 × 3 convolution kernel, which produces level 2 feature maps. The combined feature maps from level 1 and level 2 give rise to the third 3 × 3 convolution kernel that generates level 3 feature maps. The combination of level 1, level 2, and level 3 feature maps generates distinct feature maps on encoder and decoder paths. The steps to calculate the feature maps of the CR block are given in Eqs. (2) and (3). R(X ) = G ' (X ) + F ' (X ) + F(X )

(2)

(x) is 3 × 3 convolution on G(x). The convolution layer produces the same dimension as the input image due to the same padding method [8]. The intermediate feature map G(x) is derived as follows. G(X ) = F ' (X ) + F(X )

(3)

where F' (x) is 3 × 3 convolutions on F(x), and F(x) is 3 × 3 convolution on a given input ‘x’. The residual blocks eliminate the deep network’s vanishing and exploding gradient problems [9].

2.2.2

D Block

Dilated convolution yields larger receptive fields and retains the original spatial resolution. The D block has two parallel paths of stacked convolutional layers; the first and second paths have three sequences of 3 × 3 and 5 × 5 convolution kernels. The output of the D block is the addition of two parallel paths [10]. The cumulative feature maps of the D block are given in Eq. (4). D(X ) = Y '' (X ) + Z '' (X )

(4)

where Y'' (x) is a 33 convolution on Y' (x), and Y' (x) is a 33 convolution on Y (x), and Y (x) is a 33 convolution on Y (x), and Y (x) is a 33 convolution on a given input image x, and Z'' (x) is a 55 convolution on Z' (x), and Z(x) is a 55 convolution on Z' (x), In the first path, each element has a receptive field of 7 × 7, and in the second path, each element has a receptive field of 11 × 11. The dilation block generates feature maps without resolution loss. The receptive field is directly proportional to the convolution operation’s dilation rate (α). The proposed method has adopted a dilation rate α = 2 to get the best fit model [10].

650

J. L. Webber et al.

3 Implementation 3.1 Data Preparation The proposed methodology uses the Brain Tumour Segmentation (BraTS) challenge 2017 training dataset. It contains 285 subjects where 210 HGG and 75 LGG volumes are available. LGG and HGG are pre-operative multi-institutional MRI scan images in the dataset. The images from the dataset are skull-stripped and interpolated to a resolution of 1 mm3 . Each volume in the dataset has the following modalities: T1, T2, T1, and Flair. T1-weighted and T2-weighted MR images are distinguished by Cerebrospinal Fluid (CSF), or other fluid present in the brain [11]. T1 and T2 MRI sequences have dark and white CSF. The Flair MRI modality suppresses the CSF signal and is highly sensitive to white matter abnormalities. The dataset has labelled data where necrotic and non-enhancing tumours (NCR/NET) are represented by label ‘1’, peritumoural oedema (ED) is indicated by label ‘2’, ET is represented by label ‘4’, and label ‘0’ for others. Various glioma sub-regions combine the specific labels as follows: WT = [1, 2, 4] labels; ET = [4] labels; CT = [1, 4] labels.

3.2 Model Setup The proposed model has the following parameters for configuration; DSC as a metric, Dice loss as a cost function, and Adam as an optimiser function. It is the best metric for the high volume of background pixels rather than foreground pixels. It is the general metric to evaluate the segmentation results in the MICCAI challenges. It has a value from 0 (no match) to 1 (perfect match). Equation (5) describes the computation of soft DSC, where “Xi” is the ground truth sample and “Yi” is the predicted sample. Dice loss is good for dealing with imbalanced datasets and provides a differentiable value. The differentiable dice loss is represented by 1-DSC. So, the derived dice loss from the DSC makes the best fit model. The dice loss measures the dissimilarity between ground truth samples and predicted samples. DSC =

2(X i ∩ Yi ) X i + Yi

(5)

The Adam algorithm optimises the gradients of the parameters (weights and biases). It is well suited for a model with many parameters and sparse data. The combination of momentum and Root Mean Square Propagation (RMSP) methodologies gives the Adam optimiser. This property achieves global minima faster and occupies less memory. In the Adam optimiser, the initial learning rate (η) is set at 3e − 06, the decay rate of first momentum ‘β 1 ’ is 0.9, the decay rate of second momentum ‘β 2 ’ is 0.999, and ‘ξ ’ is chosen as 1e − 08 to avoid divide by zero issues. The ReLu non-linear activation function has removed vanishing gradient

Glioma Segmentation in MR Images Using 2D Double U-Net: …

651

problems and produced a better non-linear model. Equations (6) and (7) describe biased 1st momentum ‘pt ’ and 2nd momentum ‘qt ’ update rules. ‘gt ’ is a gradient for parameter ‘w’ at a time ‘t’. pt = β1 pt−1 + (1 − β1 )gt

(6)

qt = β2 qt−1 + (1 − β2 )gt2

(7)

With Eqs. (8) and (9), you can find the bias-corrected 1st momentum and the bias-corrected 2nd momentum. The parameter update rule of the Adam optimiser has given in Eq. (10). Where wt − 1 is the old parameter vector. pˆ t =

pt 1 − β1t

(8)

qˆt =

qt 1 − β2t

(9)

pˆ t wt = wt−1 − η √ qˆt + ξ

(10)

3.3 Training Model Flair and T2 MRI sequences are used in the proposed model to segment WT. The Flair and T1CE MRI sequences segment the tumour sub-region ET. T1 and T2 MRI sequences are used to segment tumour sub-region CT. The proposed model segments WT, ET, and TC individually for HGG, LGG, and combined grades. In the dataset, the proposed model has selected 62 volumes, where 50 HGG volumes and 12 LGG volumes are for combined grades. The segmentation of both HGG grade and LGG grade has taken 50 volumes. Each MR image has a dimension of 240 × 240 × 155, where [240, 240] represents the 2D shape of the MR image, and 155 slices are available for each subject. The proposed model takes image slices from 50 to 110. The model is to partition the data randomly in the ratio of 80:20. After the random partition, the combined grade category has 50 images for the training set and 12 images for the validation set, and the HGG grade and LGG grade categories have 40 images for the training set and 10 images for the validation set. The partitioned data has been fed into Z-score normalisation to get normalised data.

652

J. L. Webber et al.

4 Results and Discussion The 3DFCNN model has been segmented into brain tumours and their sub-regions such as ET, Non-Enhancing Tumour (NET), necrosis (NER), and oedema (ED). It adopts the Brain Tumour Segmentation (BraTS) 2020 challenge dataset, and each volume is normalised to get smaller-scale data. The small patch of 64 × 64 × 64 reduces the class imbalance issue. The model has two paths; the encoder path and the decoder path. The encoder path has a dense block (4 convolution layers) followed by a transition-down block. The decoder path has a transition-down block followed by a dense block. The architecture has 77 layers, and the densely connected convolution layer increases the number of parameters. The threshold-based selection technique fine-tunes the complex examples. The model was trained by all MRI sequences and achieved evaluation results of DSC of 0.85, 0.77, and 0.81 for WT, ET, and TC, respectively. The loss of spatial information and the weak ability of multi-scale lesion processing issues are focused on by the AFPNet model. The 3D atrous convolution with a single stride solves spatial information loss, and the Atrousconvolution Feature Pyramid (AFP) solves the second issue. The backbone of the AFPNet model is 3D atrous convolution, which replaces pooling with strides 2. The AFP module collects multi-scale features. This model has an input shape of 47 × 47 × 47 and produces a receptive field of 34 × 34 × 34. The model was trained by T1, T2, and Flair MRI sequences. AFPNet utilised the BRATS 2013 dataset for evaluation, and the CRF method was preferred as a post-processing technique for good performance. The model has given DSC of 0.86, 0.68, and 0.73 for WT, ET, and TC. The integration of low-level features and high-level features brings confusion to the model. The attention-guided 2DU-Net model eliminates the abovementioned issues and collects 3D contextual information. The modified U-Net has a convolution layer with stride two instead of the max pool layer. The Squeeze and Excitation (SE) block following the concatenation layer reduces the confusion about the model and collects multi-level features. Over-fitting issues are eliminated by flipping data augmentation techniques. The model was trained by five-fold cross-validation and Stochastic Gradient Descent (SGD). All MRI modalities were given as input to the model. The model was evaluated by the BRATS 2018 dataset, and it achieved a DSC of 0.895, 0.813, and 0.823 for WT, ET, and TC, respectively. The DL network has a bottom residual block consisting of three convolution layers with the shape of 1 × 1 × 1, 3 × 3 × 3, and 1 × 1 × 1 kernels. DSSE-V-Net was constructed using Exponential Linear Unit (ELU) activation and batch normalisation. The SE block enhances the most appropriate feature maps by adjusting the weights of the respective feature maps. The original MR image had been randomly cropped into a 128 × 128 × 128 size. The cropped data and 16 × 16 × 16 strides were utilised to predict the segmentation output. The model produced DSC metrics of 0.89, 0.70, and 0.79 for WT, ET, and TC, respectively. Multi-Threshold Attention U-Net (MTAU) architecture has three separate models for each region with different threshold values. The four MRI sequences are fed into all models to segment WT, ET, and TC separately. The BraTS 2020 training data was utilised for evaluation, and each volume of the

Glioma Segmentation in MR Images Using 2D Double U-Net: …

653

dataset comprises 155 slices of 2D grayscale images. The model was divided into 4000, 5000, and 5899 for training, validation, and testing. The MTAU model has 3.12 million parameters, and binary cross-entropy loss is used for backpropagation. The model produced DSC metrics of 0.72, 0.59, and 0.61 for WT, ET, and TC, respectively. The proposed model has been evaluated separately for HGG, LGG, and combined grades. The better segmentation results for WT, ET, and TC have been achieved by the HGG grade than the LGG and combined grades. The HGG grade yields DSC of 0.9, 0.77, and 0.68 for WT, ET, and TC. The segmentation of LGG gives DSC metrics of 0.87, 0.71, and 0.62 for WT, ET, and TC. The combined grade yields DSC metrics of 0.89, 0.75, and 0.64 for WT, ET, and TC. The box plot of tumour regions WT, ET, and TC for HGG, LGG, and combined grades is shown in Fig. 1. The maximum and minimum values of the mean DSC are indicated by the top and bottom side dot marks for all grades. During backpropagation, the gradient from dice loss adjusts the weights and biases of the model to produce a fit model. A small amount of existing literature has utilised data augmentation, yielding better results. Figure 2 shows the Ground Truth (GT) and Predicted Output (PO) of HGG, LGG, and the combined grades of one sample subject. Row I contains a WT sub-region, Row II contains an ET sub-region, Row III contains an NCR/NET sub-region, and Row IV contains an ED sub-region for HGG, LGG, and Combined grades. The best model for ET and TC is Attention 2DU-Net. Fig. 1 ROC curves for BC of HGG and LGG tumours SVM

Fig. 2 ROC curves for BC HGG and LGG tumours KNN

654

J. L. Webber et al.

5 Conclusion and Future Work The proposed model has 85 convolution layers and eight de-convolution layers. Stacking two U-Nets yields more training parameters, with 222.72 M parameters. It takes advantage of vanishing gradient issues from the residual block and a larger receptive field from the dilation block. The dilation block acquires more information from 3 × 3 and 5 × 5 convolution kernels. The performance of the model increases by propagating information faster with skip connections. The computational time and memory requirements are less due to pixel-wise addition instead of concatenation. It provides features from low-to-high levels to segment tumours effectively. The segmentation results of tumour sub-regions ET and TC are low compared to WT, and they will improve by changing the methodologies.

References 1. Shi H, Niu L, Sun J, Zhang X (2021) Research and implementation of material distribution method based on intelligent perception network. In: IEEE international conference on advances in electrical engineering and computer applications, pp 351–357 2. Somasundaram S, Gobinath R (2019) Current trends on deep learning models for brain tumor segmentation and detection—a review. In: International conference on machine learning, big data, cloud and parallel computing, pp 217–221 3. Wu H, Zou T, Burke H, King S, Burke B (2021) A novel approach for porcupine rab identification and processing based on point cloud segmentation. In: 20th international conference on advanced robotics, pp 1101–1108 4. Xu MF, Xie ZY, Liu ZH (2021) A Em-Plant-based AGV material distribution route optimization simulation scheme. Ordnance Ind Autom 40(1):74–78 5. Yuan L, Wang Y, Thompson PM, Narayan VA, Ye J (2012) Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. Neuroimage 61(3):622–632 6. Sudhakar S, Chenthur Pandian S (2012) Secure packet encryption and key exchange system in mobile ad hoc network. J Comput Sci 8(6):908–912 7. Sudhakar S, Chenthur Pandian S (2016) Hybrid cluster-based geographical routing protocol to mitigate malicious nodes in mobile ad hoc network. Int J Ad Hoc Ubiquitous Comput 21(4):224–236 8. Priyadarshni AU, Sudhakar S (2015) Cluster based certificate revocation by cluster head in mobile ad-hoc network. Int J Appl Eng Res 10(20):16014–16018 9. Sudhakar S, Chenthur Pandian S (2015) Investigation of attribute aided data aggregation over dynamic routing in wireless sensor. J Eng Sci Technol 10(11):1465–1476 10. Sudhakar S, Chenthur Pandian S (2013) Trustworthy position based routing to mitigate against the malicious attacks to signifies secured data packet using geographic routing protocol in MANET. WSEAS Trans Commun 12(11):584–603 11. Sudhakar S, Chenthur Pandian S (2013) A Trust and co-operative nodes with affects of malicious attacks and measure the performance degradation on geographic aided routing in mobile ad hoc network. Life Sci J 10(4s):158–163

A Stochastic Weighted Model for Task Scheduling and Resource Utilization in the Cloud Rajkumar Kalimuthu and Brindha Thomas

Abstract This work models an approach to enhance task scheduling performance and reduce unnecessary task allocation. Initially, a stochastic model is proposed for categorizing the job classifier based on the available scheduling data. The number of VMs is created dynamically, and the time is preserved during the scheduling process. Secondly, the weighted task is matched with the VMs dynamically and performs the essential process. The experimental outcomes demonstrate that the anticipated model stochastic scheduling approach (SSA) handles the scheduling process and effectively attains load balancing. Generally, task scheduling is a challenging factor when considering the cost and deadline. VM plays a major role during the scheduling process. To fulfill the deadline, metrics like execution time, waiting time, and VM creation time are needed. Here, a novel scheduling framework is proposed to handle all the shortcomings and reduces the VM waiting time. The performance is compared with other approaches and shows substantial outcomes. Keywords Cloud computing · Task scheduling · Resource utilization · VM migration · Stochastic modeling

1 Introduction Cloud computing (CC) provides computing as a service which is a well-known virtualization technology that provides huge processing capability and flexibility [1]. Amazon EC2 and Google Engine are the many commercialized CC technologies available. The novel hardware and computer are known as “Cloud of Clouds” [5] and “datacenter clouds (CDC)” [6, 7] are outcomes of integrated CC manage R. Kalimuthu (B) Department of Computer Science and Engineering, Noorul Islam Centre for Higher Education, Kanniyakumari, India e-mail: [email protected] B. Thomas Department of Information Technology, Noorul Islam Centre for Higher Education, Kanniyakumari, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_51

655

656

R. Kalimuthu and B. Thomas

dispersed CDC performance and achieve excellent QoS. Large-scale applications across dispersed contemporary datacenters exemplify the “Cloud of Clouds” concept. The study of experimental physics is an amazing illustration of data-intensive analysis (HEP). The four primary detectors generate 13 petabytes of data [8]. This data is housed on the International computational grid, which has 140 sites in 34 countries. The European Organisation for Nuclear Research is home to Tier 0, the Grid’s primary location for data processing, and it was the first to pass reconstructing (CERN). With this, a second data copy is transferred to 11 Tier 1 storage location, reconstruction, and pre-determined analysis. As the corporate internet provides large processing resources, software designers may profit from immense parallelism. Task scheduling determines where, when, and how a computational work may be completed once submitted to a cloud by a client. In contrast, cloud computing has a significant level of information variability, including varying computational power, processor placement, processor energy consumption, task waiting time, work duration, communication costs, and other concerns. Security and dependability are of utmost importance [9, 10]. Creating a high standard and performance scheduler now involves considerable technological difficulties. To measure a scheduler’s effectiveness, you need a QoS metric. Organizations perform computation jobs among heterogeneous devices in cloud computing to optimize QoS. This working model is a novel stochastic model for task scheduling analysis with the available resources. Here, metrics like QoS analysis and waiting time enhancements are attained and compared with the greedy model. The work is structured as: Sect. 2 provides the wider analysis of various prevailing approaches. Section 3 analyzes the methodology for task scheduling based on available resources. The numerical discussion gives the research outcome in Sect. 4 and the conclusion in Sect. 5.

2 Related Works A collection of network-enabled services known as the cloud [11, 12] offers scalable, QoS-guaranteed, typically customized, affordable computing environment available on demand [13] and is accessed easily and widely. InterCloud leverages a technical network built on various clouds for big dispersed applications. The “Cloud of Clouds” model’s standard embodiment, the multi-data CDC architecture [14], creates a global infrastructure spanning dispersed large CDC, storage or clusters for inter-cloud workloads. Various studies on the “Cloud of Clouds” paradigm include inter-cloud computing principles [26], protection & storage services, and programming paradigm & software architecture. In prior research, the investigators developed Hadoop model which is a showcased for MapReduce wide range of methods dispersed CDC and clusters, to realize the “Cloud of Clouds” concept. Typically, when we talk about “CDC,” we talk about the hardware and software infrastructures that enable all-purpose, high-performance computation [15]. Unlike large-scale computer complexes, a CDC comprises dispersed datacenters from across

A Stochastic Weighted Model for Task Scheduling and Resource …

657

the globe. CDC may often connect using a high-speed network interface. A single computing work may now be carried out in parallel on several computers thanks to this distributed architecture, greatly increasing efficiency. CDC gives on-demand access to inexpensive, high-capacity computing and storage (such drives) without a significant up-front cost. Because it enables process control algorithms to operate at the scale necessary for addressing unknown storage capacity, variation, and rate, it is well suited for processing large data, particularly streaming data. But to support a complex, dynamically changeable big data environment, we must develop innovative services and methods for coordinating cloud services’ choice, distribution, monitoring, and QoS management.

3 Methodology It is expected that a global coordinator exists that distributes workloads from the Cloud Job Queue to various CDC. Inbound operations from different VOs were split up across many cloud datacenters using global scheduling to achieve the following goals: (1) Reduce the QoS advantage A (P) and the median time spent traveling across all VOs, E(S). (2) According to the job scheduled and public cloud, the scheduling function is formulated as having: f : (V O, site) → P, f ∈ F

(1)

where F is a collection of every practical scheduling function defines the research problem: To discover a scheduling function that provides the minimum E(S) and minimum A(P).

3.1 Stochastic Model Cross entropy optimization was introduced, using significance sampling to optimize stochastically. It transforms an integer programming challenge into a stochastic one that can be resolved to approximate the best answers. This strong optimization paradigm has been effectively used for the optimization problem. To be thorough, some of the information about this approach is offered and expanded upon here. To minimize min f (x) in linear static D, cross permeability changes it to a x∈D

stochastic optimal control problem. It employs PDFs g(x, p) depicted in space D to simulate the distributions of minimization problem solutions. Given a collection of randomly selected X = X 1 , X 2 , ..., X n One may define δ(a) as:

658

R. Kalimuthu and B. Thomas

δ(a) = P[ f (x) ≤ a]

(2)

Here, a refers the parameter. Create the indicator function I (•) with the property f (x) ≤ a when if and only if I f (x)≤a = 1. Because of this, p[ f (X ) ≤ a] = E[I f (X ) ≤ a], where E stands for expectancy. The greatest a that causes δ(a) to approach zero provides a close to the ideal solution to the minimization problem min f (x). The fundamental concept behind changing a deterministic minimization x∈D

issue into a stochastic combinatorial optimization problem is as follows. However, it becomes difficult to determine the value of an as it gets closer to zero. Using the simple Monte Carlo simulations-based method will need a lot of samples, which is costly operationally. In other words, g(x, u) may be used to produce a collection of samples, and an impartial estimation is: δ(a) =

n 1 I f (X i ) ≤ a n i=1

(3)

A lot more samples are required because, as the solution gets closer to the ideal one, δ(a) will become smaller and smaller. In other respects, f (X ) < a turns into an uncommon occurrence. The SSA employs significance sampling to address this technological challenge. The important sampling in the weight vector approach employs a different probability distribution function, k(x, p), which is likewise defined on D, as opposed to someone using g(x, p). As a result, δ(a) may be approximated by n g(X i ) 1 I f (X i ) ≤ a n i=1 k(X i)

(4)

n g(X i ) 1 I f (X i ) ≤ a = δ(a) n i=1 k(X i )

(5)



δ (a) = ∧

δ (a) =

The technological challenge is that it is impossible to calculate k directly. As a result, the total variation approach employs a PDF that closely resembles k ∗ (x). This PDF reduces the cross entropy between k(x) and g(x, v). d(k, g) = E g ln

k(X ) = g(X )



arg max v

 k(x) ln k(x)dx −



k(x) ln g(x)dx

k ∗ ln g(x, )V D X

arg max E u I f ≤a ln g(X, v) v

Cross entropy employs important samples with an additional parameter, w.

(6) (7)

A Stochastic Weighted Model for Task Scheduling and Resource …

arg max E u I f (X )≤a v

f (x, u) ln g(X, u) f (X, w)

659

(8)

Therefore, the response to the first minimization issue may be expressed as the samples X being produced by g(x, w). ∧

v ∗ arg max v

n 1 f (x, u) I f (X )≤a ln g(X i , u)u n i=1 f (X, w)

(9)

3.2 Weighted Scheduling Algorithm This paper provides a stochastic scheduling (SSA) approach to maximize QoS and wait time. As the probability density function (PDF) is updated during optimization, the SSA approach advances the solution repeatedly. During each cycle, examples are produced using this PDF to show the potential work assignments. Here, Gaussian distribution is considered as probability distribution (PDF) function used to address the scheduling issue. Please take notice that this example is a hypothetical work assignment. In each cycle, PDFs are modified by elite sampling, which are work assignments with high QoS and waiting time. The suggested technique initially initializes the array for each distributed cloud. Every CDC has the PDF over its selection index (S I ), the variable representing our desire to choose that particular datacenter. A higher S I indicates a greater likelihood that the matching cloud datacenter will be chosen. The PDF array is carried over from the previous iteration if it is not the initial repetition. Each PDF has the same variability. Then, n samples are produced by the PDF array. For instance, a PDF-based selection score is created for every CDC. The datacenter with the highest selection score is chosen for the sample. For instance, CDC 1 gets the highest score of 0.6. Hence, it is chosen for that sample. In the same manner, example 2 chooses cloud datacenter 2. The cloud infrastructure with the highest mean on its PDF is chosen in the scenario when many clouds compute clusters have the same highest selection scores. The PDF’s biggest mean has the greatest performance statistically. Each sample is assessed based on QoS and waiting time once n has been created. The elite samples consist of the k samples with finest QoS and waiting time. For top cloud samples, the mean PDF is higher. Every PDF’s variance is reduced. Assume sampling 1 and 2 are exceptional. Given that CDC 1 and 2 were chosen for the two specimens, the PDFs have greater means now. It increases the probability of generating CDC with improved QoS and sojourn time as the SSA gets closer to convergence. Convergence occurs if the algorithm λ iterates or every samples choose the same CDC.

660

R. Kalimuthu and B. Thomas

A criterion is satisfied. After confluence, the work shows fines QoS and waiting time is sent to the cloud datacenter. Simple SSA execution might experience cloud datacenter overcrowding problems. The algorithm tends to place every work in the CDC with the greatest QoS, which is the cause. As a result, the top CDC is overrun. The load-balance-guided PDF correction is suggested as a solution to the problem. In other words, it is done on purpose to make the mean PDF value for the chosen CDC higher. The likelihood that a CDC is chosen again is lowered, and the loads are dispersed more equally across the CDC (Fig. 1). Fig. 1 Flow diagram

A Stochastic Weighted Model for Task Scheduling and Resource …

661

4 Experimental Analysis The suggested SSA-based QoS-aware task scheduling method is evaluated on a computer with a 2.8 GHz Intel Core i5 CPU, 4 GB of RAM, and a 64-bit operating system. It is simulated in MATLAB 2020a. We build a collection of 500 synthetic test cases with 1000 VOs and 50 CDC since real-world dispersed storage arrays are not accessible. Our stochastic approach is better than the genetic algorithms. The default approach will always greedily allocate incoming workloads to the cloud data center that offers the highest QoS and the shortest sojourn time. The QoS value weighs dependability and security. Both methods are assessed on each test case using the following indicators. Table 1 summarizes the optimization technique for SSA. We have made the following observations. • Our SSA delivers superior cumulative QoS on every test scenario compared to the average baseline models. Statistics show that the QoS has increased by 56% compared to the baseline algorithm. The rationale is that every job’s planning concerning QoS and sojourn time is optimized by our method (see Fig. 2). • The proposed SSA preserves 25% of waiting time (see Fig. 3) compared to the greedy algorithm. The waiting time is reduced by 9.2% on average. The existing approach often assigns all tasks to the location with the greatest QoS. It seems to overburden the CDC, lengthening sojourn time. In contrast, our algorithm’s PDF performance optimization reduces the pressure on CDC. It reduces the likelihood that a certain CDC would be often chosen, ensuring that tasks are spread equally throughout the CDC. As a result, the cumulative sojourn time is cut down. • The suggested method works extremely well. The average response time for all test scenarios is 864.55 s. It seems that the runtime average just linearly with varied-sized test cases. • However, the greedy computation processing period and the suggested algorithms are relatively similar, the proposed algorithm’s QoS outweighs the greedy one.

5 Conclusion In recent years, the cloud computing model has developed as a viable cloud computing with enormous processing capacity and flexibility. However, it confronts obstacles, including system design with variability and optimal scheduling concerns. The SSA model suggests a stochastic workload scheduling mechanism for the CC that considers dependability, security, and responsiveness. A stochastic model-based QoS-based task scheduling approach is created to calculate scheduling solutions that increase QoS metric. The results on 500 test cases show that the suggested technique beats the existing one by 56% for QoS and 25% for the waiting period.

252,918

30

201–400

47,995

409,186

630,208

293,190

40

50

60

401–600

601–1000

Avg.

125,918

10

20

6749

90,869

256,374

119,859

55,385

15,986

11.10

25.80

14.85

9.15

4.33

1.39

446,975

983,725

628,856

378,108

180,125

6408

QoS

Running time (ms)

QoS

Waiting time (ms)

SSA model

Greedy model

50–100

Total cloud

101–200

Total test cases

Table 1 Comparison analysis based on QoS, waiting, and runtime

88,475

255,290

115,268

52,565

14,215

5038

Waiting time (ms)

865

2350

1190

568

179

30

Running time (ms)

47

56

53

49

43

33

QoS (%)

9.07

0.1

3.8

5.1

11

25

Waiting time (%)

Enhancement %

662 R. Kalimuthu and B. Thomas

A Stochastic Weighted Model for Task Scheduling and Resource …

663

Fig. 2 QoS attainment

Fig. 3 Waiting time analysis

References 1. Deng RL, Luan TH, Liang H, Lai C (2016) Optimal workload allocation in fog-cloud computing toward balanced delay and power consumption. IEEE Internet Things J 3(6):1171–1181 2. Liu ZC, Mao S, Ristaniemi T, Guo X (2018) “Multi-objective optimization for computation offloading in fog computing. IEEE Internet Things J 5(1):283–294 3. Wu MD, Ota K, Li J, Yang W, Wang M (2019) Fog-computing-enabled cognitive network function virtualization for an information-centric future Internet. IEEE Commun Mag 57(7):48–54 4. Guan YZ, Zhu L, Wu L, Yu S (2019) Effect: an efficient flexible privacy-preserving data aggregation scheme with authentication in smart grid. Sci China-Inf Sci 62(3):1–14 5. Meng WW, Zhang Z (2017) Delay-constrained hybrid computation offloading with cloud and fog computing. IEEE Access 5:21355–21367

664

R. Kalimuthu and B. Thomas

6. Liu ZC, Ristaniemi T, Guo X (2017) Multi-objective optimization for computation offloading in mobile-edge computing. In: Proceedings of the IEEE symposium on computer and communication, Jul 2017, pp 832–837 7. Zhu TS, Cai Z, Zhou X, Li J (2019) Task scheduling in deadline-aware mobile edge computing systems. IEEE Internet Things J 6(3):4854–4866 8. Datta CB, Harris J (2015) Fog computing architecture to enable consumer centric Internet of Things services. In: Proceedings of the IEEE international symposium on consumer electronics (ISCE), Madrid, Spain, Jun 2015, pp 1–2 9. Zanella NB, Castellani A, Vangelista L, Zorzi M (2014) Internet of Things for smart cities. IEEE Internet Things J 1(1):22–32 10. Elhady, Tawfeek MA (2015) A comparative study into swarm intelligence algorithms for dynamic tasks scheduling in cloud computing. In: Proceedings of the 7th international conference on intelligence computing information system, Dec 2015, pp 362–369 11. Mohammadi RP, Fahringer T (2013) A truthful dynamic workflow scheduling mechanism for commercial multi-cloud environments. IEEE Trans Parall Distrib Syst 24(6):1203–1212 12. Cheng JL, Wang Y (2015) An energy-saving task scheduling strategy based on vacation queuing theory in cloud computing. Tsinghua Sci Technol 20(1):28–39 13. Buyya RR, Calheiros RN (2009) Modeling and simulation of scalable Cloud computing environments and the CloudSim toolkit: challenges and opportunities. In: Proceedings of the international conference on high performance and computer simulation, Jun 2009, pp 1–11 14. Jia GH, Rao H, Shu L (2018) Edge computing-based intelligent manhole cover management system for smart cities. IEEE Internet Things J 5(3):1648–1656 15. Wang et al (2017) QoS scheduling algorithm in cloud computing based on discrete particle swarm optimization. Comput Eng 43(6):111–117

Machine Learning and Recommendation System in Agriculture: A Survey and Possible Extensions Krupa Patel

and Hiren B. Patel

Abstract Agriculture is the preliminary source of income for most the people of numerous counties. Traditional ways of farming process have lots of issues such as lack of knowledge of crops, fertilizers, and pesticide selection and usage. These can reduce crop yield, crop quality, and farmer’s profit. Withal, technology is continually being modified and streamlined. The usage of computational methods like machine learning (ML) and recommendation system (RS) can enable farmers to make smart judgments rapidly and precisely which will increase profitability. Presently, a large number of data related to agriculture are available on the Internet. Several learning algorithms and recommendation system approach will be helpful to generate the model using available data and forecast the crops, fertilizers, pesticides, crop yields, and profit. Through this paper, we are doing a detailed review which will elaborate category of machine learning and recommendation systems, use of ML and RS in agriculture, work carried out so far, problems in agriculture with how technology will be helpful, and future possible accretions. Keywords Agriculture · Content base · Collaborative filtering · Machine learning · Recommendation system · Supervised learning · Unsupervised learning

1 Introduction Computational technology like machine learning (ML) is the technology that can be able to predict unknown things by developing a model using available data and those models are useful to predict unknown things [1]. With ML, the recommendation system (RS) can suggest the items or things using past content history or similarity [2]. Machine learning models are classified into supervised learning and unsupervised learning. A recommendation system can generate predictions that are supported by similarities between items and content. It has many applications such as items, movies, locations, and things, and these two techniques are popular on the K. Patel (B) · H. B. Patel Kadi Sarva Vishwavidhyalaya, Gandhinagar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_53

665

666

K. Patel and H. B. Patel

Internet and has many applications. However, there were very few people who know that these technologies could be helpful in the real world to solve many issues in areas like medicine and agriculture. Nowadays due to improper development activity, large numbers of land areas are utilized to construct buildings. Therefore, total agricultural land has been decreased. Along with that upsurge in population has increased the food demand. Hence, we have to improve crop yield and quality of crops to maintain the demand–supply chain. These could be achieved by using computational technologies which are often helpful in the agriculture sector in many ways, i.e., suggesting crops and fertilizers, identifying diseases, predicting yield, and meteorology. Traditionally, there is no mechanism available that could make crop predictions by analyzing the soil. Therefore, farmers make wrong decisions due to a lack of techniques and knowledge. Hence, the quality of crops and yield is decreasing continuously. It will indirectly harm the economy of the county. However, today we have a large amount of data related to agriculture over the Internet. Many government websites are providing information about soil properties, crops, fertilizers, and pesticides. Image processing techniques are utilized, and it is a great extent to take live information about soil and weather [1]. This information can be utilized to create and train the machine learning models. And those models can be helpful to make productive decisions to eliminate agricultural problems. Many researchers are tried to develop or solve problems in agriculture in several ways with help of computational technology. However, there is still a drawback which will be fulfilled by machine learning techniques and recommendation models. This paper includes a survey of machine learning and recommendation techniques and their applications in agriculture. Classifications of machine learning and recommendation algorithms are given in the next section. Section 3 comprises work done to this point using machine learning and recommendation systems. Section 4 includes problems in agriculture. Recent trends and possible augmentations are given in Sect. 5 Finally; we conclude the paper in Sect. 6.

2 Classification of ML and RS Algorithms Machine learning algorithms are categorized in to two types corresponding to how model will learn. It has two wide divisions in supervised and unsupervised learning [3]. Under these two categories, different types of algorithms are available which is discussed in Sect. 2.1. Recommendation system is technology which can suggest things using former history and similarity; hence, it has classified as content base and collaborative filtering [2]. Detail working of recommendation algorithms is presented in Sect. 2.2.

Machine Learning and Recommendation System in Agriculture: …

667

Fig. 1 Supervised learning

2.1 Machine Learning Algorithms Supervised learning: Machine learning algorithms that produce the model utilizing train data whose class label is known and that model will be applied to prognosticate the class label of unknown samples are called supervised learning [4]. Figure 1 shows working of supervised learning. Classification and prediction challenges can be answered using this approach. In agriculture, such instance is forecasting crops using soil properties and recommending fertilizers. Supervised learning includes regression, logistic regression, classification, Naive Bayes classifiers, K-NN (k-nearest neighbors), decision trees, and support vector machine algorithms. Every algorithm has its own pros and cons. And selection of model depends on input data. Regression techniques are useful with numerical input data. Classification techniques work with both numerical and categorical data. Naïve Bayes classifier utilize probability methods, while decision tree and random forest generate graphs using branching approach. K-NN finds nearest or most matching class to predict class label [2]. Unsupervised learning: Machine learning algorithms produce models by observing the similarities and learning patterns. There are no unidentified class labels available for training the model [4]. Clustering is an instance of unsupervised learning. Figure 2 shows the working of unsupervised learning. Such example in agriculture is creating a cluster of similar types of land based on soil properties. K-means clustering, principal component analysis, singular value decomposition, and principal component analysis are example of unsupervised learning. K-means clustering calculates mean values and create clusters. Singular value decomposition is matrix-based technique. Principal component analysis is prediction technique which examines interrelations among variables. As compared to supervised learning, unsupervised learning gives less accuracy [1, 3].

2.2 Recommendation Systems Content based: Recommendation techniques that induce recommendations using the past history of users are called content base recommendations [5, 6]. Such instance is recommending crops using soil properties as content or past crop history of the farm. In agriculture, most of the algorithms recommends crops, fertilizers,

668

K. Patel and H. B. Patel

Fig. 2 Unsupervised learning

etc., using content properties. Content-based filtering is more popular in agriculture field as compared to collaborative filtering. However, it does not generate independent recommendations. It only recommends things which is similar to the users past history. Collaborative filtering: Recommendation techniques that generate recommendations using similarities between items or things are called collaborative filtering [5, 6]. Collaborative filtering has two categories: user-based and item-based. In user-based collaborative filtering, similarity between users is calculated. And in item-based collaborative filtering, similarity between items is calculated. It provides diverse recommendations. Several similarity measures like Pearson and cosine are useful to measure similarity. Cold-start, sparsity, and shilling attacks are issues in collaborative filtering recommendation system [4]. In agriculture, collaborative filtering is less popular. It can be useful in agriculture sector by matching similarities between different lands and generate crop recommendations. Similarities between crops are also useful to recommend treatment and pest control techniques. Hybrid recommendation: Few researchers use the terms hybrid recommendation which is an amalgamation of content and collaborative filtering techniques for recommendations [5]. This technique will overcome the issues in both algorithms and improve recommendation results. In this section, we have discussed various algorithms of machine learning and recommendation systems. The succeeding section includes existing work done by experimenters in the agriculture field with help of machine learning and recommendation systems.

3 Existing Literature Work Machine learning and recommendation systems are successful approaches which has numerous applications areas such as medical, agriculture, image processingcommerce, location and privacy, security, and robotization. In this paper, we have only concentrate on applicability of those algorithms in agriculture area. Thus, in this section we are reviewing the existing work carried out by various scientists and

Machine Learning and Recommendation System in Agriculture: …

669

agriculture researchers. We have divided our review in three classes according to the problem which they have targeted. Those three categories are: work has been done in crop recommendation, work has been done in fertilizers suggestion, and work has been done in pesticides recommendations.

3.1 Work Has Been Done in Crop Recommendation Numerous researchers are developing algorithms or employing existing machine learning models to forecast crops and enhance the quality of crops and crop yield. Here, we are only involving few relevant researches. In paper [6], factors such as rainfall, temperature, and pesticide input are considered in prophesying the crop yield using decision tree, regressor, gradient boosting regressor, and random forest regressor. Researchers of [7] utilize IoT and machine learning to suggest crops using real-time data of soil. In [8], sensor-based soil testing mechanism is adapted and based on soil test data, random forest algorithms are used to generate recommendations. The author of [9] predicts corn crop yield using a neural network and also evaluates the effect of climate change on crop yield. Recommending crops to farmers based on market analysis using an a priori algorithm is presented in [10]. Another demandbased algorithm using a logistic regression technique is proposed in [11]. Improving crop productivity using various classifiers such as support vector machine (SVM), decision tree, and logistic regression given in [12]. Most of the research has been conducted on soil property analysis and crop recommendation. However, research on recommendation of crop along with profit using multiple criteria such as soil type, properties, demand of crop; weather, and water level has not been conducted yet.

3.2 Work Has Been Done in Fertilizer Recommendation Appropriate use of fertilizers may enrich the crop yield and quality; however, very few researches have been conducted in this area. In [13], fertilizers are recommended using fertility testing of soil samples. A voting-based ensemble classifier is implemented to recommend fertilizers using NPK values in [14]. Support vector machine and random forest machine learning are used to prognosticate yield and fertilizers in [15]. Almost all the researchers recommend fertilizers but along with that proportion of fertilizers combination also matters.

3.3 Work Has Been Done in Pesticide Recommendation Technology alike as image processing is helpful to identify crop disease; machine learning can be helpful to presage pest control methods and recommends pesticides,

670

K. Patel and H. B. Patel

so we can protect the crops against those pests. Daily treatment of crops is needed to cover crops from diseases and pests. Authors of [11] suggest crop treatment by relating disease and measuring the similarity between the disease and the treatment provided for it. Classification techniques such as support vector machine (SVM), decision tree, and logistic regression are used to classify pests in crops and to propose treatment solutions [16]. The researchers erected a simple ontological approach to the detection and treatment of pests in crops as explained in [17]. Support vector machine (SVM) to classify diseases and to propose solutions is given in [18].

4 Problems in Agriculture Sector Traditional agriculture system faces issues such as reduction in crop yield, crop quality, reduction in profit, selection of fertilizers, crop disease, unpredictable weather conditions, less educated farmers, crop demand, and hindered decisionmaking process. This section discussed all these issues and how they can be eliminated using computational techniques.

4.1 Reduction in Crop Yield Crop yield reduction is a major issue in agriculture. Due to urbanization activity the total area of farming has been reduced. In addition to that because of the global warming weather conditions are unpredictable. Hence, it will reduce yield. Crop yield can be increased by choosing the right crop for the right soil in the right season. Also, proper uses of fertilizer make it possible.

4.2 Reduction in Crop Quality In traditional farming techniques, farmers have selected crops by interactions with other farmers; they do not check their farm’s soil quality which is responsible for the quality of crops. Current machine learning and recommendation techniques check soil property and suggest crops that are most suitable for that land.

4.3 Reduction in Profit Due to reductions in crop yield and quality, the profit of the farmers is reduced. Another reason behind the reduction in profit is less market demand for the selected crop.

Machine Learning and Recommendation System in Agriculture: …

671

4.4 Selection of Fertilizers Fertilizers are the nutrition that enhances crop quality and yield. Improper use of fertilizers may reduce crop yield. Due to a lack of knowledge about fertilizer selection and usage; crop quality may be compromised. Technology such as machine learning could help farmers to select appropriate fertilizers with proportions according to soil and crop properties.

4.5 Crop Disease Pest and disease are hazardous for crops and soil. Sometimes it will cause total loss of crops if proper treatment has been not given or farmers fail to identify the disease. Hence, pest or disease identification and control techniques are needed on time. Image processing and machine learning techniques are useful to identify crop health. These techniques can suggest appropriate steps to be taken accordingly.

4.6 Uneducated Farmers Agriculture is a profession where most people are uneducated about technology. Hence, they cannot utilize successful technology to increase their profitability. Proper training of the farmers is needed. Nowadays, the government is conveying updated techniques, their utilization, and application with help of mobile phones, the Internet, and broadcasting on television and radio.

4.7 Unpredictable Weather Global warming condition and the rise in pollution have changed the weather cycle. Seasons are unpredictable and changed jeopardizing ways that will harm crops. The use of technology such as image processing, data mining, and machine learning could be helpful to forecast weather conditions.

4.8 Crop Demand In traditional farming, farmers interact with each other and cultivate the same crop without considering the market demand. Hence, the total production of crops has increased as compared to less demand. Therefore, the price of the crop has reduced

672

K. Patel and H. B. Patel

and farmers do not get enough profit. Sometimes they even did not obtain the total amount of money; they have spent on that crop and incurred loss. With help of technology, we can predict the market demand for crops on yearly basis and calculate expected profit.

4.9 Hindered Decision-Making Process Farming is a time-consuming process that includes various stages such as selection of crops and fertilizers, monitoring continuously for disease identification and doing pest control. For all these, farmers are needed to make a decision; however, government agencies are helping farmers in many ways, but it is a tedious process. Recent technological advancement helps farmers to speed up these decision-making processes by suggesting crops, fertilizers, and pesticides. Moreover, technology also helps farmers to check their soil quality, update them with changing weather conditions, aware them of the latest techniques of farming, etc. All the above-mentioned issues are responsible for the detrition of interest in the current agriculture sector. Many people move from the agriculture sector to another sector in order to obtain maximum profit in an easy way. In some regions, situations become very worst. To solve the above-mentioned issues, technology is the only option for improvement. In the next section, we have elaborated on current trends and what will be done in the future to improve the agriculture process.

5 Recent Trends and Future Augmentation in Agriculture Agriculture is the dominant sector of the country’s economy. Numerous people, directly and indirectly, rely on this area. It is the lifeline of the farmers and their families. Withal, due to several reasons and unplanned structure development exertion total land area for agriculture has been reduced over several decades. Till now, we have discussed computational techniques such as machine learning and recommendation system with its types and classifications. After that, we have discussed current issues in agriculture field which are responsible for reduction in crop quality, yield, and profit. There we need a strong mechanism that could help to increase crop quality and yield in these limited land areas. Consequently, the use of computational intelligence comes into the picture. And several scientists and agricultural research institutes started to work with technology alike as machine learning and recommendation systems. Our literature survey found that recent research trends basically concentrate on crop recommendation, soil analysis, and identification of pests in a particular crop. Some of the researchers are only focus on particular crop. Another group of scientist have focused on recommending crops by considering single parameter. Limited studies have been conducted on profit computation, yield calculation with crop recommendation, crop and soil specific fertilizer suggestion, and pesticide

Machine Learning and Recommendation System in Agriculture: …

673

recommendations. In the future, we will need an automated process that could tell farmers about soil properties and soil quality and suggest crops for various seasons as per farm size, soil quality, weather, water level, and market profit and demand. Along with that, it should also suggest fertilizer combination with proportion on the basis of soil and selected crop. In addition to that, system could also identify diseases and recommend pest control techniques for each crop. With help of this completely automated process, farmers can elect crops, fertilizers, and pesticides hastily and precisely which can lead to maximizing the profit. Computational technology is the technology through which all these could be possible. Various ways such as big data analytics, ML, deep learning, and RS may be helpful to achieve automation in agriculture. This study involves techniques and approaches in machine learning and recommendation system. In this paper, we have compactly covered those two techniques, even so both have really big pictures. In the succeeding section, we have concluded our survey.

6 Conclusion Machine learning and recommendation system are successful and extensively used methods which as numerous application areas. ML has multifold algorithms which successfully produce prognostications. The selection of the model depends on the application and dataset. Along with ML, recommendation systems also have different techniques to generate recommendations. A combination of these two could exclude certain issues in the agriculture sector and make decision-making process faster. In this review, we had discussed various algorithms of ML and RS and how they can be helpful to solve issues in agriculture. We have also included work done so far and future research directions in agriculture. Presently, we are working on a machine learning-based crop and fertilizer recommendation system which could suggest seasonal crops and profit by using soil properties, soil type, and water level, along with fertilizers proportion. In the future, we have planned to design pest identification and control techniques using machine learning.

References 1. Sharma S, Agrawal J, Agarwal S, Sharma S (2013) Machine learning techniques for data mining: a survey. In: 2013 IEEE International conference on computational intelligence and computing research. IEEE ICCIC 2013, no 1. https://doi.org/10.1109/ICCIC.2013.6724149 2. Sen PC, Hajra M, Ghosh M (2020) Supervised classification algorithms in machine learning: a survey and review, vol 937. Springer, Singapore 3. Liakos KG, Busato P, Moshou D, Pearson S, Bochtis D (2018) Machine learning in agriculture: a review. Sensors (Switzerland) 18(8):1–29. https://doi.org/10.3390/s18082674 4. Almazro D, Shahatah G, Albdulkarim L, Kherees M, Martinez R, Nzoukou W (2010) A survey paper on recommender systems. Available: http://arxiv.org/abs/1006.5278

674

K. Patel and H. B. Patel

5. Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl Based Syst 46:109–132. https://doi.org/10.1016/j.knosys.2013.03.012 6. Sreerama AS, Sagar BM (2020) A machine learning approach to crop yield prediction, pp 6616–6619 7. Gupta A, Nagda D, Nikhare P, Sandbhor A (2021) Smart crop prediction using IoT and machine learning, pp 18–21 8. Chauhan G, Chaudhary A (2021) Crop recommendation system using machine learning algorithms. In: Proceedings of the 2021 10th international conference system modeling Advancement in research. Trends, SMART 2021, vol 3307, pp 109–112. https://doi.org/10.1109/SMA RT52563.2021.9676210 9. Crane-Droesch A (2018) Machine learning methods for crop yield prediction and climate change impact assessment in agriculture. Environ Res Lett 13(11). https://doi.org/10.1088/ 1748-9326/aae159 10. Mallick PK, Balas VE, Bhoi AK, Zobaa AF (2019) Development of a model recommender system for agriculture using Apriori algorithm, vol 768. Springer, Singapore 11. Kumar A, Sarkar S, Pradhan C (2019) Recommendation system for crop identification and pest control technique in agriculture 12. Improving Crop Productivity Through A Crop Recommendation System Using Ensembling Technique. IEEE 13. Pratap A, Sebastian R, Joseph N, Eapen RK, Thomas S (2019) Soil fertility analysis and fertilizer recommendation system. SSRN Electron J 287–292. https://doi.org/10.2139/ssrn. 3446609. 14. Kumaravel A, Archana K, Saranya KG (2020) Crop yield prediction, forecasting and fertilizer recommendation using voting based ensemble classifier related papers crop yield predict ion, forecasting and fertilizer recommendation using data mining algorithm Archana Kumaravel location specific. SSRG Int J Comput Sci Eng 7. Available: www.internationaljournalssrg.org 15. Bondre DA, Mahagaonkar S (2019) Prediction of crop yield and fertilizer recommendation using machine learning algorithms. Int J Eng Appl Sci Technol 04(05):371–376. https://doi. org/10.33564/ijeast.2019.v04i05.055 16. Suriya MKS, Muthuramalingam S (2018) Pesticide recommendation system for cotton crop diseases due to the climatic changes, pp 25–32 17. Lacasta J, Lopez-Pellicer FJ, Espejo-García B, Nogueras-Iso J, Zarazaga-Soria FJ (2017) Agricultural recommendation system for crop protection. Comput Electron Agric 152:82–89. https://doi.org/10.1016/j.compag.2018.06.049 18. Digital Agriculture System for Crop Prediction & Disease Analysis Based on Machine Learning, vol 21, no X, pp 1065–1070

A Study on Accident Detection Systems Using Machine Learning S. Savitha and N. Sreedevi

Abstract Safety is the top priority of every individual in our day to day lives. Urbanization as led to a rise in the trend of motorization which as influenced the road safety measures directly or indirectly. There are various aspects of safety measures to be taken care of while traveling or driving. There are many reasons especially human errors which leads to road accidents. According to the report by WHO and NHTS. A major death reported in our country is due to accidents. One of the main causes of the accident on highways is due to drowsiness, driver fatigue, alcohol, and drug consumption of drivers while vehicle driving. A most deadly accident can be prevented by detecting the causes earlier and preventing the accident if the driver is alerted in time of his drowsiness. In this paper, we present a brief survey on design and models used for the detection of accidents. This paper also presents a survey on the analysis methods used for various reasons that cause accidents, the conventional IoT-based models are used in detecting driver drowsiness, fatigue, drunken driving, and distractions. Detailed analysis of widely used ML and AI-based techniques in this regard. Keywords Artificial Neural Networks (ANN) · IoT (Internet of Things) and ML (Machine Learning) · Artificial Intelligence (AI) · Electro-oculogyric (EOG) · Photo Plethysmography (PPG) · WHO

Abbreviations ANN HMM ECG

Artificial Neural Networks Hidden Markov Model Electrocardiogram

S. Savitha BMS Institute of Technology, Bengaluru, India e-mail: [email protected] N. Sreedevi (B) CMR Institute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_54

675

676

EEG EMG EOG IoT ML NHTSA PPG VANET WHO

S. Savitha and N. Sreedevi

Electroencephalogram Electromyogram Electro-oculogram Internet of Things Machine Learning National Highway Traffic Safety Administration Photo Plethysmography Vehicular Ad-Hoc Network World Health Organization

1 Introduction Modernization and the need for improving the socioeconomic status of individuals have led to a rising trend of motorization which has directly or indirectly influenced the increasing road safety measures and situation with a rise in fatalities and permanently disabling injuries over the past decades making road safety a major concern [1]. In 2018, more than 1.51 lakh deaths from RTA were registered only in India. There were 1.35 million deaths from RTA Globally in 2016. In 2019, study shows that the total accident that occurred in India is 449,002, out of which 151,113 people were killed and 451,361 were injured. Every year, the rate of death across the world caused due to vehicle crashes is approximately close to 1.35 million people as reported by the NHTSA and World Health Organization [2]. They are various reasons that cause road accidents a few are listed below. Causes of Road Accidents • • • • • • •

Inadequate way of driving, e.g., Over speeding. Alcohol and drugs consumption during driving. Driver Distractions. When drivers fall asleep. Traffic signal misbehavior. Avoiding Safety driving accessories like seat belts and helmets. Not obeying lane driving and vehicle overtaking.

The death caused due to road accidents is due to no on-site medical assistance, late accident reporting, and inaccurate geographic location. Effective and efficient mechanisms are needed to minimize the frequency of road traffic accidents that is increasing every year [3]. This paper addresses various study on the problem of drunken driving, Fatigue, and drowsy driver which are the major reasons and causes for road accident deaths. The study includes both IoT (Internet of Things) and ML (Machine Learning) techniques. Various ML techniques used are supervised machine learning technique, i.e., Linear Regression algorithm [4], Artificial Neural Networks (ANN) [5], fuzzy logic [6], Genetic Programming [3], Random Forest Classifiers [8],

A Study on Accident Detection Systems Using Machine Learning

677

etc. For differentiating, a normal situation from an accident, various data is fetched in real-time using the models and is compared with the previously existing accident data. Conventional accident detection methods include systems using VANET [8], Secondly using Piezoelectric Sensors [9], popular technology like GSM is used to send alert messages to the nearby emergency services by continuously tracking the vehicle in case of an accident. Vibration Sensors [10] major focus is to reduce the time taken after the occurrence of an accident and to detect and report microcontrollers are used. In this paper, we represent the role of ML-based techniques used to facilitate the IoT paradigm in accident prevention and detection. In Sect. 1, introduction to accident detection techniques. In Sect. 2, reasons for accident detection techniques. In Sect. 3, survey on IoT-based accident detection techniques is presented. In Sect. 4, survey on ML/AI-based accident detection technique is presented.

2 Related Work Driver fatigue or Driver drowsiness: A study report shows that at least 5000 people died on average every year due to crashes that involve drowsy driving. As there is an increase in vehicles due to urbanization this has led to many accidents that can occur on roads. There are many situations where a drowsy driver may not realize that he/she is falling asleep as the experience of sleep may last only for few seconds which is referred to as micro-sleep. Conventional methods use Multi-Sensors, smartphone, and Cloud-Based Computing Platform detection system as mentioned in [11] and also include applications that are based on computer vision which utilizes the visual, on-visual features [12–15] and a combination of both the features. Visual features include detection of drowsy human facial expressions like frequent yawning, nodding of the head, eye closing and blinking, and so on. Nonvisual features include brain motion, heart rate sensors, and also include physiological parameters like EOG, ECG, EEG, and PPG [16, 17]. Other driving behaviors to be monitored include vehicle speeding, pressure on the acceleration pedal, and deviations from lane position. Driver fatigue can be based on physiological behaviors like the driving state of the vehicle, machine vision, and information fusion [19–21]. In [21], author describes three main categories of drowsiness detection methods as behavioral parameters, Vehicular parameters, and Physiological parameters-based techniques. Drunken drivers: Another major cause of road accidents is drunken driving, according to recent research more than 50% of road accidents aged between 15 and 35 years [21] die due to overconsumption of alcohol. Report by WHO suggests that more than 70% of death caused due to road accidents were because of drunken driving. In [22], a conventional model was developed to detect drunken driving using alcohol sensor MQ-3, STC12C516A microcontroller, and IoT which detects alcohol

678

S. Savitha and N. Sreedevi

contentment in the breath. Another model that was designed is to detect physical changes and physiological behavior when in a drunken state like drowsiness, redness of eyes, and eye blinking. Driving Distraction: Various reasons that cause a distraction while driving includes chatting while driving, using mobiles while driving, music, handling pets and children, looking at off-road activities and people around metrics like patterns of gaze that differentiate between focused and distracted driving, and movement of the head [23]. The categories of distraction include olfactory, gustatory, visual, auditory distraction, biomechanical, and cognitive distraction [23]. Driving performance degrades due to distraction, and problems like changes in vehicle speed, controlling the vehicle, and drifting outside the lane edges [24]. Aggressive driving behavior: Aggressive driving behaviors include vehicle speeding on roads, driving too close to the vehicle, close by or in front, misleading traffic regulations, improper lane changing, ignoring speed limits and road conditions, Opposite side driving, driving between two lanes, Not using the indicator while taking a turn, Driving under the influence of drugs, etc.

3 Models and Techniques for Accident Detection There are various techniques and strategies to detect road accidents, the taxonomy to detect accident as mentioned. VANET for accident detection: Internet of Things (IoT)-based prototype using Vehicular Ad hoc Network (VANET) was designed for detecting accidents automatically. VANET uses Vehicle to Vehicle or Vehicle-to-Infrastructure-based communications as represented in Fig. 1. In [8], a model was proposed to overcome the problem of vehicle failure that causes traffic congestion in this model a node is considered a moving vehicle and the nodes that are in the range of the module can send and receive alert messages using an RF module. The node in the range of the network area receives the message, and then the received message is transmitted to the base station. The model uses a piezo-electric sensor, Micro-electro-mechanical systems sensor, flame sensor, and temperature sensor. A VANET-based technology to create a mobile network uses moving vehicles as nodes. Allowing vehicle approximately in the range of 100–300 meters to connect. In [26], uses the VANET technique with the use of a crash sensor and an airbag system. When an accident is sensed by the sensor, the information is sent to the microcontroller-based system. The site of the accident is located using GPS and the location information fetched is sent to a number using GSM. VANET is used for the transmission of the message to emergency services. A message is sent as an alert by broadcasting the message from a source node to all the vehicles on the road. The model consists of On-Board Units, a Road Side Unit which interacts with each other using dedicated short-range communication.

A Study on Accident Detection Systems Using Machine Learning

679

Fig. 1 Shows the representation of VANET architecture [25]

(1) Using Smartphones: Due to the advancement in mobile and smartphone-based technologies, the ability to detect traffic accidents is possible. These systems depend on the data extracted from the GPS receiver, smartphone, and accelerometer, the mechanisms proposed in [27] include the detection phase and the notification phase where images, audio, and video message about the accident site can be sent to the emergency service for faster responses (Fig. 2). (2) GPS and GSM models for detecting accidents: In [28], author proposes a system that uses GPS to receive and monitor the vehicle speed and to detect the accident site. When an accident occurs the GPS module tracks the location of the vehicle and sends information to the victim’s family, nearby hospital, and

Fig. 2 IoT-based accident detection module using smartphones

680

S. Savitha and N. Sreedevi

Fig. 3 GPS and GSM module for accident detection

Fig. 4 Represents the model for detecting drowsiness

social network like Facebook, Twitter, WhatsApp, and so on, it uses the GSM module for sending a message, voice SMS the same is illustrated in Fig. 3. (3) IoT based model to detect driver drowsiness. In [29], author presents an IoT-based model to detect drowsiness and fatigue using Raspberry Pi, where a camera is used to monitor driving, the vehicle also contains the crash sensor and FSR sensor for detecting collision extremity. The driver is alerted by a voice speaker present in the vehicle to alert the driver when signs of drowsiness are detected. The block diagram in Fig. 4 represents the model for detecting drowsiness.

4 ML/AI-Based Techniques (i) Fuzzy logic techniques for accident detection: In [6], author proposed a model for accident detection using fuzzy logic techniques where a model is designed to detect accidents that occur in the range of traffic lights. The proposed algorithm used for calculating the cycle time uses the Webster method based on a modification method called Dynamic Webster with Dynamic Cycle time.

A Study on Accident Detection Systems Using Machine Learning

681

Fig. 5 Structure of the system based on fuzzy logic [6]

A model to detect accidents using Fuzzy Loop detector data was designed in [29] which uses a control system to recognize incidents in loop. This system consists of two fuzzy inference modules which are used to analyze the traffic data and also identifies anomalies that occur during the incident (Fig. 5). In [30], an accident prediction system based on fuzzy logic was designed that gives the relationship between accidents and the various non-linear factors that contribute to the accident. Various factors like traffic conditions, human negligence, and inputs to the model are the count of vehicles passing in a lane, speed, and road width and road surface condition. The model results in good accuracy in predicting accidents but does not consider the factors like road lighting and weather conditions. A model was proposed based on a portable fuzzy system to detect drowsiness in [31], it is made up of various blocks that include a camera and an illumination module for acquiring the driver image, a face detection, a module to detect features like eye and mouth, and calculation module to indicate drowsiness. This model combines various drowsiness measures derived from a temporal window of eyes and mouth through a fuzzy inference system deployed in a Raspberry Pi. Three degrees of drowsiness like Low-Normal State, Medium-Drowsy State, and HighSevere Drowsiness State was analyzed using this model. Mechanisms of Machine Learning Techniques to Detect Drowsiness (i) Facial Action Detection System (FADS) Expression coding systems can be used to detect various facial actions. This model acts as the most widely used and popular to code facial expressions. A single facial muscle action corresponds to facial expressions of about 46 moments. FADS is capable of identifying the state of the emotion based on facial expressions and an

682

S. Savitha and N. Sreedevi

accelerometer is used to detect motions of the head and through a system for automatic eye tracking. (ii) Support Vector Machines (SVM) For face detection, an SVM-based Haar feature algorithm is used where each feature is classified using a classifier. This model takes the captured face of the driver as input, detected face as output. From the detected face, the eye image of the driver is detected which is sent to the ML-based algorithm for further processing. The status of eyelid closing and opening is identified using SVM which can be trained to detect the face and see whether the eyelids are shut or open and then decide whether to trigger the alarm or not. The training set consists of a set of images with eyes closed and some set of images with eyes open. An ML classifier is built using this algorithm to classify the pre-processed eye image. Linear or non-linear classification problem scan be efficiently solved using SVMs (Table 1). (iii) Hidden Markov Model (HMM) This model is based on statistics which predicts the hidden states based on the observed states. It consists of techniques for eye tracking based on color and geometrical features of the eye. Classification: This stage uses classifiers that helps in decision-making based on drowsiness. It uses weighted parameters to detect drowsiness where the eye image is converted into greyscale and pre-processed then the pre-processed eye image is further classified using a machine learning classifier to detect the status of the eye opened or closed. Table 1 Supervised learning approaches to detecting drowsiness by SVM References

Measures

Methods

Description

Accuracy (%)

[34]

State of the eye

SVM

Supervised learning 98.4 method

[35]

State of the eye lid

Binary

Used binary method 93.5 to detect the state of the eye

[36]

Eye closure

Harr with SVM

SVM for the eye 99.74 closure detection by using Harr algorithm features

[37]

Eye status

HOG and SVM

HOG is Features 91.6 extraction algorithm

[38]

Eyelid closure and yawning

Binary SVM with linear kernel

Binary SVM with the linear kernel to detect eye closure and yawning

94.5

A Study on Accident Detection Systems Using Machine Learning

683

Table 2 CNN-based classifier for drowsiness detection systems References

Measures

Methods

Description

[39]

Eye feature

CNN and Soft Max

CNN and Soft Max 78 layer used for the visual features with Voila and Jones algorithm

Accuracy (%)

[40]

State of the eye

CNN

Used CNN and Soft Max layer for the eye gaze features with voila and jones algorithm

98

[41]

State of the eye

MTCNN and DDD n

Used for the eye states detection

91.6

[42]

State of the eye

CNN

Ada-boost and LBF and 95.15 PERCLOS with CNN

(iv) Accident detection using CNN Convolutional Neural Networks (CNN): CNN’s is based on a neural network that is made up of neurons that includes learnable weights. CNN uses spatial convolution layers that are considered best for the image, which shows the strong correlations between them. CNN is used in many applications and proven itself very successful in image recognition, classification, and video analysis. CNN-based system is designed to detect driver drowsiness that detects an accident based on the live video captured by a CCTV camera installed on a highway road. Each video frame is taken and using deep learning it is run through the CNN model that is trained to classify various frames of modules based on accidents and nonaccidents. Image classifiers based on CNN give an accuracy of more than 95% for comparatively smaller datasets. In [36], the facial landmarks detected by the camera are passed to a CNN classifier. This model can be used to detect real-time driver drowsiness using embedded systems and Android devices (Table 2).

5 Conclusion The highest accuracy can be claimed by considering the physiological feature of a drowsy driver, intensively ML and deep learning techniques can be used in the detection and prevention of road accidents. The main disadvantage of using these advanced techniques is related to retrieving the necessary datasets as a few public datasets are not in real conditions. This paper provides an intensive survey on the reasons, cause, and techniques that include physiological measures and behavioral measures which are based on the vehicle or the driver that helps in overcoming the problem associated with the individual model thus resulting in improved drowsiness detection systems. The combination of ECG and EEG features achieves highperformance results emphasizing the fact that combining the physiological signals

684

S. Savitha and N. Sreedevi

improves the performance instead of using them alone. This survey also provides a study on the existing IoT-based conventional models that use smartphones, VANET, GPS and GSM module, and various sensors to detect the occurrence of accidents due to drowsiness, fatigue as well ML/AI techniques like CNN, SVM, fuzzy logic, Hidden Markov Model (HMM) that can provide better accuracy than the convention model. Future work can include ADA BOOST and more advanced techniques to detect and prevent accidents.

References 1. Identification of factors in road accidents through in- depth accident analysis, mouyid bin islam, kunnawee kanitpong, iatss research 32(2):58–67 2. Jabbar R, Shinoy M, Kharbeche M, Al-Khalifa K, Krichen M, Barkaoui K (2020) Driverdrowsiness detection model using convolutional neural networks techniques for android application. In: 2020 IEEE international conference on informatics, IoT, and enabling technologies (ICIoT), IEEE 3. Dogru N, Subasi A (2018) Traffic accident detection using random forest classifier. In: Proceedings of the 15th Learning Technologies Conference (LT), pp 40–45 4. Lane DM (2017) Introduction to linear regression online statistics education: an interactive multimedia course of study 5. Ohe I, Kawashima H, Kojima M, Kaneko Y (1995) A method for automatic detectionof traffic incidents using neural networks. In: Proceedings of the 6th international conference on Pacific Rim TransTech conference vehicle navigation information systems (VNIS). Ride Future, 1995, pp 231–235 6. Alkandari A, Al-Shaikhli IF, Najaa A, Aljandal M (2013) Accident detection and action systemusing fuzzy logic theory. In: Proceedings of the International Conference on Fuzzy Theory Its Applications (iFUZZY), Dec 2013, pp 385–390 7. Halim Z, Kalsoom R, Bashir S, Abbas G (2016) Artificial intelligence techniques for driving safety and vehicle crash prediction. Artif Intell Rev 46(3):351–387 8. Manuja M, Kowshika S, Narmatha S, Theresa GW (2019) IoT based automatic accident detection and rescue management in Vanet. SSRG Int J Comput Sci Eng 36–41 9. Patil U, More P, Pandey R, Patkar U (2017) Tracking and recovery of the vehicle using GPS and GSM. Int Res J Eng Technol 4(3):2074–2077 10. Tushara DB, Vardhini PAH (2016) Wireless vehicle alert and collision prevention system design using Atmel microcontroller. In: Proceedings of the international conference on electrical and electronic optimal technology (ICEEOT), Chennai, India, pp 2784–2787 11. Driver Fatigue Detection Systems Using Multi-Sensors, Smartphone, and Cloud- Based Computing Platforms: A Comparative Analysis, Sensors 21:56 12. Matthews G, Neubauer C, Saxby DJ, Wohleber RW, Lin J (2019) Dangerous intersections? A review of studies of fatigue and distraction in the automated vehicle. Accid Anal Prev 13. Gu WH, Zhu Y, Chen XD, He LF, Zheng BB (2018) Hierarchical CNN-based real-time fatigue detection system by visual-based technologies using MSP model. IET Image Process 12:2319– 2329 14. Kiashari SEH, Nahvi A, Bakhoda H, Homayounfard A, Tashakori M (2020) Evaluation of driver drowsiness using respiration analysis by thermal imaging on a driving simulator. Multimedia Tools Appl 79:1–23 15. You F, Li YH, Huang L, Chen K, Zhang RH, Xu JM (2017) Monitoring drivers sleepy status at night based on machine vision. Multimedia Tools Appl 76:14869–14886 16. Yin B, Fan X, Sun Y (2009) Multiscale dynamic features based driver fatigue detection. Int Patt Recogn Artif Intell 23:575–589

A Study on Accident Detection Systems Using Machine Learning

685

17. Akin M, Kurt M, Sezgin N, Bayram M (2008) Estimating vigilance level by using EEGand EMG signals. Neur Comput Appl 18. Automated Detection of Driver Fatigue Based on AdaBoost Classifier with EEG Signals, Published online 2017 Aug 3. https://doi.org/10.3389/fncom.2017.00072 19. Automated detection of driver fatigue based on EEG signals using gradient boosting decision tree model, 2018 Aug; 12(4):431–440. https://doi.org/10.1007/s11571-018-9485-1. Epub 2018 Apr 16 20. Data from NCRB, Government of India, http://ncrb.nic.in/StatPublications/ADSI/ADSI2014/ ADSI 21. Xiaorong Z et al (2016) The drunk driving automatic detection system based on internet of things. Int J Contr Autom 9(2):297–306. https://www.sparkfun.com/datasheets/Sensors/MQ3.pdf 22. Drunk driving is a bigger problem than statistics show. [online] Available at: https://www. thehindubusinessline.com/economy/policy/drunk-driving-is-a-bigger-problem-thanstatisticsshow/article9616180.ece. Accessed 23 Jan 2020 23. Liao Y, Li SE, Wang W, Wang Y, Li G, Cheng B (2016) Detection of driver cognitive distraction: a comparison study of stop-controlled intersection and speed-limited highway. IEEE Trans Intell Transp Syst 17:1628–1637 24. Ranney TA (2008) Driver distraction: a review of the current state-of-knowledge. The National Academies of Sciences, Engineering, and Medicine, Washington, DC, USA 25. SDN-based VANETs (2020) Security attacks, applications, and challenges. Appl Sci 10:3217. https://doi.org/10.3390/app10093217 26. Melcher V, Diederichs F, Maestre R, Hofmann C, Nacenta JM, van Gent J, Kusic´ D, Zˇ agar B (2015) Smart vital signs and accident monitoring system for Motor cyclists embedded in helmets and garments for advanced e call emergency assistance and health analysis monitoring. Proc Manuf 3:3208–3213 27. A Comprehensive Study on IoT Based Accident Detection Systems for Smart Vehicles. IEEE Access 8, Jul 2020 28. IoT-Based Smart Alert System for Drowsy Driver Detection, vol 2021. Article ID 6627217. https://doi.org/10.1155/2021/6627217 29. Rossi R, Gastaldi M, Gecchele G, Barbaro V (2015) Fuzzy logic- based incident detection system using loop detectors data. In: 18th Euro working group on transportation, EWGT 30. Gaber M, Wahaballa AM, Othman AM, Diab A (2017) Traffic accidents prediction model using fuzzy logic: Aswan desert road case study. J Eng Sci Assiut Univ 45(1):28–44 31. A Portable Fuzzy Driver Drowsiness Estimation System. Sensors 20(15):4093. https://doi.org/ 10.3390/s20154093 32. Ghosh S, Sunny SJ, Roney R (2019) Accident detection using convolutional neural networks. In: 2019 International conference on data science and communication (IconDSC) 33. Jabbar R, Shinoy M, Kharbeche M, Driver drowsiness detection model using convolutional neural networks techniques for android application 34. Sabet M, Zoroo RA, Sadeghniiat-Haghighi K, Sabbaghian M (2012) A new systemfor driver drowsiness and distraction detection. In: Proceedings of the 20th Iranian Conference. Electrical Engineering (ICEE), May 2012, pp 1247–1251 35. Punitha A, Geetha MK, Sivaprakash A (2014) Driver fatigue monitoring system based on eye state analysis. In: Proceedings International Conference Circuits, PowerComput. Technology (ICCPCT), Mar 2014, pp.1405_1408. 36. AL-Anizy GJ, Nordin MJ, Razooq MM (2015) Automatic driver drowsiness detection using haar algorithm and support vector machine techniques. Asian J Appl Sci 8(2):149–157 37. Pauly L, Sankar D (2015) Detection of drowsiness based on HOG features and SVM classifiers. In: Proceedings IEEE international conference research computing intelligence communication networks (ICRCICN), Nov 2015, pp 181–186 38. Manu BN (2016) Facial features monitoring for real time drowsiness detection. In: Proceedings of the 12th international conference on innovative information technology (IIT), Nov 2016, pp 1–4

686

S. Savitha and N. Sreedevi

39. Dwivedi K, Biswaranjan K, Sethi A (2014) Drowsy driver detection using representation learning. In: Proceedings of the IEEE international advance computing conference (IACC), Feb 2014, pp 995–999 40. George A, Routray A (2016) Real-time eye gaze direction classification using convolutionalneural network. In: Proceedings of the international conference on signal processing and communication (SPCOM), Jun 2016, pp 1–5 41. Reddy B, Kim YH, Yun S, Seo C, Jang J (2017) Real-time driver drowsiness detection for embedded system using model compression of deep neural networks. In: Proceedings of the IEEE conference on computervision and pattern recognition. Workshops (CVPRW), Jul 2017, pp 438–445 42. Zhang F, Su J, Geng L, Xiao Z, Driverfatigue detection based on eye state recognition. In: Proceedings of the international conference on machine and vision information technology (CMVIT)

Prediction of Cardiac Arrest Using Ensemble Methods K. Sreekanth and J. Hyma

Abstract Cardiac arrest is the sudden loss of heart characteristic, breathing and recognition, which results in person’s death. Nearly 1.15 hundred thousand people die due to cardiac arrest everyday in world. To avoid such cases this work predicts whether a particular person is going to have cardiac arrest incoming Ten years, based on particular individual’s medical history without medical intervention. Machine learning in scientific fitness care is evolving as a huge research area. Applications of machine learning getting to know in healthcare are advancing medication into a brand new realm. Neural Network and logistic regression are most important Machine Learning Algorithms used in cardiac arrest diagnosis. The ensemble model known as Maximum Voting Scheme is implemented. Keywords Neural networks · Logistic regression · Ensemble model · Maximum voting scheme · Support vector machine · Naïve Bayes classifier

1 Introduction Cardiac arrest have become a major health issue which is a sudden lack of blood go with the flow due to the failure of the coronary heart to pump efficaciously and has an illustrious deaths around 17 million per annum and 1.15 hundred thousand deaths every day [1]. Cardiac arrest is considered as high risk disease even in well-developed countries. Due to the multifactorial nature of numerous contributory threat factors together with bp, diabetes and high cholesterol etc. [2–5]. So, scientists have become too trendy processes like device studying as it performs a key position in providing a great perspective in assisting choices [6, 9]. In this research work, an ensemble K. Sreekanth (B) · J. Hyma CSE, GITAM University, Visakhapatnam, India e-mail: [email protected] J. Hyma e-mail: [email protected] K. Sreekanth CSE, Nalla Narasimha Reddy Education Society’s Group of Institutions, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_55

687

688

K. Sreekanth and J. Hyma

model popularly known as maximum voting classifier is applied. Whether a patient has 10-yrdangerofgrowingcoronaryheart disease (CHD) the use of the Framing ham dataset that is publicly available on Kaggle [7, 8].

2 Literature Survey The Research Vertical introduces previous work that relates to the subject of this paper. “Decision assist device for heart disorder analysis using neural network” [10, 12]. “Performance evaluation of various gadget getting to know techniques for prediction of coronary heart ailment” [11]. The best class accuracy of eighty five % turned into pronounced the usage of logistic regression with sensitivity. Survey of gadget studying algorithms for disorder diagnostic”: Statistical fashions for estimation that are not capable to provide proper performance consequences have flooded the assessment place [13]. Statistical models are unsuccessful to maintain express data, address missing values and big facts points. All these factors add importance to Machine Learning, we have learnt about data preprocessing. Classification of coronary heart ailment the usage of k nearest neighbor and genetic set of rules. The ensemble version characteristic choice and classification model creation on type 2 diabetic sufferers’ records has been analyzed [14]. In the recent years World Health Organization, 2020 stated that there is a severe crisis and need of preventing dreadful diseases like cardiac arrest [15].

3 Proposed Methodology The Primary motto of this work is to identify heart diseases based on prediction techniques for an individual the risk of triggering with the disease incoming ten years using historical medical data base. In past many algorithms have been used for prediction [17]. But, in this work an ensemble model popularly known as maximum voting classifier with five machine learning algorithms in it like SVM, MLP, DT, MNB and LR are used. The final result is prediction of Cardiac Arrest and suggesting an individual accordingly. By using GUI they have given the medical history of an individual and according to training of the model it will predict and suggest the individual in a new window according to the results, approximately 5000 records [19, 20]. This dataset have been used to train the Ensemble model, along with seven classifiers. Datasets The datasets used in this paper is the Framingham dataset which is publicly available on Kaggle and for GUI Heart dataset.

Prediction of Cardiac Arrest Using Ensemble Methods

689

Heart Dataset In addition to Framingham Dataset, they have used another dataset known as ‘Heart’ to test the model prediction. It is given to highest recorded accuracy classifier and results have been predicted based on particular individual’s medical history [23, 24]. Framingham Dataset Framingham is a study of prevalence of Cardio vascular disease (CVD). To know the risk factors and familiar factors in CVD, to know the patterns of the CVD [22]. This dataset consists of 16 attributes like ‘male’, ‘education’, ‘age’, ‘cigs per day’, ‘prevalent stroke’, ‘BP meds’, ‘dia BP’, ‘Sys BP’, ‘heartrate’, ‘diabetes’, ‘Glucose’, ‘BMI’, ‘Total chol’,‘current smoker’ and ‘Ten Year CHD’. This dataset has16 attributes. Firstly, they have imported the necessary modules and Classifiers then preprocessed the Framingham dataset. They have dropped unnecessary attributes like ‘education’ [16]. And then checked for missing values and dropped them. Then created correlation maps for the inputs and divided the dataset in to training and testing, they have taken all 13 attributes for training and took Ten Year CHD as testing, that’s how they have trained all the seven classifiers and the output/accuracy of that seven classifiers have been given as estimators to ensemble model for better prediction and accuracy [21]. Then they created GUI and tested the final model with new dataset ‘Heart’ and given it’s attributes as input to the model and predicted. Finally, it will predict the accurate results by giving better suggestions according to individual’s result in a new window of their GUI. Proposed Architecture This model is destined to predict the cardiac arrest in a very accurate manner. Ensemble classifier after been trained by Framingham dataset. By using GUI, they have given user’s medical history attributes to predict the cardiac arrest by training with another dataset ‘Heart’ (Fig. 1). Ensemble Classifiers Machine learning algorithms are used in the recent literature. In the current research, an ensemble model is proposed for better prediction. Support Vector Machine It is incredibly famous controlled gadget studying technique where it is applicable for both classifier and predictor. The Multinomial Naive Bayes Classifier It fits for miles appropriate for different features. Mainly applicable in integer feature counts. Logistic Regression This methodology obtains odds ratio generally incases of more than one explanatory variable, because of this there would be best two feasible classes.

690

K. Sreekanth and J. Hyma

Fig. 1 Proposed architecture

Decision Tree Classifier It is a flowchart-like tree structure, where each non-leaf node denotes a test and leaf nodes denotes the class label. Multilayer Perceptron Classifier One of the Feed forward Neural Network techniques in the field of AI, where it takes multiple inputs & provides right results. Majority Voting Scheme A maximum voting scheme is an ensemble technique in which multiple classifiers are combined and the outcomes are taken from individual classifiers and when you go for majority voting, the class label which is obtained as maximum from the classifiers is considered as the final class.

4 Result Analysis Ensemble model was applied on the data which has been used and obtained the output of predicting the risk of triggering with CVD (cardiac Vascular Disease) incoming Ten Years for particular individual by their medical history with 87% accuracy [18]. It is the best accuracy so far, since they it is the output of many classifiers and ensemble them (Table 1; Figs. 2, 3, 4, 5, 6 and 7).

Prediction of Cardiac Arrest Using Ensemble Methods Table 1 Accuracy obtained

691

S. No.

Method employed

Accuracy

1

SVM

0.6508994003997335

2

Naïve Bayes

0.7732622695980458

3

Logistic regression

0.8438818565400844

4

Decision tree

0.8423273373306684

5

MLP classifier

0.8429935598489896

6

Majority voting

0.8703086831001554

Fig. 2 Histograms

5 Conclusion In this research work, the proposed model which predict the cardiac arrest disease with 87% accuracy and noticed that the chances of getting identified with coronary heart disorder will increase about 2% for every growth in age and systolic blood pressure. From the research work, we can say that by using machine learning techniques we can make intelligent models to detect a person affecting with cardiac arrest without intervention of medical specialist, based on health care database. In medical healthcare, machine learning is evolving and can provide deeper understanding on medical data. To decrease the death rate, effective model for predicting the cardiac arrest has become more important in this world where death rate is 17 million per year.

692

Fig. 3 Age versus disease

Fig. 4 Correlation in attributes

K. Sreekanth and J. Hyma

Prediction of Cardiac Arrest Using Ensemble Methods

Fig. 5 Correlation map

Fig. 6 Pair plots

693

694

K. Sreekanth and J. Hyma

Fig. 7 Attributes versus disease

References 1. Sreekanth K, Rajeshwar J, Chandra Shekar K, Ravikanth K (2022) Classification based on evolutionary approach towards an improved classifier. In: International conference on innovations in computer science and engineering, Springer Science and Business Media LLC 2. Chandra Shekar K, Chandra P, Venugopala Rao K (2020) A framework for automatic detection of heart diseases using dynamic deep neural activation functions. J Am Intell Human Comput 11(11):5341–5352 3. Chandra Shekar K, Chandra P, Venugopala Rao K (2019) An ensemble classifier characterized by genetic algorithm with decision tree for the prophecy of heart disease. In: International conference on innovations in computer science and engineering, vol 74. Springer Nature Singapore, pp 9–15 4. Chandra Shekar K, Chandra P, Venugopala Rao K (2018) A framework for feature subset selection using rough set with mutual information. J Adv Res Dynam Contr Syst 10(3):357–367 5. Chandra Shekar K, Chandra P, Venugopala Rao K (2018) Relative-feature learning through genetic-based algorithm. In: International conference on computational intelligence and informatics, vol 712. Springer Nature Singapore, pp 69–79 6. Chandra Shekar K, Venugopala Rao K, Chandra P (2018) Hidden decision tree based pattern evaluation using regression models for health diagnosis. In: International conference on information and communication technology for intelligent systems, vol 1. Springer International Publishing, pp 30–38 7. Chandra Shekar K, Chandra P, Venugopala Rao K (2014) Fault diagnostics in industrial application domains using data mining and artificial intelligence technologies and frameworks. In: IEEE international advance computing conference, vol 1, pp 538–543 8. Chandra Shekar K, Ravikanth K, Sreekanth K (2012) Improved algorithm for prediction of heart disease using case based reasoning technique on non-binary datasets. Int J Res Comp Commun Technol 1(7) 9. Deepika N, Chandra Shekar K, Sujatha D (2011) Association rule for classification of heartattack patients. Int J Adv Eng Sci Technol 11(2):253–257 10. Vijay Bhasker G, Chandra Shekar K, Lakshmi Chaitanya V (2011) Mining frequent itemsets for non binary data set using genetic algorithm. Int J Adv Eng Sci Technol 11(1):143–152

Prediction of Cardiac Arrest Using Ensemble Methods

695

11. Mohammad J, Layeghian S (2019) An intelligent warning model for early prediction of cardiac arrest in sepsis patients. In: Computer methods and programs in biomedicine, Elsevier, vol 178 12. Sepehri M, Layeghian S, Mohammad J (2021) A predictive framework in healthcare: case study on cardiac arrest prediction, vol 117 13. Pasha M, Fatima M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 14. Kannan E, Kavitha R (2016) An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining 15. Dwivedi AK (2018) Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Comp Appl 16. Durai Raj Vincent PM, Nandhini Abirami R (2019) Cardiac arrhythmia detection using ensemble of machine learning algorithms, AISC, vol 1057, Springer 17. Zhu T, Xu S, Zang Z (2017) Cardiovascular risk prediction method based on CFS subset evaluation and random forest classification framework. In: IEEE 2nd international conference on big data analysis 18. Mago VK, Monteiro L, Singh M (2016) Building a cardiovascular disease predictive model using structural equation model and fuzzy cognitive map. IEEE 19. Kumar R, Pahwa K (2017) Prediction of heart disease using hybrid technique for selecting features. In: 4th IEEE Uttar Pradesh section international conference on electrical, computer and electronics (UPCON) 20. Shah B, Ramadoss (2005) A responding to the threat of chronic diseases in India. Lancet 21. Domor Mienyea I (2020) An improved ensemble learning approach for the prediction of heart disease risk, vol 20. Elsevier 22. Rajpal N, Dahiya A, Guru N (2007) Decision support system for heart disease diagnosis using neural network. Delhi Business Rev 8 23. Azam S, Ghosh P (2021) Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques 24. Ashok Kumar M, Kamari A (2021) A novel approach for prediction of heart disease using machine learning algorithms. In: Innovation in technology (ASIANCON) Asian conference, Feb 2021

Real-Time Object Detection and Tracking Design Using Deep Learning with Spatial–Temporal Mechanism for Video Surveillance Applications T. Kusuma and K. Ashwini Abstract We propose a CNN-based framework for “real-time object detection and tracking using deep learning” in this paper, which includes a spatial–temporal mechanism. The impact of efficient data on performance benchmarks in terms of accuracy has changed. The data processing is handled by industry buzzwords: deep learning (DL) and computer vision (CV). The CNN-based framework uses the single object tracker value to match arrival models and find targets in the next frame. Simply applying single object tracking to multiple object tracking will encounter problems in computational efficiency and results due to occlusion. In this paper, we introduce a “spatial attention mechanism (STAM)” to manage occlusion bias and target interaction. Object tracking is a sensational technology in image processing with great future implications. Multiple object tracking (MOT) has seen an extensive boom in the last few years due to machine learning, deep learning, computer vision, and more. This paper aims to provide an object tracking software solution. Using YOLO’s “You Only Look Once” technology with the help of Tensor flow, the system is geared toward object detection, tracking, and counting. Proven, effective detection and tracking on various dataset. Algorithms that offer real-time, accurate, and precise identifications appropriate for real-time applications. Keywords Deep Learning (DL) · Computer Vision (CV) · Convolution Neural Network (CNN) · Object detection · Object tracking · MOT

1 Introduction Image and video analysis has gained a wide range of applications in recent years. Tracking is a skill that will become increasingly important in the future. There are two types of tracking: multi-object tracking (MOT) and single object tracking (SOT) T. Kusuma (B) · K. Ashwini Global Academy of Technology, Bangalore, India e-mail: [email protected] K. Ashwini e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_56

697

698

T. Kusuma and K. Ashwini

Fig. 1 Overview of detection and tracking

[1, 2]. Multi-object tracking is critical in solving many major computer vision problems. In today’s technological society, CV and DL are the two most important technologies. Human vision is the feeling that 3D can be perceived in the outside world. This intuition acts like a mistake in the emergence of new technologies [3, 4]. The proposed system uses the latest yolov4 used for object recognition. Yolov4 uses a Tensor flow classifier for both training and detection [5, 6]. Yolo began his journey with the dark web technology that was later developed for yolov2, then yolov3, and later yolov4. Abundant resources enhance the ability of researchers to extract more detailed information from images [7, 8]. The intention of multi-object tracking is to assess the location of multiple objects in a video, maintain their identity consistency, and generate a completely unique trajectory. MOT remains a difficult problem, especially in crowded scenes where traffic jams and target interactions are common [9]. Detecting and tracking objects are a key challenge in video surveillance applications such as the protection of urban environments and government buildings, traffic control, and navigation. CCTV is a dynamic environment for tracking vehicles and other objects. In computer vision applications, object detection and tracking are inextricably linked. Object discovery is the process of identifying an object and looking for suspicious frames in the group [10–12]. Object tracking determines the trajectory or path that objects follow within a simultaneous frame [13, 14]. The image recorded from the data record consists of several individual images. Figure 1 shows a basic block diagram for finding and tracking objects. The recording consists of two parts. 85% of the images in the dataset will be used for training and 15% will be used for testing. Using the CNN algorithm, the image is expected to find the object inside. The implications of this work are used to estimate traffic density at intersections, detect variable light objects for different types of self-driving cars, and design smart cities and intelligent transportation systems.

2 Object Detection and Tracking Object tracking, detection, computation, segmentation, image captioning, and other CV tasks benefit the world. Object detection is the process of identifying and locating objects in an image. With advances in computer vision, powered by DL, it is possible

Real-Time Object Detection and Tracking Design Using Deep Learning …

699

to accomplish tasks on time scales. Grouping pixels based on similarities is a semantic segmentation task. Classification, positioning, and object detection are methods of determining the class of an object and demarcating it by drawing a bounding box around it. Version segmentation is a type of semantic segmentation applied to multiple entities. A common intuition to do the job is to run a CNN on an image. We extracted candidate features from shared feature maps using spatial attention mechanisms. In this case, the invisible part skews the classified object. To address this issue, we propose a spatial attention mechanism that focuses heavily on the free domain for feature extraction [15, 16]. To perform a selective search for object recognition, the hierarchical clustering algorithm is used. Some of these methodologies’ limitations are mitigated by modern algorithms such as single-shot detectors (SSDs). An efficient object detection algorithm is one that guarantees the creation of sharp-sized bounding boxes for all recognized objects, has high processing power, and provides faster processing. CNNs and SSDs promise promising results, but there is a trade-off between speed and accuracy. Therefore, the choice of algorithm depends on the application [17–19].

3 Convolutional Neural Networks (CNNs) CNN has the benefit of mechanically acting characteristic extraction at the photo, which means that critical functions might be detected via way of means of the community itself. The convolutional layer, the organization layer, and the totally linked layer are the three main components of the CNN, as shown in Fig. 2. A gray scale shot with 1024 nodes is created using the multilayer approach. Throughout the pulling down procedure, the photo’s spatial positions are misaligned. The feature represents the inner items via way of means of small squares of the enter records keeping the spatial dating among the images.

Fig. 2 Overview of CNN architecture

700

T. Kusuma and K. Ashwini

3.1 Object Tracking The Internet is the principle community connecting tens of thousands and thousands of humans across the world. Videos are the principle supply of leisure and knowledge. Video is a chain of images. Due to the small time distinction among frames, the image movement seems to be a scene movement. When growing a video processing algorithm, the videos are divided into categories. The evaluation of video streams is an ongoing process. Future frames are undetermined for the CPU. A video collection is a video of a hard and fast duration. Before processing the contemporary body, all next frames might be collected [20, 21]. Motion is a completely unique characteristic that distinguishes video from images. Motion is an effective visible signal. Only sparse places with inside the image can monitor the functions and moves of items [22, 23].

3.2 Simple Online Real-Time Tracking (SORT) Simple Online Real-Time Tracking is a probable technique to gain multi-item monitoring (MOT). SORT’s overall performance is more desirable via way of means of problems which include look, this affiliation among the interface. The SORT improves overall performance and will increase overall performance in conditions which include extended impasse times. SORT is a framework primarily based totally on Kalman filtering. The Hungarian technique plays image via way of means of image records binding on a connected metric, which include an interface that evaluates bounding field overlap. Include extra information, which includes movement and look properties, similarly to alignment, for higher mapping.

4 Design CNN has been developed and trained on how to record original data for the vehicle dataset of the traffic monitoring application. Overall, four types of images were collected at different points and under different recording conditions, including NIR images. Videos are categorized by function and size of the vehicle. Tricycles are presented in the automotive category. Buses, trucks, and trailers are considered serious means of transport. Cars, SUVs, and limos are included in the lighting category. Most images contain only one subject from each category.

Real-Time Object Detection and Tracking Design Using Deep Learning …

701

4.1 Single Object Detection Figure 3 shows a flowchart for detecting a single object. First, the necessary libraries are exported, and training data is provided via Google Drive. Tensor flow algorithms are simulated online using Google Colab. The algorithm then compiles and acquires data in a controlled manner [16]. This method is known as a supervised classification algorithm. The data is processed by the CNN layer, which determines various actions and the storage rate. The amount of data and stack size are also shown. The algorithm then executes an era and learns from the training data. Accuracy and drive loss are monitored on a continuous basis. When the accuracy of training drops a certain threshold, a callback is called and the epoch is paused. Then, based on the training and test data, a confusion matrix is displayed. The confusion matrix can be used to track and detect a variety of performance metrics.

4.2 Multiple Object Detection Train vehicle trackers using deep learning to track multiple objects and optimize detector collision rates by providing efficient vehicle recognition results by evaluating vehicle detection devices. Convenience is detected, formed on experimental data. It is divided into six phases: loading the dataset, designing, configuring training options, object tracking training, and tracking evaluation. A Online MOT Algorithm Figure 4 depicts an overview of the proposed algorithm. To track the object, use the following steps: 1. Find the each target area using the motion model in the current time frame. Candidates are selected in a search area. 2. Use spatial attention to extract and weight candidate features for each target. The best candidate with the highest score is then chosen as the expected target state using a binary classifier. 3. Each tracked target’s visibility map is derived from the characteristics of the target’s corresponding putative state. The visibility of the tracked target was paired with the spatial profile of the target and neighboring targets to determine temporal attention. 4. All tracking CNN branches are updated based on training sample loss in the current and previous frames, with weighting based on temporal interest. The motion model for each target is updated based on the target’s estimated state. 5. The policy defines the creation of new targets and the deletion of untracked.

702

T. Kusuma and K. Ashwini

Fig. 3 Flowchart of single object detection

4.3 Metrics of Performance The test dataset must be used to evaluate the DL performance of the model on previously unseen data. The algorithm analysis would be affected by the performance choice measurements. This aids in identifying the reasons of misclassifications so that appropriate action could be taken to correct them.

Real-Time Object Detection and Tracking Design Using Deep Learning …

703

Fig. 4 Result of YOLOv4 Deep SORT, with bounding boxes and ID for each person. For each object, there are two bounding boxes: the red box is the detection in the current frame, and the colored box is the predicted bounding box given by the Kalman filter

1. Confusion matrix: It provides predictive information for different objects for binary classification. 2. Accuracy and loss: It is determined by formula. Because it provides similar overhead for both types of error and works well for a well-balanced dataset; the measurement of accuracy as a standalone measurement is unreliable. 3. Accuracy, recall, and F1 score: Accuracy refers to the fraction of the appropriate classification results. The proportion of total results correctly categorized by the algorithm is called recall. Since the F1 score takes into account both the precision value and the recall value, it must be maximized to improve the model. 4.3.1

Qualitative Results

In this section, qualitative results are presented. Consecutive sequences of frames illustrating tracking capabilities of the COCO approach are displayed. Tracking result images obtained from the tracking video sequences with conditions such as occlusion, blur induced by motion, and debris are used to illustrate the visibility. The images of tracked pedestrians are color coded with bounding boxes and IDs.

704

T. Kusuma and K. Ashwini

This approaches Deep SORT with Faster R-CNN, respectively, on COCO dataset. The tracking result obtained using DPM performs worst with low accuracy of 22.2% due to the poor tracking stability pointed by high fragmentation number 1041 with respect to the 13 mostly tracked tracks. Meanwhile, greater multi-tracking values mean more tracked objects with better stability. From the results, it can be confidently stated that detection quality plays an important role in the performance of the tracking approach.

5 Conclusion Deep learning has been used to solve computer vision problems and has surpassed image processing methods to solve these problems. A CNN model trained on multiple datasets to detect a single object. This work proposes a CNN-based dynamic multiobject technique for developing individual object trackers using shared and aggregate CNN characteristics. To cope with drift concerns caused by frequent occlusion and object contact, a spatial–temporal attention approach was included. In addition, to employ motion information, a single object motion model is added into the algorithm. The test results for the transportation section’s complicated milestones demonstrate the efficiency of the suggested algorithm for Internet transportation. Vehicle speed and color, vehicle type, vehicle movement direction, and number of cars are all performance metrics that have been verified by ROI. At various times in the movie, several items are recognized and tracked. The model continues in a strong graphics process, increasing the number of frames, assessing the models of other datasets, and adjusting the architecture required to match the model more and more likely to succeed.

References 1. Nguyen VD et al (2017) Learning framework for robust obstacle detection, recognition, and tracking. IEEE Trans Intell Transport Syst 18(6):1633–1646 2. Kain Z et all (2018) Detecting abnormal events in university areas. In: 2018 International conference on computer and applications (ICCA), pp 260–264 3. Wang P et al (2018) Detection of unwanted traffic congestion based on existing surveillance system using in freeway via a CNN-architecture trafficnet. In: IEEE conference on industrial electronics and applications (ICIEA), Wuhan, 2018, pp 1134–1139 4. Mu Q, Wei Y, Liu Y, Li Z (2018) The research of target tracking algorithm based on an improved PCANet. In: 10th international conference on intelligent human-machine systems and cybernetics (IHMSC), Hangzhou, 2018, pp 195–199 5. Baykara HC et al (2017) Real-time detection, tracking and classification of multiple moving objects in UAV videos. In: 29th IEEE international conference on tools with artificial intelligence (ICTAI), Boston, MA, 2017, pp 945–950 6. Wang W, Shi M, Li W (2017) Object tracking with shallow convolution feature. In: 9th international conference on intelligent human-machine systems and cybernetics (IHMSC), Hangzhou, 2017, pp 97–100

Real-Time Object Detection and Tracking Design Using Deep Learning …

705

7. Muhammad K et al (2018) Convolutional neural networks based fire detection in surveillance videos. IEEE Access 6:18174–18183 8. Hernandez DE et al (2018) Cell tracking with deep learning and the Viterbi algorithm. In: International conference on manipulation, automation and robotics at small scales (MARSS), Nagoya, 2018, pp 1–6 9. Qian X et al (2017) An object tracking method using deep learning and adaptive particle filter for night fusion image. In: 2017 International conference on progress in informatics and computing (PIC), Nanjing, 2017, pp 138–142 10. Yoon Y et al (2018) Online multi-object tracking using selective deep appearance matching. In: IEEE international conference on consumer electronics—Asia (ICCE-Asia), Jeju, pp 206–212 11. Bharadwaj HS, Biswas S, Ramakrishnan KR (2016) A large scale dataset for classification of vehicles in urban traffic scenes. In: Proceedings of the 10th Indian conference on computer vision, graphics and image processing, ACM 12. Mohana et al, Performance evaluation of background modeling methods for object detection and tracking. In: International conference on inventive systems and control (ICISC) 13. Chandan G et al (2018) Real time object detection and tracking using deep learning and OpenCV. In: International conference on inventive research in computing applications (ICIRCA) 14. Mohana et al, Elegant and efficient algorithms for real time object detection, counting and classification for video surveillance applications from single fixed camera. In: International conference on circuits, controls, communications and computing (I4C) 15. Mohana et al, Simulation of object detection algorithms for video surveillance applications. In: 2nd international conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) 16. Raghunandan A et al,Object detection algorithms for video surveillance applications. In: International conference on communication and signal processing (ICCSP) 17. Mangawati A et al, Object tracking algorithms for video surveillance applications. In: 2018 international conference on communication and signal processing (ICCSP) 18. Mohana et al, Design and implementation of object detection, tracking, counting and classification algorithms using artificial intelligence for automated video surveillance applications. In: Advanced computing and communication society (ACCS)—24th annual international conference on advanced computing and communications (ADCOM-2018), IIITB, Bangalore 19. Jo KU, Im JH, Kim J, Kim DS (2017) A real-time multi-class multi-object tracker using YOLOv2. In: IEEE ICSIPA, Malaysia, September 12–14 20. S Sanjana VR Shriya G Vaishnavi K Ashwini 2021 A review on various methodologies used for vehicle classification, helmet detection and number plate recognition Evol Intel 14 2 979 987 21. Kusuma T, Ashwini K (2021) Modular ST-MRF environment for moving target detection and tracking under adverse local conditions. In: International conference on big data analytics. Springer, Cham, pp 93–105 22. Kusuma T, Ashwini K (2018) Real time object tracking in H. 264/AVC using polar vector median and block coding modes. Int J Comp Inform Eng 12(11):981–985 23. Kodipalli A, Devi S (2021) Prediction of PCOS and mental health using fuzzy inference and SVM. Frontiers in Public Health 24. Kusuma T, Ashwini K (2022) Analysis of deep learning frameworks for object detection in motion. Int J Knowl Based Intell Eng Syst. ISSN:1327-2314. https://doi.org/10.3233/kes220002 25. Kusuma T, Ashwini K (2022) Multiple object tracking using STMRF and YOLOv4 deep SORT in surveillance video. Int J Res Trends Innov. ISSN:2456-3315

Equipment Planning for an Automated Production Line Using a Cloud System K. Bhavana Raj, Julian L. Webber, Divyapushpalakshmi Marimuthu, Abolfazl Mehbodniya, D. Stalin David, Rajasekar Rangasamy, and Sudhakar Sengan

Abstract Lean Manufacturing (LM), which reduces waste, maximises labour efficiency and offers lean production advantages, operational costs, and low product quality. A production line includes workstations linked via transmission and electrical control. An automated manufacturing line reduces production and labour costs, improves output quality, and minimises human error. Integration and machines automate industrial processes, so the process is computer-controlled. Adopting automated manufacturing and lean and automated Logistics Methods (LM) may enhance product quality, minimise waste, and increase agility. A survey indicates a new K. Bhavana Raj Department of Management Studies, Institute of Public Enterprise, Hyderabad 500101, India e-mail: [email protected] J. L. Webber · A. Mehbodniya Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha, Kuwait e-mail: [email protected] A. Mehbodniya e-mail: [email protected] D. Marimuthu Department of Computer Science and Engineering, GITAM University, Bengaluru, Karnataka 561203, India e-mail: [email protected] D. Stalin David Department Of Information Technology, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamil Nadu 600072, India e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering, GITAM School of Technology, GITAM University, Bengaluru, Karnataka 561203, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_57

707

708

K. Bhavana Raj et al.

strategy for optimum manufacturing and distribution applications. Case studies, theoretical analysis, and real-world examples of cutting-edge technology all have an impact on the application and production processes. This suggests that LM and automation are essential in manufacturing. Practical examples are the starting point for lean and advanced automation applications that allow Lena Automation System (LAS) deployment of the whole production and LM. Keywords Cloud computing · Automation · Manufacturing process · Lean

1 Introduction The relationship will become lither and less slick, requiring us to change rapidly to meet budgetary conditions. Create and then re-evaluate the joint efforts, and the measures must be taken into account with this indicator. Two different systems are sound. The combination of Six Sigma, which reduces disability ratio and the rate of return, and Lean Thinking helps reduce the cycle range. The specific number of affiliations is different; these methods have achieved outstanding results. Another primary point in developing this system forgiveness cycle is the adaptation compromise of the process and activities. In lean philosophy, we incorporate the health of the flow [1]. The conditions for implementing the data flow are changing all the time. Storing has been seen as the power hotspot for the economy’s progress. Producers in the business have been looking at the new challenges that come from a globalised world that is constantly changing. As progress gradually improves, people are well if all else fails to move towards a more viable and capable solution for their essentials. As time passes, various producers have pulled in their things turn, while established affiliations attempt to make do by incorporating changes and updates into their key gathering measures. Creators try to keep up their bit of the pie in a genuine atmosphere. This impediment has hampered Sri Lankan policymakers and the Western relationship with rapidly developing countries. A few components raise this resistance, for instance, resource cost, work cost, natural challenges, people’s feelings, etc. Under these conditions, makers attempt to be more capable and monetarily sharp. All the more convincing and valuable producers have tried to reach generally accepted points of view by extending working hours and supplier resources [2]. The stores’ point is that robotisation can improve the suitability, increase the creation limit with the least number of pros, cut the creation cost, and improve quality. An inevitable result of these propensities is that affiliations will consider structuring their computerisation levels by executing more robotisation outlines. Despite the different central explanations behind robotisation, computerisation isn’t the best blueprint and isn’t even a potential one sporadically [3]. Various makers perceive that motorisation creates a multifaceted nature and requires a more recognisable cost while smoothing out the cycles. Henceforth, finding the right balance and mix of lean and robotisation is essential for a relationship to be lean and use

Equipment Planning for an Automated Production Line Using a Cloud …

709

computerisation. Finding the ideal degree of computerisation in the perfect space is reliably fundamental. Since robotisation is exceptionally advanced and effectively refers to the system, this requires higher presentation costs and backing costs. As per producers’ affirmation, its computerisation and lean social events are ideal approaches to stay reasonable and advantageous, paying little heed to the blend’s tendency. This structure consistently perceives a foreseen creation. It changes into a hardcoded blend. Lean and computerisation are based on the same idea: to speed up and automate the creation and collaboration cycles. This makes them both thin and deft. Different affiliations of types and some of the evaluation work are in progress. Cooperation projects considered measures programme and long for, and activities for this concept are integrated business improvement standards. The development has been associated with any effort to create a robot framework to facilitate. The structure of the paper is: Sect. 1 is an introduction to the automated production line using a cloud system. Section 2 is a background of related research works. Section 3 is the proposed lean production system model. Section 4 is the result and discussion. Section 5 is the conclusion and future work end of the article.

2 Related Works The waste output control and the period of the reference and ID, the reduction of defects, stability and variable in quality, and the ongoing costs and convenience effort are huge loads of essential gadgets; flexible sex is the focal multiple of today. Review the needs of the cycle at a low cost; now, legally, space is more open, and there is also an opportunity to work together with a hint of a universal robot to create within. Customer cost savings are promised; capacity, quality, the collapse’s delivery time, material reduction, and improvement are used for the couple’s focus centre. In either case, these assessments, continuous openness [4], do not take into account new developments. Combat robots would create an insignificant cost, excess production, and weakness. This theory, derived from the separation of the model, the MRP and ERP structure, to the insurance premiums for the CNC machine tools, the machine of creating the line, has been tremendous. Cloud Manufacturing (CM) is a front-line creation model; it can change the all-out amassing industry. The further progression of CM, a technique for working up a CM organisation, usually in a ground-breaking way, raises huge issues and subjects [5]. At this moment, there is nothing in the composition to deal with such a case. This letter establishes the organisation’s automated advancement strategy. They were proposing another robotised advancement plan to improve the CM organisation. Robotised IC design subject to improved opening methodology is used to create the power semiconductor [6]. Programming straight mixed absolute numbers are assigned to the novel space advancement issue. By pushing the significant limits, the short evaluation time and easy use of any business improvement solver on the customer side are solved.

710

K. Bhavana Raj et al.

The customised surface survey planning, the instructive assortment, is ordinarily exorbitant, and the related technique, since it is uncommonly dependent on the enlightening record, is inconvenient—a general procedure for the ASI small learning data [7]. This procedure’s limit relies upon picture fixes portrayed from the significant learning network transmission boundary. Next, the desire for each pixel is gotten by convolving the ready in the data picture classifier. The robotised plan of strong resource creation considering Petri nets has been proposed is novel and comparable to the end control framework. In any case, a certified application may be trusted. Along these lines, the gridlock control strategies that have been proposed in the past outline report aren’t suitable for such applications. This paper offers a gridlock control philosophy for inadequacy acknowledgement and treatment of sketchy resource structures to handle this issue. The issue of gridlock control is in the flexibility of explaining mechanised creation structure; of late, it has stood apart. In Petri nets, resources progress circuit, and the syphon is a large part of the time used to portray and decide a stop’s control procedure. The bright lights on such a Petri net are a structured resource of an essential plan of cycles, including some outstanding resources [8]. Hung parts have spread to the entire current gathering industry. Hence, if an extension is in progress, it won’t be used for the welfare of mechanised screw fastening structures. Regardless, there are numerous fundamental issues and planning, quality of automation that prevents still, for small screws and tremendous things, and challenges, notably finishing requirements. The automated affiliation will be less vulnerable than the typical physical labour problem. In either case, the versatility and ability to respond to changes in the interest are reduced for the benefit of automation; to improve execution alone, the inspiration driving why the motorization must be adaptable and related to an improvement in the cycles remains questionable [9]. A specific sensor philosophy is radio-frequency identification (RFID) for social occasion measurements. They presented a technique and two business cases demonstrating such a model’s quality, setting an improvement in gathering affiliations. This is an unquestionable report. Inductive reasoning proposes an applied structure to utilise model setting progression as an engaging master in the new development and further assistance of the procedure. The central assessment technique passed on is particular critical appraisal research. The assessment is always based on a combination of the heads’ ideas and their work together, focusing on credible motorisation developments [10].

3 Materials and Methods 3.1 Lean Production System To generate profits by reducing the production costs and eliminating waste, the method of its synonyms, LM enterprise production systems and tools, and management concepts, Fig. 1 shows the LM system details below [11].

Equipment Planning for an Automated Production Line Using a Cloud …

711

Fig. 1 Design of LM system

3.1.1

Research Design

In various industrial fields, such as the LP of passage in aerospace, the proper way is, in this case, the emerging research themes and case studies. There is no empirical evidence for applying the LP in this area. Therefore, research passed to the LP is exploratory and descriptive. When you use the case study, it is recognised as discovery research in operations management. It has generated qualitative research methods [12].

3.1.2

Case Selection

The choice of validity and case studies for the survey results is a big decision. We used the theoretical sampling model. The conditions used in this study, a case study of the best results to provide robust and meaningful insight, were designed to trigger a “learning opportunity”. Our strategy, the maximum change of the basis in order to realise a copy of the text using the variance, is the case of the wealth of information. The basic unit of this analysis is a plant. Even if the plant is part of a group, it is a significant decision to be used in the LP because there are different possibilities by tissue factors of each production plant. Nevertheless, the products’ size and production have been included [13].

3.1.3

Data Collection

Fieldwork before starting the design of the case study plan. This is the data collection tool to follow, and it contains the general rules of procedure and case study method. The contract has been updated and improved during all the visits that occurred. We will use the survey as the primary source of evidence to action triangulate information

712

K. Bhavana Raj et al.

from in-depth interviews and repeat the above process to test the questionnaire’s preliminary version [14].

3.1.4

Data Processing

Data processing is based on integrating physical sensors and actuators for the world and interactive hardware-based clusters. Additional middleware enables services such as cloud computing. Therefore, it features microelectronics, sensors, communications, and a processing module. As a result, products, resources, machinery, and equipment get a form of essential intelligence. In such an environment, smart objects are connected to the global Internet [15].

3.1.5

Qualitative Data Analysis

The primary goal of the analysis is to maintain the conclusion of a data-driven, transparent chain of evidence. Therefore, we have adopted a series of measures to ensure the validity of the data analysis and interpretation process. Analysis of two of the inner cases and a cross-case is done. Researchers will be able to start collecting lots of data over time if they look at how things are going inside.

3.2 System Design of Automation System The proposed model applies to the production system of the Engine Electronic Control Unit (EECU), which is one of the critical automotive parts. Figure 2 shows the electronic board assembled by a fastening screw. Then, the cover is assembled with a sealing material, and functional tests are performed. First, it is determined that the process design process is decomposed into operations. The decomposition operations are electronic processing and supply board inspection process. Devices specified by the line can be used to perform the operations specified for the column. Almost all operations are performed by special equipment in the Fig. 2 System design

Equipment Planning for an Automated Production Line Using a Cloud …

713

conventional automatic production system. LM has been developed as a production system for jogging people, while there is no waste less definition for an automation system. However, we know the “production status” and “technical condition”. If you are looking for waste and human behaviour during the operation of the machine, there is no qualitative difference between various types of waste and how people behave when using a machine; if users waste, a new and improved technique will be able to identify significantly more. The most typical waste in evaluating our analysis and machine operation is the operation of a conveyor, such as material handling equipment. The new topics to explore and interviews and data to analyse the relationship between the identification have been investigated and have been defined in a subsequent interview variable, and it has been identified.

3.3 Business Process Intelligence Business Process Intelligence (BPI) not only focuses on managing the process but innovative ways have also been reported. Another traditional BPI is derived from their data and workflow. BPI will analyse the data; it supports the creation of reports and dashboards. For example, production can take action to analyse the analysis tool.

3.4 Design Procedure of LAS LAS standard equipment in the library. Categories of operation are conveying the feed material, the feed and discharge, positioning, inspection, and the main operation. For these operations, we have prepared a library of standard equipment, including equipment often used in practice. To calculate the optimal allocation for the polymerisation operation of each piece of equipment, we set the primary operating time, which is determined by the operating device and, for each device, we bulk polymerise by the underlying operating limit time.

4 Result and Discussion Table 1 shows the results of the automated production line equipment pre-planning based on LM ideas. Table 2 shows the data sets’ values in the LAS accuracy results below. Figure 3 shows the accuracy of LAS based on the data set analysis; the preand post-data tests increase the proposed prediction. Table 3 shows the data sets’ values in the LAS accuracy results below. Figures 4 and 5 show the prediction of manual and LAS based on the years 2017–2022 increased LAS in the collection of data sets.

714

K. Bhavana Raj et al.

Table 1 Simulation parameters Parameters

Values

Simulation tool

Visual Studio

Language

Dot Net

Process

Intel Core I5

Model

Learn automated production

Table 2 Data sets of LAS Data sets

Pre-test in %

Post-test in %

Data 1

42

53

Data 2

53

75

Data 3

67

92

Table 3 Prediction of manual and LAS Years of data

Manual production in %

LAS in %

2017

30

57

2018

43

68

2019

2

82

2020

50

95

2021

55

97

2022

60

98

Fig. 3 Automation LM

Equipment Planning for an Automated Production Line Using a Cloud …

Fig. 4 Accuracy of LAS

Fig. 5 Prediction of manual and LAS

4.1 Accuracy of LAS See Table 2, Figs. 4 and 5.

4.2 Prediction of Manual and LAS See Table 3.

715

716

K. Bhavana Raj et al.

5 Conclusion and Future Work LAS responds to the lack of documentation on the integration of management and a consistent approach to classical activities, with the flexibility of automation activities and the simplification of production and LM. Unlike the link between traditional production and distribution, this method relies on automation and process improvement; to be exceptionally flexible, you must reduce the level of digitisation. It’s not a rationalisation process, but it’s counterproductive. Practical examples represent the starting point for lean and advanced automation applications that enable LAS deployment of the entire production and LM. Further, research should be done using real-time information to investigate advanced technical areas such as applications optimised for production scheduling.

References 1. Chang K, Xing Y, Yin J, Zheng D (2018) Design of intelligent management and control system of flexible transmission assembly line. In: 8th International conference on instrumentation and measurement, computer, communication and control, pp 756–760 2. Chang SM, Yang GJ, Jun C (2016) Research and application of intelligent manufacturing technology for pulsating production line of aircraft assembly. Aviat Manuf Technol 16:41–47 3. Huang SH, Guo Y, Cha SS (2019) Overview of research and application of internet of things and its key technologies in discrete workshop manufacturing. Comp Integr Manuf Syst 25(2):22–40 4. Kafiev PR, Romanova I (2020) Fuzzy logic based control system for automated guided vehicle. Int Multi-conf Indus Eng Mod Technol 2020:1–6 5. Kerezovic T, Sziebig G (2016) Case study: optimization of end-of-line packaging in fishery industry. In: IEEE/SICE international symposium on system integration (SII), pp 694–699 6. Lee JD, Li WC, Shen JH, Chuang CW (2018) Multi-robotic arms automated production line. In: 4th International conference on control, automation and robotics, pp 26–30 7. Shi H, Niu L, Sun J, Zhang X (2021) Research and implementation of material distribution method based on intelligent perception network. In: IEEE international conference on advances in electrical engineering and computer applications, pp 351–357 8. Wu H, Zou T, Burke H, King S, Burke B (2021) A novel approach for porcupine crab identification and processing based on point cloud segmentation. In: 20th International conference on advanced robotics, pp 1101–1108 9. Zhao H, Ma H, Jingchun Y (2009) The RFID-based product identification and processing in the auto clutch assembly line. Tech Autom Appl 28:52–54 10. Sudhakar S, Chenthur Pandian S (2012) Secure packet encryption and key exchange system in mobile ad hoc network. J Comput Sci 8(6):908–912 11. Sudhakar S, Chenthur Pandian S (2016) Hybrid cluster-based geographical routing protocol to mitigate malicious nodes in mobile ad hoc network. Int J Ad Hoc Ubiquitous Comput 21(4):224–236 12. Priyadarshni AU, Sudhakar S (2015) Cluster based certificate revocation by cluster head in mobile ad-hoc network. Int J Appl Eng Res 10(20):16014–16018 13. Sudhakar S, Chenthur Pandian S (2015) Investigation of attribute aided data aggregation over dynamic routing in wireless sensor. J Eng Sci Technol 10(11):1465–1476 14. Sudhakar S, Chenthur Pandian S (2013) Trustworthy position based routing to mitigate against the malicious attacks to signifies secured data packet using geographic routing protocol in MANET. WSEAS Trans Commun 12(11):584–603

Equipment Planning for an Automated Production Line Using a Cloud …

717

15. Sudhakar S, Chenthur Pandian S (2013) A trust and co-operative nodes with affects of malicious attacks and measure the performance degradation on geographic aided routing in mobile ad hoc network. Life Sci J 10(4s):158–163 16. Mumtaz J, Guan Z, Yue L, Wang Z, Ullah S, Rauf M (2019) Multi-level planning and scheduling for parallel PCB assembly lines using hybrid spider monkey optimization approach. IEEE Access 7:18685–18700

Comparison Between Property-Based Software Security Testing Technique and Fault Injection Trina Saha and Md. Khorshed Alam

Abstract Security means safety. It measures the precautions we take to ensure our safety and protection. Still some years ago, we were primarily worried with security in our homes and workplaces, but today, “software security” is the focus of attention as a special branch of security. Various approaches are being used to solve this issue. Security is considered as a figure of merit of a software. A secured software can easily fulfil its intended purpose. The goal of this paper is to discuss about two empirical software security techniques—one is fault injection-based security testing and the another type is property-based security testing. Additionally, I have also demonstrated a basic comparison between these two techniques showing their advantages and disadvantages. For the purpose of my research, I have also developed two software programs for testing these two types of techniques. Through these programs, I have clearly demonstrated how these two techniques execute to protect a software from being attacked. My developed software is performing the black box technique to verify that a software is vulnerable or not, which has made my paper unique. Keywords Software security · Fault injection-based testing · Property-based testing · Security · Testing techniques

1 Introduction Reliability is regarded as the most significant and valuable term of the system quality. The reliability of a system refers to its capacity to perform a failure-free operation in a given environment during a given time frame [1]. Software security testing is an essential tool for ensuring the trustworthiness and security of a software. In this computer era, software becomes more complicated and large scale which makes software security issues more serious. In this paper, I have discussed about two major methods of software security testing. One is fault injection technique developed by T. Saha (B) · Md. K. Alam Department of Computer Science and Engineering, State University of Bangladesh, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_58

719

720

T. Saha and Md. K. Alam

Weinliang Du based on Environment-Application Interaction (EAI) fault mode [2, 3]. Robustness testing is a type of fault injection. This kind is frequently used to check for weaknesses in communication interfaces. The fault injection technique focuses on the locations where an application and its environment interact. Additionally, it emphasizes human input, the file system, the network interface, and environment variables. The main idea of this testing method is to determine whether the software can respond correctly to different protocol packets. In order to find security flaws, some incorrect data or defects should be injected into different protocol packets. A variety of unique software behaviors can be successfully simulated using this technique. Software must attain a specific state in a fault injection function, which is challenging by using the other technologies. Another type of popular technique is property-based testing, in which test cases are automatically produced from a specified attribute of system. In the next section, I have discussed about the existing techniques and their advantages and disadvantages including the related paper in which existing methods were discussed. In the third section, I have discussed about my proposed techniques, why have I chosen these two types of methods and what are the benefits of these techniques. In the fourth section, I have discussed about my implemented software which is a black box testing oriented to check the vulnerability of a particular software. Not only this I have also discussed how these software works what can be the output after using fault injector property based injector. This part of this research paper plays a very important role as a practical example. In the fifth section, I have discussed about the advantages and disadvantages of these two techniques. In the sixth section, I have shown a basic comparison of these two techniques which was one of my aims of this research. At last, I gave a summary that what I have done in this research and what is my future work. Actually, the goal of this paper is to show the overview of two different types of software security testing techniques—fault injection and property-based injection. I have also implemented two types of software based on these two techniques to help you understand them better. Finally, I have shown a comparison between these two testing techniques.

2 Background and Feasibility Study Following testings are used as traditional techniques to detect security vulnerabilities.

2.1 Penetration Analysis Penetration testing is a type of analysis which is done on known security flaws in software systems. A team is being assigned the responsibility of penetrating the system. The disadvantage of penetration analysis is that it requires one either to know or be able to show the nature of flaws that might exist in a system [4].

Comparison Between Property-Based Software Security Testing …

721

Fig. 1 Process flow of model-based testing

2.2 Formal Security Testing Mathematical explanations of the security requirements of that system are applied to implement the formal security testing. This method’s objective is to officially demonstrate that the system does, in fact, satisfy the criteria. Inherent difficulty is a major challenge for formal methods to suffer from specifying the requirements and the system and then applying the process of checking the requirements’ specification against system specification [5].

2.3 Model-Based Security Testing A model that precisely matches the behavior and structure of the program being tested must be built in order to use the model-based testing technique. After construction, test cases are generated from the test model and then these are run on the software [6]. By using an input–output sequence, an activity diagram, a sequence diagram, a cooperation diagram, a condition, or a data stream, we can show how the software system behaves. Commonly used models for software testing include FSM, UML models, and Markov chains. Figure 1 illustrates the process flow of model-based testing.

2.4 Fuzzy Testing To discover security vulnerability, fuzzy testing is effective and most popular technique. In this testing technique, random data are injected into program to test whether it can run normally under the clutter input. The efficiency of this technique is “it

722

T. Saha and Md. K. Alam

finds vulnerabilities of tested software, which is difficult for the other logical testing method.” On the contrary, this technique creates clutter data; that is why sometimes this is considered as illogical.

2.5 White Box Testing In this testing technique, the tester chooses inputs to exercise the paths through the code and determines the appropriate outputs. The goal of this testing is to ensure that all statements and conditions of the tested program have been executed at least once. So here, the application code should be visible, and the tester should know the code [7]. This testing is a kind of correctness testing.

2.6 Risk-Based Testing Risk analysis is focused on risk-based testing. The combination of security testing and software development life cycle is used here to find high-risk security vulnerabilities. Though this technique, the flaws or risks are found out as early as possible. There are various security testing approaches which are considered as general testing techniques, such as path testing, data flow testing, domain testing, and syntax testing [8]. These approaches are still considered weak in discovering the security flaws. The aim of this paper is to discuss about fault injection-based testing and property-based injection testing, applying the black box technique in both. I have chosen these two types of techniques because in fault injection, experiments running in linear time can be explored for new classes of faults. The property-based approach supports all system abstraction layers and it is non-intrusive.

3 Proposed Model The goal of this paper is discussing about fault injection-based testing and propertybased testing techniques. The proposed model is unique than the existing papers because I have shown the basic fundamental of each technique and have also shown examples which is very helpful to clear my understanding about the topic. Not only this I have also implemented two types of software—one for fault injection and the other for property-based testing technique. In the existing paper, they have used a white box testing which means that a tester must know the source code internal structure to perform the testing. But my empirical implemented software does not need to know the internal structure or source code of the particular system under test (SUT) which means my software is using the black box testing method.

Comparison Between Property-Based Software Security Testing …

723

3.1 Fault Injection-Based Testing In fault injection technique, fault models are required [9]. The model should be selected. The nature of the flaws influences model selection. Software errors caused by hardware faults, for instance, are often modeled via bits of zeroes and ones written into a data structure or a portion of the memory [10], whereas protocol implementation errors caused by communication are frequently designed via message dropping, duplication, reordering, delaying, etc. [11]. Understanding the nature of security flaws lays a groundwork for using fault injection. There are various related studies which have been concerned with security faults [12–14]. Fault injection is a technique through which the dependability of a system under test can be assessed. It involves injecting faults into a system and monitoring the behavior of the system in response to a fault. Various fault injection methods have been proposed and practically experimented till now. For this study, I have focused on SWIFI (software-based fault injection) techniques. It is worth noting, however, that both hardware-implemented and simulator-based fault injections have long been used for testing, particularly for hard real-time and mission-critical systems. Fault injection can be done in two different ways—indirect fault injection and direct fault injection. The fault injection method is based on the Environment-Application Interaction (EAI) fault model. The efficiency of the EAI model is in its capacity to simulate environmental defects that are likely to create security violations. Environmental defects impact an application in two different ways. Direct fault injection: The diagram (Fig. 2a) shows that the environment indirectly causes a security violation that means faults in the environment have an impact on an application via the internal property. Assume an application which uses the network for input. Any flaws in the network message connected to this input are inherited by an internal object. The application copies a memory from this context to an internal buffer. The limits of the buffer are not examined by the program. The application fault in the network messages the fault being “message too long!” This circumstance causes a violation of security. Indirect fault injection: The direct impact of environment problems on an application is depicted in Fig. 2b. Let us look at an illustration of this second type of contact. Let us say a program needs to run a file. There are two options. One is being that the user who executes the application owns the file. The file’s owner is indicated via the environment property in this case. The execution is secure in this instance. The file may additionally be the property of a malicious user. The malevolent user has caused an environment issue. The unauthorized user has caused an environment issue. Now, whomever executes the application assumes that the file is a part of it. If the application does not deal with this environment flaw, it might run arbitrary commands in that file, thereby resulting in a security violation.

724

T. Saha and Md. K. Alam

Fig. 2 Interaction model

3.2 Property-Based Testing Property-based testing (PBT) is a random testing method in which the behavior of the system is revealed by a description of valid inputs to the SUT. In this method, the predicted attributes are hold when the system is subjected to instances of valid inputs. These definitions are used by a property-based testing tool to generate increasingly more complex inputs. The tool then subjects the SUT to these inputs. Then, it checks if the outputs falsify the properties or not. With this approach, the manual tasks are reduced to exactly defining the SUT’s characteristics and generating a set of properties that precisely capture the device’s intended behavior. Since they operate on properties, PBT tools are more condensed and simpler to create and understand than full system specifications, which are effectively incomplete specifications of the SUT. Users can make full use of the host language when writing properties and can accurately describe a wide variety of input–output relations. They may also write their own test data generators. Compared to testing systems with manually written test cases, “testing with properties” is a time-consuming process. The resulting properties are also much more concise than a long series of unit tests.

4 Implemented Software for Testing Software Vulnerability In this research work, I have implemented two applications (code injectors) for the purpose of testing vulnerability of software applying the black box testing technique. One more thing, such types of software are not available in the internet to test the vulnerability of an application. My implemented software is simple. I have created some Sample applications for the sake of my research work. Then, I used that software

Comparison Between Property-Based Software Security Testing …

725

(code injector) for verifying the vulnerability which clearly stands out as the most realistic practical example.

4.1 Fault Injector Figure 3 represents the flowchart of fault injector. In my Sample application (written in C programming language), I represented a multiplication operation using addition operator (Fig. 5). The output of the program (Fig. 6): My fault injector (Fig. 4) works with executable files written in C. It can inject custom text which will be printed in the console during execution of that program, and at the same time, it can also inject another program into the C executable, which will be executed as a child process during the execution of the program. When user selects an executable file as input, my fault injector automatically retrieves all the method names from that executable file. So, source code of that program (C executable) is not required. User can select a method from the list in which he/she can inject the desired text/external executable. For example, my Sample application prints the multiplied value of 20 and 9, which is 180 (Fig. 6). Now, this program has three methods named Main, SwapByXOR, and MultiplyByAddition (I can see in the source code in Fig. 6). I wanted to inject the text “«code injected by Trina Saha Mou»” in the Main method. So, I selected the Sample application in my fault injector, and it loaded all the methods. Then, I selected Main method from the drop-down menu of my fault injector and written my desired text and selected notepad as the external executable and clicked inject button. It generated the injected executable. After running the new executable file, I got the output as shown in Fig. 7.

4.2 Property-Based Injector Figures 8 and Fig. 9 are representing the flowchart and code injector of propertybased injector. In my Sample application (written in C programming language), I represented a multiplication operation using addition operator (Fig. 10). The output of the above program (Fig. 11): The output of Sample program using code injector (Fig. 12). My property modifier can automatically retrieve all the processes running in an OS. A user needs to select a process and search for a value by placing a value in the specified field and clicking the search button. My code injector (property modifier) will then go to the address, where the selected process is located in the memory and starts scanning the entire memory address space allocated for that particular process (including all the address spaces allocated for the threads of that process) by the OS. Whenever an address is found to contain the expected value, that memory address (hexadecimal) is added to the list (labeled as memory addresses). When user selects a memory address, places a value in the value field, and clicks the inject button, the

726 Fig. 3 Flowchart of fault injector

Fig. 4 Fault code injector

T. Saha and Md. K. Alam

Comparison Between Property-Based Software Security Testing …

727

Fig. 5 Fault code injector

Fig. 6 Output without using code injector

value is written to that address, and if that address is associated with a variable, that change is reflected in that program’s (process) execution path. My Sample application is running continuously (Fig. 11). Initially, I did not modify any of the two properties (value and desired value) of the Sample application. Then, I tried to change the value 775–999 (Fig. 12). So, I searched for value 775 in Sample application using code injector (property modifier) and found the memory address 13FDA3000. Then, I injected value 999 and checked that the status of the Sample application is changed. I also changed that property desired value 809 in the same way and found that status

728 Fig. 7 Output using fault code injector

Fig. 8 Flowchart of code injector (property modifier)

T. Saha and Md. K. Alam

Comparison Between Property-Based Software Security Testing …

729

Fig. 9 Property code injector

Fig. 10 Sample application code

is again changed. So I can see that the program’s (Sample application) execution path is being changed externally by my code injector (property modifier). No source code or executable file is required for this approach.

730

T. Saha and Md. K. Alam

Fig. 11 Output without using code injector (for property modifier)

Fig. 12 Output using property code injector

5 Advantages and Disadvantages of Fault Injection Property-Based Technique 5.1 Fault Injection-Based Testing Advantage: 1. Target users of this technique are those applications and operating systems, which are facing difficulty in using hardware fault injection. 2. A high number of fault injection experiments can be undertaken when trials are conducted in close to real time. 3. The fault injection experiments are running on the real hardware. This approach has the advantage of including any design glitches that may exist in the hardware and software design itself. 4. Does not have any requirement of specialized hardware; modest level of complexity incurs minimal development and implementation cost. 5. Model development is not required, same as validation. 6. We can also expand the existing one for new classes of faults.

Comparison Between Property-Based Software Security Testing …

731

Disadvantage: 1. Injection instants are limited, which are only for assembly instruction level. 2. Locations that are unavailable to software flaws cannot be injected. 3. To support the fault injection, source code modification is needed. This means that the executing code in the fault experiment is not the same code that will run in the field. 4. Observation and control ability are constraint. At best, one would be able to corrupt the internal processor registers (as well as certain locations within the memory map) that are visible to the programmer, which is known as the programmer’s model of the processor. So faults cannot be introduced in the processor pipeline or instruction queue for instance. 5. Permanent faults are very hard to design. 6. Related to four, execution of the fault injection software could affect the scheduling of the system tasks in such a way as to cause hard, real-time deadlines to be missed, which is inconsistent with assumption two.

5.2 Property-Based Testing Advantage: 1. Each system abstraction level is supported by this method. 2. Not interrupting. 3. Full control of both property models and injection techniques. 4. Lowcost computer automation; does not have any requirement of specialized hardware. 5. Monitoring and control ability are high. 6. Performs reliability evaluation at different levels in the design process. 7. Transient and permanent faults are modeled here. Disadvantage: 1. Development effort is high. 2. Time consuming. 3. It is difficult to find models; one must rely on property accuracy. 4. The result’s accuracy relies on how efficiently the model is used. 5. Real-time property-based injection is impossible in a model. 6. Property may not include any of the design flaws that may be present in the actual hardware.

6 Basic Comparison Between Fault Injection and Property-Based Techniques See Table 1.

7 Conclusion The goal of technology enhancement is to serve humanity by ensuring the security. Nowadays, the whole world especially the industries is increasingly more dependent on technology. This phenomenon has increased more security risks and security threats for the consumers [15–17]. So we have to be more concerned in developing protected software. In this paper, I have discussed two most popular software security

732

T. Saha and Md. K. Alam

Table 1 Comparison table between fault injection and property-based techniques Fault injection

Property-based injection

1

Can inject any malicious code or content

Can inject any variable to modify the property

2

It must change the execution path of a program It may/may not change the execution path of a program

3

No model development or validation required

There exists model development or validation

4

Can verify any software through injecting another software

No possibilities of injecting through software

5

Limited set of injection instants

Unlimited set of injection instants

6

Time consuming

Takes much time than fault injection

7

Injecting fault in executable written high-level Difficulty level remains same regardless of language like Java, C is easier than low-level the level of language language like C, C++

testing methods which are black box-oriented. Fault injection technique involves injecting faults into a system and monitoring the behavior of the system in response to the fault. This technique is a fruitful way to evaluate the presence of hidden bugs. Using the property-based testing, one can change the value of the variables and can verify security level of an application which means one can change the execution path by changing the property of that application without knowing the source code of that application. I have also shown two different types of software (code injector) which were implemented by me for the sake of my research work. One of my aims was to represent that how an application could be verified as secured using these code injectors. And, I successfully represent it as an application. At last, I have also shown a basic comparison between these two types of testing techniques. So finally, I can say that my research work is successfully done; below I have given my implemented software link, so anyone can visit it. As a future work, I would like to make my implemented tools more automated and also want to improve the performance. Here is my implemented software link: https://drive.google.com/drive/folders/1zaVCciQUlbQSymSdD6vNjxDBWsyl −gR?fbclid=IwAR3RdfLUgR9LJ3txxbAaqkMTi5l5oWzQlH–Wmk9p– B3u4ex9tKlDXW08JZQ.

References 1. IEEE Std. 1633–2008 IEEE Recommended Practice on Software Reliability, pp c1–72, June 2008 2. Du I, Mathur AP (1998) Vulnerability testing of software system using fault injection. Coast TR 98-02 3. Du I, Mathur AP (2000) Testing for software vulnerability using environment perturbation. In: Proceedings of the DSN 2000, pp 603–612

Comparison Between Property-Based Software Security Testing …

733

4. Arkin B, Stender S, McGraw G (2005) Software penetration testing. IEEE Secur Priv 3(1):84– 87 5. Fink G, Levitt K (1994) Property-based testing of privileged programs. In: Proceedings of the 10th annual computer security applications conference, Orlando, FL, USA 6. Yan J et al (2004) Survey of model-based software testing. Comp Sci 31(2) 7. Xia Y et al (2006) Security vulnerability detection study based on static analysis. Comp Sci 33(10) 8. Beizer B (1990) Software testing techniques. Van Nostrand Reinhold, New York 9. Clark J, Pradhan D (1995) Fault injection: a method for validating computer-system dependability. IEEE Comp 47–56 10. Han S, Shin K, Rosenberg H (1995) Doctor: an integrated software fault injec-tion environment for distributed real-time systems. Technical report, University of Michigan, Department of Elect. Engr. and Computer Science 11. Dawson S, Jahanian F, Mitton T, ORCHESTRA: a fault injection environ-ment for distributed systems. In: 26th international symposium on fault-tolerant computing (FTCS), pp 404–414 12. Aslam T (1995) A taxonomy of security faults in the unix operation system. Master’s thesis, Purdue University 13. Du W, Mathur A (1997) Categorization of software errors that led to security breaches. Technical Report COAST Technical Report 97–09, Purdue University, Department of Computer Sciences 14. Krsul I (1997) Computer vulnerability analysis thesis proposal. Technical Report CSD-TR97–026, Computer Science Department, Purdue University 15. Khan SA, Khan RA (2010) Securing object oriented design: a complexity perspective. Int J Comp Appl 8(13) 16. Definition of Accountability (2014) Available at: http://en.wikipedia.org/wiki/Accountability. Last visit Oct 03 17. McGraw G (2004) Software security. IEEE Secur Priv 2:80–83

Sentiment Analysis on Social Media Data: A Survey Kanchan Naithani

and Y. P. Raiwani

Abstract Machines are continually being harnessed in the present era of advanced technology to deliver accurate translations of what people communicate on social media. Humankind is today engulfed in the notion of how humans think and what they think. The decisions made as a result are mostly dependent on the masses’ drift on social media platforms. This paper presents a multidimensional look at how sentiment analysis rose to prominence as a result of the unexpected expansion of content on the web. This manuscript also discusses the process of acquiring data from social media over time, as well as the detection of similarities based on similar choices made by people on social media. The approaches for communalizing user information are also discussed. The data in various forms have been examined and presented. Aside from that, the techniques for reckoning sentiments have been researched, classified, and compared, along with certain limitations, in the hopes of paving the way for future research. Keywords Sentiment analysis · Social media · Social networks · Cluster community

1 Introduction The process of categorizing the opinions stated about a specific thing or object is known as sentiment analysis. With the development of various technologies, it has become critical to be aware of the general public’s opinion in areas of products and business and general likes and dislikes [1]. Identifying the emotion behind social media posts can assist in determining the context in which one will react. Two types of social media are available, i.e., online communities and social networks. People K. Naithani (B) · Y. P. Raiwani Department of Computer Science and Engineering, HNB Garhwal University, Srinagar Garhwal, Uttarakhand, India e-mail: [email protected] Y. P. Raiwani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_59

735

736

K. Naithani and Y. P. Raiwani

who are connected through past personal interactions build social networks, which they maintain socially and would prefer to connect to new people to expand their connections. In contrast to the former, communities are made up of people from various fields who have little or no common ground. The passion for a known hobby serves as the primary link between members of a community [2]. The focus of this paper is to observe the Significance of Social Media and how the Social Networks are used to analyze the sentiments over social media using various techniques. For this, the paper is divided into five phases that are Introduction; Significance of Social Media; Social Networks: Cluster and Community; Social Networks: Centrality Factors Measuring the Impact of a Node and Sentiment Analysis on Social Media Data. The rest of the paper is organized as follows: Initially, the Significance of Social Media is discussed. After that, discussion on the social networks on the basis of cluster and community is observed. Later, the Centrality Factors for nodes present in social networks are discussed. Then, a discussion on spam detection on social networks is experienced. Finally, a review on techniques used for Sentiment Analysis on Social Media Data is observed. At last, but not the least, significant statements that provide the conclusion of the entire work are delivered.

2 Significance of Social Media It has been noted that since 2004 [3], there has been a greater emphasis on sentiment analysis, where some metrics that demonstrate the significance of social media. Even some major business schools have incorporated social media into their curricula. An excellent example is Singapore which understands the value of technology, as seen by the number of universities in Singapore that are incorporating Social Media Marketing studies into their Executive Master’s Programs are increasing. The purpose of this transition is to assist students in overcoming the hurdles of social media as well as maximizing the benefits of this new marketing and communication medium [4]. Reviewing merely sentiments-related work falls short of meeting the criterion for a thorough understanding of the methodologies used to assess sentiment semantics. Therefore, it was needed to begin with a full overview of social media and its components before moving on to the main topic of sentiment analysis. Figure 1 shows various fields where social media presents its utmost significance.

3 Social Networks: Cluster and Community The use of social networking platforms for communication with well-known or stranger individuals is referred to as social networking. It could be for social objectives, such as bonding with friends or relatives, contemporaries or coworkers, or

Sentiment Analysis on Social Media Data: A Survey

737

Fig. 1 Significance of social media data [1–6, 23, 40] Text Analytics

Personal Networks

Interest Based Networks

Social Media Data Social Publishing

e-commerce

Sentiment Analysis

Online Reviews

Bookmarking sites Trends Identification

Market Research

for commercial purposes, such as bonding with users. The complete history that led to the development of Social Network Analysis (SNA) is contained in [5]. Few illustrations of emails given in [6] indicated that graph-based methods prove to be more effective at identifying the knowledge of the individuals within the organization than content-driven methods. [7] presented relational dependence networks, highlighting the notion that models capable of detecting connections result in better data classification. While clustering is among the most used strategies for identifying and measuring community structures in social networks, the concepts are not interchangeable. A variety of features are taken into account when working with clusters, the community tends to stick to one sort of character when it is found. There is a difference in how links are considered in clusters and communities; cluster discovery is simple on congested connections, whereas community discovery assumes that the network has very few connections.

3.1 Clustering Clustering is a technique for determining a typical congregation in the context of a set of components [8]. The main character of many kinds of classic clustering approaches, as well as the relatively new spectral clustering methods, aids in solving the challenge of discovering similarities in the conduct of a group of data [9]. Specific clustering algorithms, such as hierarchical clustering [10], have been used extensively on social media for folksonomy, which aids in recognizing the client’s desired interest as well as the related study of the service [11]. According to research, social graphs indicate clusters of links, important to the individuals [12], wherein criteria such as contact regularity, closeness, and recent activity aid in recognizing a significant online association.

738

K. Naithani and Y. P. Raiwani

Using various random regions integrating interconnected information including social characteristics to group the data, noisy link detection can also be minimized [13]. Clustering has also been used to discover the proximity of visits to linked clusters, both socially and geographically [14]. This approach outperforms classic spatial clustering [15] in terms of quickly grouping a large number of areas. With the recent explosion of social information, the challenge of discovering recurring elements in a huge database is handled by the MapReduce method [16] that uses the k-means clustering technique [17] and a priori [18] and Eclat [19] algorithms [20] to mine frequent data sets.

3.2 Community Detection To deal with the ever-increasing number of online documents being indexed while maintaining accuracy, it was decided to recognize a cohesive group and tying to appropriate links would address the issue [21]. Many systems were suggested for recognizing community models, and few of them are different in comparison with classical ways for finding communities yet inevitably give the same qualitative outputs and entail an excess of vertices [22], while the other employs ephemeral techniques, modularity perception, and cluster scrutiny [23]. The community’s purpose is to find people in social networks that are tightly linked together. The Lennard Jones clusters were examined in order to find potential energy landscapes [24]. Using the centrality metric of an edge in a network [25], methods were created to locate the most significant edge by removing the less significant edges one by one until isolated groups were formed. Raghavan et al. [26] investigated a method that solely examined the network’s structure and constructed a simple label broadcast algorithm in which each node is given a distinct tag for each stride and lone elements obtain the tag similar to the most of their neighbors presently have. Other important studies on the topic of community detection include strategies that used heuristics to optimize modularity [27] and a genetics-based strategy for identifying communities. In [28], after viewing the preliminary data visualizations, optical data mining approaches have been utilized for detecting intersecting communities for allowing effective constraint selection. The modularity function was analyzed, resulting in the presentation of honing the problem in terms of spectral quandary [29], which is a prominent example of discovering communities using eigenvectors of matrices.

Sentiment Analysis on Social Media Data: A Survey

739

4 Social Networks: Centrality Factors Measuring the Impact of a Node The centrality factor is another significant component of social networks, in which the unit regulating the network’s flow passage overcomes the network with high centrality. In [30], the methods for measuring the centrality factor were examined and two deviations were proposed: one to discover the centrality measure by traffic flow, and the other to get the centrality measure through network expansion. Pererand and Shneiderman [31] mentions the discovery of centrality and its diminishing influence in the case of error. In [32], it is shown how linked like-minded clusters in social networks can help spread terrorism and therefore should be taken care of. It’s tough to survey social networks in an organized manner because they include so many links and nodes. A system called Social Action was proposed in [30] to establish a balance between systematic but elastic social network discoveries. It effectively considers numerous geometrical and optical network analysis factors. The parameter ranking approach provides receiving summary information, straining nodes, locating outliers, integrating nodes to reduce complexity, finding reliable subgroups, and additionally taking notice of communities of interest. In [31], social networks have been used to predict psychological fitness, with dissimilar networks helping in forecasting distress indications in the outliers. Backstrom et al. [33] developed ways to forecast how big groups within networks would grow as participants or as a substance, looking at specific networks to see who was joining a particular overlapping community. Kiss and Bichler [34] used network data in real time to assess several centrality criteria in order to promote messages over a network. SenderRank, a unique centrality metric, is described that can beat many current prevalent measures, although the type of communication and its contents, as well as the type of network, have an impact on the forecast of favorable customers [35]. Since social networks appear to shape an individual’s overall habits, they should also reflect a person’s mental state as well as geographical inclinations in social networks, which will eventually alter that person’s traveling tendency. However, the difficulties encountered in gathering data for this study revealed that various elements such as financial stability, psychology, and future goals influence prediction [36]. With the emergence of smartphones into our everyday lives, the authors of [35] showed us how to collect information from smartphones. The technique produced a realistic picture telling about the connections between people, as well as the ability to track the evolution of these relationships through time. However, the most crucial criterion to consider while using data with these categories was privacy and security. Breaching security in this situation might be a source of serious worry. As a result, people must remain careful about their links, which may reveal personal and important data [37]. MobiClique, a tool, has been proposed in [38], which connects to various gadgets in the vicinity. This middleware allows messages to be conveniently spread across multiple networks. To improve social network analysis, a new concept

740

K. Naithani and Y. P. Raiwani

called ModuLand was introduced in [39], in which linked sections of a community with a certain centrality threshold are referred to as modules. ModuLand procedures are divided into the following stages. Creating the modules that govern a link is done in the first phase, followed by creating a landscape for the community, then structuring the hills, and determining the upper level and lastly, establishing a chain of better networks.

5 Sentiment Analysis on Social Media Data As social media is primarily concerned with user comments, it is critical to effectively evaluate them to better comprehend the feelings and views of the general public. The ease with which people utilize social networking platforms to express their opinions on each event necessitates the exploration of emotions and attempts to find the best ways to analyze them. In recent years, there has been a greater emphasis on incorporating public sentiment through online channels. Individual facts discovered via computation aid in the formation of intended views for use by judgment makers. The concept of Sentiment analysis is becoming more popular, due to the significant technological improvement and unrestricted access to social media. Analysis of sentiments require the use of NLP and its various tasks, such as feature identification, microtext analysis, irony detection, and anaphora detection.

5.1 Data Acquisition To effectively anticipate the viewpoint of the users publishing content on public platforms, a considerable deal of attention is required to collect data from it. Popular social networking sites such as Twitter, Facebook, YouTube, etc., allow users to subscribe, like, post, comment, and share the content with other people in their network. Sites like Digg, Flickr, etc., on the other hand, are stepping up their efforts to provide various tools for improving social connections. Data acquired via social media could be noisy or not, similar or diverse, including a wide range of topics, and so on. Data is obtained from social media using the following methods: (1) (2) (3) (4)

Gathering of new data. Repeated use of previously accessible information. Repeated use of data that does not belong to a particular individual. Obtaining information from the Internet sources like social media communications.

Technically, there are three primary methods for gathering data that are widely used, which can be observed in the following Table 1.

Sentiment Analysis on Social Media Data: A Survey

741

Table 1 Primary methods for data acquisition [4] Traffic analysis on networks

Ad-hoc applications

Crawling

It is a technique of gathering packets of a network to track surfing data within the network. This strategy is rarely used, especially in private organizations, due to security issues

These are a set of air position indicators (APIs) that provide data about an account owner on a certain site and can track the user’s active activities

The most common approach for obtaining information from social networking platforms is crawling. User data is obtained by submitting a query to procure a certain kind of information. Crawling also aids in obtaining particulars via application programming interfaces given on certain social networking websites

5.2 Sentiment Analysis Techniques The two basic sentiment-extracting strategies are classification-based and lexiconbased. First, emotion-related expressions are utilized to assess valence shifters, which are responsible for transmitting the text organization’s attitude, and then the sentiment is determined. While determining sentiments, two hypotheses are considered: First, those emotions are regardless of situations, and second, that sentiments may be expressed using numbers. In [40], researchers explain how to develop classifiers for emotions stated on social media using a hybrid unsupervised technique that includes dictionary-based methods, natural language processing, and ontology methods. The research work conducted over sentiment analysis showcasing various techniques with their characteristic features, outcomes, and limitations can be observed in Table 2.

5.3 Research Observed Over Different Dimensions of Sentiment Analysis Over Social Media Data The components of social media, such as posting texts with photographs, were highlighted in [47], allowing for both lonesome and multifaceted approaches to assessing social media sentiments. Khan et al. [48] developed a technique called SentiMI which isolates individual SentiWordNet samples from object-oriented ones, extracts chunks of speech, and analyzes the combined data for both pessimistic and optimistic notions. Villarroel Ordenes et al. [49] focused on a unique method in which sentiments detection was based on verbal communication processes. The application of the above notion is based on certain disparities and asymmetric qualities of apparent and inferred statements, as well as the direct effect that discourse patterns have on attitude power.

742

K. Naithani and Y. P. Raiwani

Table 2 Techniques for sentiment analysis over social media data Ref. no.

Sentiment analysis and classification techniques

Parameters

Outcomes

Limitations

[41]

Aspect based opinion summary, supervised and unsupervised classification technique for document level classification

Pointwise mutual information (PMI) Statistical Dependence

Presented an abstract model of sentiment analysis, which formulated the problem and provided a common framework to unify different research directions

Worked only on Positive and Negative emotions while ignoring other sub-categories of sentiments

[42]

VADER: valence Validated aware dictionary for sentiment lexicon sentiment reasoning. Rule-based sentiment classification

VADER outperforms individual human raters (F1 classification accuracy = 0.96 and 0.84, respectively)

Document Level and Aspect Level Categorization are not included

[43]

Deep convolutional neural networks

Annotation accuracy and retrieval performance

370% performance gain on top-1 accuracy, 200% on top-5, and 150% on top-10

The problem of incomplete and incorrect labels could be handled more accurately

[44]

Sentribute, liblinear toolbox is used to implement SVM algorithm, eigenface-based emotion detection

Image sentiment classification performance, low-level feature-based and textual content-based baselines

Correlation coefficients of textual sentiment and visual sentiment for low-level and mid-level features generated promising results

Worked only on Positive and Negative emotions while ignoring other sub-categories of sentiments

[45]

Supervised machine tf-idf, intent tags, learning, tenfold shot patterns cross-validation, intent Indexing, eigen faces, are used to predict look-wise emotions

Both textual data and pictures are useful in recognizing a user’s emotion on social networking platforms

Classification of sentiments on the basis of intent tags and shot patterns could have been more specific

[46]

cross-media public sentiment framework for microblog

Aids in the acquisition of a tiny and massive view of the intricacies of the emotions retrieved

Although the overall outcome have shown unique properties but still certain parameters are not exhibiting promising results

Linear interpolation for fusion

Sentiment Analysis on Social Media Data: A Survey

743

In recent years, academics have been focusing on a variety of aspects in order to calculate precise sentiment identification. Along with the individual strategies investigated on a regular basis, hybridization approaches are also used, which have proven to give greater results in the event of flawless sentiment identification. ElGazzar et al. [50] proposes a visual sentiment analysis approach that incorporates low and intermediate-level image attributes. To extract emotions from photos, supervised learning approaches such as K-nearest neighbor and support vector machines have been utilized. The methods of hue saturation intensity and singular value decomposition are used to create features. Also, discoveries have been made on how to gather sentiment from social networking platforms for raising health consciousness instead of medications.

6 Conclusion and Future Work It’s the need of the hour for humans to put the unstoppable increasing amount of information on social public platforms at the top of their priority list. Because social networks can represent practical and realistic difficult issues emerging from various domains, their challenges must be addressed. This study provides a thorough exploration of social networks and concepts linked to them. The study describes the work that has been done in the areas of social networks, community, and clusters. This paper mainly attempts to highlight the flaws in a wide range of studies in order to make it easier for researchers to use sentiment analysis methods after collecting data and information from social media. Novel and innovative ideas in sentiment analysis have also been included in the paper. More of the ubiquitous deep learning algorithms are expected to be used as they require the least human involvement.

References 1. Hoffman M, Steinley D, Gates KM, Prinstein MJ, Brusco MJ (2018) Detecting clusters/communities in social networks. Multivariate Behav Res 53(1):57–73 2. Leskovec J (2011) Social media analytics: tracking, modeling and predicting the flow of information through networks. In: Proceedings of the 20th international conference companion on World Wide Web, pp. 277–278 3. Hogenboom A, Heerschop B, Frasincar F, Kaymak U, de Jong F (2014) Multi-lingual support for lexicon-based sentiment analysis guided by semantics. Decision Support Syst 62:43–53 4. Canali C, Colajanni M, Lancellotti R (2011) Data acquisition in social networks: Issues and proposals. In: Proceedings of the International Workshop on Services and Open Sources (SOS’11). Citeseer 5. Wellman B, Toolkit essays. Cont Socio 37:3 6. Campbell CS, Maglio PP, Cozzi A, Dom B (2003) Expertise identification using email communications. In: Proceedings of the twelfth international conference on Information and knowledge management, pp 528–531 7. Neville J, Jensen D (2003) Collective classification with relational dependency networks. In: Proceedings of the Second International Workshop on Multi-Relational Data Mining, 1

744

K. Naithani and Y. P. Raiwani

8. 9. 10. 11.

Van Dongen SM (2000) Graph clustering by flow simulation. Ph.D.dissertation VonLuxburg U (2007) A tutorial on spectral clustering. Statistics Comp 17(4):395–416 Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254 Shepitsen A, Gemmell J, Mobasher B, Burke R (2008) Personalized recommendation in social tagging systems using hierarchical clustering, pp 259–266 Roth M, Ben-David A, Deutscher D, Flysher G, Horn I, Leicht-berg A, Leiser N, Matias Y, Merom R (2010) Suggesting friends using the implicit social graph. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp233–242 Qi G-J, Aggarwal CC, Huang TS (2012) On clustering heterogeneous social media objects with outlier links. In: Proceedings of the fifth ACM international conference on Web search and data mining, pp 553–562 Shi J, Mamoulis N, Wu D, Cheung DW (2014) Density-based place clustering in geo-social networks. In: Proceedings of the 2014 ACMSIGMOD international conference on Management of data, pp 99–110 Scellato S, Noulas A, Lambiotte R, Mascolo C (2011) Socio-spatial properties of online location-based social networks. In: Fifth international AAAI conference on weblogs and social media Tang J, Sun J, Wang C, Yang Z (2009) Social influence analysis in large-scale networks. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 807–816 Hartiganand JA, Wong MA (1979) Algorithm as136: a k-means clustering algorithm. J Royal Statis Soc Series C (Appl Stat) 28(1):100–108 Schwarz G (1978) Estimating the dimension of a model. The Annals Stat, 461–464 Borgelt C (2003) Efficient implementations of apriori and eclat. In: FIMI’03: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations. Citeseer, p 90 Gole S, Tidke B (2015) Frequent itemset mining for big data in social media using clust big fim algorithm. In: 2015 International Conference on Pervasive Computing (ICPC). IEEE, pp 1–6 Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Proceedings of the sixth ACMSIGKDD international conference on Knowledge discovery and data mining, pp 150–160 Newman ME (2004) Fast algorithm for detecting community structure innet works. Phys Rev E 69(6):066133 Donetti L, Munoz MA (2004) Detecting network communities: a new systematic and efficient algorithm. J Statis Mech: Theory Exper, 10, P10012 Massen CP, Doye JP (2005) Identifying communities within energy landscapes. Phys Rev E 71(4):046101 Fortunato S, Latora V, Marchiori M (2004) Method to find community structures based on information centrality. Phys Rev E 70(5):056104 Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures inlarge-scale networks. Phys Rev E 76(3):036106 Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities inlarge networks. J Stat Mech: Theory Exp 10:P10008 Chen J, Za¨ıane O, Goebel R (2009) A visual data mining approach to find overlapping communities in networks. In: 2009 International Conference on Advances in Social Network Analysis and Mining. IEEE, pp 338–343 Newman ME (2006) Finding community structure in networks using the eigen vectors of matrices. Phys Rev E 74(3):036104 Pererand A, Shneiderman B (2006) Balancing systematic and flexible exploration of social networks. IEEE Trans Visual Comp Graph 12(5):693–700 Fiori KL, Antonucci TC, Cortina KS (2006) Social network typologies and mental health among older adults. J Geront SeriesB: Psychol Sci Social Sci 61(1):P25–P32 Borgatti SP (2005) Centrality and network flow. Social Net 27(1):55–71

12.

13.

14.

15.

16.

17. 18. 19. 20.

21.

22. 23. 24. 25. 26. 27. 28.

29. 30. 31. 32.

Sentiment Analysis on Social Media Data: A Survey

745

33. Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation inlarge social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 44–54 34. Kiss C, Bichler M (2008) Identification of influencers—measuring influence in customer networks. Decis Support Syst 46(1):233–253 35. Eagle N, Pentland AS, Lazer D (2009) Inferring friendship network structure by using mobile phone data. In: Proceedings of the national academy of sciences 106(36), pp 15274–15278 36. Okamoto K, Chen W, Li X-Y (2008) Ranking of closeness centrality for large-scale social networks. In: International workshop on frontiers in algorithmics. Springer, pp 186–195 37. Zheleva E, Getoor L (2009) To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: Proceedings of the 18th international conference on World Wide Web, pp 531–540 38. Pietilainen A-K, Oliver E, LeBrun J, Varghese G, Diot C (2009) Mobiclique: middleware for mobile social networking. In: Proceedings of the 2nd ACM workshop on Online social networks, pp 49–54 39. Kova´cs IA, Palotai R, Szalay MS, Csermely P (2010) Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics. PloS One 5(9):e12528 40. Baracho R, Bax M, Ferreira L, Silva G (2012) Sentiment analysis in social network 41. Liuand B, Zhang L (2012) A survey of opinion mining and sentiment analysis. In: Mining Text Data. Springer, pp 415–463 42. Huttoand C, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8, no. 1 43. Chen T, Borth D, Darrell T, Chang S-F (2014) Deep sentibank: Visual sentiment concept classification with deep convolutional neural networks. arXivpreprintarXiv:1410.8586 44. Yuan J, You Q, Luo J (2015) Sentiment analysis using social multimedia. In: Multimedia data mining and analytics. Springer, pp 31–59 45. Hanjalic A, Kofler C, Larson M (2012) Intent and its discontents: the user at the wheel of the online video search engine. In: Proceedings of the 20th ACM international conference on Multimedia, pp 1239–1248 46. Cao D, Ji R, Lin D, Li S (2016) A cross-media public sentiment analysis system for microblog. Multimedia Syst 22(4):479–486 47. Niu T, Zhu S, Pang L, ElSaddik A (2016) Sentiment analysis on multiview social data. In: International Conference on Multimedia Modeling. Springer, pp 15–27 48. Khan FH, Qamar U, Bashir S (2016) Sentimi: Introducing point-wise mutual information with sentiwordnet to improve sentiment polarity detection. Appl Soft Comp 39:140–153 49. Villarroel Ordenes F, Ludwig S, DeRuyter K, Grewal D, Wetzels M (2017) Unveiling what is written in the stars: Analyzing explicit, implicit, and discourse patterns of sentiment in social media. J Cons Res 43(6):875–894 50. El-Gazzar AM, Mohamed TM, Sadek RA (2017) A hybrid svd-hsv visual sentiment analysis system. In: 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS). IEEE, pp 360–365

Smart Cradle System P. Harika, T. Chihnitha, V. Chaitanya, and M. Vani Pujitha

Abstract In this technical world, many parents find a hard time to take care of their children as they are busy with their jobs and all their other work. Many people think that taking care of the baby is the duty of the mother only. But it will be a very burden for the mothers to do all the household work and their job and also take care of the baby on the other hand. So, for people who believe in technology, this is a smart cradle that will be connected to the mobile of the parents. This system will help the parents to take care of their babies even from a long distance. This system is built based on the four parameters. They are wetness, motion, temperature and humidity, and live streaming of the baby. Here, DHT11 sensor is used to detect any temperature increase in the room and baby movement is detected by the IR sensor and live streaming will be provided by the esp32 camera. GSM module is also used to send alert messages and calls to the parents if any uncertainty is found with the baby and here will be automatic swinging also be added. This proposed system will decrease the burden on the parents and also helps to make the baby safe without any discomfort and give relief to the parents. Keywords Wetness sensor · IR sensor · Temperature and humidity sensor · Smart cradle · GSM module · ESP32 cam

1 Introduction Generally, a smart cradle is used to make the baby sleep safely. Most of the cradles these days are more costly but with fewer features. Some electronic cradles are used in cities but they are not safe because the signals from those electronic devices may harm babies. Babies feel more discomfort and cry even in the middle of the night which make both the baby and mother lose their sleep. It will be more burden for the parents to balance both personal and professional life. Sometimes this may show some effect on their professional work. This may lead to stress for the baby and even P. Harika (B) · T. Chihnitha · V. Chaitanya · M. V. Pujitha CSE Department, V. R. Siddhartha Engineering College, Vijayawada 520007, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7_60

747

748

P. Harika et al.

headaches and other health issues if they are left in the cradle. Another important thing to be considered is that nowadays most of babies are dying due to improper care and recklessness of their parents. Sudden infant death is the saddest thing to hear. As this cradle is designed in such a way to give real-time information about the baby. So that parents can check their baby’s condition even if they are engaged in other work also. Internet of things is the most frequently used word these days. IOT deals with the interconnection of devices through the Internet. IOT makes life easier by developing remote devices. There are many devices or sensors which help us to develop new technology devices in IOT. Here, some of the sensors like wetness (rain) sensor, IR sensor, servo motor, and esp32 camera are used in this smart cradle system. Hence, IOT is used for dealing with such issues. IOT will be more helpful to build such products. The scope of this system is to make the baby safe and help the parents and cradle that help the babies feel free to sleep. Some of the devices used in this system are the IR sensor, DHT11 sensor, ESP32 camera, servo motor, and Node MCU GSM module. This system will provide continuous monitoring of the baby and send alert messages and calls whenever any uncertain conditions are detected. Node MCU is an IOT device that is used as the main component in this project. It is mainly used to connect our device with WIFI. DHT11 is the sensor that is responsible for detecting room temperature and humidity. It is a low cost sensor. Infrared sensor is the sensor that is responsible for detecting if there is any movement by the baby. Rain sensor is the sensor that is responsible for detecting wetness of the baby bed. It is responsible for swinging the cradle.

2 Related Work Jabbar et al. [1] designed a cradle in such a way that one can operate the cradle remotely. The cradle can swing either automatically or by using the remote. An app is created to give continuous information about the baby. In this system, temperature and humidity, sound sensors are used and a Wi-Fi camera to give live streaming of the baby. The camera sends real-time alerts to parents when motion or sound is detected. Jeon and Kang [2] developed project is a sleep care kit, included with the wearable device. Before discussing this tool kit first we know that, what is sleep apnea, sleep apnea is when breathing stops and starts while your yop. Anyways it is a most frequent symptom for many people, it can cause chronic and degenerative brain disease these symptoms are not recognized at home. In response to this, we develop this sleep care kit in the form of a paar band kit. It provides wireless communication to the devices. Before using this kit, basically, we can configure the basic sleep quality on real-time systems. It is used to identify and detect sleep apnea in the sleep care kit. We can calculate the apnea time and information about the uncertain conditions and breathless situations through smart devices such as phones and laptops. Tupkar et al. [3] built a device using the sensors like moisture sensor, ultrasonic sensor, PIR sensor, etc. The cradle is built in order to detect the wetness hygiene of

Smart Cradle System

749

the baby and sends an alert through Wi-Fi, it detects any uncertain conditions of the baby or the movement of the baby through a PIR sensor and sends the alert to the parents, the ultrasonic sensor is used to detect the sound of the baby cry and try to automatically swing the cradle until the baby stops crying DHT12 sensor is used to detect the temperature and give an immediate report to the parent if the temperature increases suddenly and also even if humidity value is increased so that baby will be monitored by the parent in such situations easily. Srivastava et al. [4] designed a cradle for monitoring the child’s baby. Nowadays, the maximum the mothers are employees in any of the jobs so this project is this cradle may help them to monitor their babies remotely from anywhere. Childs are sent to their grandparents’ houses so that they can be monitored by their grandparents. We can avoid all these problems using this smart cradle system for monitoring their babies without any fear. The smart cradle can be used for monitoring the baby, the cradle will start swinging itself without any remote control and turns the buzzer on, and sends an alert message to the parents. And also the smart cradle system can find babies’ abnormal conditions that are based on the sensor’s information. Gare et al. [5] designed a cradle in such a way that one can operate the cradle remotely. The cradle can swing either automatically or by using the remote. An app is created to give continuous information about the baby. In this smart cradle system, temperature and humidity, and sound sensors are used and a Wi-Fi camera sends real-time alerts to parents when motion or sound has been detected the parent through the GSM module. Gare et al. [6]: Smart cradle system is introduced based on the IoT. This system includes various sensors. This sensor helps to build a smart cradle system. This system helps the mothers to take care of the baby. This system does not include a camera, and it only notifies us through messages but it does not send any call alerts. Sometimes mothers will be busy with other work so they do not check their mobiles all the time. This kit will be simple and easy to work. This system helps us in monitoring the safe conditions for the baby. Alswedani and Eassa [7] developed a smart baby cradle. Nowadays, baby care has important and also more challenging why because all the baby mothers are working mothers so, and it is a major issue challenge for all employed mothers. They cannot balance their professional life and personal life to care for their babies repeatedly. So, we can solve their working mother-baby caring challenges we introduce the smart baby cradle introduce this smart baby cradle based on the Internet of things based on the sensors working to provide the baby conditions and give the alerts through the text message. Phalke et al. [8] built a system using IoT sensors and a Raspberry Pi camera. There is a mobile application that involves all these system details. This application gives complete information about the baby’s health condition. A raspberry camera is used to give live streaming of the baby. This also notifies all the messages if any uncertain conditions occur. This system also gives information about any fever conditions of the baby also. This system is more useful for the parents to take care of their baby remotely. Automatic swinging can be done using remotely through that application.

750

P. Harika et al.

Dangi et al. [9]: In this system, continuous care can be taken through the Blynk application. This system includes different IoT devices such as Node MCU, wetness sensor, Wi-Fi camera, and sound sensor. This system is very complex to implement and hard to use. Sometimes electronic signals may harm the baby. The movement of the cradle is in the left and right directions it may lead the brain damage for the baby. PIR sensor is used for motion detection and moisture sensor for wetness detection and DHT sensor for temperature and humidity detection. This system gives round clock monitoring of the baby. This system helps parents to take care of the baby.

3 Proposed Method This is a smart cradle system proposed diagram. In this system, at the initial stage, as the system is started live streaming will be provided continuously until the system gets off using an ESP32 camera. Based on the other four parameters such as temperature, wetness, movement, and baby cry detection, the system responds either by sending the alert messages or calls if any output is from the sensors or just continues checking if no output from the sensors. Sometimes the system will not respond due to some connection errors. In such cases, continuously restart the system until it responds in the correct way (Fig. 1).

4 Algorithms Algorithm 4.1: Automatic Cradle Swing As the cradle is connected to the Internet, if any movement is detected, then the alert message or call will be sent to the parents indicating that their baby is crying. Step 1: Begin Step 2: Check whether the baby crying in the cradle Step 3: Check whether the movement is detected if detected, then send a message alert, call the parent, and the cradle will start swinging automatically Step 4: If not continue checking Algorithm 4.2: Wetness Detection A wetness sensor is used to detect if there is any wetness on the bed. If any hygiene is detected, then this system will provide a message to the parents. If not then continue checking. Step 1: Begin Step 2: Check if a bed is wet in the cradle

Smart Cradle System

751

Fig. 1 Proposed system

Step 3: Check whether moisture content is present if present, then sends a message alert to the parent Step 4: After checking if there is no moisture, then continue checking Algorithm 4.3: Temperature and Humidity DHT11 sensor is used for detecting the temperature. If any increase in temperature, then the notification will be sent to the parents and also call will be sent. Step 1: Begin Step 2: Check whether the temperature in the surroundings is increases Step 3: If the temperature change is detected, then send an SMS alert, and call the parent Step 4: If no continue checking Algorithm 4.4: IR Sensor For the detection of the motion of the baby, IR sensor will be used. If any movement is detected, then a message alert and call will be sent to the parents.

752

P. Harika et al.

Step 1: Begin Step 2: Check whether any movement of the baby in the cradle is detected Step 3: If any movement is observed, then send a message alert, and call the parent Step 4: If there is no movement detection, then continues checking Algorithm 4.5: Live Streaming The live streaming is provided to the parents through the Blynk application using esp32 cam. Step 1: Start Step 2: Live streaming of the baby is provided Step 3: End

5 Block Diagram In this system, we use the Node MCU ESP8266 as the main component to which all other components are connected. Here, wetness sensor (rain sensor) will be used to detect the wetness, an IR sensor will be used for detection of the movement of the infant, a DHT11 sensor is used to detect the temperature and an ESP32 camera is used to provide live streaming of the baby through the Blynk application (Fig. 2).

6 Results At first, the initial system displays a smart cradle system. Sometimes it may delay or cannot respond correctly. In such cases, try to restart the system until the system works correctly. After the system responds correctly, the system gives continuous temperature and wetness values on the LCD. Here, liquid crystal display (LCD) is the main output component. The system remains idle and gives only LCD outputs of temperature and wetness values until it finds any uncertain conditions occur. Here are uncertain conditions in the sense if any sudden change in temperature, any wetness detection, i.e., value increases more than 50, and any movement by the baby in the cradle is detected. In such cases, alert messages will be sent to the parents. In necessary conditions, a call alert will also be sent. Figure 3 represents the final system of the project which is the smart cradle system. Figure 4 shows the initial display of the system. Now let us consider the wetness sensor, the wetness sensor is connected to Node MCU and the GSM module. If suddenly the wetness is detected, then the system responds and sends the alert message to the parents. As the babies feel discomfort

Smart Cradle System

753

Fig. 2 Block diagram

Fig. 3 Overall system

if their bed is wet which may lead to the baby waking. So this system makes the parents respond fast when they get the alert message. Now let us consider the wetness sensor, the wetness sensor is connected to Node MCU and the GSM module. If suddenly the wetness is detected, then the system responds and sends the alert message to the parents. As the babies feel discomfort

754

P. Harika et al.

Fig. 4 Initial system

if their bed is wet which may lead to the baby waking. So this system makes the parents respond fast when they get the alert message (Fig. 5). The DHT11 sensor is used for temperature and humidity detection. This sensor gives the continuous monitoring of the baby’s room temperature. Infants need to be survived at a minimum temperature. So if the temperature reaches greater than 35 degrees, then an alert message and call will be sent to the parents (Fig. 6). In the third case, an IR sensor will be used to detection of the movement of the baby. IR sensor is always active. If the IR sensor detects any movement in the cradle, then the cradle starts swinging and sends the alert message and call to the parents. The system also responds by the automatic swinging of the cradle indicating that the movement detected in the cradle is that the baby is crying (Fig. 7). The live streaming of the baby is provided through the Blynk application using the esp32 camera (Fig. 8).

Fig. 5 Wetness sensor outputs

Smart Cradle System

755

Fig. 6 DHT11 sensor outputs

Fig. 7 IR sensor outputs

7 Conclusion and Future Work As the technology is being increased, this smart cradle helps most employed women to give their babies a safe environment to sleep in when they are engaged in other work. This smart cradle provides continuous monitoring of the baby and if any uncertain conditions occur, then it will notify the parent’s phone number and make calls if temperature and movement are detected. This cradle will be less costly and handy to use and also provides a comfortable place for the baby to sleep. The infant’s health is the major parameter that is to be always monitored This cradle will help the mothers to do their household work besides taking care of the baby at the same time using smart devices such as smart phones and laptops.

756

P. Harika et al.

Fig. 8 a Camera outputs (1) b camera outputs on Blynk app (2)

This system can be enhanced by developing an android application with the best interface, with which users can remotely operate the cradle and can be able to connect with the cloud.

References 1. Jabbar WA, Hamid SNIS, Ramli RM, Ali MAH (2019) IOT-BBMS: Internet of Things-based baby monitoring system for smart cradle. Inst Elect Electr Eng (IEEE), 7, 12 July 2. Jeon YJ, Kang SJ (2019) Wearable sleepcare kit: analysis and prevention of sleep apnea symptoms in real-time. Inst Elect Electr Eng (IEEE) 7, 20 March 3. Tupkar AB, Chahare P, Rade S, Wakade R, Bahirseth S (2020) Development of IOT based smart baby cradle. Int Adv Res J Sci Eng Tech (IARJSET) 7, January

Smart Cradle System

757

4. Srivastava A, Yashaswini BE, Jagnani A, Sindhu K (2019) Smart cradle system for child monitoring using IOT. Int J Inno Tech Exploring Eng (IJITEE) 8, July 5. Gare HS, Shahane BK, Jori KS, Jachak SG (2020) IOT based smart cradle system for baby monitoring. Inter J Creative Res Thoughts (IJCRT) 8, March 6. Gare HS, Shahane BK, Jori KS, Jachak SG (2019) IOT based smart cradle system for baby monitoring. Inter Res J Eng Tech (IRJET) 6, October 7. Alswedani SA, Eassa FE (2020) A smart baby cradle based on IOT. Inter J Comp Sci Mobile Comp (IJCSMC) 9, July 8. Phalke A, Shaikh I, Pawar Y (2020) Baby monitoring smart cradle using Raspberry Pi and IOT sensors. Inter J Res Educ Scient Methods (IJARESM) 8, June 9. Dangi M, Sarna S, Ahuja VK (2020) Design of smart cradle for infant health monitoring system using IOT. Inter J Scient Res Eng Develop (IJSRED) 3, June

Author Index

A Abdullah M. Baqasah, 75 Abhineet Kumar, 273 Abirami Murugappan, 381 Abolfazl Mehbodniya, 27, 121, 369, 425, 479, 525, 597, 645, 707 Achu Pushpan, 139 Ahmed Binmahdfoudh, 559 Aishwarya Jakka, 15 Akash Bhat, 491 Akshay, A., 501 Amarendra Kothalanka, 525 Amrita, I., 549 Amudha Kandasamy, 27 Ani, R., 153 Ansh, Samar, 581 Anto Bennet Maria, 597 Anupama Arjun Pandit, 513 Anusha, T., 213 Arodh Lal Karn, 369, 425 Arundhathi, M., 153 Arun Mishra, 513 Arya Raj, 139 Ashrith, S. D., 491 Ashwini, K., 697 Athira, 139 Atul Kumar, 513 Ayan Banerjee, 467

Bontha Mamatha, 169 Brijesh Khandelwal, 359 Bui Thanh Hung, 343

B Balasubramanyam, C., 305 Bhagyashri R. Hanji, 501, 549 Bhargav, K. M., 491 Bhavana Raj, K., 707 Bhupesh Kumar Dewangan, 359

H Hameetha Begum, S., 305 Harika, P., 747 Harish Kundra, 43, 435 Himanshu Harlalka, 87 Himasai, M., 607

C Chaitanya, V., 747 Chandra Mohan, M., 617 Charupalli Sunil Kumar, 617 Chihnitha, T., 747 Chinna Reddaiah, 1 Chirag Jagad, 87

D Darsana, J., 153 Debkanta Chakraborty, 467 Deepa, O. S., 153 Devanshi Jhaveri, 87 Devi Mani, 27, 479 Dhana Lakshmi, R., 381 Dhivya, P., 393 Divyapushpalakshmi Marimuthu, 707

G Gladson Maria Britto James, 597 Gopika, G. S., 63

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 565, https://doi.org/10.1007/978-981-19-7455-7

759

760 Hiranmayee Nandyala, 201 Hiren B. Patel, 665 Hyma, B., 169 Hyma, J., 687 I Ignatius A. Herman, 331 Indirani, A., 393 Ishika Chokshi, 87 J Jayanthi, S., 331 Jisha, R. C., 139 K Kalyan Chakravarti, Y., 607 Kanchan Naithani, 735 Karthiga, M., 393 Kaushik Rane, 183 Kirananjali, B., 607 Krishnapriya Singamaneni, 201 Kriti Saroha, 253 Krupa Patel, 665 Kusuma, T., 697 L Lakshmi Sujitha, T., 607 Lalu Banothu, 617 M Manoj, G., 63, 305 Mansi Verma, 455 Mansoor Habib Mazumder, 285 Md. Khorshed Alam, 239, 719 Muruganantham Rajamanickam, 479 N Nagamani, T., 393 Nancy Noella, R. S., 645 Narayana, M. V., 1, 297, 633 Narsimha, G., 403 Narsimha, V. B., 403 Nedumaran Arappal, 415 Nehalika Neha, 43 Niranjan, Y., 633 P Patel, N. D., 53

Author Index Pavan Kartheek Rachabathuni, 201 Prasanalakshmi Balaji, 343 Prasanthi, G., 201 Prasun Chakrabarti, 343 Princy Diwan, 359 Priya Velayutham, 121 R Raiwani, Y. P., 735 Rajkumar Kalimuthu, 331, 655 Rajasekar Rangasamy, 27, 121, 369, 425, 479, 525, 597, 645, 707 Ramachandran, V., 645 Ramya, V., 63 Ray, Saikat, 467 Renuka, C. R., 213 Reza, Md Majid, 435 Reza, Md Rashid, 435 Rishiraj Jagdish Tripathi, 535 Roopesh, G. B., 319 S Saidulu, D., 415 Sai Tejaswi Guntupalli, 101 Sakshi Choudhary, 273 Samik Mandal, 467 Sarthak Choudhary, 273 Satheesh Narayanasami, 525 Savitha, S., 675 Saxena, Khushi, 101 Sengan, Sudhakar, 27, 121, 369, 425, 479, 525, 597, 645, 707 Sen, Snigdha, 319, 491 Shah, Seyed Muzaffar Ahmad, 221 Sharad Bajaj, 43 Sheetal Kundra, 43 Shori, Maana, 253 Shreevershith, K., 319 Singh, Ajeet, 53, 415 Singh, Maheshwari Prasad, 285, 455 Singh, Satwinder, 221, 435, 581 Sovan Bhattacharya, 467 Sreedevi, N., 675 Sreekanth, K., 687 Srinivas Murthy, Abhishek, 501 Srivani, M., 381 Stalin David, D., 27, 121, 369, 425, 479, 525, 597, 645, 707 Stephen Jeswinde Nuagah, 169 Subba Laxmi, Ch., 1 Subbalakshmi, Chatti, 297 Subramaniam, Balu, 369

Author Index Sudha, M. Vijaya, 1 Sudharkar, B., 403 SumaSree, V., 213 Sushama A. Deshmukh, 569 Suyash Agrawal, 43 Swetha Venkata Ramana, D., 213

T Taiba Sana, 213 Tariq Ahamed Ahanger, 297 Tawde, Prachi, 87 Thakur, Jayesh, 183 Thirukrishna, J. T., 305 Thomas, Brindha, 655 Trina Saha, 239, 719 Tripathi, Aakanksha, 535 Tripathi, Geeta, 535, 569

761 V Vakula Rani, J., 15 Vamsi Kalyan Reddy, A., 491 Vani Pujitha, M., 747 Vellingiri Jayagopal, 121, 425 Vijaya Sudha, M., 633 Vijaya, H., 169 Vinay Kumar, E., 63

W Webber, Julian L., 27, 121, 369, 425, 479, 525, 597, 645, 707

Z Zangazanga, Limbika, 331