Data Intelligence and Cognitive Informatics: Proceedings of ICDICI 2023 (Algorithms for Intelligent Systems) 9819979994, 9789819979998

The book is a collection of peer-reviewed best selected research papers presented at the International Conference on Dat

188 105 17MB

English Pages 599 [579] Year 2024

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Data Intelligence and Cognitive Informatics: Proceedings of ICDICI 2023 (Algorithms for Intelligent Systems)
 9819979994, 9789819979998

Table of contents :
Preface
Contents
About the Editors
Automatic Sentence Classification: A Crucial Component of Sentiment Analysis
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 Data Collection
3.2 Pre-processing
3.3 Tokenization of Data
3.4 Algorithm Feeding
3.5 Best Model Selection
4 Result and Discussion
4.1 Statistical Analysis
4.2 Accuracy Graph
4.3 Confusion Matrix
4.4 Classification Report
5 Conclusion and Future Work
References
Real-Time Health Monitoring System of Patients on Utilizing Red Tacton
1 Introduction
2 Implementation
3 Proposed Methodology
4 Results
5 Conclusion
References
An Efficient Botnet Detection Using Machine Learning and Deep Learning
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 Load Dataset
3.2 Data Pre-Processing
3.3 Feature Selection
3.4 Handling Class Imbalance
3.5 Partition Dataset in Training and Testing
3.6 Apply ML/DL Models
3.7 Bot Detection
3.8 Model Evaluation
4 Experiments and Results
5 Conclusion
References
Wavelet Selection for Novel MD5-Protected DWT-Based Double Watermarking and Image Hiding Algorithm
1 Introduction
2 Related Works
3 Methodology Used
3.1 Architecture of Proposed System
3.2 Watermark Embedder
3.3 Watermark Extractor
4 Experimental Results
4.1 Pseudorandomness in Watermark Embedding
4.2 Performance Evaluation Metrics
4.3 Selection of Wavelet
4.4 Watermark Embedding and Extraction
4.5 Image Hiding
5 Conclusion and Future Work
References
Chaotic Map Based Encryption Algorithm for Secured Medical Data Analytics
1 Introduction
2 Related Work
3 Existing Techniques
3.1 Algorithm
3.2 Cryptography
3.3 Encryption
3.4 Decryption
3.5 Key
3.6 Steganography
3.7 Symmetric Encryption
4 Proposed Methodology
4.1 Secret Key Generation
4.2 Hahn’s Discrete Orthogonal Moment
4.3 QR Code
4.4 Modified Logistic Map
5 Results and Discussions
References
Gold Price Forecast Using Variational Mode Decomposition-Aided Long Short-Term Model Tuned by Modified Whale Optimization Algorithm
1 Introduction
2 Background and Related Works
2.1 LSTM Overview
2.2 Variation Mode Decomposition Details
2.3 Metaheuristics Optimization
2.4 AI Applications for Gold Price Forecasting
3 Whale Optimization Algorithm
3.1 Elementary WOA
3.2 Proposed Improved WOA Algorithm Used in Time-Series Forecasting Framework
4 Experimental Setup
4.1 Results and Discussion
5 Conclusion
References
Requirements for a Career in Information Security: A Comprehensive Review
1 Introduction
2 Cybersecurity Foundation
2.1 Information Security Expertise
2.2 Duties and Tasks of an IS Professional
2.3 Job Nature and Requirements
3 Research Approach
3.1 Words Used to Search
3.2 Selection Criteria
3.3 Rejection Criteria
3.4 Data Gathering
3.5 Quality Appraisal
4 Results
5 Conclusion
References
Intrusion Detection Using Bloom and XOR Filters
1 Introduction
2 Literature Survey
3 NIDS Implementation Using Bloom Filter
4 NIDS Implementation Using XOR Filter
5 Experimental Results and Discussions
6 Conclusions and Future Work
References
A Model for Privacy-Preservation of User Query in Location-Based Services
1 Introduction
2 Related Work and Literature Survey
3 Methodology
4 Results
5 Conclusion
6 Future Work
References
A GPS Based Bus Tracking and Unreserved Ticketing System Using QR Based Verification and Validation
1 Introduction
2 Literature Survey
3 Proposed System
4 System Architecture
5 System Functionality
5.1 QR Code
5.2 Crowd Management
5.3 GPS Location
5.4 Automatic Ticket Expiry
5.5 Passenger Application
5.6 Conductor Application
6 Backend
6.1 Seat Availability
6.2 Passenger Information
6.3 Administrative Information
7 Conclusion and Future Scope
References
FileFox: A Blockchain-Based File Storage Using Ethereum and IPFS
1 Introduction
2 Background
2.1 Blockchain Storage
2.2 INFURA
2.3 Truffle
2.4 MetaMask
3 Related Work
4 Proposed System
5 Implementation
6 Result and Discussion
7 Conclusion and Future Scope
References
Minimizing Web Diversion Using Query Classification and Text Mining
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset
3.2 Feature Extraction
3.3 Semantic Matching
3.4 Web Query Classification
3.5 Machine Learning Models
3.6 Deep Learning Models
3.7 Evaluation Metrics
4 Results and Discussion
5 Future Scope
6 Conclusion
References
Connect: A Secure Approach for Collaborative Learning by Building a Social Media Platform
1 Introduction
1.1 Critical Characteristics of Social Media Are as Follows
1.2 Social Media Platforms
1.3 Need for Using Collaborative Learning
1.4 Challenges and Issues to Build Social Networking
1.5 Cloud Computing
1.6 Encryption
1.7 Importance of Social Media Platforms for Collaborative Learning
1.8 Advantages of Collaborative Learning for Faculty-To-Faculty Interaction
1.9 Security Concerns While Building a Social Media Platform
1.10 Conventional Threats
1.11 Modern Threats
1.12 Targeted Threats
1.13 Reasons Behind Online Social Media Security
2 Literature Review
2.1 Encryption Techniques Used
3 Summary of Literature Review
3.1 Importance of the Study
3.2 The Opportunities that Will Be Provided Among the Users of the System Are as Follows
4 Proposed System
4.1 Proposed Algorithmic Process
5 Discussion
6 Conclusion
References
Smart Analytics System for Digital Farming
1 Introduction
2 Need for the Study
3 Related Works
4 Proposed Work
5 Conclusion
References
Sarcasm Detection for Marathi and the role of emoticons
1 Introduction
2 Related Work
3 Dataset and Annotation
3.1 Distribution of Tweets
3.2 Emoji Analysis
4 Pre-processing of Tweets
4.1 Cleaning of Tweets
4.2 Tokenization
5 Feature Extraction and Sarcasm Detection
5.1 Sentence Embedding Using Language Model
5.2 Embeddings for Emojis in the Tweet
6 Experiments and Result Analysis
7 Conclusion
References
Fleet Management Mobile Application Using GPS Shortest Path
1 Introduction
2 Literature Survey
3 Proposed System
3.1 Modules
3.2 Algorithm Used
3.3 Database
4 Result and Discussion
5 Conclusions
References
Finger Vein Biometric System Based on Convolutional Neural Network
1 Introduction
2 Related Work
3 Proposed System
3.1 Image Enhancement
3.2 Features Extraction
3.3 AES Encryption and Decryption
3.4 Convolution Neural Network
4 Result and Discussion
4.1 Image Enhancement
4.2 Feature Extraction
4.3 AES Encryption and Decryption
5 Conclusion
References
Embedding and Extraction of Data in Speech for Covert Communication
1 Introduction
2 Literature Survey
3 Existing Methodology
4 Proposed Methodology
5 Results and Evaluation
5.1 Results
5.2 Evaluation Metrics
6 Conclusion and Future Work
References
A Machine Learning Based Model for Classification of Customer Service’s Email
1 Introduction
2 Problem Statement
2.1 Shared Inboxes Fail in Managing the Emails as the Number Grows
3 Proposed Solution
3.1 Advantages of Email Multi-folder Categorization for Better Customer Support
3.2 Email with Multi-folder Categorization Works Well with a Database System
3.3 Work Together to Solve Problems at a Fast Pace
3.4 Provide Support in the 100% Context
3.5 Measure Individual Performance with Intuitive Reports
3.6 Architecture of Automatic Classification of Email
4 Research Methodology
4.1 Classification of Email Using the Technique of Machine Learning
4.2 Machine Learning Terminologies Used in Classification
4.3 Support Vector Machine
4.4 Advantages and Disadvantages for Using Support Vector Machine
4.5 Validation Tool/Dataset Used
5 Results
6 Conclusion
7 Limitations and Future Scope
References
Intelligent Identification and Multiclass Segmentation of Agricultural Crops Problem Areas by Neural Network Methods
1 Introduction
2 Methods and Materials
3 Results and Discussion
3.1 Justification of the Segmentation DNN Architecture
3.2 Learning Outcomes Developed by DNN
3.3 Discussion of the Results
4 Conclusions
References
Perishable Products: Enhancing Delivery Time Efficiency with Big Data, AI, and IoT
1 Introduction
2 Delivery Time in Our Context
2.1 Optimizing Delivery Time in Transportation: Exploring Algorithmic Approaches
2.2 Objective
2.3 Advantage
2.4 Challenger
2.5 Delivery Time Description
3 The Proposed Optimization
3.1 Model System
3.2 The Proposed Optimization for Delivery Time
3.3 Specification Parameter
4 Optımızatıon Results Analysis
4.1 Comparing CT with Integrated Recent Transportation Technologies
4.2 Evaluation and Comparison of Optimization Results
5 Conclusion
References
CNN Approach for Identification of Medicinal Plants
1 Introduction
1.1 Motivation
1.2 Scope
2 Problem Statement
3 Literature Review
4 Proposed System
4.1 Dataset
4.2 Data Preprocessing
4.3 Pretrained CNN Architecture
5 Result and Discussion
6 Conclusion
References
OBSERVO: Teaching Strategy Recommendation by Monitoring Student Behavior Patterns
1 Introduction
1.1 Problem Statement
1.2 Aims
1.3 Competitive Analysis
2 Theoretical Background
2.1 Background Technology
2.2 Face Detection and Recognition
2.3 Concept Details
3 Methodology
3.1 Thermal Image Processing Algorithm
3.2 Face Detection and Recognition Algorithm
3.3 Database Training Algorithm
3.4 Image Behavior Pattern Analysis Algorithm
3.5 Algorithm to Generate Teaching Strategy
3.6 Behavior Pattern—Teaching Strategy Mapping Using Learning Styles
4 Results and Discussions
4.1 Haar Cascading
4.2 Recognition Rates
4.3 Behavior Analysis and Teaching Strategy Generation
5 Conclusion
References
Earthquake Magnitude and Depth Prediction Based on Hybrid GRU-BiLSTM Model
1 Introduction
1.1 Motivation
1.2 Contribution
1.3 Paper Organization
2 Related Work
3 Proposed Work
3.1 Preprocessing
3.2 Gated Recurrent Units
3.3 Bidirectional Long Short-Term Memory Networks
4 Experiment
4.1 Dataset
4.2 Model Architecture
5 Results and Analysis
6 Conclusion
References
Homomorphic Encryption to Improve Pharmaceutical Data Security on the Cloud and Blockchain
1 Introduction
2 Lıterature Survey
3 Workflow of the Proposed System
3.1 Paillier Cryptosystem Algorithm
3.2 Working of Paillier Algorithm
4 Proposed System
4.1 Methodology
4.2 Manager
4.3 Employee
4.4 AES Algorithm
5 Desıgn
5.1 Manager
5.2 Employee
6 Experımentatıon and Results
7 Conclusıon
References
Medical Reimbursement Prediction Using Artificial Intelligence
1 Introduction
2 Data Summary
3 Methodology
3.1 Ground Truth
3.2 Data Preparation and Transformation
3.3 Feature Selection
3.4 Model Development
3.5 Explainability Using SHAP
4 Results
5 Strengths
6 Conclusion
References
An Intelligent System for Plant Disease Diagnosis and Analysis Based on Deep Learning and Augmented Reality
1 Introduction
2 Related Work
3 Proposed System
3.1 Data Set Description
3.2 Feature Engineering
3.3 Augmented Reality
3.4 Barracuda Package
3.5 Vuforia SDK
3.6 Cloudinary
4 Methodology
4.1 Conventional Neural Network (CNN)
4.2 K-Nearest Neighbours (KNN)
4.3 Transfer Learning (InceptionV3)
5 Result and Discussion
6 Conclusion
7 Future Work
References
Power Balancing in Four-Wheel Drive EV Using Carrier-Based PWM with Two-Level Inverter Fed Drive
1 Introduction
2 PWM Control Techniques
3 Different PWM Techniques
3.1 Sinusoidal PWM Technique
3.2 Coupled PWM Technique
3.3 Decoupled PWM Technique
3.4 Carrier-Based PWM Technique
4 Two-Level Inverter Topology
5 Block Diagram for Proposed PWM Technique
6 Simulation and Results
7 Hardware Setup and Its Result
8 Conclusion
References
Breast Cancer Detection Using B-Mode and Ultrasound Strain Imaging
1 Introduction
2 Literature Survey
3 Methodology
3.1 Ensemble Model
3.2 Loss–Accuracy Curve
4 Results and Discussions
5 Conclusion
References
Prompt Engineering in Large Language Models
1 Introduction
1.1 Overview of Large Language Models (LLMs)
1.2 Importance of Prompt Engineering for LLMs
1.3 Research Objective and Motivation
2 Prompt Engineering
2.1 The Process of Prompt Engineering
3 Prompt Engineering Techniques
3.1 Techniques for Optimizing Prompts
3.2 Advanced Techniques for Prompt Engineering
3.3 Demonstration Tasks for Prompt Engineering
4 Applications, Tools, and Trends of Prompt Engineering
4.1 Applications and Tools
4.2 Current Research and Future Trends of Prompt Engineering
4.3 Current Research and Future Trends of Large Language Models (LLMs)
5 Conclusion
References
Activity Identification and Recognition in Real-Time Video Data Using Deep Learning Techniques
1 Introduction
1.1 Baseline: Single-Frame-Human Activity Recognition
1.2 Late Fusion
1.3 Early Fusion
1.4 Slow Fusion (3D CNN)
1.5 CNN–LSTM: Long Range Convolutional Network (LRCN)
1.6 Slow Fast Network
2 Literature Survey
3 Description of Data Sets and Experimental Set-Up
3.1 Methodology of Data Set Description
3.2 Implementation and Testing
4 Results and Discussion
5 Conclusion and Future Scope
References
Article Summarization Using Deep Learning
1 Introduction
1.1 Extractive Summarization
1.2 Abstractive Summarization
2 Literature Review
3 Design and Methodologies
3.1 Text Summarization
3.2 Converting Summarized Text into Speech
3.3 Text Paraphrasing
3.4 Conversion of Text to PPT
4 Proposed Methodology
5 Performance Anaysis
6 Conclusion
7 Future Enhancement
References
Knee Osteoarthritis Severity Prediction Through Medical Image Analysis Using Deep Learning Architectures
1 Introduction
2 Literature Survey
3 Methodology
3.1 Data Collection and Pre-processing
3.2 Model Selection
4 Results and Discussion
5 Conclusion
6 Future Work
References
Prediction of Harmful Algal Blooms Severity Using Machine Learning and Deep Learning Techniques
1 Introduction
2 Literature Survey
3 Existing Methodology
4 Proposed Methodology
4.1 Dataset Description
5 Satellite Images
6 Elevation Data
6.1 Data Extraction
6.2 Feature Engineering
6.3 Image Processıng
6.4 Proposed Model Buildıng
6.5 Evaluate Model Performance
7 Experimental Results
8 Conclusion
9 Future Works
References
Comparative Analysis of Classifier Algorithms Based on Sentimental Reviews
1 Introduction
2 Related Work
3 Proposed Methodology
4 Experimental Study
4.1 Data Set and Features Description
4.2 Feature Extraction
4.3 Classifier Algorithms
5 Results and Discussion
6 Conclusions
7 Future Scope
References
Enhancing Road Safety: A System for Detecting Driver Activity Using Raspberry Pi and Computer Vision Techniques with Alcohol and Noise Sensors
1 Introduction
2 Literature Survey
3 Methodology
3.1 Driver Exhaustion Detection Architecture
3.2 Video Capture
3.3 “Haar Cascade” Algorithm
3.4 Shape Predictor
3.5 Image Processing
3.6 Calculate EAR
3.7 Calculate MAR
3.8 Fatigue Detection with Face Detection and Features Extraction
3.9 Sound Detection
4 Conclusion
References
A Decision Support System for Prediction of Air Quality Using Recurrent Neural Network
1 Introduction
2 Literature Survey
3 Problem Analysis and Proposed Strategy
3.1 Data Preprocessing
3.2 Data Visualisation
3.3 LSTM Data Presentation
3.4 Fitting the Model
3.5 Evaluating the Model
4 Experimental Results
5 Conclusion
References
Trust Aware Distributed Protocol for Malicious Node Detection in IoT-WSN
1 Introduction
2 Proposed Methodology
2.1 TADP System Model
2.2 Trust Aware Distributed and Secure Data Aggregation
2.3 Trust Protocol Optimizes the Misclassification of Node Identification
2.4 Trust Aware Protocol to Detect the Malicious Node and Remove
3 Performance Evaluation
3.1 Identification
3.2 Misidentification
3.3 Throughput
3.4 Comparative Analysis
4 Conclusion
References
A Review on YOLOv8 and Its Advancements
1 Introduction
2 Existing Object Detection Models
3 Overview of YOLOv8
4 Architecture of YOLOv8
5 Architecture Components
6 Architectural Advancements
7 Training and Inference
7.1 Downloadable Python Package via Pip
8 Command Line Interface (CLI)
9 YOLOv8 Python SDK
10 YOLOv8 Tasks and Modes
11 User Experience (UX) Enhancements
12 Performance Evaluation
12.1 Object Detection Metrics
13 Benchmark Datasets and Computational Efficiency
14 Applications and Use Cases
15 Conclusion
References
Assistance for Visually Impaired People in Identifying Multiple Scenes Using Deep Learning
1 Introduction
2 Related Works
3 Methodology
4 Process
5 Result and Discussion
6 Conclusion
References
Identification of the Best Combination of Oversampling Technique and Machine Learning Algorithm for Credit Card Fraud Detection
1 Introduction
2 Literature Survey
3 Existing Model
4 Proposed Model
5 Working Methodology
6 Results and Discussions
7 Conclusion
References
Image-To-Image Translation Using Pix2Pix GAN and Cycle GAN
1 Introduction
2 Related Work
3 Methodology
3.1 Pix2Pix GAN
3.2 Cycle GAN
4 Implementation
4.1 Steps Involved in Implementing Pix2pix GAN
4.2 Steps Involved in Implementing Cycle GAN
5 Results
6 Conclusion and Future Scope
References

Citation preview

Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar

I. Jeena Jacob Selwyn Piramuthu Przemyslaw Falkowski-Gilski   Editors

Data Intelligence and Cognitive Informatics Proceedings of ICDICI 2023

Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK

This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.

I. Jeena Jacob · Selwyn Piramuthu · Przemyslaw Falkowski-Gilski Editors

Data Intelligence and Cognitive Informatics Proceedings of ICDICI 2023

Editors I. Jeena Jacob Department of Computer Science and Engineering GITAM University Bangalore, Karnataka, India

Selwyn Piramuthu ISOM Department University of Florida Gainesville, FL, USA

Przemyslaw Falkowski-Gilski Gda´nsk University of Technology Gda´nsk, Poland

ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-99-7999-8 ISBN 978-981-99-7962-2 (eBook) https://doi.org/10.1007/978-981-99-7962-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.

The ICDICI 2023 conference is solely dedicated to all the editors, reviewers, and authors of the conference event.

Preface

The 4th International Conference on Data Intelligence and Cognitive Informatics [ICDICI 2023] was held in Tirunelveli, Tamil Nadu, India, on June 27–28, 2023, at SCAD College of Engineering and Technology. The proceeding of ICDICI 2023 conference is presented here with the aim to share and exchange state-of-the-art research ideas about the different aspects of data and informatics research with a special attention to the encountered practical challenges and the potential solutions adopted to overcome it. We strongly believe that the research articles of ICDICI 2023 will give you a technically rewarding experience by providing more research information on the current issues of informatics and general data science interests. We have received 297 submissions from across the country and also from overseas by representing government, industries, and academia. Finally, 42 manuscripts are shortlisted depending on the results of the peer review process. ICDICI 2023 promises to be more informative and research simulating with a magnificent array of keynote speakers across the globe. The program consists of invited sessions, presentations, and technical discussions with the most eminent and proficient speakers and session chairs by covering a wide range of topics in data intelligence. Also, the conference delegates had a wide range of sessions in different domains of data science, informatics, and cognitive intelligence. We humbly wish to thank the organization staff, technical program committee, and reviewers of the conference for their valuable suggestions and timely response to the authors of ICDICI 2023. We also extend our gratitude to the authors and conference participants for contributing their novel research results to the conference. Special thanks to Springer publications. Bangalore, India Gainesville, USA Gda´nsk, Poland

Dr. I. Jeena Jacob Prof. Selwyn Piramuthu Dr. Przemyslaw Falkowski-Gilski

vii

Contents

Automatic Sentence Classification: A Crucial Component of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdur Nur Tusher, Md. Tariqul Islam, Mst. Sakira Rezowana Sammy, Md. Shibli Sadik, Shornaly Akter Hasna, and Gahangir Hossain

1

Real-Time Health Monitoring System of Patients on Utilizing Red Tacton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Thilagaraj, C. Arul Murugan, and Kottaimalai Ramaraj

17

An Efficient Botnet Detection Using Machine Learning and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anagha Patil and Arti Deshpande

29

Wavelet Selection for Novel MD5-Protected DWT-Based Double Watermarking and Image Hiding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . N. G. Resmi

41

Chaotic Map Based Encryption Algorithm for Secured Medical Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Sumathi, S. Gopika, S. Nivedha, and R. Kalaimathi

59

Gold Price Forecast Using Variational Mode Decomposition-Aided Long Short-Term Model Tuned by Modified Whale Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sanja Golubovic, Aleksandar Petrovic, Aleksandra Bozovic, Milos Antonijevic, Miodrag Zivkovic, and Nebojsa Bacanin

69

Requirements for a Career in Information Security: A Comprehensive Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mike Nkongolo, Nita Mennega, and Izaan van Zyl

85

Intrusion Detection Using Bloom and XOR Filters . . . . . . . . . . . . . . . . . . . . R. Manimegalai, Batul Rawat, S. Naveenkumar, and M. H. N. S. Sriram Raju

99

ix

x

Contents

A Model for Privacy-Preservation of User Query in Location-Based Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 V. Sravani, O. Krishnaveni, and Anila Macharla A GPS Based Bus Tracking and Unreserved Ticketing System Using QR Based Verification and Validation . . . . . . . . . . . . . . . . . . . . . . . . . 127 M. Suresh Kumar, S. Niranjan Kumar, Murari Reddy Sudarsan, P. Gunabalan, and R. Varsha FileFox: A Blockchain-Based File Storage Using Ethereum and IPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Kavya N. Naik, Arnica R. Patil, Kinnari N. Patil, and Shraddha S. More Minimizing Web Diversion Using Query Classification and Text Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Smrithi Agrawal, Kunal Kadam, Jeenal Mehta, and Varsha Hole Connect: A Secure Approach for Collaborative Learning by Building a Social Media Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Sonali Lunawat and Vaidehi Pawar Smart Analytics System for Digital Farming . . . . . . . . . . . . . . . . . . . . . . . . . 181 K. Sumathi, Kundhavai Santharam, and K. Selvarani Sarcasm Detection for Marathi and the role of emoticons . . . . . . . . . . . . . . 193 Pravin K. Patil and Satish R. Kolhe Fleet Management Mobile Application Using GPS Shortest Path . . . . . . . 205 Dhiraj Patil, Sitaram Mane, Sumit Biradar, Swapnil Rankhamb, and Mansi Bhonsle Finger Vein Biometric System Based on Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 V. Gurunathan, R. Sudhakar, T. Sathiyapriya, T. Gokul, R. Vasuki, M. Sabari, and G. Uvan Veera Sankar Embedding and Extraction of Data in Speech for Covert Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Vani Krishna Swamy, R. Arthi, M. S. Srujana, M. Sushmitha, and J. Vaishnavi A Machine Learning Based Model for Classification of Customer Service’s Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Javed Akhtar, Md Tabrez Nafis, Nafisur Rahman, and Aksa Urooj Intelligent Identification and Multiclass Segmentation of Agricultural Crops Problem Areas by Neural Network Methods . . . . . 247 Aleksey F. Rogachev and Ilya S. Belousov

Contents

xi

Perishable Products: Enhancing Delivery Time Efficiency with Big Data, AI, and IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Saâdia Chabel and El Miloud Ar-Reyouchi CNN Approach for Identification of Medicinal Plants . . . . . . . . . . . . . . . . . 269 Tushar Kumar Maurya, Aryaman Singh, and V. Pandimurugan OBSERVO: Teaching Strategy Recommendation by Monitoring Student Behavior Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Rishaan Jacob Kuriakose, Sanchit Raj, M. Suguna, and C. U. Om Kumar Earthquake Magnitude and Depth Prediction Based on Hybrid GRU-BiLSTM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Abhiraj, Amit Rathor, Avaneesh Kumar Yadav, and Ranvijay Homomorphic Encryption to Improve Pharmaceutical Data Security on the Cloud and Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 N. Aravindhraj, V. C. Mahavishnu, V. M. Aswin Vishal Kumar, N. T. Nethiran, and P. Manoranjith Medical Reimbursement Prediction Using Artificial Intelligence . . . . . . . 327 Monica Gaur, Suman Pal, Rupanjali Chaudhuri, Oshin Benny Anto, and R. Kalaivanan An Intelligent System for Plant Disease Diagnosis and Analysis Based on Deep Learning and Augmented Reality . . . . . . . . . . . . . . . . . . . . . 341 G. A. Senthil, R. Prabha, J. Nithyashri, S. Revathi, and R. Mohana Priya Power Balancing in Four-Wheel Drive EV Using Carrier-Based PWM with Two-Level Inverter Fed Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 M. Chindamani, K. Sudhiksha Darshini, H. Shoaib Yusuf, and E. Naveen Breast Cancer Detection Using B-Mode and Ultrasound Strain Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 N. Anusha, Pyata Sai Keerthi, Manyam Ramakrishna Reddy, M. Rishith Ignatious, and A. Ramesh Prompt Engineering in Large Language Models . . . . . . . . . . . . . . . . . . . . . . 387 Ggaliwango Marvin, Nakayiza Hellen, Daudi Jjingo, and Joyce Nakatumba-Nabende Activity Identification and Recognition in Real-Time Video Data Using Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Anant Grover, Deepak Arora, and Anuj Grover Article Summarization Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 415 D. Femi, S. Thylashri, Roshan Baniya, Mukesh Kumar, and Rajat Kumar Bhagat

xii

Contents

Knee Osteoarthritis Severity Prediction Through Medical Image Analysis Using Deep Learning Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 427 C. Dymphna Mary, Punitha Rajendran, and S. Sharanyaa Prediction of Harmful Algal Blooms Severity Using Machine Learning and Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 N. Karthikeyan, M. Bhargav, S. Hari krishna, Y. Sai Madhav, and T. Sajana Comparative Analysis of Classifier Algorithms Based on Sentimental Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Santosh Kumar, Swastik Kashyap, and Rakhshan Khalid Enhancing Road Safety: A System for Detecting Driver Activity Using Raspberry Pi and Computer Vision Techniques with Alcohol and Noise Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 P. Sudarsanam, R. Anand, and Manoj Challa A Decision Support System for Prediction of Air Quality Using Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 R. Naga Sai Harshini, V. S. V. Jetendra, K. Sravanthi, and T. Sajana Trust Aware Distributed Protocol for Malicious Node Detection in IoT-WSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 S. Bhaskar, H. S. Shreehari, and B. N. Shobha A Review on YOLOv8 and Its Advancements . . . . . . . . . . . . . . . . . . . . . . . . 529 Mupparaju Sohan, Thotakura Sai Ram, and Ch. Venkata Rami Reddy Assistance for Visually Impaired People in Identifying Multiple Scenes Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 T. P. Divina, Rohan Paul Richard, and Kumudha Raimond Identification of the Best Combination of Oversampling Technique and Machine Learning Algorithm for Credit Card Fraud Detection . . . . 557 S. Srinivasan, A. L. Vallikannu, L. Manoharan, K. Deepthi, and B. Aravind Yadav Image-To-Image Translation Using Pix2Pix GAN and Cycle GAN . . . . . . 573 Ranjan K. Senapati, Renukunta Satvika, Aishwarya Anmandla, Gopidi Ashesh Reddy, and Ch Anil Kumar

About the Editors

Dr. I. Jeena Jacob is working as a Professor in Computer Science and Engineering department at GITAM University, Bangalore, India. She actively participates on the development of the research field by conducting international conferences, workshops, and seminars. She has published many articles in referred journals. She has guest edited an issue for International Journal of Mobile Learning and Organisation. Her research interests include mobile learning and computing. Dr. Selwyn Piramuthu is Professor of Information Systems at the University of Florida. He received his B.Tech., M.S., and Ph.D., respectively, from IIT-Madras, University of Arizona, and University of Illinois at Urbana-Champaign. His research interests include machine learning and cryptography with applications in medical informatics, supply chain management, financial credit risk scoring, IoT, among others. His book, co-authored with Wei Zhou titled, RFID and Sensor Network Automation in the Food Industry, was published by Wiley in 2016. Dr. Przemyslaw Falkowski-Gilski is a graduate of the Faculty of Electronics, Telecommunications and Informatics, Gda´nsk University of Technology. He graduated first degree B.Sc. studies (in Polish) and second degree M.Sc. studies (in English) in 2012 and 2013, respectively. Between 2013 and 2017, during Ph.D. studies, he pursues his interests in the field of electronic media, particularly digital broadcasting systems and quality of networks and services. In 2018, he receives the title of Doctor of Technical Sciences with distinction, discipline telecommunications, specialty radio communication. Currently, he works as an Assistant Professor. His field of interests is related with electronic media, particularly digital broadcasting systems, as well as quality evaluation of networks and services. His research and development interests include digital video and audio broadcasting systems, software-defined radio technology, location services and radio navigation systems, as well as quality measurements in mobile networks. Author of more than 50 scientific papers, one patent application, involved in approximately a dozen of both national and international conferences as a reviewer, committee member, board member.

xiii

Automatic Sentence Classification: A Crucial Component of Sentiment Analysis Abdur Nur Tusher, Md. Tariqul Islam, Mst. Sakira Rezowana Sammy, Md. Shibli Sadik, Shornaly Akter Hasna, and Gahangir Hossain

1 Introduction Sentiment analysis involves identifying and extracting the sentiment expressed in a given text, which can be classified as positive, negative, or neutral. It has various applications, such as marketing, customer feedback analysis, and political analysis. In the Bangla language, sentiment analysis is becoming increasingly important due to the growing volume of Bangla data on the Internet. However, the sentiment expressed in Bangla texts can be more complex and nuanced than in other languages, posing a challenge for sentiment analysis. Automated approaches have been developed to infer evidence from text data, with a focus on identifying relevant point terms from sentences. A sentence is a grammatical unit of language that consists of one or more words that convey a complete thought or idea. It typically includes a subject (the A. N. Tusher (B) · Mst. S. R. Sammy · S. A. Hasna Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh e-mail: [email protected] Mst. S. R. Sammy e-mail: [email protected] S. A. Hasna e-mail: [email protected] Md. T. Islam Electronics and Communication Engineering, Khulna University of Engineering and Technology, Khulna 9203, Bangladesh Md. S. Sadik Computer Network Administrator and Management, University of Portsmouth, 29 Connaught Road, Portsmouth, UK e-mail: [email protected] G. Hossain Information Science, University of North Texas, Denton, TX, US e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_1

1

2

A. N. Tusher et al.

entity that performs an action) and a predicate (the action that is being performed), and it may also include additional modifiers or complements. A sentence can be a statement, a question, a command, or an exclamation. In written language, a sentence often starts with a capital letter at its beginning and marked by punctuation at the end. A period, question mark, or exclamation point are used as punctuation letters. Bangla is one of the most widely spoken languages in the world, with over 250 million native speakers and over 300 million partial speakers. Bangla sentences can be categorized into five types: assertive, interrogative, imperative, exclamatory, and optative. An assertive or declarative sentence is used to relay information and is the most fundamental type of sentence. An interrogative sentence is used to ask questions. An exclamatory sentence, which is similar to a declarative sentence, is used to express strong emotions and is commonly used in casual conversations and dialogues. An imperative sentence is used to express strong or weak commands, give advice or instructions. A sentence is the largest unit of language, consisting of a group of words capable of expressing a complete idea or thought. Hence, sentence analysis in any language is a crucial component in sentiment analysis. Natural language processing (NLP) is a powerful and widely used technology that enables machines to handle both voice and text communication like humans. It is a sub-branch of artificial intelligence that combines statistics, machine learning, deep learning, mathematics, and other fields to achieve its objectives [1]. However, one of the biggest challenges in this field is preparing a suitable dataset and preprocessing it for training different machine learning algorithms. Overcoming these difficulties is essential to successfully integrate the Bengali language with computers. Recently, the LSTM-attention mechanism is recognized as a powerful technique in consecutive learning applications, image acknowledgment, etc. [2]. Subsequently, the exploration suggested it has drawn in a lot of interest in deep learning [3, 4]. In this paper, we work with a machine learning and deep learning model for the identification of Bangla sentences. The remainder of this paper describes the model. We have worked with the Bangla dataset in our machine learning model (RF, DT, LR, MNB, and XGB) and deep learning model (LSTM and RNN). These machine learning models are very popular algorithms in the technological industries, and this model is automatic and provides accurate results. The main goal of our this work understands the Bangla language dataset and fits this dataset with machine learning and deep learning algorithms. The LSTM networks with an attention system have a versatile amount of long-term memory. This decomposition amount is considerably smaller than the factorization decomposition rate. Besides, we experiment with the model on our dataset with long memory that expresses LSTM with attention mechanism has much supremacy over the state-of-the-art method. In [5] input highlights and fleeting consideration instruments are proposed to separate the connection between numerous exogenous series and target series for catching the worldly data of target series from authentic perception to the recent. Those works inspire our work on the consideration mechanism in LSTM-based sentence detection. The remaining of this paper is divided as follows. Literature reviews are given in portion 2. The methodology is in part 3, results of experiments and discussion are in the fourth segment. In the fifth segment, the conclusion and future work are given.

Automatic Sentence Classification: A Crucial Component of Sentiment …

3

2 Literature Review Many dedicated researcher’s scholars have played a part in advancing the progress of NLP over the years, resulting in significant progress in tasks such as sentiment analysis. NLP acts as a bridge between humans and computers, facilitating the exchange of information in a way that computers can understand and interpret language like humans. However, despite its widespread use, the use of NLP for processing Bangla is relatively poor compared to other languages. To fully leverage the benefits of this language and increase its societal contributions, advanced technologies need to be developed that can process and understand Bangla effectively. The integration of the Bengali language processing is highly anticipated, especially with the growing interest in analysing sentiment, processing natural language, and summarizing Bangla texts expressed by various corporate companies. A generative representation for key expression forecast along with the encoder–decoder structure which can produce words taken from a lexicon as well as highlighting words deriving from the report was suggested by Meng et al. [6]. Their representation accomplished predicament-of-the-art outcomes on various keyword unsheathing datasets. This representation offers specific comparability along with a key phrase extractor utilizing a solitary neural representation to gain proficiency along with the probabilities of which words are foundation expressions. As their emphasis was on a compound abstractive extractive undertaking as opposed to the simply extractive mission, an immediate collation between works is troublesome. Medelyan et al. [7] utilized a bagged decision tree while Lopez et al. [8] utilized an MLP and SVM to perform binary stratification on the consistent. Liu et al. [9] utilized lexical features to extricate a key-phrase competitor list displaying types. The positioning model is often used to choose a phrase from among the contestants. Yang et al. [10] utilized rule-based techniques to extricate expected responses from unlabelled text and afterward produced questions offering archives and separated responses utilizing a pre-prepared inquiry age model. Gupta et al. [11] present a testing system for decrepitly supervised phrase establishing. They utilize the speech representation BERT to test negatives that conserve the phrase circumstance. This restricts its application since it requires state-establishing datasets to be clarified with this specific circumstance. This approach additionally depends on uproarious outside language data sets like WordNet to distinguish wrong negatives. Dupont et al. [12] suggest an overall plan for choosing the most fitting calculation of syntax deduction under various conditions. Cheng et al. [13] suggested a classifier to recognize phrases with area strength inside tweet substance. Their probabilistic structure predicts the top k areas of the client with a precision of the city level and finds 51% of the clients within 100 miles of their actual region simply founded on the substance. Their methodology contends gazetteer-based structures since the gazetteer might miss the mark on barely any spatial vocabulary, and furthermore, the tweets do not continuously contain clear area names. They expect to track down nearby words with a high neighbourhood centre and quickly diminish as the Twitter creator’s area gets away from the main issue. It requires picking nearby words physically to prepare

4

A. N. Tusher et al.

the classifier as well as rather than ascertaining the likelihood of the word use, it indicates the central focus. Serdyukov et al. [14] anticipate the area of Flickr photographs considering the labels utilizing probabilistic language models as well as the Bayesian hypothesis. They utilize the Geo-names gazetteer to recognize partial labels, and they miss the phrase which is not recorded in gazetteers but has solid territory. Thinking about the uproarious idea of the tweets, giving a thorough physically chosen dataset appears to be drawn out. Moreover, the words have more than one place. To resolve these issues, Chang et al. [15] have utilized a separate approach considering GMM. Hecht et al. [16] worked with an MNB model to track down client areas as precise as the state in a substance-driven methodology. One recent study proposed a hybrid approach for sentence classification in sentiment analysis, combining machine learning, and deep learning models. The proposed approach used pre-trained language models, such as BERT and GPT-2, for feature extraction and a random forest classifier for the final classification. The study found that the hybrid approach achieved higher accuracy compared to using either machine learning or deep learning models alone. These models were appropriate for the detection and classification of text, phrases, and keywords. But the problem is finding accurate and automatic Bangla sentence detection models. There have no available systems which are capable to detect Bangla sentences accurately. So, we will introduce an automatic and accurate Bangla sentence detection system in this research work.

3 Proposed Methodology Although natural language processing is widely used in English language processing, its application in the Bangla language is rare. In Bangla language, there are various types of sentiment that can be expressed through text (sentences), such as happiness, sadness, anger, and surprise. The expression of these sentiments can be influenced by various factors, such as culture, context, and individual differences. To perform sentiment analysis in Bangla language, various approaches have been proposed, such as rule-based, machine learning, and deep learning models. Rule-based approaches involve the use of linguistic rules to determine the sentiment expressed in each text. Machine learning approaches involve training a model on annotated Bangla datasets to classify texts into positive, negative, or neutral categories. Deep learning approaches involve the use of neural networks to extract features from the text and classify the sentiment. In conclusion, sentiment analysis in Bangla language is becoming increasingly important due to the growing volume of Bangla data on the Internet. The expression of sentiment in Bangla language can be complex and nuanced, and various approaches, such as rule-based, machine learning, and deep learning models, have been proposed for sentiment analysis. Understanding the different types of sentiment expressed in Bangla language can also help in developing more effective sentiment analysis models.

Automatic Sentence Classification: A Crucial Component of Sentiment …

5

Fig. 1 Proposed methodology

The main vision of this work is to design and implement a system that is automatic and has the capability to identify Bangla sentences. To overcome the difficulties and perfectly complete our goal, we need to follow some steps sequentially like generation of the dataset, pre-processing them, tokenization, building a model, and so on. The overall process of this work is represented in an organized way in Fig. 1.

3.1 Data Collection To get good execution of machine learning and deep learning approach and provide better accuracy and acceptable performance, dataset played a major role. If the dataset is large enough, the system performs well and provides accurate results easily. The dataset we created by our hand is unique that keeps Bangla sentences. Our whole dataset consists of 912 unique text documents, and each data has two columns, where the statement is recorded in the initial column and the sentence type (assertive, interrogative, imperative, optative, and exclamatory) is stored in the second column. Then, we used 0, 1, 2, 3, and 4 to denote assertive, interrogative, imperative, optative, and exclamatory sentences, respectively, for the classification state. Table 1 and Fig. 2 present our overall dataset. The dataset consisted of using five categories of data. Figure 3 represents the total documents, total words, and unique words with respect to categories. Table 2 represents five types of Bangla sentences as a sample dataset.

6 Table 1 Presentation of data for each class

A. N. Tusher et al.

Class

Total data

Test

Assertive

185

37

Train 148

Interrogative

188

38

150

Imperative

180

40

140

Optative

183

35

148

Exclamatory

176

33

143

Fig. 2 Distribution of dataset

Fig. 3 Class-wise documents and words presentation

3.2 Pre-processing Properly cleaning the data is a fundamental and important task for any machine learning approach, and the performance of the model depends intensively on this data pre-processing. As a result, before feeding the language processing system to

Imperative sentence

Interrogative sentence

Assertive sentence

Class

Table 2 Dataset distribution

Sentence

(continued)

Automatic Sentence Classification: A Crucial Component of Sentiment … 7

Exclamatory sentence

Optative sentence

Class

Table 2 (continued)

Sentence

8 A. N. Tusher et al.

Automatic Sentence Classification: A Crucial Component of Sentiment …

9

Fig. 4 Data pre-processing

the machine learning model, we perform several pre-processing steps, which are illustrated in Fig. 4. 1. Step-1 At the beginning we test word spelling mistakes and their order, and if we detect any error or problem manually, we just remove that data. We also drop missing rows, filled mission data, delete noisy data, and handle inconsistent data. 2. Step-2 To the requirement of an accurate format of data, sometimes the addition of specific contractions and data separation is needed. We used contractions like ”, “ ” instead of “ ”, “ ” instead of “ ”, etc. “ ” instead of “ 3. Step-3 Then we work on removing unexpected natural expressions and matching patterns. The natural expressions are like “{“, “!”, “?”, “2”, “/” and so on. ”, “ ”, “ 4. Step-4 The Bangla language uses lots of stop words like “ ”, etc. In this step, we are taking action to remove these because the system cannot examine this information. 5. Step-5 The dataset we created is pre-processing so that it can easily cooperate with our proposed system and our system can perform using cleaned or purified data.

3.3 Tokenization of Data After data pre-processing, the next step in our pipeline is tokenization. This involves dividing our data into small units called tokens, which can be words, sub-words, or characters. In our case, we are using word tokens extracted from the text. To perform tokenization, we are using a popular algorithm called the CountVectorizer, which is an encoder that converts text data into numerical values. By doing so, we can create a matching vocabulary shape for our data and identify which phrases are eliminated. The CountVectorizer is effective at maintaining the size and geometry of the vocabulary, making it a valuable tool in our tokenization process.

10

A. N. Tusher et al.

3.4 Algorithm Feeding An appropriate and effective NLP system can be accomplished with the combination of different machine learning models. To complete our proposed model, here used a variety of supervised machine learning approaches. As a machine learning algorithm, in our system, we used five models and two deep learning models. So, in order to do Bangla sentence detection, the natural language processing system is consisted of using RF, MNB, DT, XGB, and LR machine learning algorithms and LSTM and RNN deep learning algorithms.

3.5 Best Model Selection After implementing all the approaches, we examined various models and found that some algorithms offer the highest accuracy and are best suited to NLP systems. However, our primary objective was not only to identify the most accurate model but also the best-performing model by evaluating its statistical performance. To assess our models, we employed several established and popular approaches, including: Accuracy score: This score is determined by calculating the ratio of correct predictions to the total number of samples. Confusion matrix: Also known as the error matrix, it is used for quantifiable categorization and includes four types of values: true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Precision: This ratio is calculated by dividing true positive by the sum of true positive and false positive values. It is also referred to as positive predicted value or true prediction. Recall: This ratio is calculated by dividing true positive by the sum of true positive and false negative values. F1-Score: This score is the balanced approach of precision and recall.

4 Result and Discussion The primary objective of our NLP model is to identify the best-performing machine learning and deep learning models. The categorization process yields accurate outcomes based on classes and accuracy achievement. We presented several supervised machine learning algorithms and two deep learning algorithms, and each model produced unique outcomes that were satisfactory. To demonstrate our system’s results, we will employ various scales, including statistical and visual representations.

Automatic Sentence Classification: A Crucial Component of Sentiment …

11

Fig. 5 Accuracy graph for machine learning algorithms

4.1 Statistical Analysis To achieve greater accuracy, we employed five machine learning algorithms, and the performance of these models was optimal. To integrate our dataset with these algorithms, we divided our data into two distinct sections in an 80:20 ratio, with 80% of the data used for training and 20% for testing. The supervised machine learning models MNB and DT demonstrated accuracies of 60.98% and 65.85%, respectively. On the other hand, the remaining three algorithms, RF, XGB, and LR, delivered accuracies of 56.1%, 58.54%, and 56.1%, respectively. Regarding deep learning models, we utilized RNN and LSTM, which provided accuracies of 97.63% and 98.02%, respectively.

4.2 Accuracy Graph The accuracy graph for machine learning algorithms was provided in Fig. 5. From this, it is clear that DT and MNB algorithms provide the highest accuracy and the other three algorithms give more or less equal accuracy.

4.3 Confusion Matrix Confusion matrix, also known as error matrix, is a highly effective tool for evaluating the performance of a machine learning model. Diagonal elements of the confusion matrix hold correlation values. Confusion matrix for our system is depicted in Fig. 6 representing for LSTM without normalization, Fig. 7 for LSTM with normalization,

12

A. N. Tusher et al.

and Fig. 8 RNN. The high correlation values indicate a strong correlation between the model’s predictions and the actual outcomes. Therefore, we can conclude that our proposed model has achieved the expected level of accuracy. Fig. 6 [LSTM]

Fig. 7 [LSTM]

Automatic Sentence Classification: A Crucial Component of Sentiment …

13

Fig. 8 Confusion matrix of RNN

4.4 Classification Report Classification reports for models MNB, RF, DT, XGB, LR, and LSTM are shown in Tables 3, 4, 5, 6, 7, and 8, respectively. The classification report consists of precision, recall, f1-score, and support. This classification report also presents the accuracy of each algorithm. Table 3 Classification report of MNB Algorithm

MNB

Class

Precision

Recall

F1-score

Support

Assertive sentence

0.67

0.44

0.53

37

Interrogative sentence

0.46

0.55

0.50

38

Imperative sentence

0.60

0.50

0.55

40

Optative sentence

0.78

0.88

0.82

35

Exclamatory sentence

0.38

0.43

0.40

33

0.56

183

Support

Accuracy

Table 4 Classification report of RF Algorithm

Class

Precision

Recall

F1-score

RF

Assertive sentence

1.00

0.11

0.20

37

Interrogative sentence

0.42

0.73

0.53

38

Imperative sentence

1.00

0.33

0.50

40

Optative sentence

0.73

1.00

0.84

35

Exclamatory sentence

0.50

0.57

0.53

33

0.56

183

Accuracy

14

A. N. Tusher et al.

Table 5 Classification report of DT Algorithm

DT

Class

Precision

Recall

F1-score

Assertive sentence

0.83

0.56

0.67

Support 37

Interrogative sentence

0.60

0.55

0.57

38

Imperative sentence

0.75

0.50

0.60

40

Optative sentence

0.88

0.88

0.88

35

Exclamatory sentence

0.46

0.86

0.60

33

0.66

183

Support

Accuracy Table 6 Classification report of XGB Algorithm

Class

Precision

Recall

F1-score

XGB

Assertive sentence

0.75

0.33

0.46

37

Interrogative sentence

0.45

0.82

0.58

38

Imperative sentence

0.50

0.17

0.25

40

Optative sentence

0.70

0.88

0.78

35

Exclamatory sentence

0.80

0.57

0.67

33

0.59

183

F1-score

Support

Accuracy Table 7 Classification report of LR Algorithm

LR

Class

Precision

Recall

Assertive sentence

0.67

0.44

0.53

37

Interrogative sentence

0.46

0.55

0.50

38

Imperative sentence

0.60

0.50

0.55

40

Optative sentence

0.78

0.88

0.82

35

Exclamatory sentence

0.38

0.43

0.40

33

0.56

183

Support

Accuracy Table 8 Classification report of LSTM Algorithm

Class

Precision

Recall

F1-score

LSTM

Assertive sentence

0.62

0.78

0.69

37

Interrogative sentence

0.45

0.61

0.52

38

Imperative sentence

0.69

0.45

0.55

40

Accuracy

Optative sentence

0.62

0.66

0.64

35

Exclamatory sentence

0.55

0.36

0.44

33

0.57

183

Automatic Sentence Classification: A Crucial Component of Sentiment … Table 9 Comparison of performance between machine learning and deep learning

15

Technology

Algorithm name

Accuracy (%)

Machine learning

DT

65.85

MNB

60.98

RNN

97.63

LSTM

98.02

Deep learning

According to the performance of the experiment, the individual model shows different outcomes and two ML algorithms, and two DL algorithms provide optimum results. The comparison of these models represents in Table 9. From Table 9, deep learning algorithms provide better results than machine learning algorithms.

5 Conclusion and Future Work The analysis of sentiment is a crucial task in natural language processing, involving the identification of emotions conveyed in text. Its applications are diverse, including assessing customer feedback on products and services, tracking social media trends, and measuring public opinion on various topics. Our objective is to develop and implement a deep learning-based system to accurately categorize Bangla sentences and generate positive outcomes. We classified sentences into five categories: assertive, interrogative, imperative, optative, and exclamatory. Our algorithms produced desired results, with decision tree and multinomial Naive Bayes (MNB) achieving 65.85% and 60.98% accuracy, respectively. In contrast, recurrent neural networks (RNN) and long short-term memory (LSTM) models achieved 97.63% and 98.02% accuracy, respectively. Our system overcomes the challenges faced by humans in accurately detecting Bangla sentences by providing efficient and automatic classification. The primary challenge in this project is to develop an appropriate dataset that effectively feeds into the models. The data may contain various types of noise, unexpected words, and null values that make noise removal difficult. In future work, we aim to focus on exclamatory sentence types such as joy, sadness, shock, interest, and gladness.

References 1. Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for humancomputer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695 2. Tusher AN, Islam S, Islam MT, Sammy MSR, Rahman MS, Sadik MS (2022) User perspective Bangla sentiment analysis for online gaming addiction using machine learning. In: Proceedings of the 6th international conference on IoT in social, mobile, analytics and cloud (I-SMAC), pp 538–543. https://doi.org/10.1109/I-SMAC55078.2022.9987343

16

A. N. Tusher et al.

3. Imam Bijoy MH, Hasan M, Tusher AN, Rahman MM, Mia MJ, Rabbani M (2021) An automated approach for bangla sentence classification using supervised algorithms. In: 2021 12th international conference on computing communication and networking technologies (ICCCNT). Kharagpur, India, pp 1–6. https://doi.org/10.1109/ICCCNT51525.2021.9579940 4. Islam S, Tusher AN, Mia MS, Rahman MS (2022) A machine learning based approach to predict online gaming addiction in the context of Bangladesh. In: Proceedings of the 13th international conference on computing, communication and networking technologies (ICCCNT), pp 1–7. https://doi.org/10.1109/ICCCNT54827.2022.9984508 5. Hassan A, Mahmood A (2017) Deep learning for sentence classification. In: 2017 IEEE long island systems, applications and technology conference. (LISAT), Farmingdale, NY, USA, pp 1–5. https://doi.org/10.1109/LISAT.2017.8001979 6. Meng R, Zhao S, Han S, He D, Brusilovsky P, Chi Y (2017) Deep keyphrase generation. In: Proceedings of the 55th Annual meeting of the association for computational linguistics 7. Medelyan O, Frank E, Witten IH (2009) Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the empirical methods in natural language processing conference 8. Lopez P, Romary L (2010) Humb: automatic key term extraction from scientific articles in grobidp. In: Proceedings of the 5th international workshop on semantic evaluation 9. Liu Z, Chen X, Zheng Y, Sun M (2011) Automatic keyphrase extraction by bridging the vocabulary gap. In: Proceedings of the 15th conference on computational natural language learning 10. Yang Z, Hu J, Salakhutdinov R, Cohen WW (2017) Semi-supervised qa with generative domainadaptive nets. In: Proceedings of the 55th annual meeting of the association for computational linguistics 11. Gupta T, Vahdat A, Chechik G, Yang X, Kautz J, Hoiem D (2020) Contrastive learning for weakly supervised phrase grounding. In: Proceedings of the European conference on computer vision (ECCV) 12. Dupont P (2002) Inductive and statistical learning of formal grammar. Technical report, Research talk, Departement ingenerie Informatique, Universite Catholique de Louvain 13. Tusher AN, Sadik MS, Islam MT (2022) Early brain stroke prediction using machine learning. In: 2022 11th international conference on system modeling & advancement in research trends (SMART). Moradabad, India, pp 1280–1284. https://doi.org/10.1109/SMART55829.2022.100 46889 14. Serdyukov P, Merdock V, Zowl R (2009) Placing Flickr photos on a map. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval 15. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies 16. Hecht B, Hong L, Suh B, Chi EH (2011) Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In: Proceedings of the conference on human factors in computing systems (CHI)

Real-Time Health Monitoring System of Patients on Utilizing Red Tacton M. Thilagaraj, C. Arul Murugan, and Kottaimalai Ramaraj

1 Introduction Red tacton (RT) technology envisions an electronic future in which people may access information whenever and wherever need it. Some of the communication equipment needed to provide this instant access to information will be built into clothing [1]. Nippon Telegraph and Telephone Corporation (NTT) declared Red Tacton, a novel human area networking technology that employs the human skin layer as a fast and secure network transmission line [2]. It facilitates interaction between cell phone terminals and certain other nearby objects. For the first time, Red Tacton is a game-changing device that enables dependable high-speed HAN [3]. However, each of conventional technologies has a number of vital methodological restrictions that limit their own application, including the sudden speed reducing during transmission during multiple user scenarios, which causes network crowding. NTT, a Japanese telecommunications company, declared towards the end of 2002 that it would create a new data transmission technique that would exploit the human body’s conductive qualities to transfer data between electronic equipment. Red Tacton, the company’s first prototype of a human area network (HAN), was exhibited only two and a half years ago; Nippon Telegraph and Telephone Corporation (NTT), M. Thilagaraj (B) Department of Industrial Internet of Things, MVJ College of Engineering, Bengaluru, India e-mail: [email protected] C. Arul Murugan Department of Electronics and Telecommunication Engineering, Karpagam College of Engineering, Coimbatore, India K. Ramaraj Department of Electronics and Communication Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_2

17

18

M. Thilagaraj et al.

a Japanese corporation, claims to have created the first practical human area network (HAN) technology, which allows for fast data transfer between devices using the human body as a conductor [4]. NTT claims that this next advancement in the wireless personal area network idea, dubbed Red Tacton, can transmit data at up to 2 Mbps over the skin’s surface. However, unlike earlier products, a Red Tacton-enabled gadget does not need to be in close touch with the skin; it only has to be within 20 cm [5]. The logical and designing local area has advanced numerous potential answers for lessen or dispense with the mishaps because of plastered driving. This issue is moved towards in numerous ways and consequently has various elective arrangements, however a considerable lot of which are not essentially reasonable [6]. One of the most mind-blowing potential arrangements is to forestall a driver in a smashed state to begin his vehicle by locking the motor of the vehicle. A few potential arrangements, for example, making a driver in a tanked state aware of stop his vehicle are not solid as it vigorously relies upon the degree to which the driver is inebriated with liquor [7]. Red Tacton uses a laser to identify variations in the optical characteristics of an electro-optic crystal, and an optical receiver device transforms the information into an electrical signal. The RT contains a transmitter and a receiver. A transmitter that generates an alternating current electrical impulse while modulating it with data input from an electrode [8]. A receiver that reads the weak AC electric field on the skin stimulated by the pulse using a different electrode, decodes it, and extracts the information from this. The HAN technology known as RT allows to transform human body skin into a quick network communication path [9]. RT is an easy-to-use technology for persuasion that creates a connection among people and things nearby. The outermost layer of the human body is used as a secure, rapid network communication path by the latest human area networking technology known as RT. It utilizes the tiny electric field that is emitted on the surface of the body of a person, which sets it completely apart from wireless and infrared technologies. Medical data processing initializes when a particular part of the human body like palms, fingertips, arms, legs, face, feet, or abdomen makes contact with a RT transceiver. RT can also utilize footwear and clothes as a communication medium. The communication ends when the physical contact is broken. Red Tacton security can be improved with additional authorizations such as biometric voice, diaphragm, and fingerprint checks. Moreover, the data protection can be further enhanced by integrating several degrees of safety for transferring personal data. There are several applications for RT. Some of them are as follows: Healthcare: To connect with one another, RT establishes wireless communications by replacing the conventional wired communications. Gaming: RT can be used to develop new gaming experiences by using human body as a controller. Automobiles: In the emerging domain of "Smart Cars", RT assists drivers in autonomously operating automotive accessories such as music systems with their body movements.

Real-Time Health Monitoring System of Patients on Utilizing Red Tacton

19

Fig. 1 Red Tacton model (Image source: Circuitstoday)

Security: RT may be used to create a secured entrance system by using the human body as a key. Conferencing: RT plays a crucial role in a conference/meeting with its ability to allow the individuals to communicate and share information quickly using body actions. Entertainment: RT could be used to develop a completely immersive applications that let users to physically interact with the virtual worlds. As shown in Fig. 1, the Red Tacton technology utilizes the human body for communication purpose. The workability of RT technology is represented in Fig. 2. Earth ground is connected at the base of the person. Here, the human body and its surface acts like a conductor. Transmitter and receiver modules are present at the either side of a person, who needs to perform communication. The electric force generated is then visible as lines [10]. Figure 3 depicts the workflow of RT technology. The optical receiver circuit present in this setup comprises a transmitter and receiver. The RT device in either side of the setup is controlled by mobile phones [11]. Figure 4 illustrates the transceiver setup, which contains both transmitting and receiving section. The transceiver electrode is placed over the human skin surface. It can give and collect information required to establish a communication [12–14]. The data sensing circuit present in the setup sends a control signal to both the sections. The detector circuit present in the receiver section assists in identifying the communication among the people, people and equipment, etc. [15]. Figure 5 shows the Red Tacton model. It is easily available in the Internet. Several researchers utilize this type of device for establishing the effective employment of RT [16]. The communication distance for RT technology always lies between 0 and 0.01 m. The reading distance for RT is contrasted with state-of-the-art techniques. Figure 6 represents the comparison of RT with other communication techniques [17]. Table 1

20

M. Thilagaraj et al.

Fig. 2 Workability of Red Tacton (Image source: Circuitstoday)

Fig. 3 Block diagram (Image source: Circuitstoday)

represents the comparison of various parameters of different data transmission techniques. The advantages of RT: With the help of this technology, data transfer is quicker and simpler; a lesser amount of information is lost during transmission; utilizing only a small amount of power (in the millivolt range); security comes first. The disadvantages of RT: It has usefulness within a few centimeters; Effects on the human body are still being studied; high development costs.

Real-Time Health Monitoring System of Patients on Utilizing Red Tacton

21

Fig. 4 Transceiver setup (Image source: Circuitstoday) Fig. 5 Red Tacton

2 Implementation Figure 7 shows the AC voltage, available normally as 220 v root mean square, connected with transformer, decreases the alternative signal to the required direct signal [18]. The diode rectifier generated a full-wave rectified voltage, and the undesirable signals are removed by the capacitor filter implemented to produce DC voltage. The resultant DC output has variations and disturbances more often. Despite the input DC voltage, a regulator device eradicates reverberations, whereas retaining the very same DC value. Among the most popular voltage regulator, IC component is aiding in the regulation [19].

22

M. Thilagaraj et al.

Fig. 6 Comparison of Red Tacton with other technologies

Table 1 Comparison of various parameters of different data transmission techniques Parameter

Bluetooth

Wi-Fi

Redtacton

Speed

version 1.0 (1 Mbps) version 2.0 (3 Mbps) version 4.0 (3–25 Mbps)

Data transfer rates (11 to 1300 Mbps)

Data transfer speed is up to 10 Mbps

Distance/range

Typically less than 10 m (33 ft), up to 100 m (330 ft)

Wi-Fi a (802.11a) = 10 m Wi-Fi B (802.11b) = 100 m Wi-Fi G (802.11b) = 100 m

Depends upon the size of the human chain, as much usually 1–2 m per human

Date of invention

1994

1997

1996 by IBM

Synchronization with user behavior

Does not exist

Poor

Excellent

Number of parallel systems

15–20 depending upon the application

3

Greater than 50

Example

It takes 8 Mb photo from one smartphone to another, and the photo transfers in about three seconds

Home network router gives a speed of 405 Kb/s, where it takes 27 min to transfer 65 files for a total of 641 Mb

A test case shows that it takes about 2 s for a file of 524 Kb to get transferred from one device to another

3 Proposed Methodology The hardware setup for the present work using Red Tacton was prepared, and the protype image was illustrated in Fig. 8.

Real-Time Health Monitoring System of Patients on Utilizing Red Tacton

23

Fig. 7 Proposed block diagram

Microcontroller: The AT89C51 is a CMOS 8-bit microcomputer that offers low power consumption and excellent efficiency and has 4 K bytes of programmable and erasable Flash read-only memory. Transformer: The supply voltage of 0–230 V can be stepped down to 15-0-15 with the help of potential transformer. The voltage of the secondary coil would then decrease whereas if primary winding has more turns than the secondary, and the current will increase or decrease based on the wire gauge. The secondary of the potential transformer is subsequently connected to the rectifier [20]. Rectifier bridge: Four diodes are connected to make a bridge rectifier. The network’s diagonally contrary edges serve as the circuit’s input, and the remaining two corners serve as the circuit’s output [21]. Presume the said point A has a positive, point B has a negative potential, and that the transformer is in excellent condition. When point A has a positive potential, D3 will be biased forward, and D4 will be biased backward. Because of the negative potential at point B, D2 will be reversed, and D1 will be forward biased. Now, current can pass through D3 and D1 because those are forward biased, while D4 and D2 are reverse biased and impede flow of current. Point B receives current that travels via D1, up through load, through D3, and return to point B through the secondary of the transformer. One-half loop afterward,

24

M. Thilagaraj et al.

Fig. 8 Hardware setup

the polarity around across transformer’s secondary alters, forward biasing D2 and D4 and reverse biasing the opposing diodes. Now, current travels from point A to point D4, which is powered by devices, through point D2, along the secondary of the transformer, and back to point A. It travels from point A through points D2 and D4. Load is currently flowing [22]. Filter: Whenever a capacitor is coupled in parallel with the load resistance to create a simple filter circuit, the outcome of a rectifier will be transformed into an even more steady DC voltage. The capacitor is initially charged to the maximum point of the rectified waveform. Just after peak, the capacitor is discharged via the load resistor till the rectified voltage exceeds the nominal voltage. After charging a fresh capacitor with voltage, the procedure is continued [23]. Voltage regulator: Integrated circuits voltage regulators are a common type of integrated circuit. The fixed positive/negative voltage or variable set voltage are all regulated by IC devices. The regulators can be configured to operate with load currents ranging in different gaps, and power ratings range between mill watts and tens of watts [24].

Real-Time Health Monitoring System of Patients on Utilizing Red Tacton

25

Fig. 9 Red Tacton receiver and transmitter

Red Tacton: Red Tacton technology will feature a transmitter and a receiver, just like any other technology. The bio signals will begin to be broadcast once the skin of a person comes into touching Red Tacton transceiver. The transmission will halt when the touching is stopped. The junctions are developed within the products by themselves or carried by the purchaser. Communication will occur in numerous combinations based on the natural and bodily motions of user [25]. The body surface parts including face, arm, fingers, feet, and hands of user were used to transfer data. The technique can simultaneously be used in foot wears and other types of clothing. Figure 9 depicts the transceiver part of RT technology. The transmitter develops a mild power on the surface of the human [26]. The power strength developed from the sensor on the Red Tacton receiver will be a transistor. The resulting field strength has been identified by the inbuilt devices, and then, it is been modified for the required signal conditioning [27]. With the result, the modified data is the file to be used in the application. A Red Tacton transceiver’s basic block diagram is given below.

4 Results The proposed work aims to eliminate the inconvenient method of monitoring patient health in hospitals. It is a low-cost, energy-efficient health monitoring system used to continuously monitor patient health. If the measured value of parameters exceeds the threshold value, both the doctor and the nurse receive an abnormal notification. It saves time, reduces manpower requirements, and provides better patient health assistance. Low cost, reduced time consumption, and high reliability are all advantages of the developed system. Because of the affordable sensors and easy handling, the exact implementation price is also considerably lower compared to other models. The

26

M. Thilagaraj et al.

Fig. 10 Display device

entire goal will be the successful implementation of the framework and the tracking of outcomes. Developing an over Red Tacton can improve the system even more. Display device is shown in Fig. 10.

5 Conclusion This developed product has been completed and is being brought out as a project. Experiments show that the developed operation is effective. Red Tacton can provide an effective result because no challenges are encountered while connecting various nodes or terminals as the human body works as a transparent place. This model can be employed practically in the situation where accurate and rapid health-related information are required. When compared with the societal developments, the performance of Red Tacton outperforms other technologies in the market and can join the network with a short period of time. Also, this method is highly secured as human bodies act as the communication medium. The advancement of Red Tacton technology is going to be implemented in headsets with protected environments and wireless communication using various activities. This might be as easy as two persons armed with Red Tacton products like shaking hands or touching devices to exchange data such as word documents and other cards.

References 1. Senthilkumar N, Manimegalai M, Karpakam S, Ashokkumar SR, Premkumar M (2021) Human action recognition based on spatial–temporal relational model and LSTM-CNN Framework. Mater Today Proc 2. Senthilkumar N, Karpakam S, Gayathri Devi M, Balakumaresan R, Dhilipkumar P (2021) Speech emotion recognition based on bi-directional LSTM architecture and deep belief networks. Materi Today Proc

Real-Time Health Monitoring System of Patients on Utilizing Red Tacton

27

3. Shanmugasundaram J, Raichal G, Dency Flora G, Rajasekaran P, Jeevanantham V (2021) Classification of epileptic seizure using rotation forest ensemble method with 1D-LBP feature extraction. Mater Today Proc 4. Nandalal V, Anand Kumar V (2021) Internet of Things (IoT) and real time applications. In: Oliva D, Hassan SA, Mohamed A (eds) Artificial intelligence for COVID-19. Studies in systems, decision and control, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-030-69744-0_12 5. Jacob IJ, Ebby Darney P (2021) Design of deep learning algorithm for IoT application by image based recognition. Journal ISMAC 3(03):276–290 6. Rajesh SR (2021) Design of distribution transformer health management system using IoT sensors. J Soft Comput Paradigm 3(3):192–204 7. Patil PJ, Zalke RV, Tumasare KR, Shiwankar BA, Singh SR, Sakhare S (2021) IoT protocol for accident spotting with medical facility. J Artif Intell 3(02):140–150 8. Chen JIZ, Yeh LT (2021) Graphene based web framework for energy efficient IoT applications. J Inf Technol 3(01):18–28 9. Chen JIZ, Yeh LT (2021) Graphene based web framework for energy efficient IoT applications. J Inf Technol 3(01):18-28 10. Balasubramaniam V (2020) IoT based biotelemetry for smart health care monitoring system. J Inf Technol Digital World 2(3):183–190 11. Ramaraj K, Amiya G, Murugan PR, Govindaraj V, Vasudevan M, Thiyagarajan A (2022) Sensors for bone mineral density measurement to identify the level of osteoporosis: a study. In 2022 4th international conference on smart systems and inventive technology (ICSSIT). IEEE, pp 326–333 12. Kottaimalai R, Rajasekaran MP, Selvam V, Kannapiran B (2013) EEG signal classification using principal component analysis with neural network in brain computer interface applications. In: 2013 IEEE international conference on emerging trends in computing, communication and nanotechnology (ICECCN). IEEE, pp 227–231 13. Ramachandran V, Ramalakshmi R, Kavin BP, Hussain I, Almaliki AH, Almaliki AA, Elnaggar AY, Hussein EE (2022) Exploiting IoT and its enabled technologies for irrigation needs in agriculture. Water 14(5):719 14. Ramaraj K, Govindaraj V, Zhang YD, Murugan PR, Wang SH, Thiyagarajan A, Sankaran S (2022) Agnostic multimodal brain anomalies detection using a novel single-structured framework for better patient diagnosis and therapeutic planning in clinical oncology. Biomed Signal Process Control 77:103786 15. Ramaraj K, Govindaraj V, Murugan PR, Zhang Y, Wang S (2020) Safe engineering application for anomaly identification and outlier detection in human brain MRI. J Green Eng 10:9087– 9099 16. Srinivasan KP, Muthuramalingam T, Elsheikh AH (2023) A review of flexible printed sensors for automotive infotainment systems. Archives Civ Mech Eng 23(1):67 17. Priya SS, Aravind M, Dayanand B, Devarajan S (2022) Wearable sensor based authentications using Red-Tacton. In: 2022 3rd international conference on electronics and sustainable communication systems (ICESC). IEEE, pp 628–634 18. Kirubakaran SS (2022) RedTacton’s human area network-based healthcare monitoring system. i-Manager’s J Electron Eng 13(1):51 19. Gupta A, Ahluwalia H, Adhikari P, Memoria M, Joshi K (2019) A new approach to transfer data using Red Tacton technology. In: International conference on advances in engineering science management & technology (ICAESMT)-2019, Uttaranchal University, Dehradun, India 20. Preeti S, Padhy A, Swain R (2021) Incorporating 5G with human body communication through Red Tacton technology. In: 2021 IEEE 2nd international conference on applied electromagnetics, signal processing, & communication (AESPC). IEEE, pp 1–4 21. Pokharkar S, Vanjara G, Bansode Y, Patel J (2020) Redtacton: a human area network (HAN). SAMRIDDHI: J Phys Sci Eng Technol 12(SUP 1):85–89 22. Vadivelu R, Santhakumar G, Kaviya P, Menaka M, Monisha S, Balasubramaniam D (2021) Human body communication on portable biometric authentication. J Phys Conf Ser 1916(1):012113. IOP Publishing

28

M. Thilagaraj et al.

23. Jothibasu M, Amartya Ram V, KadimisettiRavindra Reddy SM, Karthik M (2020) Development of efficient security system for WBAN application using human body communication. Solid State Technol 63(6):15862–15868 24. Visvesvaran C, Ramyadevi N, Karthi SP, Sudhhir US (2021) Wireless data transfer based on bone conduction: Osteoconduct. In: 2021 7th international conference on advanced computing and communication systems (ICACCS), vol 1. IEEE, pp 519–521 25. Vani R, Subitha D (2020) Health monitoring wearable device using Internet of Things. Eur J Mol Clin Med 7(3):395–400 26. Ramaraj K, Amiya G, Rajasekaran MP, Govindaraj V, Vasudevan M, Thirumurugan M, Zhang YD, Abdullah SS, Thiyagarajan A (2023) A robust multi-utility neural network technique integrated with discriminators for bone health decisioning to facilitate clinical-driven processes. Res Biomed Eng 1–19 27. Ramaraj K, Govindaraj V, Zhang YD, Murugan PR, Thiyagarajan A (2021) Brain anomaly prediction with the intervention of fuzzy based clustering and optimization techniques for augmenting clinical diagnosis. In: 2021 3rd international conference on advances in computing, communication control and networking (ICAC3N). IEEE, pp 872–877

An Efficient Botnet Detection Using Machine Learning and Deep Learning Anagha Patil and Arti Deshpande

1 Introduction In the modern era, social networking has taken prime importance in everyone’s life. The social networking apps have grabbed all our attention with their fascinating features. This complex structure of Internet often comes with the threats in areas like stealing personal information, spreading misinformation, eavesdropping, and many more. Attackers try to exploit vulnerabilities on the Internet using multiple attacks for which bots are essential. Bots through these social networking apps can perform email spamming, Click fraud, DdoS attacks, phishing, identity theft, and distributed resource utilization for prohibiting particular service. It is proved that some accounts spread articles from low-credibility sources, and these are more likely to be bots [1]. A botnet [2] is a network under the control of a botmaster and made up of several hosts or zombies which have been infected by bots. A bot in a botnet is a malevolent software that can run on the victim’s computer secretly. Using a command and control server (C&C), a botmaster manages the bots to perform malevolent deeds. The C&C server is primarily in charge of locating susceptible systems, disseminating the malware, transmitting commands and code updates, and executing the attack. The intrinsic ability of botnet assaults to change their mode of operation makes prevention and detection challenging. Structure-based or graph-based approach, feature-based, and crowdsourcing are the bot detection techniques listed in the literature[3]. Machine learning techniques which are feature-based require best feature selection through human intervention. Therefore, the deep learning models, which autonomously give us suitable features, A. Patil (B) Research Scholar, Thadomal Shahani Engineering College, Bandra, India e-mail: [email protected] A. Deshpande Associate Professor, Thadomal Shahani Engineering College, Bandra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_3

29

30

A. Patil and A. Deshpande

can be considered as best models for botnet detection. Whether it had been ML approach or DL approach, all these approaches would not perform better if the dataset is not pre-processed. At the same time, handling class imbalance is considered to be a major challenge for all ML and DL approaches used for predictive modeling. If the class-imbalance problem is not handled carefully, then the prediction can be biased. In our work, we have compared the proposed model with three machine learning and three deep learning techniques on CTU-13 dataset [4]. These models are priorly used for anomaly detection purpose and have obtained promising results. This work includes: • Converting all pcap files in csv format. • Combining scenarios 3, 4, 5, 7, 10, 11, 12, and 13 to form training dataset and combining scenarios 1, 2, 6, 8, and 9 to form testing dataset from CTU-13 dataset. • To perform binary classification, add label ‘Botnet’ to the dataset as 0 (Benign) or 1 (Bot) using the infected ip address provided. • Pre-process the training dataset to handle null values and traffic reduction. • Add label ‘Weight’ for handling duplicate flows for dataset reduction. • Finding only relevant features using Pearson’s correlation. • Encoding the categorical labels for better application of models. • To handle class imbalance, apply SMOTE and ADASYN separately. • Further apply ML and DL models on these separate datasets and performance analysis of those using performance metrics. The further sections in the paper cover literature review, proposed methodology, experiments and results, and conclusions.

2 Literature Review This section provides review of existing approaches for machine learning and deep learning-based botnet detection. Sainath et al. [5] proposed an approach for the detection of bots on social networks using various ML techniques. The research uses user history (actions performed by the users) and signature text data (from historic sources) to identify bot related activities. To improve bot detection accuracy, best set of features are selected using correlation and univariate feature selection approaches. The research focuses on improving accuracy using ML techniques such as additional tree classifier and CNN. Beny et al. [6] carried out evaluation of four deep learning methods for botnet detection. Authors used various performance metrics to compare CNN, LSTM, CNN-LSTM, and MLP models. The study is carried out using two use-cases, namely for traffic classification from botnet attacks and traffic classification from unknown botnet attacks. Though third use-case used is to check the impact of unbalanced data, no measures are taken to handle it. At the same time, no pre-processing was applied on CTU-13 dataset before applying any deep learning model.

An Efficient Botnet Detection Using Machine Learning and Deep Learning

31

Suleiman et al. [7] evaluated performances of various CNN-based approaches for ISCX botnet dataset. The aim of the research is to improve botnet detection for Android applications. Authors selected 6802 Android applications to extract 342 static features. On these features, LSTM, GRU, DNN, and hybrid approaches are applied. These models are then evaluated against ML algorithms, where DNN achieved the highest accuracy. Ibrahim et al. [8] proposed protocol-independent and structure-independent framework for botnet detection. Authors applied KNN, SVM, and multi-layer perceptron models on flow-based features provided in CTU-13 dataset. The evaluation of the proposed models is done using confusion matrix. The research performs seven experiments by varying combination of a normal, botnet, and network traffic. Since the initial model was clustering the dataset, the performance was decreased for NSIS type of botnet. The study proposed in [9] focuses on the N-BaIoT benchmark dataset to detect various botnet attack types. For feature selection, mutual information-based, PCA and ANOVA methods are used. To evaluate performance of IoT botnet classifiers, several separate and ensemble classifiers are used. XGB and KNN classifiers outperformed for the proposed model. But, the deep learning models are not covered. Afnan et al. [10] proposed a methodology of detecting social bots using graph-based machine learning approach. The study proposed a model which is applied to two datasets, namely CTU-13 and IOT-23. This model used graphs for feature construction and selection of important features. This model utilizes a set of graph-based features including centrality measures. Authors applied KNN, AdaBoost, RF, and extra-tree classifiers to evaluate the proposed algorithm while selecting best feature sets accordingly. The major problem with the model is that they have used only sixth scenario of CTU-13 for testing purpose.

3 Proposed Methodology The proposed system for social botnet detection is depicted in Fig. 1. This comprises the following phases: load dataset, data pre-processing, feature extraction, handling class imbalance, partition dataset in training and testing, apply ML/DL models, and detection and model evaluation. Our proposed model focuses on traffic reduction to remove unnecessary traffic. Later, dataset pre-processing is applied to remove null values. Duplicated samples are converted into label ‘Weight’ which will keep the importance of duplicated traffic flow and reduce the volume of the dataset too. Moreover, the class-imbalance problem is handled using two oversampling techniques, namely ‘SMOTE’ and ‘ADASYN’. Performance of these two techniques is later evaluated using ML and DL models.

32

A. Patil and A. Deshpande

Fig. 1 Architecture of botnet detection system

3.1 Load Dataset In this phase, the dataset will be loaded for further traffic reduction. For the evaluation of the proposed model, CTU-13 dataset is used which is publically available. This dataset contains network traffic (pcap files) from infected ips created by CTU University. CTU-13 dataset contains 13 scenarios with benign and malicious traffic from various botnet families. Botnet tools such as Rbot, Neris, Murlo, Virut, Menti, Sogou, and NSIS.ay are used for adding botnet traffic. The samples in the dataset contain normal, botnet, and background traffic which concludes the size of the database as 1.9 GB. CTU-13 dataset contains various attack types such as DDoS attacks, SPAM, HTTP attacks, IRC, P2P, Port-Scan (PS), Click-Fraud (CF), and custom attacks. Table 1 shows details of every botnet scenario captured in CTU-13 dataset.

3.2 Data Pre-Processing CTU-13 dataset provides network flows with .binetflow files. These Netflow files consist of the attributes: start time, end time, duration, source IP, destination IP, source port, destination port, direction, state, SToS, total packets, and total bytes. Since CTU-13 dataset contains background traffic, which is majority class, but not relevant for our study, background traffic is removed from every scenario which helps in traffic reduction.

An Efficient Botnet Detection Using Machine Learning and Deep Learning

33

Table 1 Details of each scenario of CTU-13 Scenario ID

Type of Attack covered

# of packets

# of NetFlows

Botnet Tool Used

# of bots

1

IRC, SPAM, CF

71,971,482

2,84,637

Neris

1

2

IRC, SPAM, CF

71,851,300

1,808,123

Neris

1

3

IRC, PS, Custom

167,730,395

4,710,639

Rbot

1

4

IRC, DDoS, Custom

62,089,135

1,121,077

Rbot

1

5

PS, SPAM, HTTP

Virut

1

6

PS

7

HTTP

8

PS

155,207,799

9

IRC, SPAM, CF, PS

10

4,481,167

129,833

38,764,357

558,920

Menti

1

7,467,139

114,078

Sogou

1

2,954,231

Murlo

1

115,415,321

2,753,885

Neris

10

IRC, DDoS, Custom

90,389,782

1,309,792

Rbot

10

11

IRC, DDoS, Custom

6,337,202

107,252

Rbot

3

12

P2P

13,212,268

325,472

NSIS.ay

3

13

PS, SPAM, HTTP

50,888,256

Virut

1

1,925,150

For data pre-processing, we converted these files in .csv file using Python. As infected ip was provided, we added label ‘Botnet’ having values either 0 (Benign) of 1 (Bot) for easy detection of bots. Also, the data is cleaned by removing null values from the dataset. We have added a label ‘Weight’ for handling duplicate flows and thus further dataset reduction. Encoding is also applied to fit the dataset in the models used.

34

A. Patil and A. Deshpande

3.3 Feature Selection Feature selection is a process of selecting only important and required attributed from the dataset for faster processing. Using Pearson’s correlation metric, the attributes having less co-relation with class label are dropped from the csvs.

3.4 Handling Class Imbalance For accurate prediction, class-imbalance problem has to be handled. Though we have separated scenarios carefully, class-imbalance problem can be seen in training csvs of CTU-13. To handle class-imbalance problem, ADASYN [11] and SMOTE [12] are used which oversample minority data samples. Synthetic minority oversampling technique (SMOTE) method is applied to minority class to generate synthetic data points. Adaptive synthetic sampling approach (ADASYN) is density-based method which oversamples minority class to generate synthetic samples.

3.5 Partition Dataset in Training and Testing In this phase, the pre-processed and balanced dataset is partitioned in training and testing. For efficient detection of bots, the CTU-13 dataset is separated into training (80%), testing (20%), and cross-validation part [4]. For CTU-13, in training dataset, csvs of scenarios 3, 4, 5, 7, 10, 11, 12, and 13 are merged in a single csv file. Similarly for testing dataset, csvs of scenarios 1, 2, 6, 8, and 9 are merged in a single csv file. For cross-validation, none of the families used in training should be used for testing to detect new behaviors.

3.6 Apply ML/DL Models In this phase, we applied three machine learning and three deep learning models for botnet detection. We first outline the models used and corresponding hyperparameter tuned for our study. A. Gaussian Naïve Bayes: Gaussian Naïve Bayes is a modified Naïve Bayes classifier which supports continuous valued features. This model follows normal (or Gaussian) distribution. The model is based on Bayes theorem and considered to work best for binary classification problems. B. KNN Classifier: K-nearest neighbor (KNN) classifier is non-parametric supervised machine learning algorithm used for classification purpose. This algorithm works on similarity of feature space. This algorithm chooses k value at random.

An Efficient Botnet Detection Using Machine Learning and Deep Learning

35

C. Logistic Regression: Logistic regression finds the probability of event occurring based on the independent variables. This method is generally used when the dependent variable is categorical. Since we have to detect, whether the traffic is malicious (1) or not (0), logistic regression can be applied here. A logit transformation is applied on the log odds which calculates ratio of the probability of success and the probability of failure. D. Convolution Neural Network (CNN): CNN is a deep learning algorithm which takes in an input image, assign weights and biases to various aspects in the image. CNNs have ability to learn features by itself. It generally has convolution layer, pooling layer, and fully connected layer. For our model, we have used one 1D CNN layer with four filters which takes nine features. We have considered three dense layers with 8, 4 and 1 filters, respectively. In between those, two dropout layers with dropout rate of 0.2 are used to avoid overfitting. Except the last dense layer who is Sigmoid function, rest of the layers are using Relu activation function. E. Long Short-Term Memory (LSTM): LSTM is a type of recurrent neural network (RNN) in deep learning which are capable of learning dependencies in sequential data. LSTM has a memory cell which can hold information for an extended period of time. Our LSTM model comprises two LSTM layers with 128 neurons and Relu activation function. After every LSTM layer, a dropout layer is added with the rate of 0.2 to avoid overfitting. F. Multi-Layer Perceptron (MLP): MLP is a neural network which contains fully connected dense layers which transform input dimension to the desired dimension. MLP contains one input layer, one output layer, and any number of hidden layers inside it. Our MLP model has an input layer with Relu activation function followed by a dropout layer with rate of 0.2. Further there are two dense layers and again a dropout layer with same rate.

3.7 Bot Detection We investigated three ML and three DL algorithms, to access efficiency of the proposed system including Gaussian Naïve Bayes, KNN classifier, logistic regression, CNN, LSTM, and MLP. Once the model is trained, the system is deployed for botnet detection.

3.8 Model Evaluation Multiple experiments are conducted to analyze effectiveness of the proposed system. The performance of all ML and DL models for botnet detection is evaluated using metrics such as accuracy, precision, recall, F1-score, and area under curve (AUC).

36

A. Patil and A. Deshpande

We used confusion matrix to calculate these performance metrics in terms of true positive, true negative, false positive, and false negative.

4 Experiments and Results For best results from the ML algorithms, best parameters are chosen on the training set. Similarly, to obtain optimal performance and accurate predictions, hyperparameters of deep learning models are tuned. After traffic reduction phase, the merged training file (originally with 9613,453 tuples) is reduced to 801,132 tuples. Label encoder is used to convert source and destination ip addresses into numeric form to fit into the transform. Further to reduce dataset, duplicates are dropped and weights are added again for similar traffic. So, if the traffic flow from same source to destination is repeated 100 times, the value of label ‘Weight’ would be 100. Here, attributes when co-related with target attribute, attributes showing high co-relation are kept for application of model. Our target attribute ‘botnet’ is highly co-related with SrcAddr, DestAddr, Sport, Dport, Direction, State, Protocol, and Duration. At the same time ‘TotPkts’, ‘TotBytes’, and ‘SrcByets’ are highly co-related so they can be considered as duplicates, and we can drop ‘TotBytes’ and ‘SrcByets’ attributes. This feature is implemented using heatmap in Python. Figure 2 indicates CTU-13 dataset after pre-processing. The dataset is highly imbalanced as ‘botnet’ class is in minority. The performance of the model would be degraded if this problem is not handled. Figure 3 depicts the data imbalance. In case of such skewed data, the performance metrics will be biased

Fig. 2 Snapshot of CTU-13 dataset after pre-processing

An Efficient Botnet Detection Using Machine Learning and Deep Learning

37

Fig. 3 Class imbalance in the dataset

Fig. 4 After applying ADASYN method

Fig. 5 After applying SMOTE method

and not preferable. To handle class-imbalance problem, two oversampling methods, ADASYN and SMOTE, are taken into consideration. For comparing performances of all ML and DL algorithms with these two methods on CTU-13 dataset, experiments would be carried out separately hence forward. After applying ADASYN and SMOTE methods, the result shows that the dataset is now balanced as we have approximately similar ratio of the 0 and 1 class (Figs. 4 and 5). Then the dataset is partitioned into training and testing as 80%-20% ratio. The training data is now used to train the models. The performance of all ML and DL models for botnet detection is evaluated using metrics such as accuracy, precision, recall, F1-score, and AUC. This is indicated in Tables 2 and 3. Figures 6, 7 and 8 show these results graphically. The deep learning models were developed in Python using Keras. Similarly, other libraries such as Seaborn, Scikit Learn, Pandas, and Numpy are also utilized. From the performance evaluation, we can observe that LSTM model shows better performance whether it is ADASYN or SMOTE method for handling class imbalance. Moreover, LSTM with SMOTE is showing better results than LSTM with ADASYN. Even the processing time of models is compared which is shown in Fig. 9. All the experiments are performed on 11th Gen Intel (R) Core (TM) i5-1135G7 @ 2.40GHz with 16 GB RAM. With the clear understanding of deep learning models, due to their

38

A. Patil and A. Deshpande

Table 2 Performance evaluation of all the models after ADASYN method Model/Performance Metric used

Accuracy

Precision

Recall

F1 measure

AUC

GNB

0.86

0.83

0.97

0.89

0.86

KNN

0.88

0.86

0.96

0.91

0.86

LR

0.87

0.83

0.98

0.90

0.86

CNN

0.49

0.49

0.73

0.6

0.50

LSTM

0.93

0.96

0.92

0.94

0.94

MLP

0.86

0.85

0.93

0.89

0.83

Table 3 Performance evaluation of all the models after SMOTE method Model \ Performance Metric used

Accuracy

Precision

Recall

F1 measure

AUC

GNB

0.73

0.79

0.75

0.77

0.83

KNN

0.88

0.84

0.97

0.91

0.86

LR

0.86

0.81

0.98

0.89

0.83

CNN

0.49

0.49

0.73

0.60

0.50

LSTM

0.97

0.99

0.96

0.98

0.98

MLP

0.86

0.83

0.95

0.89

0.84

Fig. 6 Evaluating models using performance metrics after ADASYN

complex structure, it is presumed that processing times of DL models will be always greater than ML models. It can be observed that multi-layer perceptron model takes the highest time compared to other models. LSTM takes lesser time than MLP.

An Efficient Botnet Detection Using Machine Learning and Deep Learning

Fig. 7 Evaluating models using performance metrics after SMOTE

Fig. 8 Evaluating models using performance metrics

Fig. 9 Evaluating models using processing time taken

39

40

A. Patil and A. Deshpande

5 Conclusion We have successfully detected botnet using various ML and DL techniques and compared the models using various performance metrics for CTU-13 dataset. Here, the focus is on binary classification on the dataset. We have seen that all the models show good results without overfitting issues and LSTM model with SMOTE shows the best results for our model. Further, in future we can apply same model on other datasets and evaluate the models. Also, we intend to apply the proposed model for real-time datasets. This methodlogy is limited to binary classification; however, researchers can extend their work to identify type of attack the botnet is trying to do and to mitigate those attacks.

References 1. McKenzie H, Salvatore G, Amanda D, Muhammad R, Lyle U, Andrew SH, David HE, Lorenzo L, Brenda C (2021) Bots and misinformation spread on social media: a mixed scoping review with implications for COVID-19. J Med Internet Res 2. Xing Y, Shu H, Zhao H, Li D, Guo L (2021) Survey on Botnet detection techniques: classification, methods, and evaluation. Math Prob Eng. Article ID: 6640499 3. Anagha P, Arti D (2022) A comprehensive review of social Botnet detection techniques. In: International conference on augmented intelligence and sustainable systems (ICAISS), Trichy, India, pp 950–957. https://doi.org/10.1109/ICAISS55157.2022.10010877 4. García S, Grill M, Stiborek J, Zunino A (2014) An empirical comparison of botnet detection methods. Comput Secur 45:100–123 5. Sainath G, Ahmed D, Rasha SA, Ali A (2020) Bot detection using machine learning algorithms on social media platforms. In: 5th International conference on innovative technologies in intelligent systems and industrial applications. IEEE 6. Beny N, Anshitha N, Thomas B (2020) Performance evaluation of botnet detection using deep learning techniques. IEEE Explore 7. Suleiman Y, Mohammed KA, Annette S, Vinod P (2021) Deep learning techniques for android botnet detection. Electronics 10:519 8. Wan NHI, Syahid A, Ali S, Ondrej K, Ruben GC, Enrique H, Hamido F (2021) Multilayer framework for botnet detection using machine learning algorithms. IEEE Access 9. Mohammed A-S, Faisal S, Eman HA, Norah S (2022) An aggregated mutual information based feature selection with machine learning methods for enhancing IoT botnet attack detection. Sensors 22:185 10. Afnan A, Khalid A (2021) Botnet detection approach using graph-based machine learning. IEEE (2021) 11. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IJCNN, Hong Kong, pp 1322–1328 12. Nitesh C, Kevin B (2022) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res

Wavelet Selection for Novel MD5-Protected DWT-Based Double Watermarking and Image Hiding Algorithm N. G. Resmi

1 Introduction Digital watermarking is the process of embedding bits of information in the form of a digital signature, a logo, or any kind of identity proof into a text, image, audio, or video file usually for ownership verification. Data shared across a communication channel requires protection at two levels: channel level and receiver level. Cryptography deals with securing the data at channel level by encrypting the data using highly secure encryption algorithms so that the data seems useless for a third-party attacker but could be properly decrypted by the receiver using appropriate decryption algorithms. However, the already decrypted data may be misused at the receiver end. Watermarking can be effectively applied in such situations to prevent any sort of misuse of data [1]. Apart from authentication, digital watermarking has a variety of applications. Watermarking is commonly used for copy protection and transaction monitoring, which employs a unique watermark for each copy of the image sent. Another use is broadcast monitoring, in which the broadcast material embeds sufficient data to detect when and where the specific content is broadcasted. A wide variety of other related security applications also benefit from the peculiar properties of watermarking which include robustness, the ability of a watermark to survive the most common image processing operations [1]. In this research work, a simple and efficient double watermarking scheme based on pseudorandom number sequence and DWT is proposed for image authentication which along with cryptographic hash function MD5 guarantees to protect the data during transmission and sharing. This study also analyzes different wavelet techniques and chooses the one which gives the best results. Rest of the paper is arranged as follows: Sect. 3 describes the image watermarking and protection scheme used N. G. Resmi (B) Muthoot Institute of Technology and Science, Ernakulam, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_4

41

42

N. G. Resmi

in the proposed system. Results and discussion are given in Sect. 4. Section 5 gives conclusion and future work.

2 Related Works Several watermarking techniques have been devised so far [2–10], which include spatial domain and frequency or transform domain watermarking, fragile, semifragile and robust watermarking, visible and invisible watermarking, reversible and non-reversible watermarking, blind and non-blind watermarking, etc. One such categorization is based on whether the watermark embedding is done in the spatial or frequency domain. Spatial domain techniques [11, 12] directly manipulate the pixel values, and transform domain techniques deal with the image after conversion to any transform domain such as discrete Fourier transform (DFT), discrete Cosine transform (DCT) [13], discrete wavelet transform (DWT) [14, 15], integer wavelet transform (IWT), Arnold transform, and singular value decomposition (SVD) [16]. Spatial domain techniques are considered less robust compared to those using transform domain since they are highly prone to attacks and also create watermarks that are easily extractable compared to the latter. Hybrid methods, a combination of different techniques [17, 18], are applied by many researchers to obtain better results. Some of these include combining DWT and DCT [19], DWT and SVD [20–22], IWT and Arnold transform [19], IWT and SVD [21, 23], DWT, DCT and SVD [24], finite Ridgelet transform (FRT), DWT, SVD, particle swarm optimization (PSO), and Arnold transform [25], etc. Cryptographic hash functions are also used in some watermarking algorithms [12].

3 Methodology Used 3.1 Architecture of Proposed System The proposed system shown in Fig. 1 mainly consists of a watermark embedder and a hash code generator at the sender side, and a hash code generator and a tampering detector at the receiver side. It also has a watermark extractor, which may be used by the sender to trace any misuse by the receiver and may or may not be used at the receiver end based on the application. The watermark extractor is hence shown in Fig. 2 separates from the architecture of the proposed system. The input image or the unwatermarked image is first given as input to the watermark embedder, which inserts a simple visible watermark in the image. It then performs DWT on the resultant image and embeds it with an invisible watermark based on a watermarking key, in the transform domain. The watermarked image obtained is then given as input to a hash code generator, which uses MD5 algorithm and generates a hash code. This hash

Wavelet Selection for Novel MD5-Protected DWT-Based Double …

43

Fig. 1 Architecture of the proposed system Fig. 2 Visible watermark embedding algorithm

code is shared with the receiver. The watermarked image is transmitted across the channel. At the receiver side, a hash code generator is used to compute hash code of the received watermarked image. If the hash codes do not match, then it indicates that the image was tampered during transmission; whereas if the hash code generated is same as that shared by the sender, then the image has been transmitted untampered, and therefore, the visible watermark undoubtedly verifies the owner of the image. This type of communication is private since the secret hash code is shared between the sender and receiver.

3.2 Watermark Embedder The watermark embedder takes as input the unwatermarked input image, inserts a visible watermark in the image in the spatial domain, and then converts it into the transform domain using DWT. A watermark image is chosen by the sender and is also converted into transform domain using DWT. The watermark embedder has a pseudorandom number generator which takes as input a seed or a watermarking

44

N. G. Resmi

Fig. 3 Invisible watermark embedding algorithm

key to generate a pseudorandom sequence of locations where the invisible watermark transform coefficients have to be embedded in the visible watermarked image transform. The seed to generate the sequence is generated from the approximation coefficients of the visible watermarked image. Alpha is a constant between 0 and 1 which controls the degree of imperceptibility. The higher the value of alpha, the more visible the watermark. The watermarked image transmitted across the channel thus has a visible watermark, embedded in the spatial domain and an invisible watermark, embedded in the transform domain. Invisible watermark embedding algorithm is shown in Fig. 3.

3.3 Watermark Extractor The watermark extractor, shown in Fig. 4, may be used by the sender to check whether illegal copies of the image were created and transmitted or distributed by the receiver after the removal of visible watermark. Since the image already had a visible watermark, the receiver may not suspect the presence of a second invisible watermark and may therefore subject it to some kind of misuse if he wishes so. The sender, who has the original image with him, may use the watermark extractor and

Wavelet Selection for Novel MD5-Protected DWT-Based Double …

45

Fig. 4 Invisible watermark extractor

Fig. 5 Watermark extraction algorithm

try to extract the watermark he had embedded in the image. This scheme is a nonblind watermarking scheme since it uses original image information to retrieve the watermark (Fig. 5).

4 Experimental Results 4.1 Pseudorandomness in Watermark Embedding The invisible watermark embedding process involves embedding the watermark at random positions in a cover image. The randomness, in fact, pseudorandomness is introduced to decrease the probability of detection of watermark by an unauthorized

46

N. G. Resmi

Fig. 6 Extraction of watermark a original image and a watermark image, b watermarked image and watermark extracted without using the key and c watermarked image and watermark extracted using the key

person. The seed to generate the pseudorandom sequence serves as the watermarking key. Figure 6a shows example of an image and a watermark image used for watermarking, Fig. 6b shows watermarked image obtained by applying ‘bior3.1’ wavelet and the watermark extracted without using the watermarking key, and Fig. 6c shows watermarked image and the watermark extracted using the key.

4.2 Performance Evaluation Metrics To evaluate the performance of the algorithm, mean square error (MSE), peak signalto-noise ratio (PSNR) and structural similarity index (SSIM) values are computed for watermarked image and extracted watermarks [21] of size MxN. MSE is calculated as  [I (m, n) − D(m, n)]2 MSE =

M,N

M∗N

(1)

Wavelet Selection for Novel MD5-Protected DWT-Based Double …

47

where I represents original image, and D represents distorted image. PSNR, considering a bit-depth of 8 bits, is calculated as  2552 PSNR = 10 log10 MSE    2μx μ y + C1 2σx y + C2    SSIM = 2 μx + μ2y + C1 σx2 + σ y2 + C2 

(2)

(3)

where μx , μ y , σx , σ y , σx y , respectively, are the local means, standard deviations, and cross-covariance for images x and y, C1 = (0.01 ∗ L)2 , C2 = (0.03 ∗ L)2 , and L is the specified dynamic range of pixels. For an MSE of value 1, PSNR obtained is  PSNR = 10 log10

2552 1

 = 10 log10 (65025) = 10 ∗ 4.813 = 48.13 dB

Hence, it is evident that an MSE value as low as 1 yields a PSNR value as low as 48.13 dB. When MSE approaches 0 from 1, PSNR approaches infinity from 48.13. Higher values of PSNR are considered better for many applications. However, for our application, PSNR values above 20 dB are acceptable as seen from the experiments done.

4.3 Selection of Wavelet Different wavelet families were used to watermark the images in the DWT domain. MSE and PSNR values were calculated for each of the wavelets in each of the wavelet families for both watermarked image and the watermark. The results thus obtained are shown in Figs. 7 and 8. The wavelet which produced minimum MSE or maximum PSNR value was found to be ‘bior3.1’ (around 30 without denoising) and hence was selected to watermark the images.

4.4 Watermark Embedding and Extraction The proposed method was tested using a set of randomly selected cover images and a set of watermarks and using ‘bior3.1’ wavelet. Figure 9 shows the visible and invisible watermarks used for watermarking the ‘Desert.jpg’ image. Figure 10 shows the original image, image obtained after embedding the visible watermark, final watermarked image, and the watermark extracted from the untampered watermarked image.

48

N. G. Resmi

Fig. 7 PSNR values obtained using different wavelet families for watermark image. a Daubechies orthogonal wavelets, b biorthogonal wavelets, c reverse biorthogonal wavelets, d Coiflets, e FejerKorovkin wavelets, f Symlets and g highest PSNR values obtained (db1 is same as Haar)

Wavelet Selection for Novel MD5-Protected DWT-Based Double …

49

Fig. 8 MSE and PSNR values obtained using different wavelet families for cover image. a Daubechies orthogonal wavelets, b biorthogonal wavelets, c Reverse biorthogonal wavelets, d Coiflets, e Symlets, f Fejer-Korovkin wavelets and g least MSE/ highest PSNR values obtained

50

N. G. Resmi

Fig. 9 a Visible watermark, b invisible watermark used for watermarking ‘Desert.jpg’ image

Fig. 10 Original image, visible watermarked image, final watermarked image, and extracted watermark

Figure 11 shows the image received by the owner of the image from a different source with its visible watermark removed and the watermark extracted. The proposed method is able to extract watermark from the received, visible watermark removed image with an acceptable PSNR without denoising. The MSE and PSNR values obtained for the watermarked image and watermarks extracted for tampered and non-tampered images are given in Table 1. The experiments were done and executed using MATLAB online platform. The cover images used initially were mainly of natural scenes, and watermarks used were simple text written over white background. The choice of wavelet for performing DWT was made after executing the code for wavelets belonging to different wavelet families and available for use in MATLAB. The algorithm was then tested using a set of color images from USC-SIPI image database (Volume 3: Miscellaneous) [26] and later for all color and grayscale

Wavelet Selection for Novel MD5-Protected DWT-Based Double …

51

Fig. 11 Image received without visible watermark and extracted invisible watermark

Table 1 MSE and PSNR values obtained for watermarked image and watermarks extracted (without median filtering) for tampered and non-tampered images Image Watermarked image

MSE 45.962

PSNR 31.507

Extracted watermark (non-tampered image)

243.09

24.618

Extracted watermark (tampered image)

256.19

24.424

images in it. The database consists of 44 images of which 16 are color and 28 are monochrome (4.2.04 (lena), 4.2.02 (tiffany), elaine.512, numbers.512, and testpat.1k. were removed). The PSNR values obtained for the watermarked image, watermark extracted from untampered watermarked image and watermark extracted from visible watermark removed image without applying median filter on the extracted watermarks for a set of color images from USC-SIPI database are shown in Table 2. Median filtering was applied on watermark extracted from untampered watermarked image and watermark extracted from visible watermark removed (tampered) image to remove noise for all the 39 images in the database, and the PSNR and SSIM values obtained are shown in Table 3. It can be noted that the PSNR values were improved upon using median filter on the extracted watermarks. Figure 12 shows the comparison of PSNR and SSIM values obtained by watermarking color and grayscale images in the USC-SIPI database.

4.5 Image Hiding Rather than a text image watermark, this algorithm may also be used to hide a secret image inside another image as evident from Fig. 13. In this example, visible

52

N. G. Resmi

Table 2 PSNR values without applying median filter on the extracted watermarks Image

PSNR (Cover Image)

PSNR (Non tampered watermark)

PSNR (Tampered watermark)

4.1.01.tiff

31.44

28.22

27.60

4.1.02.tiff

31.43

28.67

28.22

4.1.03.tiff

31.58

18.61

9.11

4.1.04.tiff

31.75

15.50

9.42

4.1.05.tiff

31.52

27.01

26.47

4.1.06.tiff

31.56

24.04

22.75

4.1.07.tiff

31.53

28.89

28.89

4.1.08.tiff

31.53

28.89

28.89

4.2.01.tiff

31.57

21.99

20.56

4.2.03.tiff

31.66

21.25

17.81

4.2.05.tiff

31.44

29.15

29.15

4.2.06.tiff

31.53

26.45

25.83

4.2.07.tiff

31.44

28.92

28.91

watermark is embedded on image ‘4.1.01.tiff’, and then, image ‘4.2.07.tiff’ is hidden in it. Figure 14 shows the hidden image (watermark) extracted from the visible watermark removed image (Table 4). The PSNR values obtained are acceptable values (since even an MSE value of 1 yields a PSNR value as low as 48.13 as proved in Sect. 3.2), and the extracted watermark is very much similar to the original as seen from the difference image in Fig. 15a, which is the difference between the original secret image and the extracted secret image (for visible watermark removed image). Figure 15b shows the normalized 2D cross correlation between original and extracted watermarks. Figure 16 is an example for hiding a color image inside a grayscale image. It could be noted that the hidden image can effectively be extracted from the cover image without much degradation of the former.

5 Conclusion and Future Work This research work proposes a simple approach of hashing with double watermarking using DWT for image authentication. The proposed method helps the receiver to effectively identify if any kind of tampering attempt has been made to the image during transmission by computing a hash code using MD5 algorithm and comparing it with the hash code shared by the sender. Visible watermark in the image received untampered by the receiver helps the latter to verify the owner of the image. The invisible watermark, the presence of which is unknown to the receiver, helps the owner of the image to track any sort of misuse of the image by the receiver after

Wavelet Selection for Novel MD5-Protected DWT-Based Double …

53

Table 3 PSNR values along with SSIM values obtained after applying median filter on the extracted watermarks (for all 39 images) Image

PSNR1

SSIM1

PSNR2

SSIM2

PSNR3

SSIM3

4.1.01.tiff

31.44

0.98

30.71

0.97

30.38

0.96

4.1.02.tiff

31.43

0.97

31.03

0.97

30.94

0.97

4.1.03.tiff

31.58

0.98

19.63

0.83

15.45

0.79

4.1.04.tiff

31.75

0.99

16.43

0.61

14.22

0.58

4.1.05.tiff

31.52

1.00

29.42

0.93

29.06

0.92

4.1.06.tiff

31.56

0.99

26.80

0.81

26.00

0.80

4.1.07.tiff

31.53

0.99

30.70

0.96

30.70

0.96

4.1.08.tiff

31.53

1.00

30.70

0.96

30.70

0.96

4.2.01.tiff

31.57

1.00

26.42

0.74

25.10

0.72

4.2.03.tiff

31.66

1.00

25.67

0.55

23.21

0.46

4.2.05.tiff

31.44

0.98

36.36

0.98

36.36

0.98

4.2.06.tiff

31.53

0.99

32.64

0.86

32.13

0.84

4.2.07.tiff

31.44

1.00

36.03

0.98

36.03

0.98

5.1.09.tiff

31.51

0.99

30.20

0.96

28.95

0.96

5.1.10.tiff

31.59

0.99

25.52

0.83

24.13

0.81

5.1.11.tiff

31.53

0.97

30.64

0.96

30.64

0.96

5.1.12.tiff

32.06

0.98

18.17

0.55

17.82

0.54

5.1.13.tiff

39.75

1.00

1.88

0.05

0.98

0.01

5.1.14.tiff

31.49

0.99

25.40

0.90

20.52

0.86

5.2.08.tiff

31.55

0.98

22.87

0.91

18.46

0.89

5.2.09.tiff

31.53

0.99

23.43

0.80

20.50

0.76

5.2.10.tiff

31.54

0.99

24.61

0.81

20.14

0.76

5.3.01.tiff

31.48

0.98

36.83

0.97

35.78

0.97

5.3.02.tiff

31.49

0.99

36.89

0.97

36.83

0.97

7.1.01.tiff

31.44

0.98

36.18

0.98

36.17

0.98

7.1.02.tiff

31.45

0.97

36.16

0.98

36.06

0.98

7.1.03.tiff

31.48

0.98

36.35

0.98

36.35

0.98

7.1.04.tiff

31.44

0.98

36.65

0.98

36.63

0.98

7.1.05.tiff

31.44

0.99

36.18

0.98

36.13

0.98

7.1.06.tiff

31.50

0.99

36.05

0.98

35.96

0.98

7.1.07.tiff

31.44

0.99

36.19

0.98

36.19

0.98

7.1.08.tiff

31.48

0.98

36.35

0.98

36.35

0.98

7.1.09.tiff

31.48

0.99

36.35

0.98

36.29

0.98

7.1.10.tiff

31.44

0.98

36.65

0.98

36.65

0.98

7.2.01.tiff

31.48

0.97

37.90

0.98

37.90

0.98

boat.512.tiff

31.49

0.98

34.43

0.96

32.71

0.95 (continued)

54

N. G. Resmi

Table 3 (continued) Image

PSNR1

SSIM1

PSNR2

SSIM2

PSNR3

SSIM3

gray21.512.tiff

31.94

0.96

16.10

0.53

10.48

0.41

house.tiff

31.46

0.99

32.94

0.95

30.64

0.94

ruler.512.tiff

41.17

1.00

1.94

0.05

0.94

0.05

1 PSNR

and SSIM values for watermarked image and cover image and SSIM values for extracted non-tampered watermark and original watermark 3 PSNR and SSIM values for tampered watermark and original watermark 2 PSNR

Fig. 12 a PSNR and SSIM values obtained for color images and b PSNR and SSIM values obtained for grayscale images

removing its visible watermark. The owner can successfully extract the watermark from the received image in an efficient manner. The method can also be applied to successfully and efficiently hide one image inside another. The method presented in the paper has been applied successfully on randomly selected images using a few chosen watermarks and also for a set of images from USC-SIPI image database. It could be tested for a wide variety of images using watermarks of varying nature in the future. Moreover, the robustness of the proposed

Wavelet Selection for Novel MD5-Protected DWT-Based Double …

55

Fig. 13 Watermark embedding and extraction using invisible color watermark (secret image) Fig. 14 Color watermark extracted from visible watermark removed image

Table 4 MSE and PSNR values obtained for watermarked image and color watermarks extracted for tampered and non-tampered images (see Figs. 12 and 13) Image

MSE

PSNR

Watermarked image

13.396

37.572

Extracted watermark (non-tampered image)

84.198

28.901

Extracted watermark (tampered image)

87.718

28.714

56

N. G. Resmi

Fig. 15 a Difference image (between original and extracted watermarks) and b normalized 2D cross correlation between original and extracted watermarks

Fig. 16 Hiding a color image inside a grayscale image

watermarking scheme to common image processing operations could be tested using different types of images and watermarks. However, the proposed method can be effectively used by the owner to check whether an image received without visible watermark is actually owned by him and has been redistributed by the receiver after removal of visible watermark. PSNR values obtained were in the acceptable range.

Wavelet Selection for Novel MD5-Protected DWT-Based Double …

57

The algorithm can also be used to hide one image inside another which may also facilitate a type of secret communication.

References 1. Bloom J, Fridrich J, Kalker T, Cox I, Miller M (2007). Digital Watermark Steganography. https://doi.org/10.1604/9780123725851 2. Begum M, Uddin MS (2020) Digital image watermarking techniques: a review. Information 11:110. https://doi.org/10.3390/info11020110 3. Caldelli R, Filippini F, Becarelli R (2010) Reversible watermarking techniques: an overview and a classification. EURASIP J Inf Secur 2010:1–19. https://doi.org/10.1155/2010/134546 4. Douglas M, Bailey K, Leeney M, Curran K (2017) An overview of steganography techniques applied to the protection of biometric data. Multimedia Tools Appl 77:17333–17373. https:// doi.org/10.1007/s11042-017-5308-3 5. Hartung F, Kutter M (1999) Multimedia watermarking techniques. Proc IEEE 87:1079–1107. https://doi.org/10.1109/5.771066 6. Lyatsky S (2018) Digital watermarking techniques. Master Thesis, Department of Computer Science, School of Graduate Studies, Alabama A & M University 7. Potdar VM, Song H, Elizabeth C (2005) A survey of digital image watermarking techniques. In: 3rd IEEE International conference on industrial informatics (INDIN 2005), pp 709–716. https://doi.org/10.1109/INDIN.2005.1560462 8. Rey C, Dugelay J-L (2002) A survey of watermarking algorithms for image authentication. EURASIP J Adv Signal Process 2002. https://doi.org/10.1155/s1110865702204047 9. Tao H, Chongmin L, Mohamad Zain J, Abdalla AN (2014) Robust image watermarking theories and techniques: a review. J Appl Res Technol 12:122–138. https://doi.org/10.1016/s1665-642 3(14)71612-8 10. Tyagi S, Singh HV, Agarwal R, Gangwar SK (2016) Digital watermarking techniques for security applications. In: 2016 International conference on emerging trends in electrical electronics & sustainable energy systems (ICETEESES). https://doi.org/10.1109/iceteeses.2016. 7581413 11. Chan C-K, Cheng LM (2004) Hiding data in images by simple LSB substitution. Pattern Recogn 37:469–474. https://doi.org/10.1016/j.patcog.2003.08.007 12. Chang Y-J, Wang R-Z, Lin J-C (2009) A sharing-based fragile watermarking method for authentication and self-recovery of image tampering. EURASIP J Adv Signal Process 2008. https://doi.org/10.1155/2008/846967 13. Lenarczyk P, Piotrowski Z (2013) Parallel blind digital image watermarking in spatial and frequency domains. Telecommun Syst 54:287–303. https://doi.org/10.1007/s11235-0139734-x 14. Guitart Pla O, Lin ET, Delp III EJ (2004) A wavelet watermarking algorithm based on a tree structure. SPIE Proc. https://doi.org/10.1117/12.531459 15. Vaidya SP, Mouli PVSSRC (2015) Adaptive digital watermarking for copyright protection of digital images in wavelet domain. Procedia Comput Sci 58:233–240. https://doi.org/10.1016/ j.procs.2015.08.063 16. Vaishnavi D, Subashini TS (2015) Robust and invisible image watermarking in RGB color space using SVD. Procedia Comput Sci 46:1770–1777. https://doi.org/10.1016/j.procs.2015. 02.130 17. Alshoura WH, Zainol Z, Teh JS, Alawida M, Alabdulatif A (2021) Hybrid SVD-based image watermarking schemes: a review. IEEE Access 9:32931–32968. https://doi.org/10.1109/acc ess.2021.3060861 18. Begum M, Uddin MS (2020) Analysis of digital image watermarking techniques through hybrid methods. Adv Multimedia 2020:1–12. https://doi.org/10.1155/2020/7912690

58

N. G. Resmi

19. Benoraira A, Benmahammed K, Boucenna N (2015) Blind image watermarking technique based on differential embedding in DWT and DCT domains. EURASIP J Adv Signal Process 2015. https://doi.org/10.1186/s13634-015-0239-5 20. Du M, Luo T, Li L, Xu H, Song Y (2019) T-SVD-based robust color image watermarking. IEEE Access 7:168655–168668. https://doi.org/10.1109/access.2019.2953878 21. Durafe A, Patidar V (2022) Development and analysis of IWT-SVD and DWT-SVD steganography using fractal cover. J King Saud Univ Comput Inf Sci 34:4483–4498. https://doi.org/10. 1016/j.jksuci.2020.10.008 22. Poonam, Arora SM (2018) A DWT-SVD based robust digital watermarking for digital images. Procedia Comput Sci 132:1441–1448. https://doi.org/10.1016/j.procs.2018.05.076 23. Alshoura WH, Zainol Z, Teh JS, Alawida M (2020) A new chaotic image watermarking scheme based on SVD and IWT. IEEE Access 8:43391–43406. https://doi.org/10.1109/access.2020. 2978186 24. Ansari A, Saavedra G, Martinez-Corral M (2020) Robust light field watermarking by 4D wavelet transform. IEEE Access 8:203117–203133. https://doi.org/10.1109/access.2020.303 5912 25. Cheema AM, Adnan SM, Mehmood Z (2020) A novel optimized semi-blind scheme for color image watermarking. IEEE Access 8:169525–169547. https://doi.org/10.1109/access. 2020.3024181 26. SIPI Image Database—Misc. http://sipi.usc.edu/database/database.php?volume=misc. Last accessed 05 Jun 2021

Chaotic Map Based Encryption Algorithm for Secured Medical Data Analytics S. Sumathi, S. Gopika, S. Nivedha, and R. Kalaimathi

1 Introduction Almost all online-using companies now use the internet to transmit digital information, making it a prevalent method by doing it. Concerning assurance for information security, it is crucial to obtain information across an open, vulnerable network [1]. For considerable information security, transfer required information across wide, vulnerable networks. The cloud should be secured while storing and transmitting patient medical information between two hospitals. Due to this, it is crucial to protect sensitive data using cryptography. Using of cryptography is to alter data by executing data encryption utilizing a formula that only experts can comprehend. So this algorithm is commonly noticed as a key. Encrypted data is the end product of the process. Decryption is reforming range from traditional cipher systems like Caesar, Trifid, and so on. With the advent of digital computers, cryptography has incorporated a variety of digital formats, such as text, video, and others. The hackers attempted to capture similar photographs earlier to determine whether a patient has any abnormalities. With the assistance of steganography techniques, one can conceal sensitive information through various forms of communication, “Cover Object” is the name given to the channel that continues to camouflage. The cover object is frequently any digital object with the ability to conceal data, such as a photograph, audio file, or video, as opposed to a text that cannot mask the existence of confidential data. Text, video, music, or an image are frequently the primary pieces of information (in digital form) that we would like to conceal within the cover item. The cover image next to the masked information is a stego image. Image steganography is a type of steganography where an image is as a cover object. The most crucial factor to pay attention to when creating an Image Steganography technique is the appearance of the image S. Sumathi (B) · S. Gopika · S. Nivedha · R. Kalaimathi Department of ECE, Velammal Engineering College, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_5

59

60

S. Sumathi et al.

may change significantly after concealing the secret message within the cover image, and an untrained eye will notice that the image is undeniably different from the original. If the cover image does not visually demonstrate the modifications made to it when concealing a secret message, an image steganography technique is said to be excellent. Nonetheless, statistical analysis can be used to determine the changes. The two main areas where steganography is enforced frequently are the transform domain and spatial domain. The transformed coefficients are induced by translating the cover objects to a completely different frequency domain in the transform domain. The adjusted coefficients were altered to hide private information. To create Stego signals, the coefficients are transformed in the reverse direction. In addition, the modification of confidential data of a pixel’s intensity value in the spatial domain is stored and concealed. Techniques in the spatial domain, as opposed to those in the transform domain, are more susceptible to attacks because of variations in the actual sample values.

2 Related Work The relevant research on the theme of cryptography with its associated method is covered in this portion. The idea about cryptography was presented by Kester et al. [2], which involves specific procedure where an end user secures their data so that a recipient can read the secured information. Public and private keys are both utilized in cryptography. The terms plaintext, cipher text, and other terminology related to data security, such as confidentiality, integrity, and availability, were defined by Kester et al. in their paper published in 2013 [3]. In this publication, ECG in combination with Hill Cipher and AES is proposed as Hybrid technique in [2, 3]. A hybrid algorithm combines several different kinds of algorithms. In this study, Dixit [4] employed the AES algorithm and the cost and security of AES as its evaluation criteria. This document explains the AES encryption and decryption processes. Several kinds of symmetric and asymmetric algorithms were involved in [5] through [4]. AES and DES are compared using several factors that are available in the form of a table. The weaknesses of conventional symmetric and asymmetric algorithms were discussed by S. Mishra et al. [6] in addition to a new method that combines ECC and Blowfish to offer the highest level of security and confidentiality for data. A multilevel security approach that gives higher protection than single-level security in encryption was submitted in [7]. The suggested algorithm has various directions of speed and safety. Several cloud model types, with their benefits and drawbacks, were discussed in [7] and [8]. Concerns about data security when user data stays in the cloud are also examined using other criteria. They described how Blowfish works, improved RSA, and compared the new method with time. It provides quick encryption time and effective key management. In order to achieve safe cloud accumulation of healthcare information, Chinnasamy and Deepalakshmi [9] explored on mixed approach and proposed improvements to the RSA algorithm’s key generation mechanism, and the speed of the recommended method is quicker than that of existing RSA

Chaotic Map Based Encryption Algorithm for Secured Medical Data …

61

methods. The hybrid design for a medical system was described on One time pad and the RSA method, which is also concerned with cryptographic approach in [10, 11]. In addition to the enquired encryption technique for information reserve with the decryption procedure for information fetch, developed the structure for cloudrelated data reserve and fetch. Authors of [12] and [13] discussed security using various cryptographic approaches. The authors evaluate many methods and suggest an elliptic curve cryptography-based method to guarantee security with a shorter key length.

3 Existing Techniques The program initially examines each matrix component to build an image object, indicating that the photos are input. The different types of medical images are shown in Fig. 1. The image’s size is then determined. Following the size computation, the entropy is estimated. It will be tougher for a traducer to decipher the real image. Determine the simple text’s average. The secret key (SK) is formed after determining the mean of the plain image. The red, green, and blue components are separately indicated. A picture is molded and altered. In the same dimension as the original image’s “b,” “r,” or “g,” add the r, “g,” and “b” arrays. Subsequently, the conversion of data into picture layout is formed and blurred.

3.1 Algorithm For solving a problem, a collection of rules, methods, or techniques are used called as Algorithm. A combination of encrypted customs for implementing encryption is a predefined mathematical technique. For remodeling facts to their distinctive shape, data is made inside cipher textual content and calls by using these set of rules. It pulls to the theme of cryptography which is widely used in data security and transferring.

3.2 Cryptography For the receiver of the data to understand it and how it was transmitted, it must be stored and carried in a specified format using advanced mathematical ideas. Cryptography’s central concept is encryption. Understanding the action of encoding an information to a theme was described by cryptographic approach. The encryption to cipher test and processing via a data exchange channel can be performed in a person’s simple material, whereas plaintext content shouldn’t be prone to a backdoor. The cipher text is decoded at the other end to reveal the ideal information that was initially transmitted.

Fig. 1 Different types of medical images

X-RAY

CT SCAN

MRI SCAN

THERMOG RAPH

PET SCAN

62 S. Sumathi et al.

Chaotic Map Based Encryption Algorithm for Secured Medical Data …

63

3.3 Encryption It is a technique for encrypting data with the aid of cryptography. Encryption has been done to protect information.

3.4 Decryption Decryption specifies a process that involves cryptographic technique to open the encrypted data [1].

3.5 Key Data encryption and decryption, a secret piece of information, such as a password, is used. In cryptography, several different types of keys are used.

3.6 Steganography Protecting data from unauthorized person and outsiders is the only thing that hurts. Steganography differs from encryption in that snoopers won’t be able to decipher any hidden information in a photograph, text, or other pieces of media.

3.7 Symmetric Encryption The ability to cipher and decipher data with just one secret key makes this encryption approach the most effective. Symmetric encryption is a time-tested and widely used technique. The number, phrases, or random string of characters which is collectively called as Mystery Key. It’s combined with textual information or a message to encapsulate the content in a certain way. Both the sender and the receiver must be familiar with the title of the key used to encrypt and decrypt the data. Few examples for symmetric encryption are AES, DES, 3DES, and RCG.

4 Proposed Methodology All the process undergone in the medical data transmission is shown in Figs. 1, 2.

64

S. Sumathi et al. secret key

user defined key

Encrypon Technique

Original Image

Decrypon Technique

QR code

Reconstructed Image

Fig. 2 Process flow diagram

4.1 Secret Key Generation In the preparation for pixel rearrangement, all the image’s pixels were initially reserved in an array in which array ordering was carried out. By using the classifying method, every pixels were mixed and sorted in increasing order of values (R,G,B), with 0, 0, 0 pixels appearing at the top if any are present, and 255, 255, 255 pixels appearing in the last position if any are. Because the goal of sorting was to lessen the correlation between pixel values, the order of categorizing is unrelated of values, i.e., R, G, and B. The specified correlation technique, which sorts the pixel values, is superior to block shifting, the key generation process shown in Fig. 3. When a proportionally large key can be used, the encryption method becomes robust. Because the key used in the suggested technique is composed of many blocks (i.e., RB), each block contains a certain number of bytes, the key is quite large. This increases the size of the bits used by the proposed technique to represent the secret key. The ratio of randomness and the combined use of the key in the execution of operations in the encryption technique determines the complexity of the key. The suggested method makes use of an extremely random key that is taken directly from the image. Additionally, it employs keys at two different stages of the operations’ implementation.

PIXEL SWAPPING

Fig. 3 Secret key generation

SECRET KEY GENERATION

Chaotic Map Based Encryption Algorithm for Secured Medical Data …

65

4.2 Hahn’s Discrete Orthogonal Moment Discrete orthogonal polynomials serve as the moment kernels for Hahn moments. We employ the weighted versions of polynomials to calculate the moments, increasing numerical stability. According to Eq. 1, the polynomials are reemployed in a dissociable manner, with one set of polynomials being for each dimension. Given a digitalized image f (x, y) with size N × N, the (m + n)th order of Hahn moment of image is Hmn =

N −1 N −1  

f (x, y)h −(µ,v) (x, N )h −(µ,v) (y, N ) m n

X =0 y=0

It indicates that the image can be completely reconstructed by calculating its discrete orthogonal moments up to order 2N − 2. This property makes the discrete orthogonal moments superior to the conventional continuous orthogonal moments. To extract the image moment set {Hmn }(0 ≤ m, n ≤ N − 1), we can simply use the following matrix notation, H = HxT fHy Where f denotes the N × N image matrix, Hx denotes Hahn’s discrete orthogonal moments matrix in x-axis, Hy denotes Hahn’s discrete orthogonal moments matrix in y-axis, Hahn moments can be set into global feature extraction mode. In this experiment, the Hahn moments of the image are first calculated and, subsequently, their image representation power is verified by reconstructing the image from the moments and measuring the difference between the original image and the reconstructed image using the Mean Squared Error (MSE). The images are converted to gray-scale format and are each resized to M × N = 128 × 128. A sample image and its reconstructed versions using Hahn moments are shown in figure. This makes Hahn moments a unique set of feature descriptors in their own right. This paper aims to highlight the generalization property of Hahn moments and to show how this property can be properly exploited to make Hahn moments a useful set of image feature descriptors. In addition, we have also shown how Hahn moments can be incorporated into the framework of normalized convolution to analyze local structures of irregularly sampled signals. This is built upon the fact that the set of Hahn polynomials spans a weighted space defined by the related weight function, which for the case of Hahn polynomials resembles the Gaussian function.

66

S. Sumathi et al.

4.3 QR Code The QR code is a two-dimensional barcode. The strong qualities of code make it handy in a variety of applications. The advantages of the QR code include improved data-encoding capacity, readability from a distance (any angle ranging from 0 to 360°), compact size, dimensions, and robust noise and damage resistance. For these reasons, different services and products use QR codes produced by various industries. The field of information security is progressively using QR codes for image copyrights protection. QR codes are applied to secure the patient diagnostic information available during remote transmission and the ECG signals recorded in medical facilities. Strong encryption and a zero-watermarking strategy use this code for image copyright protection. The information about the protected image was encoded into a QR code in this project.

4.4 Modified Logistic Map The logistic map uses the discrete model of a straightforward nonlinear equation, a one-dimensional chaos system, and is displayed as follows: xn+1 = r xn (1 − xn ) The parameter value for r is [0, 1]. When the parameter r is equal to 4, the expression xn+1 [0,1] indicates that the deterministic function is chaotic. The bifurcation diagram of the logistic map is shown in Fig. 4. The logistics map F:(−1, 1) modification is under consideration (−1, 1). Real numbers that are both positive and negative can be used as the initial value, which will also produce real numbers that are both positive and negative alternately for each iteration. The modified logistic map is divided into two functions, g1(xn) and h1 (xn). Figure 5 shows the Lyapunov Exponent Graph for the modified Logistic map.

5 Results and Discussions A randomly selected large number of medical images represented in Fig. 1 is tested, and various sizes of different medical images are calculated for the encryption time. Those encrypted time values are represented in Table 1. The decryption time values between the six randomly selected images are represented in Table 2. It could be found that the encryption and decryption time between different medical images were all less than other existing methods. We consider the DES algorithm, AES algorithm, and Blowfish algorithm. Therefore, we could take the feature vector of the medical image by Hahn’s discrete orthogonal moments, and associate it with

Chaotic Map Based Encryption Algorithm for Secured Medical Data …

67

Fig. 4 Bifurcation diagram Fig. 5 Lyapunov exponent graph

Table 1 Encryption time for various-sized medical images Size/algorithm

DES

Blowfish

AES

Proposed

1 KB

0.00099682

0.00199651

0.00199604

0.00086945

10 KB

0.00199508

0.00199556

0.00298547

0.00097705

100 KB

0.00997257

0.00798106

0.00797772

0.00786342

1 MB

0.04088544

0.02692794

0.03390717

0.03166244

watermarking using a QR code, so as to realize encryption and decryption algorithm using a modified Logistic map to reduce the processing time effectively.

68

S. Sumathi et al.

Table 2 Decryption time for various-sized medical images Size/algorithm

DES

Blowfish

AES

Proposed

1 KB

0.000994682

0.00099206

0.00098604

0.000846821

10 KB

0.002992392

0.00299263

0.00186456

0.001862124

100 KB

0.004985809

0.001995087

0.00657245

0.004653216

1 MB

0.034943819

0.023936033

0.01563871

0.031543674

References 1. Kester Q (2013) A visual cryptographic encryption technique for securing medical images. Int J Emerg Technol Adv Eng 3(6):496–500 2. Kester QA (2013) Image encryption based on the RGB PIXEL transposition and shuffling. Int J Comput Netw Inform Secur 5(7):43–50. https://doi.org/10.5815/ijcnis.2013.07.05 3. Kester Q, Nana L, Pascu AC (2013) A novel cryptographic encryption technique for securing digital images in the cloud using AES and RGB pixel displacement. Europ Modell Sympos Manchester 293–298. https://doi.org/10.1109/EMS.2013.51 4. Dixit P, Gupta AK, Trivedi MC, Yadav VK (2018) Traditional and hybrid encryption techniques: a survey. In: Networking communication and data knowledge engineering, Springer, pp 239– 248 5. Chowdhary CL, Patel PV, Kathrotia KJ, Attique M, Perumal K, Ijaz MF (2020) Analytical study of hybrid techniques for image encryption and decryption. Sensors 20(18):5162. https:// doi.org/10.3390/s20185162 6. Mishra S, Dastidar A (2018) Hybrid image encryption and decryption using cryptography and watermarking technique for high security applications. Int Conferen Curr Trends Converg Technol (ICCTCT) 2018:1–5. https://doi.org/10.1109/ICCTCT.2018.8551103 7. Abdullah A (2017) Advanced encryption standard (aes) algorithm to encrypt and decrypt data. Cryptograph Netw Secur 16:1–11 8. Zeebaree SR (2020) DES encryption and decryption algorithm implementation based on FPGA. Indonesian J Electric Eng Comput Sci 18(2):774–781. https://doi.org/10.11591/ijeecs.v18.i2. pp774-781 9. Chinnasamy P, Padmavathi S, Swathy R, Rakesh S (2021) Efficient data security using hybrid cryptography on cloud computing. InInventive communication and computational technologies, Springer, pp 537–547 10. Hidayat T, Mahardiko R (2020) A Systematic literature review method on aes algorithm for data sharing encryption on cloud computing. Int J Artific Intell Res 4(1):49–57 11. Semwal P, Sharma MK (2017) Comparative study of different cryptographic algorithms for data security in cloud computing. In: 2017 3rd international conference on advances in computing, communication & automation (ICACCA) (Fall), pp 1–7. https://doi.org/10.1109/ICACCAF. 2017.8344738 12. Al-gohany NA, Almotairi S (2019) Comparative study of database security in cloud computing using aes and des encryption algorithms. J Inform Secur Cybercrimes Res 2(1):102–109 13. Yassein MB, Aljawarneh S, Qawasmeh E, Mardini W, Khamayseh Y (2017) Comprehensive study of symmetric key and asymmetric key encryption algorithms. Int Conferen Eng Technol (ICET) 2017:1–7. https://doi.org/10.1109/ICEngTechnol.2017.8308215

Gold Price Forecast Using Variational Mode Decomposition-Aided Long Short-Term Model Tuned by Modified Whale Optimization Algorithm Sanja Golubovic, Aleksandar Petrovic , Aleksandra Bozovic , Milos Antonijevic , Miodrag Zivkovic , and Nebojsa Bacanin

1 Introduction Economic and financial health of banks and stock markets are strongly related to the price of gold. Investment risk constantly increases as a result of its price alterations, which are influenced by a variety of factors, making it difficult and complex to anticipate gold’s future price [6]. To make informed decisions and reduce potential risks, accurate gold price prediction is essential for economy. This is especially important during the crises, as the one which is currently ongoing. The field of artificial intelligence (AI) offers techniques that can effectively address highly challenging problems, like gold price prediction, with greater accuracy. Among these techniques, deep learning has proven to be the most effective due to its ability to facilitate faster learning, higher adaptability, and more accurate predictions, particularly when analyzing time-series data. In long short-term memory (LSTM) deep learning models, as an enhanced instance of recurrent neural network (RNN) has proven to be extremely efficient in processing time-series datasets. S. Golubovic · A. Petrovic · M. Antonijevic · M. Zivkovic · N. Bacanin (B) Singidunum University, Danijelova 32, 11000 Belgrade, Serbia e-mail: [email protected] S. Golubovic e-mail: [email protected] A. Petrovic e-mail: [email protected] M. Antonijevic e-mail: [email protected] M. Zivkovic e-mail: [email protected] A. Bozovic The Academy of Applied Technical Studies, Katarine Ambrozic 3, 11000 Belgrade, Serbia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_6

69

70

S. Golubovic et al.

While using these techniques can offer significant benefits, they do have a downside in that they require hyperparameter tuning for every particular problem, which is non-deterministic polynomial hard (NP-hard) problem. This process is subject to the no free lunch (NFL) theorem, according to which it is not possible to determine a single solution that could solve all problems. As a result, the optimization process has to be fine-tuned for each particular challenge. It must be noted that the problems with this level of complexity can’t be addressed by using deterministic approaches, as they are not feasible with respect to the temporal and computational resources they would require, therefore, stochastic methods must be employed. Since metaheuristics optimization methods are known as the robust NP-hard problem solvers, within the approach proposed in the presented manuscript, a redesigned version of whale optimization algorithm (WOA) [23] has been devised and applied to LSTM tuning for gold price forecasting. Additionally, since the gold price time series are non-stationary, the variational mode decomposition (VMD) has been employed to execute decomposition of the gold price time series before inputting to the LSTM. Moreover, evaluation was performed for one-step, two-step, and three-step ahead forecasting. The contributions of the research presented herein may be summarized in the following way: • An ameliorated variant of the WOA optimization algorithm has been devised, aiming to address the known deficiencies of the elementary WOA. • This algorithm was used as a component of the forecasting infrastructure with a specific aim to optimize the LSTM network’s hyperparameters for the gold price anticipation problem. • The suggested method has been validated over the gold price dataset and collated with the performance attained by other well-known metaheuristics used for the same task. The arrangement of the manuscript is the following: Sect. 2 describes the technologies that are the backbone of the proposed solution. The original algorithm has been provided in Sect. 3, together with the description of the improvements to the original algorithm. The experimental process is given in Sect. 4. Lastly, Sect. 5 provides final observations and ideas for the work laying ahead.

2 Background and Related Works This part of the manuscript first gives the short background on LSTM networks, followed by the description of the variation mode decomposition methodology. Afterwards, survey on metaheuristics optimization is provided, together with successful applications of metaheuristics algorithms in tuning various machine learning models. Finally, this section provides a short overview of the AI-based applications that deal with gold price predictions.

Gold Price Forecast Using Variational Mode Decomposition- …

71

2.1 LSTM Overview Neural networks are a sort of artificial intelligence experiencing a growing surge in popularity over the past few years. They may be employed for a range of purposes, including speech and image recognition, and natural language processing [21]. Deep learning is a branch of neural networks that focus on training deep neural networks with numerous layers [12]. The RNN is one kind of neural networks that have received a lot of attention lately. Processing sequential data, like voice or text, is where RNNs excel. They function by passing data from one time step to the next, which enables them to retain data from earlier time steps [16]. Well-known examples of recurrent neural networks are LSTM and BiLSTM. LSTM is particularly useful for tasks that involve predicting sequences of data, such as predicting the next word in a sentence or the next value in a time series [10]. Because of their distinctive design, LSTM networks can preserve long-term reliances within the data [13]. These include an input gate, an output gate, and a forget gate, among other gates that supervise the data flow across the network. This behavior allows the network to choose to either retain or omit information from previous phases. The cell state is altered by the gates at each time step [16] and runs through the entire network. The cell state acts as a type of memory for the network, enabling it to save data throughout lengthy data sequences. The LSTM gate formulas are as follows: Input gate: i t = σ (Wi · [h t−1 , xt ] + bi )

(1)

Forget gate: f t = σ (W f · [h t−1 , xt ] + b f )

(2)

Output gate: ot = σ (Wo · [h t−1 , xt ] + bo )

(3)

Cell state: ct = f t · ct−1 + i t · tanh(Wc · [h t−1 , xt ] + bc )

(4)

Hidden state: h t = ot · tanh(ct )

(5)

In these formulas, xt denotes the input at time step t, h t−1 represents the hidden state at the preceding time step, and W and b are weight and bias matrices, respectively. LTSM networks are an effective tool for forecasting data sequences because of their capacity to preserve long-term dependencies in the data, and their design has been expanded and altered in many ways to fit a range of purposes [13].

72

S. Golubovic et al.

2.2 Variation Mode Decomposition Details Variation mode decomposition is recognized as an efficient signal processing technique that can divide a signal into its different frequency components [9]. VMD is a data-driven method that can handle complicated and non-stationary signals [7], unlike conventional signal processing approaches that necessitate prior knowledge of the signal or its frequency components. It operates by breaking a signal down into a finite number of modes or components, each mode corresponding to a certain frequency range. The highest energy mode in the signal is iteratively retrieved from it by the VMD algorithm, which then repeats the procedure until all the modes are extracted [22]. VMD is a useful tool in various fields that include biomedical signal processing [27], speech recognition [20], and image processing. This is due to its adaptability in handling many types of data. The capability of VMD to handle non-stationary signals is one of its main benefits. The fact that non-stationary signals’ frequency components shift over time makes them difficult to evaluate. Due to its data-driven methodology, VMD is able to accurately deconstruct signals into their many frequency components while also adapting to these changes. The capability of VMD to efficiently manage noise is another benefit. The regularization term in the algorithm works to stabilize the decomposition [31] and lessen the impact of noise. All things considered, variation mode decomposition is a useful method that can aid academics and professionals in a variety of situations where signal separation is necessary.

2.3 Metaheuristics Optimization A class of algorithms known as metaheuristics optimization is used to handle challenging optimization issues [30]. These algorithms are heuristic approaches aimed at discovering a workable solution in a reasonable period of time [5]. Many optimization issues, including continuous and discrete, single and multi-objective, static and dynamic, can be handled using metaheuristics. Metaheuristics come in a variety of forms, each with its own unique approach to optimization. Examples include the simulation of natural selection and evolution in swarm metaheuristic arithmetic optimization algorithm (AOA) for LSTM tuning [26], and firefly algorithm (FA) tuning for adaptive boosting (AdaBoost) [25]. Examples of nature-adapted algorithms include tabu search which uses a memory structure to avoid revisiting previously explored solutions [11], and ant colony optimization being motivated by the behavioral patterns of ants while they search the shortest path to a food source [8]. Swarm metaheuristics have been proven to discover adequate solutions in a fair amount of time [17], metaheuristics continue to be a popular option for optimization issues despite the difficulty of parameter tuning. Additionally, the previously stated is of high importance due to the difficulty of the problems for which the swarm

Gold Price Forecast Using Variational Mode Decomposition- …

73

intelligence (SI) is successful in solving. Real-world problems do not rarely depict non-deterministic polynomial-time hardness of complexity (NP-hard). Various different examples of real-world applications of metaheuristics optimization include security and spam email classification [18], recent COVID-19 pandemicrelated applications [35], cryptocurrencies forecasts [28], energy production predictions [29], air pollution detection [3, 19], intrusion detection [2], and tuning of machine learning models in general [4].

2.4 AI Applications for Gold Price Forecasting The price of gold is a topic of interest for investors, traders, and economists worldwide. Accurate gold price forecasting can help investors make well-informed decisions by revealing important market patterns. Predicting gold prices is now easier and more accurate than ever thanks to developments in artificial intelligence (AI) and ML. Firstly, several standard machine learning structures, such as random forest, support vector machine (SVM), and ANN [32], have been utilized for the prediction of the price of gold. To forecast future prices, these models examine previous gold prices as well as other market variables like interest rates, inflation, and currency exchange rates. However, the quality of the data and feature engineering has a significant impact on how accurate these models are. Secondly, various deep network models such as BiLSTM, LSTM, and others have been used for gold price forecasting. In time-series forecasting, particularly predicting the price of gold, LSTM, and BiLSTM models have demonstrated outstanding performance. It has been shown that these models outperform conventional machine learning models and are capable of grasping long-term dependencies and patterns in the data [26]. Lastly, for the purpose of forecasting the price of gold, various studies have merged metaheuristics, deep learning, and conventional machine learning models. For instance, LSTM and particle swarm optimization (PSO) were combined in a research by Gupta et al. [14] to predict the price of gold. Extreme gradient boosting (XGBoost), BiLSTM, and the gray wolf optimization algorithm were used in a different research by Nes et al. [24] to estimate gold prices. These hybrid structures are capable of enhancing the accuracy of gold price predictions by combining the benefits of deep learning with metaheuristics. In conclusion, the advancements in AI and ML have opened up new possibilities for predicting the price of gold. There are a variety of AI applications for forecasting the price of gold, ranging from conventional machine learning models to deep learning and hybrid models. The combination of deep learning and metaheuristics has shown promising results and can be further explored in the future research to enhance the accuracy of gold price forecasting.

74

S. Golubovic et al.

3 Whale Optimization Algorithm This section commences by briefly introducing the elementary variant of the whale optimization algorithm (WOA). Afterward, the suggested enhanced WOA variant that has later been used to tune the LSTM is described.

3.1 Elementary WOA WOA [23] is a metaheuristic optimization algorithm that imitates humpback whales’ hunting techniques. It was suggested by Mirjalili et al. [23], and the algorithm was well adopted due to its usefulness and simplicity. WOA searches for the optimal solution using two essential components: exploration and exploitation [23]. During the exploration phase, WOA looks for prospective prey by examining the search space using random solutions. The method leverages the current best solutions to converge toward the optimal solution during the exploitation phase. Initialization, search agent updating, encircling prey, and spiral updating are the algorithm’s four primary operators. An initial population of candidate solutions is generated during the initialization stage, and each candidate solution’s location is updated during the search agent updating step using the current best solution and a random vector. The spiral updating operator mimics the spiraling motion of whales, where the location of individual whale is updated toward the most recent best solution. The encircling prey operator resembles the encircling behavior of whales, where each whale’s location is updated toward a randomly selected individual.

3.2 Proposed Improved WOA Algorithm Used in Time-Series Forecasting Framework Despite the fact that WOA has demonstrated success in optimizing a variety of problems, it has some potential drawbacks, including premature convergence caused by a lack of diversity in the initial population, limited scalability in high-dimensional problems, sensitivity to parameter settings, and a lack of theoretical analysis of its optimization behavior. In the approach taken in this manuscript, a refined technique is proposed to enhance the elementary exploitation mechanism. By incorporating search components from the famous FA, performance may be significantly improved, leveraging the FA’s highly efficient exploitation process. The components of the new FA are detailed in Eq. (6). −γ ri2j

yi = yi + ζ0 ei

j (y j − yi ) + νt

(6)

Gold Price Forecast Using Variational Mode Decomposition- …

75

where yi represents the position of the i-th agent, ri , j is the distance between two agents, and ν0 represents the attraction between insects at range r = 0. Gamma is the medium absorption coefficient, and rand is a value chosen at random from [0, 1]. Last but not least, zeta is a randomization parameter being steadily decreased over rounds. To increase exploration power, the suggested method incorporates the FA search techniques over even further iterations. This novel approach has been utilized to leverage the exploration capabilities during later stages of the tuning process. To do this, a new parameter named psi is proposed, specifying the count of rounds when the FA mechanism should become active. Empirically, a threshold value of T3 was discovered. Each agent also receives the parameter omega. Each agent chooses a value for omega from a uniform distribution within limits [0, 1] in case t > psi. Every individual conducts a standard WOA search in case omega < 0.5 or else employs FA search. This described altered metaheuristics is named the improved whale optimization algorithm (IWOA). Algorithm 1 IWOA Generate the starting populace of whales Yi (i = 1, 2, ...N ) − → Fitness calculation and identification of X ∗ (t) best search agent. repeat for k ← 1 to M (total count of whales) do if ψ < T3 then Update solution’s position conducting WOA search else Select random number within [0, 1] for solution ω if ω < 0.5 then Update solution’s position conducting WOA search else Update solution’s position conducting FA search end if end if end for Increment iteration counter t = t + 1 until t > Imax or the stopping condition reached Output: best fitness and position.

4 Experimental Setup The following section describes the used dataset and the experimentation assessment measures. Following this, the specifics of the experimental process are given. For this analysis, a real, publicly accessible dataset on daily changes in the price of gold from 2015 to 2022 was employed. Yet for the sake of these experiments, only the period from January 1, 2020, to August 5, 2022, has been taken into account. As

76

S. Golubovic et al.

Fig. 1 Visualization of data splitting for the day-to-day gold price dataset

shown in Fig. 1, for the model training, a combined 70% of the data was utilized, with 10% allocated for evaluating the technique, and the remaining 20% set aside specifically for testing purposes. This study employs an empirical approach to compare the effectiveness of several cutting-edge metaheuristic algorithms in optimizing LSTM network parameters for three-step ahead predictions based on six lags. The VMD technique is used with K = 3, which means the input signal is decomposed into three subsignals and one residual, utilizing four columns, as shown in Fig. 2. The LSTM network is optimized by the proposed IWOA method, using the following control parameters: the learning rate (lb_lr = 0.0001, ub_lr = 0.01), quantity of training epochs (lb_epochs = 300, ub_epochs = 600), dropout rate (lb_dr opout = 0.05, ub_dr opout = 0.2), hidden layer bounds (lb_layer s = 1, ub_layer s = 3), and number of neurons in the hidden layers (lb_nn = 100, ub_nn = 200, lb_nn1 = 100, ub_nn1 = 200). Based on empirical tests highlighting their significant impact on performance, these parameters were selected for tuning. Additionally, the observed experiment can be classified as a hybrid NP-hard task due to a combination of continuous value ranges for particular parameters and discrete value ranges for others. The optimization goal is minimizing the MSE, although other indicators such as R2 and MAE are also captured. The performance of the introduced IWOA metaheuristics has been evaluated against the outcomes of four other powerful algorithms that were observed in the comparative analysis. These algorithms were basic WOA implementation, firefly algorithm (FA) [34], Harris hawks’ optimization (HHO) [15], and reptile search algorithm (RSA) [1] that were all implemented independently and evaluated in the identical simulation setup. Over 8 rounds, the metaheuristic algorithms were entrusted with leveraging the efficacy of LSTM structures, allocated by the populace of five solutions each. Furthermore, to account for the volatile behavior of metaheuristics techniques and their significant processing demands, these experiments were conducted across 30 individual executions.

Gold Price Forecast Using Variational Mode Decomposition- …

77

Fig. 2 Gold price—VMD decomposition of price Table 1 Performance metrics of regarded methodologies in terms of overall objective function Method Best Worst Mean Median Std Var VMDLSTMIWOA VMDLSTMWOA VMDLSTM-FA VMDLSTMHHO VMDLSTM-RSA

0.000575

0.000678

0.000623

0.000644

4.68E – 05

2.19E – 09

0.000605

0.000699

0.000639

0.000635

3.31E – 05

1.09E – 09

0.000636

0.000665

0.000656

0.000660

9.70E – 06

9.40E – 11

0.000621

0.000675

0.000642

0.000638

2.01E – 05

4.04E-10

0.000615

0.000685

0.000647

0.000642

2.67E – 05

7.13E – 10

4.1 Results and Discussion Using deconstructed signal inputs, all metaheuristic algorithms were evaluated, and their prediction performance was graded using the criteria provided. In all tables, best metrics are denoted with bold style. The findings of the MAE averaged across 15 different iterations for each analyzed metaheuristic are shown in Table 1. Comprehensive Table 2 provides an overview of measurements of the top-performing LSTM structure for three steps forward. Lastly, the most appropriate collection of parameters that each metaheuristics generated in the best run are shown in Table 3. Table 1 indicates that the introduced IWOA grounded technique attained the ultimate scores for the best and mean values regarding all implemented techniques. The basic version of the whale optimization has obtained the best median value where it reaches lower scores as well as in standard deviation and variance. For a better understanding of the achieved improvements, detailed metrics are provided in Table 2. The best performance was achieved by VMD-LSTM-IWOA for two steps ahead predictions across all metrics, as well as the best overall results for all metrics. One-step ahead predictions proved to be most suited for the firefly

78

S. Golubovic et al.

Table 2 Evaluation metrics for the LSTM structure that yielded the three steps ahead Error VMDVMDVMDindicator LSTMLSTMLSTM-FA IWOA WOA One-step ahead

Two-step ahead

Three-step ahead

Overall results

best results for forecasting VMDLSTMHHO

VMDLSTM-RSA

R2

0.700250

0.679444

0.716704

0.702338

0.656462

MAE MSE RMSE R2

0.021615 0.000789 0.028089 0.808667

0.022369 0.000844 0.029047 0.788911

0.020555 0.000746 0.027307 0.773175

0.021537 0.000783 0.027991 0.789218

0.023039 0.000904 0.030070 0.794171

MAE MSE RMSE R2

0.017304 0.000504 0.022441 0.835998

0.017932 0.000556 0.023571 0.842261

0.018757 0.000597 0.024434 0.785385

0.017981 0.000555 0.023554 0.800152

0.017686 0.000542 0.023276 0.848511

MAE MSE RMSE R2

0.016250 0.000432 0.020777 0.781638

0.015433 0.000415 0.020376 0.770205

0.018256 0.000565 0.023767 0.758421

0.017634 0.000526 0.022935 0.763903

0.015244 0.000399 0.019968 0.766381

MAE MSE RMSE

0.018390 0.000575 0.023974

0.018578 0.000605 0.024594

0.019189 0.000636 0.025216

0.019051 0.000621 0.024929

0.018656 0.000615 0.024797

Table 3 Choice of hyperparameter’s values established by every metaheuristics method Method Layer 1 Learning Epochs Dropout Layer count Layer 2 neurons rate neurons VMDLSTMIWOA VMDLSTMWOA VMDLSTM-FA VMDLSTMHHO VMDLSTM-RSA

109

0.010000

600

0.050000

3

148

176

0.010000

600

0.052703

1

100

108

0.010000

553

0.200000

3

200

185

0.006805

575

0.070399

3

114

191

0.010000

600

0.069048

1

100

Gold Price Forecast Using Variational Mode Decomposition- …

79

Fig. 3 Visualizations of the convergence and box plots for both objective and R 2 indicator

algorithm-based solution, while the reptile search algorithm-based solution provided the best results for the three-step ahead prediction. VMD-LSTM-FA and VMDLSTM-RSA obtained the best solutions for their respective time-step predictions across all compared metrics. With respect to the no free lunch theorem [33] stating that there is no singular technique that can be universally excellent for every optimization task, and hence the obtained results are expected and the true improvement is valued through the overall results, which the VMD-LSTM-IWOA dominates.

80

S. Golubovic et al.

Fig. 4 Visualizations of violin plot diagrams and KDE plots for both objective and R 2 indicator

Gold Price Forecast Using Variational Mode Decomposition- …

81

Fig. 5 Gold price—VMD-LSTM-IWOA

Aiming to facilitate the visualizations of the attained scores, the convergence diagrams, box and violin plots, as well as kernel density estimation (KDE) diagrams for both MSE (used as the fitness function) and R 2 (indicator) are provided in Fig. 3 and Fig. 4. These diagrams clearly show the fast convergence exhibited by the suggested approach when compared to other competitors. Lastly, the results for the top-performing model in comparisons against real values are provided in Fig. 5.

5 Conclusion The contribution of this manuscript is a powerful model to address a pressing issue in the global banking system. The introduced model has been trained on real-world data; hence, it is suitable for real-world use. The gold’s price high complexity and volatility result in a need for a robust model for reliable predictions. The problem that was tackled in this research is of univariate time-series nature and hence the application of LSTM model which has proven to provide results with these types of problems. The hyperparemeter tuning of the LSTM was performed by the IWOA, while the VMD handled data complexity. The improvements to the original WOA that were performed include the modified search strategy hybridized with the FA metaheuristics. The goal of this change is to boost exploration at later iteration phases. The predictions that were performed include the one-step, two-step, and, three-step ahead forecasts. Overall objective performance values are provided for best, worst, mean, median, standard deviation, and variance. The method was compared against three other strong metaheuristic optimizers as well as the original WOA for the R 2 score, MAE, MSE, and RMSE. Future research should concentrate on further enhancing the aforementioned method and applying it to address additional significant real-world issues in the financial industry. Modern metaheuristic applications will be further studied in an effort to enhance the provided method.

82

S. Golubovic et al.

References 1. Abualigah L, Abd Elaziz M, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191:116158 2. Bacanin N, Petrovic A, Antonijevic M, Zivkovic M, Sarac M, Tuba E, Strumberger I (2023) Intrusion detection by xgboost model tuned by improved social network search algorithm. In: Modelling and development of intelligent systems: 8th international conference, MDIS 2022, Sibiu, Romania, Oct 28–30, 2022, Revised Selected Papers. Springer, pp 104–121 3. Bacanin N, Sarac M, Budimirovic N, Zivkovic M, AlZubi AA, Bashir AK (2022) Smart wireless health care system using graph LSTM pollution prediction and dragonfly node localization. Sustain Comput Inf Syst 35:100711 4. Bacanin N, Stoean C, Zivkovic M, Jovanovic D, Antonijevic M, Mladenovic D (2022) Multiswarm algorithm for extreme learning machine optimization. Sensors 22(11):4204 5. Blum C, Roli A (2003) Metaheuristics in combinatorial optimization: overview and conceptual comparison. ACM Comput Surv (CSUR) 35(3):268–308 6. Chen Y, Liu J, Wang J (2020) Research on gold price forecasting based on grey relational analysis and support vector machine. J Appl Math 7. Cong F, Ren Y, Wu H (2018) A review of variational mode decomposition. Measurement 127:286–301 8. Dorigo M, Stützle T (2010) Ant colony optimization. MIT press 9. Dragomiretskiy K, Zosso D (2014) Variational mode decomposition. IEEE Trans Signal Process 62(3):531–544 10. Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471 11. Glover F (1989) Tabu search-part i. ORSA J Comput 1(3):190–206 12. Goodfellow I, Bengio Y, Courville A (2016) Deep learning, vol 1. MIT press 13. Graves A (2012) Supervised sequence labelling with recurrent neural networks. Springer 14. Gupta A, Mittal M, Aggarwal A (2021) Prediction of gold price using LSTM and particle swarm optimization. J Ambient Intell Humanized Comput 12(5):4985–4998 15. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Future Gener Comput Syst 97:849–872 16. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 17. Hutter F, Hoos HH, Leyton-Brown K (2014) An efficient approach for assessing hyperparameter importance. In: Proceedings of the 31st international conference on machine learning (ICML14). pp 754–762 18. Jovanovic D, Antonijevic M, Stankovic M, Zivkovic M, Tanaskovic M, Bacanin N (2022) Tuning machine learning models using a group search firefly algorithm for credit card fraud detection. Mathematics 10(13):2272 19. Jovanovic L, Jovanovic G, Perisic M, Alimpic F, Stanisic S, Bacanin N, Zivkovic M, Stojic A (2023) The explainable potential of coupling metaheuristics-optimized-xgboost and shap in revealing vocs’ environmental fate. Atmosphere 14(1):109 20. Kim JH, Lee H (2016) An efficient feature extraction method for speech emotion recognition using variational mode decomposition. Appl Sci 6(3):67 21. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 22. Li X, Yang Y (2018) Adaptive vibration signal decomposition based on variational mode decomposition and hilbert transform. Mech Syst Signal Process 101:270–288 23. Mirjalili S, Mirjalili SM, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67 24. Nesa SF, Ahmed S, Islam MA (2021) Forecasting of gold price using hybrid models of machine learning and metaheuristic optimization algorithms. Soft Comput 25(6):4403–4420 25. Petrovic A, Bacanin N, Zivkovic M, Marjanovic M, Antonijevic M, Strumberger I (2022) The adaboost approach tuned by firefly metaheuristics for fraud detection. In: 2022 IEEE world conference on applied intelligence and computing (AIC). IEEE, pp 834–839

Gold Price Forecast Using Variational Mode Decomposition- …

83

26. Petrovic A, Jovanovic L, Zivkovic M, Bacanin N, Budimirovic N, Marjanovic M (2023) Forecasting bitcoin price by tuned long short term memory model. In: 1st International conference on innovation in information technology and business (ICIITB 2022). Atlantis Press, pp 187– 202 27. Sannino G, Ciaramella A, Bifulco P (2021) ECG signal denoising by variational mode decomposition. Comput Biol Med 129:104187 28. Stankovic M, Jovanovic L, Bacanin N, Zivkovic M, Antonijevic M, Bisevac P (2022) Tuned long short-term memory model for ethereum price forecasting through an arithmetic optimization algorithm. In: Innovations in bio-inspired computing and applications: proceedings of the 13th international conference on innovations in bio-inspired computing and applications (IBICA 2022) Held During Dec 15–17, 2022. Springer, pp 327–337 29. Stoean C, Zivkovic M, Bozovic A, Bacanin N, Strulak-Wójcikiewicz R, Antonijevic M, Stoean R (2023) Metaheuristic-based hyperparameter tuning for recurrent deep learning: application to the prediction of solar energy generation. Axioms 12(3):266 30. Talbi EG (2009) Metaheuristics: from design to implementation. Wiley 31. Tian Y, Bai Z, Chen Y (2018) An improved variational mode decomposition algorithm based on total variation regularization. Signal Process 146:92–101 32. Wang Y, Li H, Li X (2017) Gold price prediction using machine learning: a study towards an application of artificial intelligence in finance. J Financ Data Sci 1(1):14–23 33. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82 34. Yang XS (2009) Firefly algorithms for multimodal optimization. In: Watanabe O, Zeugmann T (eds) Stochast Algorithms: Found Appl. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 169–178 35. Zivkovic M, Bacanin N, Venkatachalam K, Nayyar A, Djordjevic A, Strumberger I, Al-Turjman F (2021) Covid-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain Cities Soc 66:102669

Requirements for a Career in Information Security: A Comprehensive Review Mike Nkongolo, Nita Mennega, and Izaan van Zyl

1 Introduction The demand for IS professionals has surged due to the escalating cyber-threats [1–5]. This study focuses on the shortage of skilled IS experts and the specific challenges faced by women’s gender in entering this field. Additionally, it examines the diverse nature of cyber-threats, including their impact on data security, financial vulnerabilities, and instances of cyberbullying [6]. Recent research conducted by the IS Workforce has revealed a significant deficit of 3,000,000 IS professionals [7]. This scarcity is primarily attributed to the growing number and sophistication of cyberthreats. The proliferation of internet connectivity and the widespread use of wireless networks have expanded the attack surface for hackers, resulting in increased security breaches. The consequences of cyber-threats extend beyond financial impacts. Zero-day attacks, for example, can cause substantial data loss and financial harm. Furthermore, cyberbullying poses a psychological threat to individuals, as it can lead to embarrassment and potentially prompt drastic actions. Home automation, involving IoT devices, introduces another avenue for cyber-threats, allowing malicious actors to gain control over household systems. A concerning scenario includes unauthorized manipulation of residential gate controls by hackers [7]. While there is a shortage of IS professionals in general, women encounter distinct challenges in entering the field. Factors such as long and irregular working hours for maintenance windows contribute to the underrepresentation of women in the IS profession. However, there is growing recognition of the need to foster greater gender diversity in the IS landscape [8], as it brings unique perspectives and skills to address the evolving threatscape (Fig. 1). M. Nkongolo (B) · N. Mennega · I. van Zyl Department of Informatics, University of Pretoria, Pretoria, South Africa e-mail: [email protected] N. Mennega e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_7

85

86

M. Nkongolo et al.

Fig. 1 Field of cybersecurity

To effectively establish and maintain robust cybersecurity systems, professionals in the field of Information Technology must possess a comprehensive understanding of the interplay between hardware and software components [9]. Experience and technological knowledge are crucial in developing the requisite skills necessary for success in the IS domain. Furthermore, individuals specializing in IS should embody specific qualities [10]. A comprehensive review of the literature was conducted to acquire a more profound comprehension of the requirements and potential challenges encountered by individuals aspiring to pursue careers in the field of IS. It is worth noting that the specific requirements may vary depending on the particular job specifications. Moreover, this study explores an approach to fostering IS awareness and encouraging greater participation in the IS field. The article is structured into distinct sections, with Sect. 2 providing a concise overview of the IS landscape. Section 3 delineates the research methodology employed in this study, while Sect. 4 presents the findings derived from the literature review. Finally, Sect. 5 concludes the study by succinctly summarizing the key findings and their implications.

2 Cybersecurity Foundation Numerous private and public organizations face significant security challenges arising from the utilization of wireless communications [11]. The repercussions of such actions can be severe, as they have the potential to disrupt governmental systems

Requirements for a Career in Information Security: A Comprehensive …

87

by introducing viruses that encrypt sensitive data. For instance, malicious codes can be deployed to instruct a National Database to crash, leading to disruptions in national security and causing substantial harm [11, 12]. A vishing attack tricks entities into divulging sensitive information [13]. According to reports from the Federal Bureau of Investigation (FBI), scams have emerged as some of the most prevalent crimes since 2021 [14]. In addition, the IC3 (Internet Crime Complaint Center) has reported an overwhelming number of complaints, exceeding 28,000, that are related to vishing themes. Given the seriousness of these challenges, the field of IS demands a diverse set of IS skills to effectively safeguard critical infrastructures against cyberattacks [15, 16].

2.1 Information Security Expertise To safeguard critical infrastructures against an ever-growing array of threats, it is imperative to automate data privacy and security measures. Information security skills encompass a range of areas related to deep packet inspection [16, 17]. Additionally, soft skills such as critical thinking, adaptability, and communication play a crucial role in ensuring comprehensive IS measures. Within the realm of cybersecurity careers, limited research has been conducted on the IS landscape. However, a study by [18] identified several essential skills, including teamwork and ethical decision making. Another study [19] highlighted the significance of technical skills, specifically experience in detecting network attacks and proficiency in utilizing various products for deep packet inspection. Furthermore, [20] stated that many IS-related job positions emphasized the importance of prior work experience. In summary, a comprehensive approach to cybersecurity requires both technical expertise and a range of soft skills to effectively protect critical infrastructures from threats. The duties of an IS professional could encompass the following: • Providing support at Tier 1, Tier 2, and Tier 3 levels for network Traffic Management Function (TMF) solutions. • Conduct regular monthly reviews of client systems to identify and address any issues. • Planning, coordinating, and implementing resolutions for the identified issues on client systems. • Carrying out monthly backups of each client’s configuration [21, 22]. Moreover, one needs to possess the capability to deploy diverse user-friendly interfaces aimed at monitoring network traffic and identifying potential malicious intrusions. The interface should be intuitive, while also offering advanced features that show the necessary level of detail to monitor the network traffic [21, 22]. A comprehensive understanding of networking is crucial to detect potential network anomalies [22]. The company may undertake the responsibility of providing relevant training to the cybersecurity employee. This training could be conducted internally or, when necessary, through training programs offered by trusted vendor partners. However, it

88

M. Nkongolo et al.

is worth noting that such vendor training programs often involve significant financial investments on the part of the company.

2.2 Duties and Tasks of an IS Professional A key focus for the IS professional involves analyzing the network traffic to identify and thwart network threats. Their responsibilities encompass various tasks such as assisting in the implementation of deep packet inspection systems [23]. Data analysis and utilization of Machine Learning (ML) skills are essential for creating insightful reports and dashboards used for operational purposes to effectively identify and combat malicious intrusions [24]. Troubleshooting proficiency is crucial for identifying and resolving system issues and ensuring the smooth delivery of network services. Efforts should be made to enhance the efficiency, quality, and value of traffic management systems [25]. To succeed in this position, candidates must demonstrate a strong ability to think creatively and proactively in developing solutions that enhance intrusion detection.

2.3 Job Nature and Requirements An IS professional’s responsibilities include designing, developing, and optimizing network systems, ensuring system reliability, and implementing new policies for network management. They utilize their knowledge to provide support, analyze statistics, and make recommendations for more efficient network attack detection [26, 27]. To excel in this role, one should have a strong understanding of networking protocols [27], proficiency in networked applications, familiarity with operating systems, skills in database administration, data analysis, and knowledge of artificial intelligence and machine learning.

3 Research Approach To identify pertinent research articles, a systematic literature review methodology was employed. The research questions guiding this review are depicted in Fig. 2.

3.1 Words Used to Search To find relevant published articles, the search was conducted using the keywords depicted in Fig. 3.

Requirements for a Career in Information Security: A Comprehensive …

89

Fig. 2 Investigation queries

Fig. 3 The search terms employed to identify pertinent articles

3.2 Selection Criteria This study incorporated academic articles and conference proceedings that focused on IS topics.

90

M. Nkongolo et al.

3.3 Rejection Criteria Non-peer-reviewed and inadequately described articles were excluded from this study.

3.4 Data Gathering Figure 4 depicts the databases employed for data retrieval, while Fig. 5 outlines the article selection process. Figure 5 displays the diagram employed in this research. The PRISMA diagram represents the "Preferred Reporting Items for Systematic Reviews and Meta-Analyses". This diagram enables authors to showcase the literature review’s quality. Following the exploration of literature using the aforementioned keywords (depicted in Fig. 3) and the acquisition of a total of 1,520 citations, we initially eliminated any duplicate entries. Subsequently, we evaluated the titles and abstracts of the articles to determine their relevance in addressing the research questions.

Fig. 4 Academic repositories used

Requirements for a Career in Information Security: A Comprehensive …

91

Fig. 5 PRISMA flowchart depicting the article screening process

The search process identified 31 relevant research articles listed in Table 1. Figure 6 showcases a Wordcloud that highlights the key keywords extracted from the literature on cybersecurity. The analysis of Wordcloud indicates an increasing use of machine learning (ML) in the IS industry, highlighting the significance of ML skills for IS professionals. ML, a subset of artificial intelligence (AI), involves training algorithms to recognize patterns from data [28, 29]. In the field of IS, ML has diverse applications, particularly in threat detection and automated response. One notable application is malware classification [2, 3, 28], where ML classifiers assign scores to network traffic samples based on their maliciousness, representing the confidence level of the classification or prediction process [29–31].

3.5 Quality Appraisal The quality of the 31 research articles was assessed using four QA questions, and each article was given a subjective score to indicate its relevance. The QA questions

92

M. Nkongolo et al.

Table 1 Finalized articles (Key: South Africa-SA, Australia-Aus) Nr Citation

Country

Focus of paper

1

Yair Levy (2013)

USA

Skills needed for a career in cybersecurity

2

Von Solms and Van Niekerk (2013) SA

Threats and risks involved in the cyber environment and how it impacts individuals

3

Reeves et al. (2021)

Aus

Training is needed to acquire the appropriate skills for cybersecurity

4

Li et al. (2019)

USA

Training is needed to acquire the appropriate skills for cybersecurity

5

Hoffman et al. (2012)

USA

Training is needed to acquire the appropriate skills for cybersecurity

6

Hadlington (2017)

UK

The focus is on the skills needed for a career in cybersecurity

7

Gratian et al. (2018)

USA

Intentions of an individual when working in a cyber domain

8

Furnell et al. (2017)

UK

Skills needed for a career in cybersecurity

9

Furnell (2021)

UK

Skills needed for a career in cybersecurity

10 Dawson and Thomson (2018)

USA

Workforce and the work environment of a cybersecurity specialist

11 Catota et al. (2019)

USA

Training is needed to acquire the appropriate skills for cybersecurity

12 Caldwell (2013)

USA

Skills needed for a career in cybersecurity

13 Burley et al. (2014)

USA

Workforce and the work environment of a cybersecurity specialist

14 Shahriar et al. (2016)

USA

Skills that need to be improved for a cybersecurity career

15 Jeong et al. (2019)

Aus

Personality of an individual working in a cybersecurity career

16 Chowdhury et al. (2018)

Aus

Time pressure and how it impacts individuals with tasks to be completed in a certain timeframe

17 Paulsen et al. (2012)

USA

Workforce and the work environment of a cybersecurity specialist

18 Bagchi-Sen et al. (2010)

USA

Skills needed for a career in cybersecurity (continued)

Requirements for a Career in Information Security: A Comprehensive …

93

Table 1 (continued) Nr Citation

Country

Focus of paper

19 Liu and Murphy (2016)

USA

Education is needed to acquire the appropriate skills for cybersecurity and which gender is mostly dominant in the cybersecurity career

20 Javidi and Sheybani (2018)

USA

Training is needed to acquire the appropriate skills for cybersecurity

21 Sharevski et al. (2018)

USA

Training skills for cybersecurity

22 Besnard and Arief (2004)

UK

Computer security and factors that influence cybersecurity

23 Martin and Rice (2011)

Aus

Cybercrime and the impact it has on individuals and communities

24 Bauer et al. (2017)

USA

Making people and communities aware of the impact of cybercrime so that they are more careful with their information

25 Abawajy (2014)

Aus

Make people aware of information security in different ways

26 Albrechtsen and Hovden (2009)

Norway

Emphasize the different viewpoints between specialists and users of information security

27 Christopher et al. (2017)

USA

Identify the cybercrime trends and educate specialists

28 Baskerville et al. (2014)

USA, Italy

Identify and explain the threat paradigm versus the response paradigm of a cyber-attack

29 Rajan et al. (2021)

India, UK

Management of cybersecurity in an organization

30 Hong and Furnell (2021)

China, UK, SA Behavior of individuals exposed to cybercrime and how they respond to it

31 Kam et al. (2020)

USA

Learning is required to improve skills of the cybersecurity workforce

are depicted in Fig. 7, and the corresponding QA ratings for each paper are recorded in Table 2. Table 2 demonstrates that four papers achieved a QA rating score of three or higher out of four, whereas three papers received a score of one or lower. These findings highlight the high relevance of the majority of the 31 final papers, suggesting their valuable contribution to address the research questions.

94

M. Nkongolo et al.

Fig. 6 Terms employed in the cybersecurity domain

Fig. 7 Questions used to assess the quality of research articles

4 Results Figure 8 showcases the publication trend of IS research starting from 2004. The figure reveals a consistent upward trajectory in the interest and focuses on cybersecurity over the years [32–34]. Notably, there was a noticeable decrease in the number of publications during the year 2020, which can be attributed to the global shift toward remote work due to the COVID-19 pandemic. This renewed surge indicates the continued importance and relevance of the subject in the field of research [34–36]. Figure 9 presents a column chart depicting the distribution of published articles across various IS concepts. The chart highlights that the most prominent concept explored in the articles was the skill set needed for an IS career [27]. Following closely behind were topics related to IS education and training, as well as the IS workforce. The top contributors to published articles on IS are Australia and the United States. Addressing cyber-attacks that pose a threat to critical infrastructures remains a significant challenge. To effectively tackle these issues, IS professionals must possess specific skills and expertise [17].

Requirements for a Career in Information Security: A Comprehensive …

95

Table 2 Ratings applied to the selected research papers Citation

QA1

QA2

QA3

QA4

Score

Yair Levi (2013)

Partial

Yes

No

Partial

2.0

Von Solms and Van Niekerk (2013)

No

Yes

No

Partial

1.5

Reeves et al. (2021)

No

Yes

Yes

Partial

2.5

Li et al. (2019)

No

Yes

No

Yes

2.0

Hoffman et al. (2012)

Partial

Yes

Yes

No

2.0

Hadlington (2017)

No

Yes

No

Yes

2.0

Gratian et al. (2018)

No

Yes

Partial

Yes

2.0

Furnell et al. (2017)

No

No

Partial

No

0.5

Furnell (2021)

Yes

Yes

Partial

Partial

3.0

Dawson and Thomson (2018)

Yes

Yes

Partial

No

2.5

Catota et al. (2019)

Yes

Yes

Yes

No

3.0

Caldwell (2013)

Partial

Yes

Partial

No

2.0

Burley et al. (2014)

No

Yes

Partial

No

1.5

Shahrian et al.

No

Partial

No

No

0.5

Jeong et al.

No

Yes

Partial

Yes

2.5

Chowdhury et al.

No

Yes

Partial

Yes

2.5

Paulsen et al. (2012)

Partial

Yes

Yes

No

2.5

Bagchi-Sen et al. (2010)

Yes

Yes

Yes

Partial

3.5

Liu and Murphy)

Yes

Yes

Partial

Partial

3.0

Javidi and Sheybani)

Yes

Yes

Yes

Partial

3.5

Sharevski et al.)

Partial

Yes

Yes

No

2.5

Besnard and Arief (2004)

No

Yes

Partial

No

1.5

Martin and Rice (2011)

No

Yes

No

No

1.0

Bauer et al. (2017)

No

Yes

Partial

No

1.5

Abawajy (2014)

No

Yes

Partial

No

1.5

Albrechtsen and Hovden (2009)

Partial

Yes

No

No

1.5

Christopher et al. (2017)

No

Yes

Partial

Partial

2.0

Baskerville et al. (2014)

No

Yes

No

No

1.0

Rajan et al. (2021)

No

Yes

Partial

No

1.5

Hong and Furnell (2021)

No

Yes

Partial

Yes

2.5

Kam et al. (2020)

Partial

Yes

Yes

No

2.5

Organizations can bridge the skills gap by implementing IS awareness training programs such as paper-based materials, interactive games, simulations, and videobased training programs. This study investigates the prerequisites for pursuing a career in IS and examines the relative importance of technical and soft skills.

96

M. Nkongolo et al.

Fig. 8 Annual tally of cybersecurity publications

Fig. 9 Key themes extracted from the analyzed papers

The findings reveal that technical aptitude assumes a more significant role in this domain, with specific proficiencies like troubleshooting and familiarity with vendor products emerging as vital. Moreover, the study highlights that full-time employment in IS typically encompasses an average of 40 working hours per week, occasionally necessitating after-hours assistance for clients.

5 Conclusion This research brings novelty by conducting a comprehensive systematic literature review that examines the requirements, skills, and knowledge necessary for a career in IS. It employs rigorous methodologies, including specific keyword-based searches in bibliographic databases, to collect relevant data for analysis. Through thematic analysis, the study uncovers valuable insights into IS skills, education, training,

Requirements for a Career in Information Security: A Comprehensive …

97

and awareness initiatives, contributing to the understanding of the field. Notably, it sheds light on the demand for female professionals in IS and proposes policies and game theory strategies to enhance data security and privacy. The intention to develop an IS awareness game further demonstrates a novel approach to promoting data protection and privacy. Lastly, the emphasis on encouraging females to enter the field and progress through experience and qualifications highlights an important aspect of fostering diversity and inclusivity in the IS field.

References 1. Kam H-J, Menard P, Ormond D, Crossler RE (2020) Cultivating cybersecurity learning: an integration of self-determination and flow. Comput Secur 96:101875 2. Nkongolo M, Van Deventer JP, Kasongo SM (2021) Ugransome 1819: a novel dataset for anomaly detection and zero-day threats. Information 12(10):405 3. Nkongolo M, Van Deventer JP, Kasongo SM, Zahra SR, Kipongo J (2022) A cloud based optimization method for zero-day threats detection using genetic algorithm and ensemble learning. Electronics 11(11):1749 4. Nkongolo M, van Deventer JP, Kasongo SM (2023) The application of cyclostationary malware detection using boruta and pca. In: Smys S, Lafata P, Palanisamy R, Kamel KA (eds) Computer networks and inventive communication technologies, Springer Nature Singapore, Singapore, pp 547–562 5. Nkongolo M, Van Deventer JP, Kasongo SM, Van Der Walt W, Kalonji R, Pungwe M (2022) Network policy enforcement: an intrusion prevention approach for critical infrastructures. In: 2022 6th international conference on electronics, communication and aerospace technology, pp 686–692 6. Hadlington L (2017) Human factors in cybersecurity; examining the link between internet addiction, impulsivity, attitudes towards cybersecurity, and risky cybersecurity behaviours. Heliyon 3(7):e00346 7. Von Solms R, Van Niekerk J (2013) From information security to cyber security. Comput Secur 38:97–102 8. Bagchi-Sen S, Rao HR, Upadhyaya SJ, Chai S (2010) Women in cybersecurity: a study of career advancement. IT Profess 12(1):24–31 9. Choi M, Levy Y, Hovav A (2013) The role of user computer self-efficacy, cybersecurity countermeasures awareness, and cybersecurity skills influence on computer misuse. In: WISP 2012 proceedings. Retrieved from https://aisel.aisnet.org/wisp2012/29 10. Graham CM, Lu L (2022) Skills expectations in cybersecurity: semantic network analysis of job advertisements. J Comput Inform Syst 1–13 11. Li Y, Liu Q (2021) A comprehensive review study of cyber-attacks and cyber security; emerging trends and recent developments. Energy Reports 7:8176–8186 12. Snehi M, Bhandari A (2021) Vulnerability retrospection of security solutions for softwaredefined cyber–physical system against ddos and iot-ddos attacks. Comput Sci Rev 40:100371 13. Bordoff S, Chen Q, Yan Z (2017) Cyber attacks, contributing factors, and tackling strategies: the current status of the science of cybersecurity. Int J Cyber Behav Psychol Learn (IJCBPL) 7(4):68–82 14. Kumar S, Agarwal D (2018) Hacking attacks, methods, techniques and their protection measures. Int J Adv Res Comput Sci Manage 4(4):2253–2257 15. Rege A, Bleiman R (2023) A free and community-driven critical infrastructure ransomware dataset. In: Onwubiko C, Rosati P, Rege A, Erola A, Bellekens X, Hindy H, Jaatun MG (eds) Proceedings of the international conference on cybersecurity, situational awareness and social media, Springer Nature Singapore, Singapore, pp 25–37

98

M. Nkongolo et al.

16. Posey C, Roberts TL, Lowry PB (2015) The impact of organizational commitment on insiders’ motivation to protect organizational information assets. J Manag Inf Syst 32(4):179–214 17. Furnell S (2021) The cybersecurity workforce and skills. Comput Secur 100:102080 18. Sussman LL (2021) Exploring the value of non-technical knowledge, skills, and abilities (ksas) to cybersecurity hiring managers. J High Educ Theory Pract 21(6) 19. Peslak A, Hunsinger DS (2019) What is cybersecurity and what cybersecurity skills are employers seeking? Issues Inform Syst 20(2) 20. Brooks NG, Greer TH, Morris SA (2018) Information systems security job advertisement analysis: skills review and implications for information systems curriculum. J Educ Bus 93(5):213–221 21. Vogel R (2016) Closing the cybersecurity skills gap. Salus J 4(2):32–46 22. Adams M, Makramalla M (2015) Cybersecurity skills training: an attacker-centric gamified approach. Technol Innov Manage Rev 5(1) 23. Nkongolo M, van Deventer JP, Kasongo SM, van der Walt W (2023) Classifying social media using deep packet inspection data. In: Ranganathan G, Fernando X, Rocha A (eds) Inventive communication and computational technologies, Springer Nature Singapore, Singapore, pp 543–557 24. Nkongolo M (2023) Using arima to predict the growth in the subscriber data usage. Eng 4(1):92–120. Retrieved from https://www.mdpi.com/2673-4117/4/1/6 25. Nkongolo M, van Deventer JP, Kasongo SM (2022) Using deep packet inspection data to examine subscribers on the network. Procedia Comput Sci 215:182–191. 4th international conference on innovative data communication technology and application. https://www.scienc edirect.com/science/article/pii/S1877050922020920 26. Pompili D, Akyildiz IF (2009) Overview of networking protocols for underwater wireless communications. IEEE Commun Mag 47(1):97–102 27. Bukauskas L, Brilingaite A, Juozapavicius A, Lepaite D, Ikamas K, Andrijauskaite R (2023) Remapping cybersecurity competences in a small nation state. Heliyon e12808 28. Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C (2018) Machine learning and deep learning methods for cybersecurity. IEEE Access 6:35365–35381 29. Nkongolo M (2018) Classifying search results using neural networks and anomaly detection. Educor Multidisciplin J 2(1):102–127 30. M. Bada and J. R. Nurse, The social and psychological impact of cyberattacks, in Emerging cyber threats and cognitive vulnerabilities. Elsevier, 2020, pp. 73–92. 31. Potter LE, Vickers G (2015) What skills do you need to work in cyber security? a look at the australian market. In: Proceedings of the 2015 ACM SIGMIS conference on computers and people research, pp 67–72 32. Kilincer IF, Ertam F, Sengur A (2021) Machine learning methods for cyber security intrusion detection: datasets and comparative study. Comput Netw 188:107840 33. Gohil M, Kumar S (2020) Evaluation of classification algorithms for distributed denial of service attack detection. In: 2020 IEEE third international conference on artificial intelligence and knowledge engineering (AIKE), IEEE, pp 138–141 34. Dotcenko S, Vladyko A, Letenko I (2014) A fuzzy logic-based information security management for software-defined networks. In: 16th international conference on advanced communication technology, IEEE, pp 167–171 35. Alsamiri J, Alsubhi K (2019) Internet of things cyber attacks detection using machine learning. Int J Adv Comput Sci Applicat 10(12) 36. Cobb S (2016) Mind this gap: Criminal hacking and the global cybersecurity skills shortage, a critical analysis. In: Virus bulletin conference, pp 1–8

Intrusion Detection Using Bloom and XOR Filters R. Manimegalai, Batul Rawat, S. Naveenkumar, and M. H. N. S. Sriram Raju

1 Introduction Network security is essential to protect the network and the data flowing in the network. The goal of intrusion detection system is to prevent data breaches, intrusions, and cyber threats. It protects client data, ensures reliable access, and increases network performance. Few of the traditional network security measures available for network are: firewall, access control, antivirus software, gateway, and remote access VPN [1]. Security breaches can result in data loss or system damage. If the installed software applications are not regularly patched and updated, hackers can attack user PCs. Hackers and intruders have easier time accessing outdated software [2]. It is necessary to install bug fixes that are released in vendor security patches and updates, such as memory storage violations and buffer overflows. The attacks are vicious and dismal in which information as well as equipment are held hostages in exchange for money or other assets. As it is possible to download dangerous code [3], in life-saving medical devices, intrusions may lead to life threatening situation. Ransomware may either permanently block the access or lock the system without damaging any files, etc. Hacktivism is caused by anonymous set of attackers, who also initiate Distributed Denial of Service (DDoS) attacks to alter the traffic or to cause extensive damage to the entire system [4]. For example: DDoS attacks such as email bombing can bring down the performance. Intrusion detection is the process of looking at various events that take place in a network, analyzing those events, and then identifying intrusions behavior to take appropriate action. Monitoring system may be set up to help detect suspicious activities and notify relevant parties. Intruders R. Manimegalai · B. Rawat (B) · S. Naveenkumar · M. H. N. S. Sriram Raju Department of Computer Science and Engineering, PSG Institute of Technology and Applied Research, Coimbatore, India e-mail: [email protected] R. Manimegalai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_8

99

100

R. Manimegalai et al.

break into the system to acquire access to a particular computer or network for their own purposes. An intrusion detection system is a piece of hardware or software that recognizes the possible intrusions in the network. In real-time circumstances, IDS tracks these assaults, attempts to thwart them, and produces a thorough report for administrators. Modern cybersecurity requires the use of NIDS, which provides business access to real-time threats and enables them to swiftly identify and respond to security events. Network traffic is examined by NIDS to search for patterns and behaviors that indicate an attack. A variety of network-based threats, including port scans, denial-of-service attacks, malware infestations, and unwanted access attempts, can be found using NIDS. It operates by examining the data packets that are being transmitted over a network and comparing them to a set of predetermined rules or signatures that denote known attack patterns or unusual behaviors. NIDS are available in various forms such as: hybrid, anomaly-based, and signature-based.

2 Literature Survey Cuckoo filters [2] are used for set membership tests. It has three main factors: support for dynamic deletion, better lookup performance, and better space efficiency. Though the comparisons show that Cuckoo filter is performing better than any variants of Bloom filter, its performance is slower than that of the XOR filter. There are a few challenges in Cuckoo filters which can be overcome by XOR filter approach. Ezzaki et al. [5] have indicated that Bloom filters and its variants perform well for various cases using different algorithms. But Bloom filters are mainly considered best for pre-filtering process rather than being the only way to filter the intruders due to its limitations of consuming more space and minimum lookup performance. Kumar et al. [6] have presented a review on network-based intrusion detection systems, its approaches, and the most common datasets used to evaluate IDS Models. Network-based intrusions are split into four categories, namely, statistical, bioinspired, knowledge-based, and machine learning-based. Network-based intrusion detection system can be implemented using data mining techniques [7], which automatically detect attacks against computer networks and systems. It mainly focuses on two specific contributions, namely, unsupervised anomaly detection technique or association pattern analysis-based module. It is observed that anomoly-based is successful but when there is high volume of connections, association-based module is used. The misuse-based detection uses well-known attack signatures to match and identify attacks. The need for hybrid approach is emphasized because the combined strength is required for better accuracy and detection rates. Ely et al. [8] have proposed a way to store dictionaries that relate keys to values in the style of Bloom filters. Comparatively speaking, a dictionary takes up far less space than a hash table. It permits storing n keys, each of which is mapped to a value that is a string of k bits long. The benefit of this method is that pairwise independent hash functions are necessary, negating the need for a complex hash function. Few examples include databases that store differential files, dictionaries for incorrect passwords, internet

Intrusion Detection Using Bloom and XOR Filters

101

caching, and distributed storage systems. Other data structures such as sets consume linear memory whose size is determined by the number of items in the set. When the elements stored in the set do not have concise representation, it presents a huge advantage. Consider strings, which typically are of 500-bit length. A hash table for storing 10,00,000 string need at least 500*10,00,000 bits, necessitating the use of a hard disc and making lookups expensive. The main memory can easily accommodate more than 16.875 Megabytes, which is needed to implement a simple Bloom filter-based structure. Bloom filters do not support the deletion of elements and are only supported for membership tests. Since they are based on a probabilistic model, they produce false positive probability, which is produced when a bloom filter treats an element that doesn’t belong to a set. The multi-pattern-matching algorithm based on bloom filter data structure [9] has a false positive probability which needs to be reduced for the hardware device to give us accurate results. Dharmapurika and Lockwood [10] have developed a unique hash table data structure called as extended Bloom filter that performs better than a naive hash table in terms of performance and minimizes the amount of memory visits required for the most time-consuming lookups. This hash table enables designers to get higher lookup performance for a particular memory bandwidth due to which the need for buffering is eliminated in the lookup engine. The suggested approach supports exact matching using a Bloom filter data structure. The proposed work in [10] provides better throughput for router applications which use these unique hash tables making it quicker than a naive hash table. Bloom filter is used mainly for set membership testing and XOR filter is an advanced and improvised version of Bloom filter. Weaver, etc. [11] have proposed XORSAT filter which is one of the first simple filter construction used for very large datasets whose size goes beyond 100 million rows. Efficiency is highly increased and queries are processed at high speed using the methodology proposed in [11]. According to Liu and Lang [13], machine learning models detect attack variations and novel attacks when a huge set of training data is available. Additionally, machine learning-based IDSs are simple to design and build as they do not rely on domain expertise. Deep learning approaches include more deep networks that can be used to improve IDS performance. Since deep learning techniques are independent of feature engineering and domain knowledge, they offer considerable advantage over shallow machine learning models [12]. Bioinspired methodologies are being used to expand the IDS statistical approach with new techniques. These techniques mostly rely on the swarm intelligence approach or evolutionary theory [15]. To determine the best fit and most appropriate bio-inspired algorithm, it is necessary to consider factors such as convergence, intensification, diversification, CPU time, etc. Besides the usage of various data structures such as Bloom and XOR, IDS can be implemented using Cuttlefish Feature Selection Algorithm and Support Vector Machine [16]. The proposed approach in [16] uses feature selection algorithm based on CRF to select the most relevant features from the network data. The MSVM classifier is applied to the selected features to classify the network traffic as either normal or anomalous. The implementation is presented

102

R. Manimegalai et al.

as a case study using the NSL-KDD dataset and depicts how high accuracy rates and low false positive rates can be achieved when compared to the traditional IDS. Zuech et al. [17] have discovered that a NIDS is helpful in the juristic process by finding the footprints of breaches. Attacks spread from one computer to another via routers and switches in a network. A NIDS monitors network traffic data at routers or switches in the network layer. Based on pattern matching of the network traffic data, NIDS can be further divided into anomaly-based and misuse-based IDS. Anomaly detection uses pattern-based traffic flow analysis, and when a pattern deviates from expected behavior, intrusive information is inferred. In contrast, misuse detection is used to compare results with a preset set of guidelines using parametric analysis attack features and signatures. Bloom filter is very popular for membership testing and has been used in variety of applications such as high-speed packet routing for IP based networks. When Bloom filters are used in packet routing [18], it decreases the number of lookups and pre-processes the incoming routing requests or queries. To improve the performance and usability, optimized hash functions can be used which are partitioned in few buckets or sets [19]. It is easy to work with dense routing table and a collision-free approach is possible which results at a minimum cost. Bloom filter and related data structures are used to implement NIDS to avoid unnecessary network access to intruders. Filters are sent from one computer on the network to the other computer [20]. Network-based Intrusion Detection System is mainly used to identify the intruder, using various efficient data structures. The data structures such as Bloom and XOR filters are used for implementing NIDS in the proposed system. NIDS using Bloom and XOR filter data structure is discussed in Sects. 3 and 4.

3 NIDS Implementation Using Bloom Filter Bloom filter a space-efficient probabilistic data structure which is used to implement NIDS can quickly test for membership in a collection. It can easily determine whether a given input is not in a set or is likely to be in the set [14]. Due to their ability to swiftly filter out innocuous traffic from further analysis and concentrate on potential threats, Bloom filters are useful for NIDS. NIDS implementation using Bloom filter uses a dataset which contains a variety of usernames. The dataset contains more than ten lakh usernames which is stored in a Comma Separated Values (CSV) file. The strings present in the dataset are evaluated using hash functions. The number of hash functions is obtained based on the size of the bit array, the bloom data structure and the number of elements present in the dataset. According to the hash value obtained for each username present in the dataset, the bits are set to 1 in the bit array whose size is determined by the number of elements and the false positive probability assigned. The algorithm takes another string as input from the user. The input string undergoes the same process and the corresponding bits are set to 1 based on the hash value obtained. After matching the bits of the input string with the bits of the usernames present in the dataset, it can be either classified as a false positive. The username

Intrusion Detection Using Bloom and XOR Filters

103

dataset is meticulously pre-processed such that all the unnecessary whitespaces are removed, the duplicates are eliminated and null characters are removed. The dataset contains 1,048,575 values i.e., rows and one column. All the rows contain a single username and none of the rows in the dataset contain null values. The proposed system implements NIDS using Bloom filter data structure based on the application of selecting the usernames as shown in Fig. 1. False positive probability value is set to 0.01 based on Fig. 2. Figure 2 shows the graph between false positive probability and the size of bit array required when the number of elements is fixed as 10 Lakhs. From Fig. 2, it can be observed that when the false positive probability is less than 0.01, size of bit array is very large and for values more than 0.01 the number of false positive is increasing. Therefore, the false positive probability is fixed as 0.01 for the proposed algorithm. Consider n is the number of usernames in the dataset, k is the number of hash functions, m is the size of bit array, and p is the false positive probability. The number of hash functions and the size of the bit array are computed using Eqs. (1) and (2). Accuracy is computed using Eq. (3). m=

−(n ∗ lnlnp) (lnln2)2

(1)

k=

m ∗ (lnln2) n

(2)





1 p = 1− 1− m

kn k (3)

After the evaluation of k and m, the dataset is passed through the murmur version three which is available as a module in Python. The seed value is the number of hash functions i.e., k. The bit array is filled according to the hash value obtained for each value in the dataset. The user gives an input username which is passed through the same murmur hash function and the bits obtained for the input are compared with those stored in the bit array for all the dataset elements. If the bits at that particular index value match in an exact fashion, the output displayed will be present which represents that a username has matched. When not all bits are matching, there is a possibility to obtain false positive as a result, which means that the username is not actually present in the dataset but it shows that the username already exists. In case if all the bits do not match the bits present in the bit array, the output shows not found, which allows the user to go ahead with the username.

104

R. Manimegalai et al.

Fig. 1 Steps involved in implementing NIDS using bloom filter

4 NIDS Implementation Using XOR Filter A data structure that effectively checks for set membership is the XOR filter. XOR filters, in contrast to conventional data structures, use a bitwise XOR operation to save memory space while delivering quick query speed. The effectiveness and accuracy of intrusion detection systems are increased by using XOR filters. XOR filters provide

Intrusion Detection Using Bloom and XOR Filters

105

Fig. 2 False positive probability versus size of bit array for n = 10 L

a quicker and more effective solution for NIDS by using a bitwise XOR operation to quickly evaluate incoming data against a pre-built list of malicious traffic signatures. The data required to build the proposed system is a set of string values that are provided by the Snort dataset. It is made up of incoherent strings that resemble patterns that might be discovered through network packet evaluation. The dataset containing 4055 values is fed as the input with another string value as the input from the user to implement the most efficient data structure in content analysis. Implementation of XOR filter revolves around Eq. (4). In Eq. (4), h(x) is the hash value of the element x, h1(x), h2(x), and h3(x) are three independent hash functions that generate random values for each element x. ⊕ represents the XOR operation and mod m ensures that the hash value falls within the range of the filter size, which is typically a power of two. h(x) = (h1(x) ⊕ h2(x) ⊕ h3(x)) mod m

(4)

B-array construction in the implementation of XOR filter is an essential part, which is mainly used to evaluate the unique fingerprint values for each string present in the dataset. The size of the B-array is fixed to be equal to 1.23 * size of the dataset + 32.       b[unique_arr [i]] = b dhash[i][0] ∧ b dhash[i][1] ∧ b dhash[i][2] ∧ f inger print[dataset[i]]

(5)

Equation (5) is used to find values that are to be filled in the index position of the string’s hash value from the dataset. Fingerprint is the data structure whose initial values are set to 0’s for all the string indices. Unique value acts as the index to store the fingerprints in the B-array for any string. Once the fingerprint value is obtained, the user is prompted for a string as an input. The input string undergoes the same processing using the three hash functions and the fingerprint value of the input is obtained. Comparison takes place between the input and all the dataset fingerprint strings. If the fingerprints are matched, the intruder’s presence is confirmed, else if

106

R. Manimegalai et al.

Fig. 3 Steps involved in XOR filter implementation

the fingerprint values do not match then there is no intruder in the network. Steps involved in the implementation of XOR filter are illustrated in Fig. 3.

5 Experimental Results and Discussions Network-based Intrusion Detection System has been implemented using the content analysis technique using Bloom filters. Bloom filter uses the bit array and varying number of hash functions to describe a collection of objects. The proposed system implements an algorithm which checks if a particular username is present in the

Intrusion Detection Using Bloom and XOR Filters

107

Fig. 4 Bloom filter execution

given dataset or if it is a false positive. Bloom filter implementation snapshots are shown in Figs. 4 and 5. Since the number of elements is more than 1,000,000, the size of the bit array is also large as presented in Table 1. In Fig. 5, the username cannot be allocated since few values at the beginning indicate that they may likely to be present because the value is not missing and is already present in the dataset. The values that are marked as false positives are not actually in the dataset, but the user still cannot use them as usernames. The space complexity of a Bloom filter can be calculated as follows: Let n be the number of elements in the input set, p be the desired false positive rate, m is the size of the bit array and k is the number of hash functions used. The space complexity of a Bloom filter is given by the Eqs. (6) and (7). A larger bit array and more hash functions result in a lower false positive rate, but also increase the space complexity. Fig. 5 Snapshot of checking username for intrusion

Table 1 Number of hash functions and size of bit array for n = 1,000,000 False positive probability 0.1

Number of hash functions 3

Size of bit array (MB) 4.5705

0.01

6

9.1410

0.001

9

13.7115

0.0001

13

18.2820

0.000001

16

22.8525

108

R. Manimegalai et al.

 1 m∗ bits p   −n ∗ m ln ln 2 ∗ bits k

(6) (7)

The time complexity of a Bloom filter depends on the number of hash functions used and the size of the bit array. The time complexity of adding an element to the Bloom filter is O(k), where k is the number of hash functions used. The time complexity of testing whether an element is in the set is also O(k), to check all the bits in the bit array that correspond to the hash values of the query. Therefore, the time complexity of adding an element to a Bloom filter is constant with respect to the size of the input set, while the time complexity of testing for membership is also constant with respect to the size of the input set, but is proportional to the number of hash functions used. The proposed NIDS implementation using Bloom filters classifies the network users. It may not give the desired results when the number of users in a network is more because Bloom filters use k hash functions and the value of k is totally dependent on the size of input data set. Relationship between accuracy of results and, the size of k is inversely proportional, because it is very difficult to find unique hash codes every time from the set of (k * size of input dataset) hash codes. Even after reducing the value of k, the results are not accurate due to which there is a need to move to an advanced algorithm which uses the value of k as three. Lookups in the Bloom filter are slower than in XOR filters. The Snort dataset is given as the input, which contains around 4055 string values. XOR filter uses exactly three hash functions to get the hash digest for the strings present in the dataset. The digest values produced are evaluated based on the frequency and tested if the value is already visited or not. Based on the result, the B-array construction is done efficiently and the user input string is evaluated. The fingerprint value produced is compared with the dataset fingerprint values. The frequency of these digest values is determined based on the variables in order to build the B-array and determine the fingerprint value. The accuracy of XOR filter is 94% which is obtained by evaluating the number of right and wrong answers. Figure 6 illustrates the success probability of the mapping step for different experimental sizes of B-array, which is partitioned into 180 unique sets. The purpose of this experiment was to determine the optimal size of the B-array for XOR filters by experimentally calculating the probability of mapping the unique hash value generated into the B-array for three different B-array sizes. In the first scenario, where the size of the B-array is |S|+16, the cumulative experimental analysis yielded a probability measure of 0.77. This result indicates that choosing this size would result in less memory usage but poor mapping. In the second scenario, where the B-array size is |S|+32, the cumulative experimental analysis resulted in a probability measure of 0.93. This outcome implies that selecting this size would result in moderate memory usage and good mapping. In the third scenario, where the B-array size is |S|+64, the cumulative experimental analysis resulted in a probability measure of 0.96. This outcome suggests that choosing this size would result in high memory usage but

Intrusion Detection Using Bloom and XOR Filters

109

Fig. 6 Probability of mapping step versus generated sets for XOR filter

good mapping. Considering the memory usage and mapping results, it is clear that the size of |S|+32 is optimal. This size provides a good balance between memory usage and mapping performance. However, it is essential to note that the optimal size may vary depending on the specific use case and dataset. Therefore, it is important to conduct similar experiments to determine the optimal B-array size for different scenarios. Space complexity of the NIDS implementation using XOR filter is influenced by the quantity of hash functions it employs and the desired false positive rate. When compared to Bloom filter or hash table, an XOR filter typically requires less space. An XOR filter has an O(m) space complexity, where m is the set’s elemental count. Time complexity of XOR filter for a lookup operation is O(k) where k is the number of hash functions. When there are m items in the set and k hash functions are employed, the time complexity of creating an XOR filter is O(mk). XOR filter guarantees faster cache performance and saves 15% memory usage.

6 Conclusions and Future Work Intrusion Detection Systems are crucial for maintaining network security by quickly detecting and responding to security threats. Network-based Intrusion Detection Systems use Bloom filters for content analysis, which employ a bit array and a varying number of hash functions to check for the presence of usernames in datasets and avoid false positives. However, Bloom filters have limitations when dealing with large number of users. As a result, XOR filters have been developed to replace Bloom filters due to their faster lookup times and fixed value of hash functions,

110

R. Manimegalai et al.

making them a promising alternative for NIDS. XOR filters are probabilistic data structures that outperform Bloom filters in space usage and query performance. A NIDS implementation using XOR filters can detect intruders by filtering through a dataset of strings with an accuracy of 94%. To construct the B-array efficiently, the XOR filter employs three hash functions to produce hash digests that are evaluated based on frequency. The size of the B-array is critical to the success probability of the mapping step, and its optimal size is experimentally determined. The XOR filter guarantees faster cache performance, saves 15% memory usage, and is independent of false positives, with a space complexity of O(m) and a lookup time complexity of O(k). Future work on Bloom filters may focus on reducing false positives or improving the scalability of the filter to handle large datasets. Additionally, future work may focus on optimizing the filter for specific types of attacks and improving hash functions to minimize collisions. Machine learning can also be used by NIDS to detect more complex and sophisticated attacks. Reducing false positives, improving scalability, and developing dynamic filters that can adapt to changing network traffic patterns are some of the interesting open problems. Integrating machine learning techniques can further enhance the effectiveness of NIDS, ultimately leading to better network security and more robust protection against cyber threats.

References 1. Tanenbaum AS (2003) In: Computer networks. 5th edn. Pearson 2. Stallings W, Lawrie B (2014) In: Data and computer communications. 10th edn. Pearson 3. Stallings W (2017) In: Cryptography and network security: principles and practice. 7th edn. Pearson 4. Forouzan BA (2013) In: Data communications and networking. 5th edn. McGraw-Hill Education 5. Ezzaki F, Abghour N, Elomri A, Moussaid K, Rida M (2020) Bloom filter and its variants for the optimization of mapreduce’s algorithms: a review. In: Proceedings fifth international conference on cloud computing and artificial intelligence: technologies and applications, pp 1–7 6. Kumar S, Gupta S, Arora S (2021) Research trends in network-based intrusion detection systems: a review. IEEE Access 9:157761–157779 7. Raghunath BR, Mahadeo SN (2008) Network intrusion detection system. In: Proceedings of first international conference on emerging trends in engineering and technology, pp 1272–1277 8. Ely P, Frid A, Morozov A, Rybalchenko A, Wagner KW (2009) An optimal bloom filter replacement based on matrix solving. In: computer science—theory and applications, Springer, Berlin, Heidelberg, pp 263–273 9. Dharmapurikar S, Lockwood JW (2006) Fast and scalable pattern matching for network intrusion detection systems. IEEE J Sel Areas Commun 24(10):1781–1792 10. Song H, Dharmapurikar S, Turner J, Lockwood J (2005) Fast Hash table lookup using extended bloom filter. In: Proceedings of ACM SIGCOMM, pp 181–192 11. Weaver SA, Roberts HJ, Smith MJ (2018) XOR satisfiability testing. In: Theory and applications of satisfiability testing. pp 401–418, Springer 12. Thomas M, Lemire D (2020) XOR filters. Fast and smaller than bloom and cuckoo. In ACM J Experim Algorithms 25:1–16

Intrusion Detection Using Bloom and XOR Filters

111

13. Liu H, Lang B (2019) Machine learning and deep learning methods for intrusion detection systems: a survey. Appl Sci 9:4396 14. Fan B, Andersen DG, Kaminsky M, Mitzenmacher MD (2014) Cuckoo filter: practically better than bloom. In: Proceedings of the 10th ACM international on conference on emerging networking experiments and technologies, pp 75–88 15. Camacho D (2015) Bio-inspired clustering: basic features and future trends in the era of big data. In: Proceedings of IEEE 2nd international conference on cyber networks, pp 1–6 16. Kambattan R, Manimegalai R (2016) An effective intrusion detection system using CRF based cuttlefish feature selection algorithm and MSVM. In Asian J Inform Technol 15(5):891–895 17. Zuech R, Khoshgoftaar T, Wald R (2015) Intrusion detection and big heterogeneous data: a survey. In J Big Data 2(3):1–41 18. Li D, Chen P (2013) Optimized hash lookup for bloom filter based packet routing. In: Proceedings of 16th international conference on network-based information systems, pp 31–37 19. Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426 20. Mitzenmacher M (2002) Compressed bloom filters. IEEE/ACM Trans Netw 10:604–612

A Model for Privacy-Preservation of User Query in Location-Based Services V. Sravani, O. Krishnaveni, and Anila Macharla

1 Introduction Location-based services (LBS) have become more prevalent as a result of the expansion of mobile networks in our daily lives. Billions of people use LBS all over the world for a variety of purposes, including finding points of interest, monitoring delivery, and even keeping track of loved ones. Because of its accessibility and simplicity, LBS has grown to be a very popular service with customers. The advantages of LBS must be balanced out, nevertheless, by a few challenges. One of the key issues is the issue of security. Users of LBS have access to vital location information that could reveal a lot of personal information about the user. If a user often uses LBS to query hospitals, their current state of health may be made known. Similar to this, if a person frequently searches for eateries in the same area, their residence can be determined. These kinds of very specific data can be utilised by criminals for a wide range of illegal actions, such as identity theft, stalking, and targeted advertising. To preserve the privacy of users, it is crucial to develop a system that can prevent the dissemination of such information. Another challenge faced by LBS providers is the considerable computational load. LBS providers must carry out significant computational tasks to support a sizable user base. To get around this problem, LBS providers commonly outsource their service-related data (LBS data) to cloud servers with powerful computational capabilities. The cloud server frees up processing resources for the LBS provider by handling user requests in this manner. Because it is the foundation of the business, LBS providers are hesitant to share LBS data with cloud servers. As a result, LBS data is never exported without being first encrypted and sent to a cloud server. In order to protect user privacy, the query from an LBS user should also be encrypted so that the location information is hidden V. Sravani · O. Krishnaveni · A. Macharla (B) Department of CSE, Chaitanya Bharathi Institute of Technology (A), Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_9

113

114

V. Sravani et al.

from the cloud server. Several privacy-preserving techniques have recently been developed to solve the security and computational challenges of LBS. One popular tactic is to use cryptographic techniques like differential privacy, safe multi-party computation, and homomorphic encryption. In order to protect user privacy, these methods enable LBS providers to process encrypted data without disclosing it to any other parties. Another way is to use techniques for location obfuscation including location cloaking, fake queries, and perturbation-based methods. Despite how important LBS services have become to our daily lives, there are still certain drawbacks, including security and computational load. These methods alter the location information of queries so that accurate service delivery is still possible but it is more challenging for other parties to guess the user’s precise location. Numerous privacypreserving strategies have been developed that employ location obfuscation and encryption to get around these issues. With the help of these programmes, LBS vendors may provide consumers with precise, efficient services without endangering their privacy. As LBS services advance, it is crucial to prioritise user privacy and make sure that these services are developed with privacy in mind. LBS providers must also handle the challenging computational processes necessary to sustain a sizable user base. In order to serve a high number of clients, LBS providers must complete challenging computing jobs. Sending LBS data to a cloud server (CS) with lots of processing power is one popular tactic. The LBS provider would frequently outsource service-related data (LBS data) to a cloud server (CS), which has powerful computational power, as a solution to the issue. As a result, CS will respond to user inquiries in place of the LBS provider. The LBS supplier is reluctant to give it to CS because the LBS data is the main source of the service. Thus, LBS data will never be transferred to CS without first being encrypted. The query from an LBS user should be safeguarded such that the location information of the inquiry is unclear to CS in order to protect user’s privacy.

2 Related Work and Literature Survey Location-based services (LBS) inquiries protect privacy. Spatial range search, which allows users to look for data inside a particular geographic area, is one of the frequently used LBS searches [1]. In a brief discussion of earlier research on the subject, Zhu et al.’s [2] proposal for an effective and privacy-preserving LBS query technique employing homomorphic encryption is notably mentioned. This method enables distance estimations while protecting user privacy by enabling circular range queries over encrypted LBS data. Time-Aware Boolean Spatial Keyword Queries (TABSKQ), which consider both the spatial and temporal aspects of data to produce more precise query results, are a novel query type introduced by Gang Chen et al. [3]. By generating fictitious query sequences, Wu et al. [4] suggested a technique for protecting location and query privacy in LBS. It demonstrates that in terms of privacy protection and query accuracy, the strategy performs better than existing privacy preservation techniques. However, the technique needs more storage and computing

A Model for Privacy-Preservation of User Query in Location-Based …

115

power. The paper cites Dargahi et al.’s [5] use of the k-anonymity approach to safeguard consumer’s privacy in location-based services (LBS). By ensuring that LBS providers cannot tell target users apart from at least k other users, this approach guards against the compromise of the user’s location. In other words, the approach makes it harder for the LBS provider to identify the user based just on their position by adding a layer of anonymity to the user’s location data. This aids in safeguarding user privacy and preventing unauthorised use of or tracking of their location. An effective and trustworthy location privacy-preserving method is proposed by Jin Luo et al. [6] utilising a cryptographic approach, which guarantees the confidentiality, integrity, and anonymity of user’s identities for LBS providers and fog servers. The scheme’s resilience to internal, external, and coordinated attacks is further shown in the paper through a security analysis. Santhosh Kumar et al.’s [7] study on K-Anonymity Techniques and their efficacy in preserving data privacy, which involves anonymizing data so that the individual’s identity is kept secret, demonstrates their capacity to thwart re-identification attacks, and provides a thorough analysis of the technique’s efficacy. Users can perform encrypted location-based service (LBS) data searches using a predetermined search region and a Boolean keyword expression to establish their search preferences using P3 GQ [8], a unique framework created by Zeng et al. This approach builds a secure circular range query using the K-nearest neighbours (KNN) algorithm and offers privacy-preserving Boolean keyword searches using an encrypted inverted index strategy. Zhu et al.’s publication from [9] presents in-depth research on a privacy-preserving framework created for outsourced location-based services (LBS). Their suggested approach uses Key-Policy Attribute-Based Encryption (KP-ABE) to encrypt the LBS data and then performs a semantic keyword search on it directly. The authors use a Bloom filter to represent the location attribute before encrypting it using the function-hiding inner product encryption (FHIPE) method. The cloud infrastructure independently checks the number of matching bits for both the location and the attributes to confirm compatibility between the query vector and an index vector. FHIPE is also used to encrypt the query itself [4, 10–14, 15]. This architecture enables safe and efficient location-based searches while safeguarding user privacy because the data is encrypted and the cloud is unable to access any personal information or user location. Abaka et al. provide location data for users with k-anonymity and privacy preservation [6]. It performs better than the currently available alternatives in terms of computing overhead [8, 16, 17].

3 Methodology The suggested system aims to address privacy concerns by enabling users to look up POIs based on keywords without disclosing their location or the terms they are looking for. This is done by using a spatial keyword search strategy that protects user privacy. The Privacy-Preserving Indexing (LPPI) at the centre of the proposed system encrypts search queries and uses quad-tree-based indexing to quickly and precisely search the POI database.

116

V. Sravani et al.

It first introduces PrivSTL, which enables spatiotemporal keyword searches over encrypted data in outsourced LBS while respecting privacy, and then introduces PrivSTG, an upgraded technique based on an index tree to increase service quality over multiple ciphertexts. It improves the security and effectiveness of trapdoor generation and search algorithms to produce a more potent and effective privacy-preserving spatial keyword search approach that can be used in a number of LBS applications while protecting user privacy and upholding the rights of data owners. The LBS provider, users, and two cloud servers (A and B) make up the LBS system. Location-based data are compiled by the supplier, who also offers query services. Prior to outsourcing the LBS data content and user profiles to cloud servers, the supplier encrypts them using PrivSTL to protect data privacy and take advantage of cloud computing advantages. This strategy makes use of the advantages of cloud computing to secure location-based interactions and give users a secure experience. System Model The processing and storage resources for mobile users are provided by two cloud servers, A and B, in this model’s Fig. 1. Cloud server A retains encrypted indexes, whereas cloud server B holds a substantial amount of encrypted LBS data. Users can make requests to the cloud server by constructing a custom trapdoor using a spatiotemporal keyword query and their retrieve key. The cloud servers in this paradigm do not collude, which means they do not work together to access or trade data, and this is a key point to make. This security measure has been put in place to stop data from being leaked or accessed by unauthorised people.

Fig. 1 System model

A Model for Privacy-Preservation of User Query in Location-Based …

117

Fig. 2 Proposed model for privacy-preserving location-based system

System Design The proposed model for privacy-preserving spatio temporal keyword search framework: can be seen in Fig. 2. Data preparation: The location-based services (LBS) data is encrypted using LPPI, trapdoor generation, and secure search algorithms to improve the security and efficiency of the privacy-preserving spatial keyword search technique. The spatiotemporal data should be divided into more manageable portions. Each subset should be encrypted using a secure encryption method. Send the encrypted LBS data that is being kept on a cloud server. User inquiry A user can send an LBS enquiry. b. The query also includes an access policy to ensure that the search is authorised. Query processing: The cloud server searches the encrypted LBS data for cypher messages that match the search criteria. The cypher texts are decrypted and matched against the access rules for a precise and authorised search. The search query and its results are secured to maintain query privacy and spatiotemporal keyword profile privacy. The user creates an LBS query with a spatial range, time period, Boolean keyword expression, and an access policy for authorised search, as shown in Fig. 3. This is the first stage in the suggested methodology’s flow of actions. The cloud server gets the query and access policy and then looks for ciphertexts that fit the query requirements in the encrypted LBS data. The necessary ciphertexts are then decrypted by the cloud server, and they are compared against the access policy for a precise and authorised search. By using Geohash to divide the locations into grids, searching just over the ciphertexts in the grids around the mobile user, and encrypting the query and search results to preserve anonymity, the cloud server optimises the search. The user receives the encrypted query and search results, decrypts them using their personal encryption key, and gains access to the relevant information without jeopardising their privacy.

118

V. Sravani et al.

Fig. 3 Proposed methodology

The activity diagram emphasises the value of privacy protection across the full LBS query submission and search procedure in its entirety.

A Model for Privacy-Preservation of User Query in Location-Based …

119

4 Results Search optimisation: The grids of locations in the LBS data are created using Geohash. A produced encrypted index tree is delivered to the cloud server. The cloud server only scans the ciphertexts in the grids around the mobile user in order to improve service efficacy. Security analysis: PrivSTL is tested for resistance to chosen plaintext, chosen keyword, and outside keyword-guessing attacks in a broad bilinear group model. The analysis proves PrivSTL guarantees the confidentiality of both queries and spatiotemporal keyword profiles (Figs. 4, 5 and 6).

Fig. 4 Screenshot of user dashboard interface

Fig. 5 User search query page screenshot

120

V. Sravani et al.

Fig. 6 Screenshot of admin dashboard interface

Displaying Results in the Web Application The development of a Web application is covered in this paper. User and Admin are the two modules that make up the web application. A user must first register and then log in to utilise the application. Once logged in, as shown in Fig. 4, the user can check their search history and search for places as shown in Fig. 5. The search history of the user is kept in encrypted form to protect the privacy of personal information, as shown in Fig. 8. As depicted in Fig. 7, the Admin module enables the administrator to add new location data sets and categories to the server. All user’s search histories are accessible to the admin as well, although this information is encrypted and inaccessible to the admin. As illustrated in Fig. 9, the encryption of user data guarantees that neither the administrator nor any other third party will be able to read the user’s search queries or other data, giving them a high level of privacy and security. The application makes sure that the user’s search history and other data are hidden from unauthorised third parties by storing the encrypted data on the server. The admin module gives the administrator options for customization and flexibility. Performance Evaluation: The application successfully meets the objective of providing a secure, privacy-preserving solution for users searching for location-based services. The encryption algorithm used in our project (Spatio-Temporal) effectively encrypts the user’s search query and protects their data from unauthorized access. The encrypted data is stored on the server, ensuring that the user’s search history and other data are not visible to unauthorised third parties shown in Figs. 8 and 9. The location-based data of the user is very private thanks to this encryption, making it difficult for unauthorised parties to access or interpret the data. A variety of search queries and datasets are used to test the web application, and it is discovered that it consistently performs effectively. The application is fast, responsive, and achieves accurate search results.

A Model for Privacy-Preservation of User Query in Location-Based …

Fig. 7 Admin page for adding location categories and data

Fig. 8 User view of search history

Fig. 9 Admin view of users search history

121

122

V. Sravani et al.

Fig. 10 Search result display

Results Display After the user performs a search (Fig. 4), the application will display a list of location results. Each result will include an image of the location, along with its address. This provides the user with the necessary information to decide which location to select without revealing too much detail (Fig. 10). For example, if a user searches for “cbit or post office”, the application will be displayed to the user in the form of a location image address, providing the necessary information without revealing too much detail as mentioned before. EPQ and EPLQ These schemes provide data confidentiality and location privacy preservation, but do not support fine-grained authorised search or user query protection at either the user or admin side. FINE On the user or admin side, this plan enables user query protection; however, it does not offer data confidentiality, spatial range search, fine-grained authorised search, or location privacy preservation. KP-ABTKS This system provides data confidentiality and uses KP-ABE for access control, however, it is incompatible with spatial range search, fine-grained authorised search, or user query protection on the admin or user sides. PrivSTL On the user or admin side, this plan enables user query protection; however, it does not offer data confidentiality, spatial range search, fine-grained authorised search, or location privacy preservation. PrivSTL with user query encryption This solution delivers all of the benefits of PrivSTL while also offering user query protection on both the user and admin sides using ciphertext-policy attribute-based encryption (CP-ABE). With capabilities including data confidentiality, spatial range search, fine-grained authorised

A Model for Privacy-Preservation of User Query in Location-Based …

123

Table 1 Comparison with existing schemes Data confidentiality

Spatial range search

Fine-grained

LPP

EPQ

Yes

EPLQ

Yes

FINE

Yes

Mechanism of UQP

Circular

No

Yes

No

Circular

No

Yes

No

Rectangular

CP-ABE

Yes

No

PrivSTL

Yes

Circular

CP-ABE

Yes

No

PrivSTL UQ

Yes

Circular

CP-ABE

Yes

Yes

search, location privacy preservation, and user query protection on both the user and admin sides, this project delivers the most comprehensive collection of tools for safeguarding the privacy of geographic data. Each of the current privacy plans for spatial data has pros and cons of its own. The several elements of our suggested technique are contrasted in Table 1. The results are consistently accurate, and the application is quick and responsive. By expanding the number of users and including a map in search results, the suggested method’s user experience can be enhanced. Our initiative has the ability to provide consumers seeking location-based information with a helpful service while guaranteeing the security and privacy of their data.

5 Conclusion By using the described strategy, location-based services (LBS) can be provided while maintaining user privacy. To do this, this offers PrivSTL and two spatiotemporal keyword search frameworks. Authorised mobile users can utilise these frameworks to do location-based queries without risking their privacy. The key-homomorphic PRF, collaborative algorithm, and k-anonymity algorithm are used to encrypt every position, ensuring user privacy and allowing the majority of computation to be sent to a semi-trusted cloud server. The schemes also encrypt the location attribute using Bloom filters and function-hiding interior product encryption (FHIPE) methods to prevent the cloud server from obtaining the data. PrivSTG uses spatial indexing using Geohash when working with many encrypted data sets to improve search performance. We have carefully considered the viability and security of our proposed ideas, but we still think there is potential for improvement. Through this project, a web application was developed that allows users to view their search history and look up locations while still protecting their personal data. The web application’s admin module enables the administrator to add new location data sets and categories to the server while maintaining the privacy and security of user data.

124

V. Sravani et al.

Ultimately, this project provides an efficient, feature-rich, and privacy-preserving search engine for outsourced LBS, saving client’s computing costs and preventing critical abuse. Our tactics and website application demonstrate the potential to provide LBS services that protect.

6 Future Work The Privacy-Preserving Location-Based Services Web Application is a robust and useful tool right now, but there are a few things we want to do better in the future. These consist of: Adding a map display The search results are now presented as a geographical image address. In the future, it may be possible to implement models that let users view location results on a map, such as by adding a map display. Expanding the user base Only a few people are currently using the application. By advertising the application and incorporating user feedback to enhance its usefulness, the user base of the programme can be increased. Integrating additional datasets The programme currently only makes use of a few location datasets. Future dataset integration will give consumers access to a wider variety of search choices. Enhancing the encryption techniques The security of user data can be further improved by investigating additional encryption techniques, even though the current encryption methods are effective. Improving the user interface The current user interface is functional, however, it should be improved more user-friendly and straightforward. The Privacy-Preserving of Location-Based Services Web Application will become even more useful as a tool for users who want to search for location-based services while maintaining the security of their personal information thanks to the aforementioned future improvements that will enhance its functionality and usability.

References 1. Wang CX, Song Y, Tay WP (2021) Arbitrarily strong utility-privacy tradeoff in multi-agent systems 16 2. Zhu H, Lu R, Huang C, Chen L, Li H (2016) An efficient privacy-preserving location-based services query scheme in the outsourced cloud. IEEE Trans Veh Technol 65(9):7729–7739 3. Chen G, Zhao J, Gao Y, Chen L, Che R (2016) Time-aware boolean spatial keyword queries 4. Wu Z, Li G, Shen S, Lian X, Chen E, Xu G-D (2020) Constructing dummy query sequences to protect location privacy and query privacy in location-based services, part of Springer Nature 2020

A Model for Privacy-Preservation of User Query in Location-Based …

125

5. Dargahi T, Ambrosin M, Conti M, Asokan N (2016) Abaka: a novel attribute- based k-anonymous collaborative solution for less. Comput Commun 85:1–13 6. Pu Y, Luo J, Wang Y, Hu C, Huo Y, Zhang J (2018) POSTER: privacy-preserving scheme for location based services using cryptographic approach 7. Santhosh Kumar B, Daniya T, Sathya N, Cristin R (2020) Investigation on privacy pre-serving using K-anonymity techniques, (ICCCI-2020), January 22–24, 2020 8. Zeng M, Zhang K, Chen J, Qian H (2018) P3GQ: a practical privacy-preserving generic location-based services query scheme. Pervasive Mobile Comput 51:56–72 9. Zhu X, Ayday E, Vitenberg R (2021) A privacy-preserving framework for out-sourcing locationbased services to the cloud. IEEE Trans Dependable Secure Comput 18(1):384–399 10. Zhang Y, Li M, Yang D, Tang J, Xue G, Xu J (2020) Tradeoff between location quality and privacy in crowdsensing: an optimization perspective. 7(4) 11. Feng T, Wong W-C, Sun S, Zhao Y, Zhang Z (2019) Location privacy preservation and locationbased service quality tradeoff framework based on differential privacy 12. Agarwal P, Kumar A, Yamaguchi RS (2019) Preserving user’s privacy for location-based services 13. Xu L, Jiang C, He N, Qian Y, Ren Y, Li J (2018) Check in or not? It a stochastic game for privacy preserving in point-of-interest recommendation system. 5(5) 14. Wang J, Yan D (2017) Achieving effective-anonymity for query privacy in location-based services 15. Dargahi T, Ambrosin M (2016) ABAKA: a novel attribute-based k-anonymous collaborative solution for LBSs 2016 16. Ameri MH, Delavar M, Mohajeri J, Salmasi Zadeh M (2018) A key-policy attribute-based temporary keyword search scheme for secure cloud storage 17. Sen AAA, Alnsour A, Aljwair SA, Aljwair SS, Alnafisah HI, Altamimi BA (2021) Fog mixzone approach for preserving privacy in IoT. In: Proceedings IEEE International Conference Computing Sustainable Global Development (INDIACom), March 2021, pp 405–408

A GPS Based Bus Tracking and Unreserved Ticketing System Using QR Based Verification and Validation M. Suresh Kumar, S. Niranjan Kumar, Murari Reddy Sudarsan, P. Gunabalan, and R. Varsha

1 Introduction According to [1] in India, Bus rapid transit systems (BRTS) exist in several cities. Buses take up over 90% of public transport in Indian cities and serve as an important mode of transport. Services are mostly run by state government owned transport corporations. In today’s fast-paced world, everyone is hurrying to reach their destinations. Looking forward to the buses and waiting for them is not a reliable alternative in this situation. In both rural and urban India, buses are one of the most preferable forms of public transportation after auto rickshaws according to the information on transportation spending. The current method of public bus transportation only offers manual ticketing options, which entails buying the ticket directly from the conductor, to accommodate the large number of passengers to be used every day. Bus transit has not been able to keep up with the rising demand for travel. Bus services have the drawback of being unreliable. The public bus transportation industry has several issues, such as excessive paper waste and the use of cash to pay for tickets. Some issues that commuters frequently encounter with bus transportation include excessive waiting time, insufficient time to purchase tickets, balance not being refunded, failing to provide other travellers a place, etc. The main problem for those who utilize public transportation, like buses, is figuring out where the bus is in real time and how long it will take to reach the destination. Hence, a tool that gives comprehensive information, such as the number of buses that run between two stops, their itineraries, and maps that direct users, passengers, and most crucially, tracking the real-time location coordinates of the bus and to show the time it will take the bus to arrive at different bus stops.

M. Suresh Kumar (B) · S. Niranjan Kumar · M. R. Sudarsan · P. Gunabalan · R. Varsha Sri SaiRam Engineering College, West Tambaram, Chennai—44, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_10

127

128

M. Suresh Kumar et al.

2 Literature Survey This section discusses about the existing research works related to U-Bus. In [2], the authors propose RFID based smart cards to digitize the travelling tickets by replacing conventional paper ticketing systems. Scanning the QR once is enough. In [3], ticketing is done through rechargeable Radio Frequency Identification Card which is paperless and cardless. Everyone by default will be carrying a Smart phone which is the only necessity. Tracking the location of the bus, ticket booking system, Pre-booking system for bus seats in city buses and tracking the current location of the buses, U-Bus app focusing on unreserved tickets [4]. The crowd density and its management in the bus transport system are clearly described in [5]. The system developed in [6] aims to provide better bus management capable of bus tracking, route scheduling, and E-ticketing via IoT by using both hardware and software. The paper [7] details the RFID based ticketing and passenger identification in the public transport using IoT and it should be physically swiped in the bus while boarding and authorizing the passenger travelling. The research here discusses about the ticketing system using RFID card via IoT where it should be booked before through mobile application and then the RFID card should be physically swiped while boarding the bus where it verifies and authorizes the ticket as illustrated in [8]. Existing research studies detail about the cashless payment instead of paper ticketing system [9–12].

3 Proposed System The proposed system helps in digitalizing the whole unreserved ticketing system for bus transport in metropolitan cities. The passenger should register in the application that can be used to book unreserved e-tickets by entering the source. It is the nearby source place that appears and destination place where they want to reach. Then they need to make the payment through digital payment services that is razor pay. After payment, a QR code will be generated which contains the details of the journey which will also have some time lapse for the ticket to expire before boarding the bus. The passenger can check the nearby bus terminals for boarding the bus. Later after boarding the bus, the conductor will scan the QR code which displays the details of the journey and verifies the ticket which is quick and easy through which the crowd count is added, through this the other passengers who are about to board the bus could check the number of people in the buses and plan their journey according to it. With the help of this crowd management could be done easily. Here we have different categories of buses and passengers by which the ticket cost and count are also easily calculated. The ticket generated will automatically get expire after the passenger reaches destination and is tracked with the help of GPS (Global Positioning System). The accurate availability and tracking of the buses in the route where the passenger has opted will be easily displayed along with the number of seats and

A GPS Based Bus Tracking and Unreserved Ticketing System Using …

129

Fig. 1 System architecture

vacancies. Through this, an average of 25 lakh people will be getting benefitted as mentioned [1].

4 System Architecture Figure 1 shows the system architecture.

5 System Functionality 5.1 QR Code A QR code, often known as a matrix barcode or two-dimensional barcode, quick response code. A barcode is an optical label that can be read by a computer and contains data about the object to which it is attached. In reality, QR codes frequently contain information for a tracker, location, or identifier that directs users to a website or application. To store information efficiently, QR codes use four specified encoding modes: numeric, alphanumeric, byte/binary, and kanji; extensions may optionally be used.

130

M. Suresh Kumar et al.

5.2 Crowd Management The crowd management system helps in managing the crowd in the bus by using Global Positioning System (GPS). The GPS which is placed in the conductor’s mobile verifies the current location and destination of the passenger and the ticket generated will get expire if the GPS location and destination of the passenger are the same. So that the upcoming passengers who are waiting for the buses in upcoming bus terminals could check and know about the crowd in the bus.

5.3 GPS Location The Global Positioning System (GPS) is a radio navigation system based on satellites. An unobstructed line of sight to four or more GPS satellites is a requirement for a GPS receiver that is part of the global navigation satellite system (GNSS). GPS positioning information does not require the user to transmit any data, and it can be used without the use of telephone and internet connection however the use of these technologies increases its functionality.

5.4 Automatic Ticket Expiry Automatic ticket expiry is a key feature in the proposed system for digitizing unreserved ticketing in bus transport. It ensures proper resource utilization and prevents ticket misuse. The QR code generated upon booking contains a time lapse for ticket expiry based on estimated travel time. The system automatically expires tickets that have exceeded the expiry time, leading to efficient utilization of resources and improved crowd management.

5.5 Passenger Application The passenger application is used by the passenger with which they can book unreserved tickets and track their upcoming bus to their bus terminal where they are waiting for the bus with the help of GPS location from the conductor who will be in the bus. Here the passenger can select the source and destination for their journey from the locations that are displayed.

A GPS Based Bus Tracking and Unreserved Ticketing System Using …

5.5.1

131

Bus Schedule

The unreserved ticketing in bus transport features an accurate bus schedule. Through GPS tracking, passengers can access real-time bus location and estimated time of arrival, enabling efficient travel planning. The system also displays the number of available seats and categorizes buses by route, capacity, and fare, leading to better resource utilization and enhanced passenger experience. Overall, the bus schedule feature is a key component of the proposed system for improved bus transport in metropolitan cities.

5.5.2

Bus Tracking System

Bus tracking is a key feature of the proposed digitized unreserved ticketing system for bus transport in metropolitan cities. Through GPS, passengers can access real-time bus location and availability of seats, allowing for efficient travel planning. This feature also enables efficient crowd management by providing data on passenger count in each bus. Overall, bus tracking enhances the passenger experience and improves the efficiency of the bus transport system.

5.5.3

Nearest Bus Terminals

The system proposed by us for digitizing unreserved bus ticketing in metropolitan cities also includes a feature for finding the nearest bus terminals. After registering in the application, passengers can easily search for nearby terminals based on their location. This helps them plan their travel route and make their way to the bus terminal in a timely manner. With this feature, passengers can avoid delays and manage their time better.

5.5.4

Ticket Payment

The proposed system for digitizing the unreserved ticketing system for Bus transport in metropolitan cities allows passengers to easily make payments through digital payment services such as Razorpay. After registering in the application and selecting the source and destination, the passenger can make the payment for the e-ticket which generates a QR code with journey details. The QR code has a time lapse for the ticket to expire before boarding the bus. This payment process is quick and easy, eliminating the need for physical currency and reducing the time spent waiting in line for ticket purchase. Additionally, the system calculates ticket cost and count based on different categories of buses and passengers. Through this process, passengers can conveniently make their ticket payments without any difficulty and inconvenience.

132

5.5.5

M. Suresh Kumar et al.

Ticket Booking System

For booking a ticket in our system, passengers need to first register in the application and enter their source and destination locations. They can then choose their preferred bus category and make the payment using the digital payment service, Razorpay. Once the payment is made, a QR code will be generated with the details of the journey and a time lapse for the ticket to expire. Passengers can then check the nearby bus terminals for boarding and present the QR code to the conductor after boarding the bus. The conductor will scan the QR code to verify the ticket, and the passenger can take their seat. With the help of the model under consideration., the booking process for unreserved bus transport is made easy and convenient for passengers, and they can plan their travel with ease while ensuring crowd management on the buses.

5.6 Conductor Application The conductor application plays a crucial role. After the passenger books an e-ticket through the passenger application, the conductor can verify the ticket using a QR code scanner on their application. The conductor application not only displays the details of the journey but also adds to the crowd count, enabling better crowd management. The conductor application also helps in tracking the number of passengers on the bus, which is particularly useful during peak hours. Additionally, the conductor can update the number of seats available on the bus, which is reflected in the passenger application. Overall, the conductor application streamlines the ticket verification process, enhances crowd management, and enables better tracking and monitoring of the bus’s occupancy.

5.6.1

Bus Availability

The metropolitan cities offer very accurate and real-time information about bus availability to the passengers. Passengers can easily check the availability of buses and view the number of seats and their vacancies on the route where they plan to travel. This helps passengers plan their journey accordingly and ensures a hassle-free experience. The system’s GPS integration allows for accurate tracking of buses, ensuring that passengers can track the bus’s location and estimated time of arrival. With this information, passengers can avoid unnecessary waiting and ensure that they reach their destination on time. The system also offers different categories of buses and passengers, making ticket cost and count calculations simple and easy. Overall, this system offers a convenient and efficient solution for managing bus availability in metropolitan cities.

A GPS Based Bus Tracking and Unreserved Ticketing System Using …

5.6.2

133

Crowd Management System

A highly effective approach to handling crowd management in public transport. The system enables passengers to book e-tickets using a mobile application and make digital payments through Razorpay. The QR code generated after payment displays journey details and has a time lapse before boarding the bus. After boarding, conductors can easily scan the QR code to verify the ticket and add to the crowd count. This count is displayed in real-time to passengers who can plan their travel schedules accordingly, and also to the authorities who can monitor the crowd levels and take necessary measures to manage them. The system’s GPS integration allows for accurate tracking of buses, the number of seats, and their vacancy. With this information, passengers can avoid overcrowding and ensure a comfortable and safe journey. Overall, the system offers an effective crowd management solution, making travel convenient and hassle-free. An average of 25 lakh people will be benefitted from it.

5.6.3

Ticket Verification System

Ticket verifying system utilizes QR codes generated after booking and payment. The conductor of the bus scans the QR code displayed on the passenger’s mobile device, which displays the details of the journey and verifies the ticket. This system enables quick and easy verification of tickets, adding to the crowd count and facilitating crowd management on buses. Other passengers who are about to board the bus can also check the number of people already on board and plan their travel accordingly, thanks to the availability of real-time crowd data. Additionally, the system automatically expires the ticket once the passenger reaches their destination, which is tracked through GPS, ensuring a smooth and hassle-free ticket verification process for both passengers and conductors.

6 Backend The backend database stores information about all the buses running across the city, including their schedules, routes, and stops as shown in Fig. 2.

6.1 Seat Availability The database tracks the availability of seats on each bus and updates this information in real time as bookings are made.

134

M. Suresh Kumar et al.

Fig. 2 Flowchart of the backend of the application

6.2 Passenger Information The database stores personal information about each passenger, including their name, age, gender, and contact details.

A GPS Based Bus Tracking and Unreserved Ticketing System Using …

135

6.3 Administrative Information The backend database also stores administrative information related to the management of the railway system, such as information about train maintenance, crew schedules, and track maintenance schedules.

7 Conclusion and Future Scope In conclusion, the proposed system for digitalizing unreserved ticketing system for bus transport in metropolitan cities has the potential to revolutionize public transportation. By allowing passengers to book e-tickets through a user-friendly app, make digital payments, and obtain a QR code for ticket verification, our system guarantees a smooth and convenient experience for passengers. The crowd management system incorporated in the app helps passengers plan their travel schedule based on the number of people on board, and the GPS tracking feature ensures that the ticket automatically expires once the passenger reaches their destination. The availability and tracking of buses, along with the calculation of ticket costs and counts for different categories of buses and passengers, make it a highly efficient system. Overall, our proposed system will benefit an average of 25 lakh people, who rely on public transportation in metropolitan cities and make their regular daily trips easier and more convenient. There are several potential future developments for our proposed digitalized unreserved ticketing system for bus transport. One possibility is to expand the crowd management system to include sensors in the bus which helps to count and update the real crowd in the bus. Another possibility is to incorporate real-time traffic and weather data into the system, which can help optimize bus routes and schedules and reduce passenger wait times. Acknowledgements We wish to thank Dr. M. Suresh Kumar for his guidance throughout the research.

References 1. Article of Transport in India, From wikipedia, the free encyclopedia by Wikimedia foundation, 2023 2. Bin Alam MJ, Zahra F, Khan MM (2021) Automatic bus ticketing system Bangladesh. In: 2021 12th international conference on computing communication and networking technologies (ICCCNT) 3. Khedekar T, Jamdar V, Waghmare S, Dhore ML (2021) Fid automatic bus ticketing system. In: 2021 international conference on artificial intelligence and machine vision (AIMY), 2021

136

M. Suresh Kumar et al.

4. Sharma K, Pandey R, Tarafdar S, Dubey S (2021) Towards smart mobility in cities—bus tracking and booking system. In: 2021 9th International conference on reliability, infocom technologies and optimization (Trends and Future Directions) (ICRITO), 2021 5. Deepthi VS, Venkat Chavan N, Shanbhag S, Dandappala SS (2022) Crowd density estimation and location prediction in public transport system. Int J Eng Res Technol (IJERT) 11(07) 6. Deore MK, Raj DB, Srinivasan N, Vandana P, Vignesh M (2021) Smart bus and bus stop management system using IoT technology. In: 2021 International conference on design innovations for 3Cs compute communicate control (ICDI3C), Bangalore, India 7. Kaushik, Jain N (2021) RFID based bus ticket generation system. In: 2021 International conference on technological advancements and innovations (ICTAI), Tashkent, Uzbekistan 8. Punarvit Y, Sawant K, Shankar KPKR, Kumar V (2021) Implementation of cashless bus ticketing system using RFID and IoT. In: 2021 International conference on advances in technology, management and education (ICATME), Bhopal, India 9. Girsa R, Srivastava K, Jain A, Biyani P (2021) Demo abstract: contactless E-ticketing in public transport buses. In: 2021 International conference on communication systems and networks (COMSNETS), Bangalore, India, 2021 10. Jimoh OD, Ajao LA, Adeleke OO, Kolo SS (2020) A vehicle tracking system using greedy forwarding algorithms for public transportation in urban arterial. vol 8. IEEE Access 11. Sree J, Mamatha T, Sreekanth B, Noor M (2021) Integrated college bus tracking system. Int J Scient Res Sci Technol 12. Gomathy CK (2022) RFID and GPS based bus tracking system. In: International conference on research advancement and changes in engineering sciences 2022

FileFox: A Blockchain-Based File Storage Using Ethereum and IPFS Kavya N. Naik, Arnica R. Patil, Kinnari N. Patil, and Shraddha S. More

1 Introduction Blockchain technology has become increasingly popular due to its ability to provide secure and decentralized systems. It is a decentralized, digital ledger that securely and irrevocably records transactions. Blockchain stores data based on the concept of blocks. Every data coming into a particular node of the blockchain network is copied and replicated to all the other nodes present in the network. This makes blockchain the distributed technology. Data is encrypted and hashed in each and every block of the blockchain network. If somebody tries to tamper the data on a node, then he/she must hack all the nodes present in the network, which is quite impossible because there can be infinite nodes. Due to this reason blockchain is considered to be one of the safest and a secure technology in terms of data storage. Blockchain is decentralized in nature that means there is no central authority to manage the transactions and records. Decentralization is the distribution of functions, control, and information instead of centralizing them in a single entity [1]. It provides transparency to the user as all the people connected in the network have details of the transactions and data present in network. People gets notified about the changes if there is any tampering. Interplanetary File System (IPFS) is a promising technology that allows decentralized data storage and content-address hypermedia protocol, enabling files to be distributed across a network of nodes in a peer-to-peer manner. The user don’t have to rely on centralized server for accessing and storing file. IPFS uses content-addressed storage, which means that files are stored based on their content, rather than their location or name. This makes it more resilient to censorship and improves data integrity. There are many data storage applications available in the market but the main issue using such applications is about the security of stored files. There are numerous K. N. Naik (B) · A. R. Patil · K. N. Patil · S. S. More St. John College of Engineering and Management, Palghar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_11

137

138

K. N. Naik et al.

files uploaded by users on a daily basis and its storage is done on the backend part in the traditional file storage application. The user is totally unaware about the security of his/her file as it is a centralized system. Although such applications provide a level of security to user’s data, there are chances of data breaches, hacks, and tampering. Even in organizations where there are numerous valuable data files, a personnel with malicious or venomous intent can tamper the data for their own or competitive benefit, which can even cause financial losses. Another big issue of storage applications is the size of files, many applications have a limit to file sizes, i.e., files exceeding the set limit can’t be stored or uploaded. Handling big files and storing them becomes a tedious task. There are various existing applications for file storage that provide security using different cryptography techniques like AES, DES, RSA, and hybrid cryptography techniques [2]. But these applications don’t provide transparency to the user, tamper-proof environment, and a level of security which blockchain provides. Whereas by using blockchain as a technology for file storage, commendable results can be yielded. As it is a tamper-proof technology, any data stored cannot be altered. If any alteration occurs, it can be easily detected as the data is stored in a distributed network where a copy of each data is stored at various nodes in the network and is cryptographically hashed. Therefore, even if a hacker tries to tamper with or steal any kind of data, they need to decrypt each and every copy of a file stored at different nodes, otherwise it will be evident from the hash value that the file was tampered. Ethereum blockchain, on the other hand, uses the Proofof-Stake consensus algorithm, which provides more security than the traditional Proof-of-Work consensus algorithm. As Proof-of-Stake requires less computational power, it is more difficult and expensive to hack such type of network. Moreover, it provides a higher degree of decentralization and participation, making data more secure and safe. Thus using blockchain in file storage can solve security issues of data more efficiently. With the advantages there are disadvantages of blockchain too, like immutability, even though immutability is the main key feature that blockchain works on but this can be a bane in a specific cases like if there is any mistake and you want to edit or modify the data, it is impossible in blockchain [3]. Even IPFS has certain disadvantages like it consumes a lot of energy and bandwidth as well as it has only a few built-in security mechanisms to protect data from vulnerabilities [4]. To solve such security and file storage related issues, FileFox, i.e., a blockchainbased file storage application is built. This application combines the two prominent technologies that are blockchain and IPFS, to build a decentralized environment where the files can be uploaded and viewed securely without any centralized body to control the data. It also provides transparency to the user as well as and all the data stored is in the control of the user, i.e., the user can actually understand whether the data or file which he/she has uploaded is secure or not. Additionally, IPFS solves major issue of file storage by allowing user to store and upload larger files. The purpose behind this project was to develop a blockchain-based file storage. The proposed system stores the file in a decentralized environment and helps the user to store their files with transparency, tamper-proofness, and security. The proposed system has various other technologies used in its technological stack like Solidity, React, Web3.js.

FileFox: A Blockchain-Based File Storage Using Ethereum and IPFS

139

2 Background 2.1 Blockchain Storage Blockchain is a distributed ledger technology that empowers multiple parties to store and share the data in secure way. Blockchain technology is a chain of blocks into which data is being stored. This data are broken down into pieces and encrypted before distribution which is referred to as “miners”. The duty of miners is to validate the transaction and then add them to blockchain, this ensure that the data is not tamper. However, there are also some limitations to blockchain storage, including scalability issues and higher energy consumption due to the computational power required for mining. As such, while blockchain storage has the potential to revolutionize data storage, it is still in the early stages of development and implementation. Figure 1, shows how the data is stored in blocks in blockchain network and depicts how each block is connected and replicated.

2.2 INFURA INFURA is a Web3 Backend provider that proffer a range of services and tools for developer [5]. Developers can quickly create decentralized applications (dApps) on the Ethereum blockchain with the help of a suite of tools and services provided by INFURA, a web3 infrastructure provider. Additionally, INFURA provides users with additional tools like Interplanetary File System (IPFS) and Filecoin that enable decentralized data storage and distribution. In the proposed system, INFURA is used for storage purpose.

Fig. 1 Data storage in blockchain

140

K. N. Naik et al.

2.3 Truffle Truffle is a development framework used to build and implement smart contract on Ethereum. Truffle is considered as a popular with over 1.5 million lifetime downloads [6]. Utilizing popular programming languages like Solidity, developers can create smart contracts with Truffle. They can then use the tools provided to compile, migrate, and test the contracts on a local or remote blockchain network. It provides easy to use graphical interface or command line interface for deploying smart contracts.

2.4 MetaMask In order to engage with Ethereum blockchain applications (dApps), users can use the popular web3 wallet MetaMask. It can be added as a browser extension to widely used web browsers like Chrome, Firefox, and Brave. ConsenSys Software Inc. invented the MetaMask [7]. Users can access a variety of dApps, such as lending platforms, gaming applications, and decentralized exchanges (DEXs) by connecting to the Ethereum network using MetaMask. Additionally, it enables easy management of numerous Ethereum accounts and switching between them.

3 Related Work Lazlo Sari et al. [8] developed a blockchain-based file sharing application. This application was designed for sharing files in closed or private group members. Ethereum, the cryptocurrency, was used to transact in the blockchain environment. The members in the group were able to control the access of files uploaded as well as due to IPFS the system also allowed the users to store large data files. The proposed system also allowed users to do some group operations like inviting a new member in the group, group key management, etc. The application allowed the group members to share a file as well as to control its access using grant and revoke commands. This system provided better privacy, security, and transparency. Transaction and storage costs generated by the system were nominal and the system required more computational resources than regular client-server applications. Zheng et al. [9] built a blockchain-based storage model for the bitcoin environment. To solve the problem of bitcoin blockchain ledger due to its high storage requirement and bandwidth demand, the proposed system introduced the concept of IPFS to reduce the transaction data size and after compressing the data it was sent to the blocks in the blockchain network. According to this proposed system the transaction data is deposited by the miners into the IPFS network and the returned IPFS hash of the particular transaction is packed into the blocks. This system provided better efficiency and results due to the usage of the IPFS network as the intermedi-

FileFox: A Blockchain-Based File Storage Using Ethereum and IPFS

141

ary as well as reduced the size of data considerably up to the compression ratio of 0.0817. The solution provided by the proposed system can also be applied to other blockchain-based crypto-currencies. Steichen et al. [10] presented a paper discussing about the implementation of modified IPFS into blockchain for storage. Due to issue in transparency and size in blockchain, the suggested system used modified version of IPFS named as acl-IPFS. acl-IPFS leverages the Ethereum smart contract to provide access control file sharing. Whenever a file is uploaded, downloaded or transferred, the acl-IPFS interacts with the smart contract. This system takes more time than standard IPFS. Chen et al. [11] constructed a P2P file system using blockchain and IPFS. The proposed system was developed by combining both blockchain and IPFS. It was designed to share files between a peer-to-peer network and provided and a new solution is provided for content service providers and also upgrades their user experience. The proposed model was inspired by blockstack and used the client-server methodology. It allowed content service providers as well as the customers to connect with the network without maintaining the fully functional node. This system provided security and also improved the IPFS architecture. High throughput issues were also addressed and large files were stored properly. Kang et al. [12] instrumented IPFS and Blockchain-based File Storage-Sharing system. The authors have presented a blockchain file storing and sharing technique based on an NDN by combining Named Data Network (NDN) technology with a distributed blockchain and an Interplanetary File System (IPFS). To enhance the performance of the forwarding process, the required data are transferred in a reverse path after being encrypted with an NDN signature. To ensure the integrity of the entire forwarding process and the traceability of the knowledge file transfer process, the suggested model simultaneously records the forwarding transaction processes on blockchain. Nizamuddin et al. [13] implemented Decentralized Document Version Control using IPFS and Ethereum blockchain. The system that has been created offers a method for document sharing and version control which enables multiple parties sharing and tracks changes in a reliable, secure, and decentralized manner. Ethereum smart contract is used to regulate and control the version control. It uses IPFS to store documents. The suggested system removes the requirement of third-party between the approvers and the developers. The system automates the version control logic and workflow. The discussed system is only check for known bugs. Lu et al. [14] developed a file storage application using Hyperledger Fabric and Interplanetary File System (IPFS). This system stores files using the k − r allocation scheme, additionally mathematical formula of file security and file availability is developed by the system and it discusses optimal parameters. The results conclude that the file allocation strategy used in the k − r technique is better on minimum number nodes (MNN) than minimum slices number (MSN). Additionally it uses techniques like AES and RSA to encrypt files. The system uses the two storage schemes, i.e., MSN and MNN. Results show MNN is better than MSN. The Hyperledger fabric is used as a blockchain layer while this system requires further improvement in future.

142

K. N. Naik et al.

Naz et al. [15] articulated a blockchain-based secured data sharing platform which leverages the benefits of Interplanetary file system (IPFS). Access control and security is achieved by the system by executing the access role mentioned in smart contract by owner. The system authenticates the user using RSA algorithm. The system takes the review from the user after delivering the data successfully. The reviews is rectified by using Watson Analyzer. The reviews are used to authenticate the data. The solution for bloating problem at user’s end is provided by the model. Encryption scheme used in this model uses less computation time. Zhu et al. [16] proposed a blockchain-based decentralized storage schema. The proposed system provides user to upload the crypted data to the storage provider through a middleman and informs the user of the data storage location. After the completion of data integrity certificate between the user and the storage provider, the user will make use of the lightning network technology to pay the storage fee to the storage provider. Anusree et al. [17] developed a system using blockchain technology and provided file sharing services with lower risks and greater security due to IPFS. The proposed system uses characteristics of IPFS to build a secure file sharing environment. This system uses the concept of Solana and provides functionality like adding files, granting access permission, retrieving files, and account creation. This research tries to overcome the file storage issue. Access control list is managed by Solana smart contracts. IPFS Software is used for file storage. This model uses acl-IPFS, which is a blockchain-based extension to IPFS. The proposed system provides secure, distributed, tamper-proof environment for file storage.

4 Proposed System To solve the problems of data tampering, we have proposed a blockchain and IPFSbased file storage system named as “FileFox” that allows the user to store their data securely. The interface has two options that is view and upload the files. The uploaded file is shown to the user in table format. The table has Share Link column through which the user can copy the link of file and share it with other users securely. Since the link can be viewed by unauthorized user a encryption key is added between sender and receiver to increase the level of security and trust. The block diagram of the proposed system is shown in Fig. 2. The user is connected to the system via a browser which is built using ReactJS. React is JavaScript library for building easy to use user interfaces [18]. The browser is connected to blockchain. The blockchain used for the project is Ganache and truffle is used to create smart contract. The MetaMask is used for digital wallet. The browser is connected to IPFS which is used for storage purpose. The path link of the uploaded file is generated which is made visible to the users. The link can be used to share it with others. As the file can be shared with other users the proposed system uses Advanced Encryption Standard 256 bits algorithm. AES 256 is a symmetric block cipher encryption and decryption algorithm. It is added to the share link generated

FileFox: A Blockchain-Based File Storage Using Ethereum and IPFS

143

Fig. 2 Block diagram of FileFox

which can be used to share the file with other users. After the file has been shared, a passcode is asked to open the file. The passcode is the secret key through which other user can access the file and it act as an extra security check point to verify the outside users authenticity. The following Fig. 3, depicts the system’s flowchart for a blockchain-based file storage application. When the user operates the proposed system, they are taken to the home page where they can learn about the application. The user must select the Get Started button in order to use the system, at which point they will be directed to the main page, which has two sections on a single page. The option to upload data is offered in the first part. They can view their uploaded data in a table in the second part. The table holds the files in stack style. The first row of the table is taken up by the most recent uploaded items. The table has several sections, including the following mentioned. • • • • • • •

ID—A serial number generated automatically by the system. Name—File name of the uploaded file. Description - Specification of file provided by the user. Type—The Data type. Hash—The path of the stored data which can be then use to share with others. Size—The size of the data. Address—The account address of the user.

144

K. N. Naik et al.

Fig. 3 Flowchart of FileFox

5 Implementation The proposed system’s home page, depicted in Fig. 4, serves as the starting point for users to access the application. Following Fig. 5 displays the application view that appears upon clicking on the ‘Get Started’ button. It showcases a simple interface for file uploading and viewing. Figure 6 illustrates the application view that appears upon clicking the ‘Choose File’ button, which triggers the file browsing window to open. Figure 7 showcases the file chosen by the user to upload onto the proposed file storage application. Figure 8 displays the view of application after selecting the file. Upon completing this step, the file may be uploaded by selecting the ‘Upload’ button. It is essential for the user to add a file description before clicking on the ‘Upload’ button.

FileFox: A Blockchain-Based File Storage Using Ethereum and IPFS

145

Fig. 4 Home page of FileFox application

Fig. 5 Share file screen

The transaction confirmation window is depicted in Fig. 9. It is crucial to confirm the transaction prior to uploading the file. Once the transaction is successfully completed, the file will be uploaded into the file storage application. Figure 10 depicts the uploaded file table and displays the uploaded file’s image. By clicking on the Share link, users can view the file. The user who uploaded the file can access their transaction details by clicking on Account Detail. The successful upload of the file is shown as first row in the following figure. Figure 11 displays the transaction details of a specific user after they have clicked on Account Detail. Only authorized users who have successfully completed the transaction from the same account can view this information.

146

K. N. Naik et al.

Fig. 6 Choose file window

Fig. 7 File section

Figure 12 illustrates how the proposed system allows users to view uploaded files using the Share Link option. The file can only be viewed through this feature, ensuring that it cannot be downloaded or tampered with.

FileFox: A Blockchain-Based File Storage Using Ethereum and IPFS

Fig. 8 File uploading Fig. 9 Confirming ethereum transaction

147

148

Fig. 10 Upload table gets updated after successful transaction

Fig. 11 Ethereum transaction details after clicking on account details

Fig. 12 Viewing uploaded file by clicking on share link

K. N. Naik et al.

FileFox: A Blockchain-Based File Storage Using Ethereum and IPFS

149

6 Result and Discussion This research developed a blockchain-based file storage application which combines with IPFS to provide hassle free storage of large files. The proposed model had shown exceptional results by providing a decentralized environment which stores files of an organization in a transparent and tamper-proof manner. The use of SHA256 algorithm is used to generate the path title as share link in the table. This link is a path to the data stored into the system. To overcome the major disadvantage of blockchain of transparency we are making use of AES Encryption which is embedded into the link path. This make sure that the link will be accessible to others only after entering the key. This technique provides extra security to the system. All the data uploaded by the user is stored in IPFS (INFURA). Ether is used to transact in the blockchain environment. Every transaction consumes approximately 0.007 Ether points which is quite a nominal cost. The proposed system uses the characteristics of Ethereum blockchain and IPFS and it has delivered good results in terms of security and functionality.

7 Conclusion and Future Scope The study aimed to develop a secure file storage application which solves the limitations of traditional file storage applications. The proposed system has solved issues faced by traditional file storage applications effectively. The storage of large files has become easier with the development of the proposed system. Level of security and authorization provided by the system is commendable. All files of organization are kept within the boundary of organization due to the file sharing application developed. For sharing files outside the organization requires authorization from the side of the receiver. If the user is authorized, only then the shared file can be viewed. Authorization is done with the help of passcode. In future, this system can be integrated with other applications used in the organization like distributed version control software’s. Additional functionality like sorting the files according to dates, names, and file modification time can be done. Profiles can be assigned to users, for generating a list of uploaded files uploaded by a particular user in the organization. Communication portal can be added for external viewers to communicate with members of the organization. This application can also be converted to provide end to end solution related to file storage within an organization.

References 1. What is Decentralization in Blockchain? shorturl.at/wIOR0. Accessed 15 Dec 2022 2. Bharathi P, Annam G, Kandi JB, Duggana VK, Anjali T (2021) Secure file storage using hybrid cryptography. In 2021 6th international conference on communication and electronics systems (ICCES). IEEE, pp 1–6

150

K. N. Naik et al.

3. Advantages And Disadvantages Of Blockchain Technology, shorturl.at/szWX9. Accessed 17 Dec 2022 4. What are the advantages and disadvantages of using IPFS instead of a normal website/app? https://www.quora.com/What-are-the-advantages-and-disadvantages-of-usingIPFS-instead-of-a-normal-website-app. Accessed 21 Dec 2022 5. Infura explained-what is Infura? https://url1.io/s/4ao3q. Accessed 25 Dec 2022 6. Get from idea to dapp quickly and easily. http://surl.li/fkcir. Aaccessed 29 Dec 2022 7. MetaMask. https://en.m.wikipedia.org/wiki/MetaMask. Accessed 02 Jan 2023 8. Sari L, Sipos M (2019) FileTribe: blockchain-based secure file sharing on IPFS. In: European wireless 2019; 25th European wireless conference. VDE, pp 1–6 9. Zheng Q, Li Y, Chen P, Dong X (2018) An innovative IPFS-based storage model for blockchain. In: 2018 IEEE/WIC/ACM international conference on web intelligence (WI). IEEE, pp 704– 708 10. Steichen M, Fiz B, Norvill R, Shbair W, State R (2018) Blockchain-based, decentralized access control for IPFS. In: 2018 IEEE international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData). IEEE, pp 1499–1506 11. Chen Y, Li H, Li K, Zhang J (2017) An improved P2P file system scheme based on IPFS and Blockchain. In: 2017 IEEE international conference on big data (Big Data). IEEE, pp 2652–2657 12. Kang P, Yang W, Zheng J (2022) Blockchain private file storage-sharing method based on IPFS. Sensors 22(14):5100 13. Nizamuddin N, Salah K, Azad MA, Arshad J, Rehman MH (2019) Decentralized document version control using ethereum blockchain and IPFS. Comput Electr Eng 76:183–197 14. Meng L, Sun B (2022) Research on decentralized storage based on a blockchain. Sustainability 14(20):13060 15. Naz M, Al-zahrani FA, Khalid R, Javaid N, Qamar AM, Afzal MK, Shafiq M (2019) A secure data sharing platform using blockchain and interplanetary file system. Sustainability 11(24):7054 16. Zhu Y, Lv C, Zeng Z, Wang J, Pei B (2019) Blockchain-based decentralized storage scheme. J Phys Conf Ser 1237(4):042008 17. Anusree K, Vadekkat JS, Dev AR, Abhinav (2022) Decentralized file transfer system blockchain-based file transfer. Int J Eng Res Technol (IJERT) 11(05) 18. React.js introduction and working. https://www.geeksforgeeks.org/react-js-introductionworking/amp/. Accessed 15 Jan 2023

Minimizing Web Diversion Using Query Classification and Text Mining Smrithi Agrawal , Kunal Kadam , Jeenal Mehta , and Varsha Hole

1 Introduction Compared to traditional question-answering, analyzing user intent in web content is complicated by its chaotic nature. However, the interlinked structure of web pages offers valuable features that aid in this analysis process. Most search sessions have a domain or a target set, but distractions can hamper or delay the target. As the World Wide Web continues to rapidly scale up, there is an increasing demand for automated web categorization support for users. Automated web categorization is useful for organizing the vast amount of information generated by keyword-based search engines or constructing hierarchical collections of web documents for cataloging purposes. Chen and Dumais’ research [1] suggests that users prefer browsing preclassified content catalogs. Web page classification, though resembling text classification from machine learning, is a complex problem. Web pages have an underlying structure in the HTML language. Web pages often contain distracting elements such as ad sponsors, sticky navigation bars and exit points for visitors. When applying a pure-text classification approach to these pages, significant bias can be introduced into the classification algorithm, causing it to overlook important topics and essential content. Therefore, developing an intelligent pre-processing technique that can extract the primary topic and improving the overall user experience has become a critical issue. S. Agrawal · K. Kadam · J. Mehta (B) · V. Hole Sardar Patel Institute of Technology, 400058 Mumbai, Maharashtra, India e-mail: [email protected] S. Agrawal e-mail: [email protected] K. Kadam e-mail: [email protected] V. Hole e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_12

151

152

S. Agrawal et al.

2 Literature Review The following papers covered a range of topics, including web query classification, big data analysis, text summarization, and the relevancy of web queries to user intent. Query enrichment is a widely used technique in information retrieval that involves mapping short queries to intermediate objects and subsequently mapping them to target categories. The effectiveness of this approach has been demonstrated in several research papers. For instance, in a recent study [2], the authors propose a new method called BridgeLDA to address the challenges of web query classification. BridgeLDA incorporates information from related documents to improve classification accuracy. Additionally, the authors introduced category selection which improved the success rate of their model. The authors evaluated BridgeLDA on two benchmark datasets and showed that it outperforms existing methods. Broder et al. [3] aimed to build a robust classification architecture of rare queries by leveraging web knowledge. Due to the presence of advertisements in the form of web links, their main motivation was to find rare queries that are hardest to locate. They are using a blind feedback technique which gives a higher classification accuracy than previously reported. Beitzel et al. [4] demonstrated an approach for the automatic classification of web queries using a big corpus of training data. The author finds the topical classification of queries accurately, to be an important factor contributing to a web search. They used a support vector machine classifier with a linear kernel for the classification task. The methodology involved preprocessing the queries, extracting features, and training the SVM model. The proposed approach achieved high accuracy in classifying queries into various categories. The work by Pan et al. [5] highlights the effectiveness of using a semantic search engine for information retrieval in digital libraries. The authors proposed a new algorithm named “Mellom,” built by combining two ranking algorithms. According to them, the ranking of a web page helps users to search for the information they are looking for. Subsequently, they enhanced the coefficient value by applying the genetic algorithm. The algorithm had a recall that increased by nearly 54% and precision that achieved 58% over state-of-the-art algorithms. Shen et al. believed queries to be ambiguous which can lead to incorrect mapping of web queries to their users. So they presented a new technique of query enrichment that matches each query to an intermediate object which further gets mapped to the respective category. The methodology in this paper [6] consisted of two phases: data collection for training the classifiers and query classification based on the classifiers. In the first stage, the queries were enriched by searching related web pages for relevant text and category information. In the second stage, ensemble classifiers are developed to classify the queries based on the data collected in stage 1. Experimental results show significant improvement in classification performance. Future research will explore finding more valuable information to build base classifiers. Finally, Shen et al. [7] have put forward a new algorithm based on the summarization of web pages. The authors extracted the most relevant information using

Minimizing Web Diversion Using Query Classification and Text Mining

153

different summarization techniques and used them as keywords for web categorization. Their results demonstrated an 8.8% improvement while using the proposed summarization-based classification algorithm compared to the pure-text-based algorithms. Other papers reviewed offer valuable insights into this approach, which involves utilizing both local [8] and contextual features to forecast the structure of abstracts [9]. This approach has proven to be effective in predicting the local structure of abstracts in slightly over 60% cases [10]. However, the unregulated nature of web content introduces additional challenges to its classification. The hyperlinked nature of hypertext, on the other hand, presents features that can aid the classification process [11]. Link information alone can boost the F1 score by up to 46 points when compared to a conventional content-based classifier [12]. In conclusion, the review highlights the importance of query enrichment techniques that map short queries to intermediate objects for effective classification. The review also showcases the approaches adopted by different researchers for query classification, such as building a bridging classifier on an intermediate taxonomy, using a blind feedback technique, and classifying queries without using external sources of information. Furthermore, the review discusses the importance of summarization as a form of feature selection, and how it can be used to predict the structure of abstracts. The interconnected nature of hypertext is also noted, with link information alone being able to significantly improve classification results. Overall, this literature review provides valuable insights into the various techniques and methods used in web query classification and highlights the importance of query enrichment and summarization for effective classification. The reviewed research papers provide useful contributions to the field, and their findings can be leveraged for developing efficient query classification systems in the future.

3 Methodology To provide a positive user experience, it is important for websites to understand the intent behind a search query and guide the user to relevant outcomes. A thorough classification of user intent can help in achieving this goal. Additionally, time management can be improved during search sessions with the help of extensions that have already computed search results based on the user’s intent. Furthermore, organizing the user’s search query with the present domain can help accomplish the required task efficiently. Overall, by focusing on user intent and utilizing tools to streamline the search process, websites can enhance the user’s experience and increase their satisfaction. The methodology followed for implementation was as follows: 1. Fetched the dataset that has the query and clicked URL. 2. Collected top ten query requests from search engines, e.g., Google and Microsoft Bing.

154

S. Agrawal et al.

3. Applied feature engineering and tokenization to get the semantics of the data used. 4. Calculated semantic matching value for each web URL and assigned a score.

3.1 Dataset In this research, 800,000 web queries collected from the SIGKDD KDD Cup 2005: Internet user search query categorization dataset has been used with 67 predefined categories. The meaning and intention of the search query given in the dataset were subjective. A search query “Bark” might mean tree bark to some people and dog bark to others. Therefore, the data is classified manually into subsets of queries and each query is tagged into five categories. The dataset (as shown in Table 1) was built using a web scrapper to collect some important text-based content from the web page like title, type, description, keywords, and HTML paragraph tags. To fetch links from the Google Search engine, Google Search and Beautiful Soup were used. After obtaining the links, relevant tags such as “p,” “title,” “type,” “description,” and “keywords” were collected to generate metadata. This metadata helped in gaining insights about the content of the links. The dataset also contains important details such as polarity, subjectivity, and length of the query after data cleaning. Each row also contains a category that is specific to the entire query. Since executing algorithms on such a huge dataset was a time-consuming task, 20% of the queries with labels were used in a random fashion that was sampled and the top ten relevant web pages from the search engine for each query which are distributed among the top seven categories (as shown in Fig. 1).

3.2 Feature Extraction The data collected from the web pages was filtered to extract meaningful clean data for training models. The nltk library was used to remove the punctuation marks, convert the entire text to lowercase, discard unnecessary noise and stop words, and lemmatize the text to extract a root word. After cleaning the text, null entries were removed from the dataset. Metadata was also generated from the text data to determine the sentiment of the text (i.e., polarity and subjectivity) and the length of the words after cleaning. To input data into deep learning models, an embedding was built using tokenizer methods to establish a limited vocabulary set, and the words were converted to numbers in a sequence of fixed length. The process of transforming the data to a sequence of fixed length is necessary for projects based on deep learning.

https://en.wikipedia.org/wiki/ Cumber_Times https://www.facebook.com/ Cumberland/ https://www.thepaperboy.com/ newspaper.cfm?Paper https://www.relib.net/dbcumberland-times-news

1

5

3

2

https://www.times-news.com/

0

Cumberland times newspaper Cumberland times newspaper Cumberland times newspaper Cumberland times newspaper Cumberland times newspaper

Link

Ind

Query

Find information on community issues and event...

First amendment: congress shall make no law re... This article about a Maryland newspaper is a s... Cumber landTimes-News Cumberland Times-News, C... Cloudflare ray ID: 77225b576...

Text

0.25

0.1048

0

0

0.0888

Pol

0.25

0.2238

0

0.1611

0.3787

Subj

0.5625

0.3958

0.4483

0.4855

0.4140

Len

Living

Living

Living

Living

Living

Cat

Table 1 Processed data format used in the analysis, based on data collected from sources mentioned (Ind-index, Pol-polarity, Subj-subjectivity, Len-length, Cat-category)

Minimizing Web Diversion Using Query Classification and Text Mining 155

156

S. Agrawal et al.

Fig. 1 Data abstraction performed on queries taken from KDD Cup 2005

3.3 Semantic Matching Calculating semantic matching generally involves using natural language processing (NLP) techniques to represent text as vectors and then comparing these vectors to determine their similarity. We have made use of term frequency-inverse document frequency (TF-IDF) models and cosine similarity to represent text as vectors. TF-IDF weights each word by its frequency in the query web page and the inverse frequency in the corpus, to give more weight to rare words that are more informative. Cosine similarity measures the cosine angle between two vectors. The closer the vectors are to each other, the more similar they are considered to be.

3.4 Web Query Classification In this part, the analysis of complex structures found in web pages was introspected, and the use of this data for the classification of web pages. The proposed technique involved retrieval of the pertinent content from the web and feeding it to a conventional text classification algorithm. Two distinct methods to carry out the classification task were explored. A machine learning-based approach where an SVM classifier and an SGD classifier were used was implemented. The second method used was recurrent neural networks like LSTM and GRU to extract the meaning from the features and classify them.

3.5 Machine Learning Models Support Vector Machine: It is a supervised learning algorithm that addresses both classifications and regression-based problems. The algorithm determines the best hyperplane that can segregate n-dimensional space into classes such that the vector that belongs to a specific category lies in the same subspace created by the plane

Minimizing Web Diversion Using Query Classification and Text Mining

157

Fig. 2 This was created based on the data collected during research and has not been previously published or cited. It illustrates the architecture of machine learning models used in the research

and the vectors that are belonging to the different category lie in different subspaces. The algorithm uses extreme vectors in the space that separates two different categories, and these extreme cases are called support vectors. To tackle the webpage classification challenge, the SVM algorithm can be leveraged by first finding vector representations of textual data that capture relevant information from the texts. Stochastic Gradient Classifier: SGD classifier is one of the best optimization algorithms used for finding the optimal values of parameters. In large-scale classificationbased problems, this algorithm has drawn a considerable amount of interest. Specifically, it is well-suited for multi-class classification tasks, utilizing a “one versus all” (OVA) approach by integrating several binary classifiers. This method is particularly effective for limited machine learning problems that are commonly found in NLP and text classification. A binary classifier that distinguishes between a given class and all other K1 classes is trained for each of the K classes. In order to determine which class has the highest confidence level, the confidence score was calculated for each classifier during the testing period (Fig. 2).

3.6 Deep Learning Models See (Fig. 3). Recurrent neural network (RNN): It comprises a “series” type input embedded vectors with no specified size while a vanilla neural network can only process inputs of fixed size. Therefore, there is no set size restriction on the input vectors that the RNNs can accept. RNNs are useful for classifying web pages that contain vectors of word embeddings because they can handle variable-length sequences of input vectors using internal state memory. Long short-term memory (LSTM): It is a unique type of recurrent neural network which can detect long-term dependencies within data. This is achieved through a repeating module that consists of four interconnected layers. The LSTM module,

158

S. Agrawal et al.

Fig. 3 This was created based on the data collected during research and has not been previously published or cited. It illustrates the architecture of deep learning models used in the research

which comprises a cell state and three gates, allows for selective learning, unlearning, or retention of knowledge from each unit. The cell state in LSTM enables uninterrupted information flow across units by permitting a few linear exchanges. To regulate the information flow to the active cell state, the input gate uses point-wise multiplication of “sigmoid” and “tanh.” The output gate ultimately determines which data should be transferred to the following concealed state. Attention LSTM: It is a model that utilizes a modified form of the long short-term memory (LSTM) network that enables it to concentrate on the essential segments of the input sequence while making predictions, rather than processing the entire sequence equally. This shortens computation time and helps the model handle lengthy input sequences successfully. In contrast to the standard encoder-decoder approach, the attention mechanism aids in looking at all hidden states from the encoder sequence for producing predictions. Gated Recurrent Units (GRU): GRUs are well suited for web page summarization tasks due to their ability to handle sequential data effectively. It incorporates gates that control the flow of information, allowing the model to choose which information to retain and which to discard. This makes them highly effective in processing and summarizing long sequences of text, such as a web page. Additionally, GRUs have fewer trainable parameters compared to other deep learning models, such as LSTM networks, which reduces the risk of overfitting. Overall, GRUs provide a balanced combination of performance, computational efficiency, and robustness, making them a highly suitable deep learning model for web page summarization. GRU is faster than LSTM and uses less memory space. However, LSTM is more accurate when using datasets with longer sequences.

Minimizing Web Diversion Using Query Classification and Text Mining

159

3.7 Evaluation Metrics An essential phase in this paper’s experiment workflow is to calculate the model’s accuracy using the proper metric. Categorizing text into specific labels is done by utilizing accuracy as a metric to determine the relevant label. To compute accuracy, the accuracy_score function from the sklearn.metrics module is used. Accuracy_score is one of the most widely used metrics that computes the performance of classification models. This function compares the true and predicted labels to determine the fraction of correct predictions in the accuracy classification score.

4 Results and Discussion The label categories are shorten to five main categories—“Living,” “Computers,” “Sports,” “Information,” “Entertainment,” “Shopping,” “Online Community.” When you enter a query, the system takes top ten links and converts them. Let’s say the input query is “Women IPL final.” The results are shown in Table 2 The data is normalized to bring it on a similar scale and subsequently presented in the form of confusion matrix (as seen in Figs. 4, 5, 6, and 7). The diagonal entries represent the number of web pages that are correctly classified by the model, while

Table 2 Predictions based on SVM, SGD, attention LSTM and GRU models Index Links Prediction_ Prediction_ Prediction_ SVM SGD Attention_LSTM 0 1 2

3 4 5 6 7

https://www.forbes.com/ sites/tristanlavalette https://thesportsrush.com/ cricket https://www.youtube. com/watch?v=cXwT7MRQNs https://en.wikipedia.org/ wiki/IPL_ https://en.wikipedia.org/ wiki/IPL_Trail https://en.wikipedia.org/ wiki/IPL_Velocity https://en.wikipedia.org/ wiki/2022_Women https://byjusexamprep. com/current-affairs/ womens-ipl-schedulepoints

Prediction_ GRU

Sports

Sports

Sports

Information

Sports

Sports

Sports

Information

Living

Living

Information

Living

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

Sports

160

S. Agrawal et al.

Fig. 4 Confusion matrix of GRU obtained through experiments conducted by the authors

the off-diagonal entries represent the number of web pages and categories that got wrongly classified. The higher diagonal values of the confusion matrix indicate the performance of the model is better, indicating many accurate classifications. Since finding the relationship between the variables is not the ultimate goal, constructing a correlation matrix is not necessary. Correlation matrix aids in dimensionality reduction and is commonly utilized during exploratory data analysis (EDA). The evaluation of the work is based on three widely used metrics in text classification tasks, namely precision, recall, and F1-measure. Precision rate refers to the proportion of positive samples that are correctly categorized and predicted, while recall rate refers to the proportion of accurately predicted target labels. F1-measure is a metric that calculates the harmonic mean of both the precision rate and the recall rate. These metrics are commonly used in evaluating text classification tasks as well. As the results indicate from Table 3, machine learning models categorize web pages more accurately than deep learning models. The accuracy of the SGD classifier

Minimizing Web Diversion Using Query Classification and Text Mining

161

Fig. 5 Confusion matrix of attention LSTM obtained through experiments conducted by the authors

is the best performing compared with other models. It means that feature engineering done in machine learning architecture is able to accurately classify web pages. The results demonstrate that the NLP techniques used for preprocessing the data can improve the accuracy of the specified web classification problem.

5 Future Scope In the subsequent step, the system will require the user to input the target category for their search session, such as “shopping” if the user intends to browse ecommerce sites. If the system recommends a website that does not fall under the category of

162

S. Agrawal et al.

Fig. 6 Confusion matrix of support vector classifier obtained through experiments conducted by the authors

“shopping,” it will alert the user accordingly. Thus, instead of classifying websites, the system will utilize the predicted probability of a website belonging to a particular category to trigger an alert message for the user. The performance can be improved further by using more variety of complex HTML tags. Modifying the sequence length of the deep learning models can also affect the accuracy of the models. Performing classification based on the visual image data coming from the web pages will give a new direction to this project. The metrics of categorization can be enhanced even further by assembling the models and using more sophisticated techniques like BERT, ELMO, and FastText.

Minimizing Web Diversion Using Query Classification and Text Mining

163

Fig. 7 Confusion matrix of stochastic gradient decent classifier obtained through experiments conducted by the authors Table 3 Accuracy metrics for various models. Data presented are from experiments conducted by the authors Model Precision Recall F1 Accuracy Support vector classification Stochastic gradient classifier Attention LSTM Gated recurrent units (GRU)

0.801 0.872 0.787 0.793

0.834 0.794 0.709 0.678

0.809 0.824 0.743 0.723

0.863363 0.867117 0.815315 0.785285

164

S. Agrawal et al.

6 Conclusion In this study, the feasibility of using AI-assisted aids in preventing procrastination while browsing the web was explored. To achieve this goal, a comparative study between various deep learning and machine learning architectures was conducted. The results achieved suggested the potential for artificial intelligence (AI) to play a role in this area despite the limitations posed by the small and limited dataset used in this study. One of the key findings from the study was the issue of overfitting or underfitting that deep models frequently displayed. The balance between the size of the training sample and the number of trainable parameters in the model could have had a significant impact on how much of this happened. This paper’s results also showed that shallower models with fewer trainable parameters often exhibited higher accuracy compared to deeper models and also showed that shallower models with fewer trainable parameters often exhibited higher accuracy compared to deeper models. Moreover, the aim of this paper was to reduce the time lost by providing users with a relevancy score for URL links generated based on their search query. This score could help users quickly identify and access the most relevant information, thereby reducing the likelihood of procrastination and wasted time while browsing the web. In conclusion, the findings of this study highlight the potential for AI-assisted aid in preventing procrastination while browsing the web. However, more research is needed to further refine the models and expand the dataset used in the study. By doing so, the aim is to build a more robust and effective solution to this problem in the future.

References 1. Chen H, Dumais ST (2000) Bringing order to the Web: automatically categorizing search results. Proceedings of CHI2000, pp 145–152 2. Shen D et al (2018) Building bridges for web query classification, microsoft research. Available at: https://www.microsoft.com/en-us/research/publication/building-bridges-for-web-queryclassification/ Accessed Dec 8 2022 3. Broder AZ, Fontoura M, Gabrilovich E, Joshi A, Josifovski V, Zhang T (2007) Robust classification of rare queries using web knowledge. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR ’07). Association for Computing Machinery, New York, NY, USA, pp 231–238. https://doi.org/10. 1145/1277741.1277783 4. Beitzel SM, Jensen EC, Lewis DD, Chowdhury A, Frieder O (2007) Automatic classification of web queries using very large unlabeled query logs. ACM Trans Inf Syst 25(2):9-es. https:// doi.org/10.1145/1229179.1229183 5. Pan Z (2020) Optimization of information retrieval algorithm for digital library based on semantic search engine. Int Conf Comput Eng Appl (ICCEA) 2020:364–367. https://doi.org/ 10.1109/ICCEA50009.2020.00085 6. Shen D et al (2006) Query enrichment for web-query classification. ACM Trans Inf Syst 24(3):320–352. https://doi.org/10.1145/1165774.1165776

Minimizing Web Diversion Using Query Classification and Text Mining

165

7. Shen Dou et al (2004) Web-page classification through summarization. ACM SIGIR conference on research and development in information retrieval 27(2):7. https://doi.org/10.1145/1008992. 1009035 8. Qi X, Davison BD (2009) Web page classification: features and algorithms. ACM Comput Surv 41(2):1–31. https://doi.org/10.1145/1459352.1459357 9. Jansen BJ, Booth DL, Spink A (2008) Determining the informational, navigational, and transactional intent of web queries. Inf Process Manag 44(3):1251–1266. https://doi.org/10.1016/ j.ipm.2007.07.015 10. Li J et al (2022) Graph enhanced bert for query understanding. arXiv:2204.06522 [cs]. Accessed 20 Oct 2022. [Online]. Available: https://arxiv.org/abs/2204.06522 11. Bharat K (2000) SearchPad: explicit capture of search context to support web search. Comput Netw 33(1–6):493–501. https://doi.org/10.1016/s1389-1286(00)00047-5 12. Xia C, Wang X (2015) Graph-based web query classification. In: 2015 12th web information system and application conference (WISA), pp 241–244. https://doi.org/10.1109/WISA.2015. 68 13. Kurian A, Jayasree M (2014) Analyzing and classifying user search histories for web search engine optimization. In: 2014 3rd international conference on eco-friendly computing and communication systems, pp 39–44. https://doi.org/10.1109/Eco-friendly.2014.83 14. Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Review 37:573–595 15. Attardi G, Gulli A, Sebastiani F (1999) Automatic web page categorization by link and context analysis. In: Hutchison C, Lanzarone G (eds) Proceedings of THAI’99, pp 105–119 16. Lam W, Han YQ (2003) Automatic textual document categorization based on generalized instance sets and a metamodel. IEEE Trans Pattern Anal Mach Intell 25(5):628–633 17. Chakrabarti S, Dom B, Indyk P (1998) Enhanced hypertext categorization using hyperlinks. Proc ACM SIGMOD 37:307–318. https://doi.org/10.1145/276305.276332

Connect: A Secure Approach for Collaborative Learning by Building a Social Media Platform Sonali Lunawat and Vaidehi Pawar

1 Introduction Social media is also known as social networking sites like WhatsApp, Facebook, Twitter, etc. A social network [1] as featured in Fig. 1 is the largest utilized platforms for information sharing on the Internet. Social networks allow users to control who can view their profile, upload images, add multimedia content, or change the appearance and feel of their profile, create blogs, comment on postings, and share contact lists. Social networking sites can be defined as collaborative web-based applications that enable users to connect with relatives and connections, meet new people, join interesting communities, communicate, exchange images and event details, and network with others in their real-life groups.

1.1 Critical Characteristics of Social Media Are as Follows • Connectedness: It connects people interested in same areas of work or domains. Through the media, they are connected 24 * 7 using access devices to like, comment, and share or update their profile and follow others. • Collaboration: The people enable themselves to collaborate and create knowledge which can be either open or closed. E.g., Wikipedia

S. Lunawat (B) · V. Pawar Department of Computer Engineering, Pimpri Chinchwad College of Engineering and Research, Ravet, Pune, India e-mail: [email protected] V. Pawar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_13

167

168

S. Lunawat and V. Pawar

Fig. 1 Features of social networking sites [1]

• Community: Connectedness and collaboration helps to form communities. These communities can create awareness about various privacy and security issues, policy making, and feedback.

1.2 Social Media Platforms • Facebook: Facebook was launched in 2004 at Harvard University’s social networking university. In 2009, Facebook overtook as the largest social networking platform [2]. It is more popular for photograph-sharing platform. • Twitter: Odeo launched Twitter in 2006, and it became a public network in the same year. Twitter is a real-time, web-based service that allows users to send messages in the form of 140-character tweets. Twitter is well known for its online application for microblogging [2]. • YouTube: It is a video-sharing platform where users may upload, watch, and share their own videos. It is more popular short video sharing. It was founded in 2005 and has progressed into a platform for users to build channels using their Google account [2]. • LinkedIn: It is established for a professional network where professionals participate and collaborate with their networks. A group of professionals with similar interests can be formed, and it is also additionally utilized by organizations to find new hires [2]. • MySpace: It is a social networking site for advertisers who are paying for page views with many options. They are used in the United Kingdom, Australia, etc. [2].

1.3 Need for Using Collaborative Learning • Enhancing outreach for own development • Real-time connections with domain experts

Connect: A Secure Approach for Collaborative Learning by Building …

169

• Peer-to-peer interactions • Discussion forums

1.4 Challenges and Issues to Build Social Networking • Fundamentals: While building a social networking site, one has to deal with large data and dynamic data, heterogeneous connections, maintaining networks over large-scale social networks [3]. • Technologies: Model, methods [4], and tools used for community formation with big data, behavior analysis, and storage of a huge amount of data [3]. • Security: Detecting malicious information, securing social networking applications, monitoring, and preventing different malwares in social networks [5], novel security solutions, threat and vulnerability, and security in social networking applications with big data [3]. • Trust: Models, methodologies, and tools for measuring the trust of social networks with big data [3]. • Privacy: Privacy in social networking management and analysis with big data [3].

1.5 Cloud Computing Cloud computing uses internet-supported devices for resource sharing and can serve different operations such as storage, security, scaling, and many more [6]. A social network is a virtual space where people may communicate with family and friends and share any information as per they wants. With the growing demand for social networking services, a scalable, cost-effective, and efficient cloud computing architecture is becoming increasingly popular.

1.6 Encryption The encryption and decryption of messages take place between both users in endto-end encryption. Cryptography is used to secure data storage and transmission so that only the intended user has access to the information. Data should be completely hidden from the third party [7]. Each day new ways of developing cryptosystems are built to provide faster and more efficient security mechanism by using end-to-end encryption [8].

170

S. Lunawat and V. Pawar

1.7 Importance of Social Media Platforms for Collaborative Learning Technology is used to strengthen learning and overcome problems with conventional methods in many aspects. All social media platforms like YouTube, Instagram, and many more are built for specific purposes. Figure 2 shows active user in each of popular platform. There are no any social media platforms for faculty-to-faculty collaboration [10]. It is very important to connect between faculties to upgrade themselves and give the best teaching and guidance to their students.

Fig. 2 Most popular social networking sites [9]

Connect: A Secure Approach for Collaborative Learning by Building …

171

1.8 Advantages of Collaborative Learning for Faculty-To-Faculty Interaction A. Creates long-term and sustainable joint initiatives for all disciplines at each university in various areas as required. B. Introduces students to global citizenship which helps to prepare them for global citizenship with global relevance in their personal and professional decisions as tomorrow’s world leaders. C. Provides a forum for diverse points of view and encourages the diversity of ideas. D. Distributes student talent. E. Educates the next generation of faculty on global research methodologies, provides access to specialized research facilities such as specific equipment and laboratories, which are not available at the home institution or country, university, and leverages financing from numerous national funding sources. F. Large, long-term projects. G. Forming SIG (special interest group). H. Collaborative learning. I. Extracurricular events.

1.9 Security Concerns While Building a Social Media Platform Very serious drawback of social media platforms is lack privacy and security. [5] Despite the fact that social media security is a well-studied topic, there is no agreement on security in these platforms. It has been found that the majority of users are generally uninformed of the various privacy dangers involved in uploading personal information on social networking Web sites [11]. Security and privacy issues in social networking Web sites are shown in Fig. 3.

1.10 Conventional Threats Spam Attack: When attacker sends bulk electronic messages which are not required, it is called a spam attack. Malware Attack: This is a harmful program that is specifically designed to access a computer system without user’s knowledge. Phishing: A phishing attack is a type of attack in which the attacker obtains sensitive and confidential information such as usernames, passwords, and credit card information by fake Web sites and emails [11]. Identity Theft: In this type, attackers use other identities like mobile, numbers, credit card number, and address, without permission.

172

S. Lunawat and V. Pawar

Fig. 3 Security and privacy issues in social networking Web sites [1]

1.11 Modern Threats Cross-site Scripting Attack: Cross-site scripting is a malicious script injected in the form of browser side script. Profile Cloning Attack: In this attack, profile identical to existing is made, and a fake profile is created. Hijacking: The attacker gains control of computer systems on a network during hijacking. Inference Attack: In this attack, confidential information that the user does not want to expose is stored or added to a social networking site (SNS) using a machine learning algorithm. Sybil Attack: The Sybil attack is a destructive attack where forged identities are used in place of genuine one.

Connect: A Secure Approach for Collaborative Learning by Building …

173

Clickjacking: Clickjacking is a technique in which an attacker obtains private information by clicking unwanted links without checking the originality what actually planned by him. De-anonymization Attack: This attack modifies personally identifiable information with available information. Cyberespionage: Cyberespionage uses cyberskills to get sensitive information with the goal to communicate to opposing parties.

1.12 Targeted Threats Cyberbullying: In this attack, an attacker teases a person through phone or online social networks. Cybergrooming: In this attack, an emotional bond with the victim with the goal of sexual abuse is created.. Cyberstalking: Cyberstalking is the practice of observing a victim via the internet, email, or another method in order to disturbing person mentally.

1.13 Reasons Behind Online Social Media Security (a) The scale of social media: As large amount of personal data are stored on social media for different purposes, it will abuse attackers; hence, security is the top most challenge of concern. (b) Trusted nature of social media: People accept an unfamiliar friend request based on mutual friends because of the trusted nature of social media. They readily click on the dangerous link blindly without concerning about security. (c) Invisibility to security team: Despite the lack of security, people spend most of their time on social media networks, making it tough for security professionals to manage this massive data, deep dive into security challenges associated with social media, and find solutions.

2 Literature Review The above Table 1 reviews different social media platforms and security issues in the architecture.

174

S. Lunawat and V. Pawar

Table 1 The literature work analysis related to the architecture of different social media platforms and security issues in this architecture S. No. Ref. Literature review ID 1

[12] The work focused on the problems and opportunities presented by social media in future. According to the author, education has an unparalleled opportunity to monitor and enhance their own practices. Collaborative learning will improve the efficacy of both teaching and learning, and students will discover new methods to share their work

2

[10] The author emphasized the significance of collaborative learning, which is one of the most successful learning methodologies for postgraduates in science education. Based on the author’s findings, it was advised that online collaborative learning enables researchers or students to cope with study, work, and family to enhance sharing of resources by improving performance

3

[13] The author studies and showed that collaborative learning is a successful way that should be implemented in education. However, an appropriate environment to be built for learning. The author also found that the combination of collaborative learning and e-learning benefits students by increasing their enthusiasm and interest

4

[14] The study demonstrated that the Telegram® social media application is very effective for improving students’ reading skills, ability to introduce audio–visual, topic explanation, motivating students to search for knowledge and information, and concentration with modern learning theories to achieve the desired outcomes

5

[2]

6

[15] The study explained all the technology Facebook used which are of selfinvented technologies and languages and some are open source and customized. Facebook helped to connect people without creating their own Web site

7

[16] In this work, the author demonstrated how to scale a Memcache architecture is used in Facebook’s for growing demands. Memcache was utilized to separate cache and persistent storage systems, as well as to increase monitoring, debugging, and operational efficiency

8

[17] The author of this research examined relational and NoSQL databases utilized by the most popular social network sites (Facebook, YouTube, Twitter, Instagram, and LinkedIn), and the primary reason for using NoSQL for big data. A NoSQL database is an appropriate option for distributed system, cluster-oriented, horizontally scalable, and consistency concerns. NoSQL helps for data analysis and works for unstructured data

9

[18] The authors determined that Facebook has grown in popularity over the last three years and has become the most successful social networking Web site. The architecture and scalability elements of Facebook are detailed. This helps to grasp how Facebook operates

10

[6]

The research discussed social networking and how it has revolutionized the way individuals communicate with each other without concerning the location they belong from. It elaborated the popularity of social networking sites in the development of new applications

In this research, the author compared MySpace with Facebook, where efficient cloud computing technologies were employed to derive benefits and increase income (continued)

Connect: A Secure Approach for Collaborative Learning by Building …

175

Table 1 (continued) S. No. Ref. Literature review ID 11

[3]

The author provided insights of using big data in social networking. These issues have also provided future research directions for developing applications using big data

12

[19] The work analyzed the fact that social media poses significant security and privacy issues because of centralized infrastructure and personally identifiable data, which can be exploited by hackers

13

[5]

There are numerous options, but they all provide less privacy. As a result, many opportunities for privacy, data security, cost, adaptability, and user performance can be provided

14

[1]

The author analyzed various case study of online social network to analyze dangers and solutions by comparing various models, frameworks, and encryption approaches that safeguard social network members from various attacks. It was also stated that there is a need to address security and privacy concerns through the use of hybrid techniques and threat detection systems

15

[8]

The author stated that end user’s security is very important aspect of the system’s privacy and security. The study explained end-to-end encryption, which is less vulnerable in WhatsApp. It also elaborated the WhatsApp functionality to know the last logged, without indicating other information such as the IP or computer name or the location of the computer

16

[4]

The author explained how Instagram is utilized for photograph sharing. The research focused on Instagram’s privacy issues and the threats associated with it. Several ways for dealing with threats to users’ personal information have been offered. The future of social media should focus on security requirements in order to design applications that are error-free for users

17

[20] How WhatsApp make use of end-to-end encryption was explained. It further summarized various advanced cryptography protocols that enable the various security and privacy issues of WhatsApp, in their security architecture

18

[11] The author investigated many privacy and security issues linked with online social media as well as third-party data collectors. The study’s major goal was to advise users on how to protect themselves against these dangers when using social media

19

[21] The paper has done study on growing popularity of internet with increase in high risk of their privacy and security issues. As per the results analyzed, there should be method to verify a user’s identity

2.1 Encryption Techniques Used Two types of encryptions such as symmetric and asymmetric encryption are used on social media. Many social media platforms used cryptography and steganography as safe methods. AES encryption provides secure data communication and transmission over these platforms. Various applications like Telegram and Facebook use end-toend encryption. In shared media applications, E2EE encryption is used. Everyday an

176

S. Lunawat and V. Pawar

innovative encoding system is developed, and hence, conventional enciphering tools need to be upgraded for faster and high rate of security.

3 Summary of Literature Review The importance of collaborative learning and the impact social media trends nowadays are summarized. Building a social media to serve as platform for faculty for upgrading themselves in multidisciplinary benefits is challenging. To build such platform, one should be aware of security and privacy must be focused.

3.1 Importance of the Study Over the last few decades, a rapid change in new media, which includes digital, computerized, and networked information and communication technologies, is observed. Every societal section has established a virtual presence on the social media platform [14]. All available information on the subject has been studied to demonstrate the importance, characteristics, and roles played by social media, in addition to the key challenges of social media security and privacy [19]. The data which are useful and the problems associated [13] should be analyzed before developing any tool for educational purpose.

3.2 The Opportunities that Will Be Provided Among the Users of the System Are as Follows • • • • • • • • •

Multicultural exposure and learning opportunities. Enhanced media/digital literacy. Motivate user to use the system should be increased. Academic and personal identity growth. Integration of formal with informal learning. Content discovery and creation Research work expansion. Collaboration between institutions and graduates for supporting lifelong learning. Alumni support.

4 Proposed System As shown in the Fig. 4, the system consists of:

Connect: A Secure Approach for Collaborative Learning by Building …

177

Fig. 4 Proposed architecture

1. Users: Users can be any faculty or student who wants to use the services provided by the platform. They can login with the web application with their credentials. 2. Proposed Algorithm: To understand various challenges in social media platforms, implementing lightweight and high security providing algorithms is proposed. 3. Load Balancer: It acts as the “Traffic Police” residing before web servers for routing the client requests across servers who can fulfill the requests in terms of maximizing speed and checking the server utilization. 4. Memcache: It uses key-value storage system for the given piece of data for operations. Unlike databases that store data in disk or SSDs, Memcached as in keeps all its data in memory [16]. It is free and open-source, high-performance, distributed memory object caching system. Memcached is very simple but has powerful features. Memcached is mostly used in real-time applications like web, mobile apps, gaming, ad-tech, and e-commerce. Memcached is easy to scale out by adding new nodes and due to which it is used in distributed applications [15]. 5. Content Delivery Network: Content delivery network is collection of servers where it selects the closest servers which can serve the requesting user. Content delivery network (CDN) reduces data transfer time from geographically dispersed servers. They help to improve web performance by reducing the time, reducing network latency in streaming video, and optimizing broadcast. 6. Hadoop: A large-scale distributed batch processing framework that leverages parallel data processing among different nodes to assist in building distributed applications that support big data. HDFS (Hadoop file system) replicates small chunks of data and stores them to ensure that the data are available from another node if any node fails [18].

178

S. Lunawat and V. Pawar

7. Kafka: It is a real-time distributed publish-subscribe messaging system with a strong queue that can manage a large amount of data and allows users to pass messages from one end to the other. Kafka may be used to take in messages both off-line and online. To prevent data loss, Kafka messages are kept on a persistent disk with a replica of the data stored in the cluster. 8. NoSQL: It is a non-tabular or unstructured database [17]. They can store all types of unstructured data such as images, files, and videos. Given below are the reasons for using NoSQL. • Flexible schema • Auto-scaling • Transaction support • Auto-sharing

4.1 Proposed Algorithmic Process As analyzed, all the numerous scenarios linked to online social network issues and their solution in utilizing various models, frameworks, and encryption approaches safeguard social network members from various attacks in this study. Hence, there is a need to address security and privacy concerns using hybrid encryption algorithm. Hence, algorithmic steps as in Fig. 5 are followed as proposed algorithm in proposed architecture. 1. Password is stored encrypted with 2 level of security on chat server [20] 2. Secure session or transport layer security (TLS) has a unique key for every session. 3. Ensuring the communication is with the correct person and no attackers in between. 4. Messages will be acting as secrets and are privately and securely maintained. 5. Secrets which are messages are protected by encryption at storage server storage. 6. Secrets are not allowed to be exchanged if they are not friends.

Fig. 5 Proposed algorithm

Connect: A Secure Approach for Collaborative Learning by Building …

179

The Poly1305 is a cryptographic message authentication code (MAC) created by Daniel J. Bernstein. It can be used to verify the data integrity and the authenticity of a message. Poly1305 takes a 32-byte one-time key and a message and produces a 16-byte tag. This tag is used to authenticate the message.

5 Discussion As the architecture can be build using proposed system, there will be lot of challenges to build this system. Server Utilization: As CDN is used, it is completely the job of CDN to check for utilization and serve the user without any down time. Speed: As the system will be using Cloud and CDN, Kafka, and Memcached, speed can be used to achieve the same. Security: As per the study of different social media platforms, to build a social media application, security should on top priority. Many of these platforms use AES, Triple DES, CAST-128, IDEA, and RC2. The current encryption is also studied, and a new way of encryption to provide more security has been proposed.

6 Conclusion Social media has become the digital way of communication to world over Internet. By extensive literature survey in the field of collaborative learning, a system where faculty-to-faculty interaction is proposed is used to fulfill many research openings like multicultural exposure and learning, intensive growth in academics, and many more. The proposed system connects faculties from various universities to collaborate virtually for knowledge sharing. As security is the major concern to build trust among collaborators, an encryption algorithm is proposed. Overall architecture can be modified to increase learning curve and improve different skillsets.

References 1. Jain AK, Sahoo SR, Kaubiyal J (2021) Online social networks security and privacy: comprehensive review and analysis. Complex Intell Syst 7:2157–2177 2. Sadiku MN, Omotoso AA, Musa SM (2019) Social networking. IJTSRD 3 3. Peng S, Yu S, Mueller P (2018) Social networking big data: Opportunities, solutions, and challenges. Future Gener Comput Syst 4. Sai Lakshmi Harichandana Nandyala (2018) privacy impact assessment: Instagram. Project Privacy Big Data 5. Sushama C, Sunil Kumar M, Neelima P (2021) Privacy and security issues in the future: a social media. Mater Sci Technol Eng

180

S. Lunawat and V. Pawar

6. Dudi P (2013) Cloud computing and social networks: a comparison study of Myspace and Facebook. J Glob Res Comput Sci 4(3) 7. Sharma N, Yadav S, Bohra B (2016) Review on data encryption techniques used for social media on internet. Int J Adv Comput Eng Netw 4(9), ISSN: 2320-2106 8. Quist SC (2018) Data security and privacy in mobile technology: a case of Whatsapp web. Texila Int J Acad Res 5(1) 9. https://www.broadbandsearch.net/blog/most-popular-social-networking-sites 10. Ajayi PO, Ajayi LF (2020) Use of online collaborative learning strategy in enhancing postgraduates’ learning outcomes in science education. Educ Res Rev 15(8):504–510 11. Ali S, Islam N, Rauf A, Din IU, Guizani M, Rodrigues JJ (2018) Privacy and security issues in online social networks. J Future Internet 12. Anderson T, Challenges and Opportunities for use of social media in higher education. J Learn Develop 6(1):6–19 13. Al-kaabi AF, Effects of collaborative learning on the achievement of students with different learning styles at Qatar University (QU), PhD Thesis 14. Al Momani AM (2020) The effectiveness of social media application “telegram messenger” in improving students’ reading skills: a case study of EFL learners at Ajloun University College/ Jordan. J Lang Teach Res 11(3):373–378 15. Abdullah HM, Zeki AM (2014) Frontend and backend web technologies in social networking sites: Facebook as an example. In: 2014 3rd international conference on advanced computer science applications and technologies 16. Nishtala R, Fugal H, Grimm S, Kwiatkowski M, Lee H, Li HC, McElroy R, Paleczny M, Peek D, Saab P, Stafford D, Tung T, Venkataraman V, Scaling Memcache at Facebook. In: 10th USENIX symposium on networked systems design and implementation (NSDI ’13) 17. Gaspar D, Mabic M (2017) NoSQL databases as social networks storage systems. In: 2017 ENTRENOVA conference proceedings 18. Barrigas H, Barrigas D, Barata M, Furtado P, Bernardino J (2014) Overview of Facebook scalable architecture. ISDOC 2014, May 16–17, Lisbon, Portugal 19. Perera S, Fernando H, Investigation of social media security: a critical review 20. Rastogi N, Hendler J (2017) WhatsApp security and role of metadata in preserving privacy. J Cryptography Secur 21. Abeer AM, Maha H, Nada AS, Hemalatha M (2016) Security issues in social networking sites. Int J Appl Eng Res 11 ISSN 0973-4562

Smart Analytics System for Digital Farming K. Sumathi, Kundhavai Santharam, and K. Selvarani

1 Introduction This technological world had taken off from big data analytics and is now into smart data analytics era. In the event of taking big data analytics into consideration, it has assumed a lot of importance in every sector at a global level. When a massive amount of data is being collected by organizations on a daily basis, it is called big data analytics. When the data are refined and presented to decision-makers leading to more data efficiency, it is known to be smart data. This current study attempts to aid smart farming activities with the help of smart analytics system. In the study conducted by Kawthankar et al. [1], it is noted that the study aimed in providing a smart solution in the field of Medicare in India. The outcome of the work focused in paving a way for the physicians to analyze various records, disease predictions, and amalgamate other observations, respectively. Similarly, this proposed study attempts to aid farming automation with the advantage of either a matter of device management or connectivity management, and finally achieve better productivity with remote management system through smart analytics system. A novel methodology was put forth by Chetan Dwarkani et al. [2] with the goal of connecting the smart sensing system with the smart irrigation system. In this proposed work, the architecture that best suits the farming workspace and that results

K. Sumathi (B) Department of BCA, The American College (Autonomous), Madurai, Tamil Nadu, India e-mail: [email protected] K. Santharam Department of Business Administration, Kalasalingam Academy of Research and Education, Krishnan Koil, Tamil Nadu, India K. Selvarani Department of Horticulture, Kalasalingam Academy of Research and Education, Krishnan Koil, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_14

181

182

K. Sumathi et al.

in efficient farming activities with the help of smart analytics system is taken into consideration.

2 Need for the Study The point that is to be noted is that the lack of models that are available in terms of smart analytics system and appropriate structure formulation to ensure production rate and quality in the field of agriculture in specific is the motivation of this research. The same point is highlighted in the study conducted by Triantafyllou et al. [3]. The researchers have focused on the adoption of smart technologies using cloud computing and internet of things. The various layers of the architectural model were discussed as well. A survey-based study conducted by Xing Yang et al. (2021) [4] has proven results of development of three different modes of smart agricultural system. The main challenges that were highlighted were that the security challenges from two main perspectives namely agricultural production and information technology. In another work by Gangwar et al. [5], the agroecological resource management which disseminates the operational information relevant to precision farming was showcased. The challenges that were faced while the study was undertaken include bottleneck problems, and energy efficient data transmission that ensures prolonging of overall network lifetime performance. In this proposed study, the aspects from the perspectives of cloud, knowledge, and productivity basis are drawn, and a suitable architectural model of smart analytics is devised. In most of the research studies, a model that suits the client and server level architecture has been devised like in the study by Chan [6] where the architecture highlights framework that consists of real-time NoSQL database which leads to distributed processing from the client’s end to the server cluster’s which involves task and data. In this proposed study, the model framework highlights the cloudbased technology and with the insights on knowledge and business management level suiting to the societal requirement and insights. Similar work has been done by Kune et al [7], who focuses on the anatomy of big data computing. The need and importance of big data analytics was highlighted with the advent of massive data that need to be processed in a systematic manner by allowing users to be at ease while dealing with the disseminated form of data.

3 Related Works The relevant literary works are showcased here, and the various researches related to the current study mainly on those aspects, such as the introduction of big data analytics and its transformative impact across all industries, the smart way of handling challenges faced by agriculturalists and especially effective ways of farming with the use of technological innovations and trends, digital farming techniques that exist and

Smart Analytics System for Digital Farming

183

their advantageous usage, and finally smart analytics system that ensures effective productivity in farming technology, are highlighted. Fugini et al. [8] have presented the big data approach in a project SIBDA (Sistema Innovativo Big Data Analytics). The IoT sensor networks are utilized in order to implement the designed architecture and provided with supportive points based on their experiments conducted and consideration for using the same in smart cities and smart enterprises and communities. In this proposed study, the focus is specific to the farming stakeholders in the agricultural sector. The use of smart analytics system in enhancing the farming outcome is considered as insights to those who implement and for future users and relevant stakes. Wolfert et al. [9] have elaborately discussed about the smart farming system under result section and the various aspects such as drivers, pull factors, farm process, farm management, network management, and the technology involved. In this current study, the possibility of smart analytics system with a suitable architecture that would suit the farming technology has been explored. The focus of the current study is on the outcome or results that the farmers and stakes who would be the beneficiary. While analyzing the specific factors related to external environment, the main factor is the risk factor in farming. Other factors include coping and adapting strategies based on the climatic conditions which also becomes crucial in farming arena. The farmers find it difficult to manage the strategies based on the varied climatic conditions on one hand and the climatic conditions on the other hand. In this preview, the agrienvironmental indicators need to be monitored as a major strategy to address the issue(s) of external factors. The evidence of smart farming as an instance of key to developing sustainable agriculture was discussed by Walter et al. [10]. It was demonstrated that with the help of technical revolutions, generations of disruptive changes in agriculture field are possible. Similarly, Dagar et al. [11] showcased the possibility of implementing IoT in the field of agriculture for better crop management, resource management, cost-efficient way of evolving resources in agriculture, and improved crop quality after analyzing farmers’ problems. In this proposed study, the data analytics system also ensures that the improvement in resources being evolved by the farmers is experienced with the model created suiting to their needs. Alfred et al. [12] discussed the impact of machine learning, big data, and IoT, particularly in achieving production outcomes in the agricultural sector, by devising a suitable framework that maps the gaps and achieves results that are positively related to the production outcomes and postproduction outcomes of paddy rice process. In addition, the challenges faced at varied stages of process have also been elaborated in an effective manner. Hence, keeping the basis of the gaps and the challenges faced by the farming stakes, the current study effectively has tapped the need for a model that tries to eradicate the challenges to certain extent. Balducci et al. [13] have explained how the information is exploited from the sensors or intelligent systems or IoTs which are received from various sources. Hence, this work paves way for the proposed work to detect the transformation of data into formulating an effective model of agriculturists.

184

K. Sumathi et al.

This study involved identification of farming challenges like in the study conducted by Stringer et al. [14], where the challenges were related to soil, farming, land management, and devising a model for ensuring sustainable technologies. In this proposed study, the three layers suggested in the model have been formulated keeping the challenges of farming in mind. In the current study, the challenges and key issues faced in agricultural farming are studied, as in the work by Elijah et al. [15] which contributed in terms of including various aspects such as land or crop management, and types of agricultural technologies that can be utilized with IoT and data analytics to resolve the key challenges and issues in farming. Opportunities of exploring the possibilities of using IoT with data analytics to resolve farming issues, current trends, and future trends in developing an IoT ecosystem and creating a sustainable environment in the agricultural field have been explored. There are other related studies that involve resolving farming issues using smart technologies particularly data analytics such as the work conducted by Sumathi et al. [16]. It discussed how to convert unstructured/unformatted data into structured/ formatted data before feeding it into machine learning algorithms. The study had indicated a pathway for future research in terms of smart and intelligent agriculture. Similar to that study, the current study focusses on the identifying the issues related to farming and devising an architecture that best suits the stakes in agricultural field. A study by Coelli and Battese [17] highlighted the factors that are instrumental to hinder the production in agricultural farming. The stochastic frontier production function was used by means of incorporating a model on the effects of technical inefficiency. In this proposed study, the data from the climatic forecasting aired on the governmental link on weather forecast for farmers are taken and fed to test the conditions of varying weather conditions and the conditioned prompting that would occur and be communicated to the farmers so that they can effectively increase the production conditions. Similarly, there are socioeconomic factors which need to be considered while discussing about the farming production. Assogbadjo et al. [18] concentrated on biodiversity and socioeconomic factors that play a supportive role for the farming members. The case of farmers’ choice of wild edible trees was considered. Especially the study has been conducted with the questionnaire tool through a field exploration and semi-structured survey. The study is taken into consideration as it validates the point about the end outcome or the socioeconomic effects. Hence, the proposed study also aims in reaching the end result of taking into consideration the socioeconomic effects among the farming members. The study by Defrancesco et al. [19] postulates on the agricultural measures based on the issues. The rationale of the farmers and their decision were discussed elaborately. In this proposed study, the rationale of farmers and their decision are given due importance while also the choice of the irrigation system, the choice of crops, etc.

Smart Analytics System for Digital Farming

185

4 Proposed Work In the process of farming, the first and foremost thing is to understand the aspects relevant to it. Hence, in this study, the aspects relevant to farming were formulated by the researchers and then disseminated suitably and then formulated into three types of layers. Hence, the architectural modeling was done with the basis of three main aspects which are layered one after the other. As indicated, proposed architecture is divided into three layers, namely data consumption, analytics, and end user. In this study, the data-gathering context is included with the basis of climatic condition forecasting, the user who is the farmer in the study and their previous beliefs or practices, and ecosystem-based strategy suggestion to farmers; and with this, as the basis of the three-layer based suggestion, the work is proposed. In this study, the proposed model ensures in resolving the performance issues among the farmers while considering the factors to be considered which are discussed in the literary review. The coinciding measures that align with the study include the identification of those factors that hinder production which would ensure effective way of pitching in the proposed model, socioeconomic factors and the relative outcome, the sustainability factor of the proposed model, IoT system and its usage, data transformation and formulation of effective model, respectively. An attempt is made in analyzing all the factors, and the most crucial factor notable is the pitching in appropriate system of farming and that depends on the decision of the farmer. In this study, the farmers are supported with the proposed model enabling them to take decisions in an effective way so as to ensure the best outcome. The agro-economical factors analysis suggested by various researchers would help in devising a suitable decision support system so that the environmental factors which is the main external factor such as climatic deviations are forecasted ahead of time by farmers and prior measures are adopted. Figure 1 shows the working process of smart analytics system for digital farming. Data Consumption Layer: This is the first layer and responsible for acquiring data from various data sources such as IoT devices mounted on the field and crop, weather forecast database, farmers database, eco-partner databases and cloud, and converting those into a format suitable for data analysis. It also integrates data from various sources to provide the unified data required by machine learning algorithms. Analytics Layer: This layer provides various operations required to modularly realize the solution. This layer is in charge of providing all necessary information to the appropriate end users. This layer performs the necessary preprocessing, which includes the removal of noisy and irrelevant data. Agriculture and an external intelligent dataset provide the necessary data. Solutions such as normalization, machine learning algorithms, cognitive computing, and benchmarking may be used for data transformation and analytics. To consolidate and integrate the data, Python and Google Tensor flow are used. Lined open data and cloud-based wireless platforms are employed for data transfer. This layer provides essential services to farmers, peripheral partners, and the general public. The analytics layer employs a set of machine learning algorithms that are executed.

186

K. Sumathi et al.

Fig. 1 Smart analytics system for digital farming

End-User Layer: This layer facilitates farmer enrollment in ecosystems by providing advisory services such as pest infestation corrective actions and drought management tips. Delivering all these solutions in real time via mobile devices that can be consumed anywhere, at any time, as well as facilitating partner ecosystem enrollment into the system with their details, constraints, and limitations, such as weather forecast, pesticide vendors, irrigation partners, and workforce providers, and notifying other participants to stock up on pesticides or agricultural equipment based on weather forecasts and crop types are facilitated. Farmers can use open data services to share the best harvesting time with their agri-marketing partners in order to project a good purchase price and cut out the middleman. This study proposes a data analytics architecture built with a leveraging effect on modern digital technologies to create a positive impact on the farmer fraternity by bringing all supporting elements together in one place and offering insights throughout crop cultivation. The five phases of the proposed data analytics architecture are data gathering, preparation, contextualization, insights, and communication with relevant stakeholders. Data can be collected from stakeholders such as farmers, pesticide vendors, and irrigation partners, as well as field sensors and national information centers. Sensors mounted on crops or in isolated locations below and above ground level can collect relevant data on soil moisture, leaf wetness, solar radiation, soil temperature, wind direction, and greenness, rainfall level, and other variables. These sensors can work in tandem with a Global Positioning System.

Smart Analytics System for Digital Farming

187

Data are also collected and stored from weather forecasts, pesticide vendors, irrigation partners, and workforce providers. Data can be collected and stored in the cloud using base stations and appropriate application programming interfaces. Preprocessing is performed by removing noisy and irrelevant data. Required data from agriculture and external intelligent datasets are extracted and standardized into a format suitable for further processing. To consolidate and integrate the data, Python and Google Tensor Flow are used. The usage of feature engineering is mainly to create and host machine learning models that crunch data. When the farmers are given inputs in line with the development of this analytics platform by big data analytics, cloud computing, and advanced machine learning techniques, it would become helpful for them to get better produce. The systematic way of solution finding through normalization, machine learning algorithms, cognitive computing, and benchmarking may be used for data transformation and analytics. Machine learning models ensure in giving us with insights that are in an enhanced manner and that are also useful to farming communities, such as guidance related to farming tips that ensure high-yield crops, irrigation, right actions to be performed during pest attack, right time for sowing and harvesting, and so on. The stage of collaboration tends to occur when the system receives analysis results and passes them on to the appropriate output layer. The system provides APIs for farmers and partners, as well as data services for effective collaboration within the farmer ecosystem. The outcome is next to be visualized, and this is where the analytical layer becomes critical or difficult. The main purpose of data visualization techniques is used to present the results of the analytical layer, which will help one to market in an agricultural market. The system sends alerts to other participating partners to stockpile pesticides or agri-equipment based on weather forecasts and crop nature, shares the appropriate harvesting time with agri-marketing partners to project a good purchase price, and delivers all of these solutions in real time, anywhere, and at any time consumption enabled mobile devices. All related stakes such as farmers, partners, and others would benefit from the result of this collaborative data analytics architecture: increased farmer awareness of market demand and crop selection insights by considering various parameters such as season, soil conditions, weather forecasting, and market demand for produce, usability in terms of crop spacing, watering techniques, and moisture control, advice on reusing water, resources, and plant products, and so on. Accurate forecasting and the use of optimization techniques to manage their businesses will benefit pesticide and agro-equipment service providers. Figures 2 and 3 shows a sample plots for wind speed and humidity. The level of humidity will be high if there is a lot of water vapor in the air. The higher the humidity, the more soaked it feels outside. In weather reports, relative humidity is commonly used to describe humidity. Accurate wind speed prediction improves wind power generation planning, lowering costs, and optimizing resource use. When retrieving the weather forecast, the Timeline Weather API seamlessly combines historical and prediction data in a single API request; historical weather data may be added by simply expanding the API query. An accurate wind speed

188

K. Sumathi et al.

Fig. 2 Plot of wind speed days

Fig. 3 Plot of humidity days

prediction can lower costs and improve resource utilization. Agriculture and farming rely heavily on nature and the seasons. Temperature is critical in the cultivation of various fruits, vegetables, and pulses. Farmers had to rely on estimates to do their jobs in the past because they lacked a better understanding of weather forecasts. They do, however, experience losses from

Smart Analytics System for Digital Farming

189

Fig. 4 Weather prediction report 1

time to time due to inaccurate weather forecasts. Farmers will now be able to get all the weather forecasts on their smartphones using this analytics system. Figures 4 and 5 show the weather forecast for the next three days. Using this proposed system, farmers and ecosystem partners will be able to work together more effectively to provide services and guidance. The middleman culture in agriculture is eliminated, ensuring harmony among farmers and their lives. Farmers are given high-precision decisions and actions to maximize crop productivity, as well as good offers to sell their produce at higher prices. Irrigation can be used intelligently by identifying soil moisture contents using geographical maps, weather forecasting, and intelligence gathered through sensors deployed.

190

K. Sumathi et al.

Fig. 5 Weather prediction report 2

5 Conclusion A smart analytics system for digital farming is proposed in this research. Through the suggested analytics system, farmers in rural areas can use this platform to receive farming advisory services on personalized basis that would be given to them (for instance, monitoring system of crop, choices of pesticides, best practices in harvesting, seeds that are best sowed based on internal soil features, and tips on irrigation). Farmers can enhance the farming activities with the help of analytics system. Support partners send necessary alerts to help farmers at right time. Farmers and partners will benefit from the following aspects as a result of this collaborative data analytics architecture. The study also paves way to farmers in terms of better awareness about the existing demand/prevailing demand in market and also choice of crop(s), and other related insights may be achieved by considering important parameters like weather or season changes, soil conditions based on the weather, forecasting of weather changes if any, produce market demand, analyzing the usability in terms of crop spacing, watering techniques, and moisture control, advice on reusing water, resources, and plant products, and so on. Accurate forecasting and the use of optimization techniques to manage their businesses will benefit pesticide and agro-equipment service providers.

Smart Analytics System for Digital Farming

191

References 1. Kawthankar S, Joshi R, Ansari E, D’Monte S (2018) Smart analytics and predictions for Indian Medicare. In: 2018 International conference on smart city and emerging technology (ICSCET), 2018, pp 1–5. https://doi.org/10.1109/ICSCET.2018.8537383 2. Chetan Dwarkani M, Ganesh Ram R, Jagannathan S and R. Priyatharshini, “Smart farming system using sensors for agricultural task automation,” 2015 IEEE Technological Innovation in ICT for Agriculture and Rural Development (TIAR), 2015, pp. 49–53, https://doi.org/10.1109/ TIAR.2015.7358530 3. A. Triantafyllou, D. C. Tsouros, P. Sarigiannidis and S. Bibi, An Architecture model for Smart Farming. In: 2019 15th International conference on distributed computing in sensor systems (DCOSS), 2019, pp 385–392. https://doi.org/10.1109/DCOSS.2019.00081 4. Xing Yang, Lei Shu, Jianing Chen, Mohamed Amine Ferrag, Jun Wu, Edmond Nurellari and Kai Huang (2021) A survey on smart agriculture: development modes, technologies, and security and privacy challenges. IEEE/CAA. J Autom Sin 8(2):273–302. https://doi.org/10.1109/JAS. 2020.1003536 5. Gangwar DS, Tyagi S, Soni SK (2019) A conceptual framework of agroecological resource management system for climate-smart agriculture. Int J Environ Sci Technol 16:4123–4132 6. Chan JO (2013) An architecture for Big Data analytics. Commun IIMA 13(2), Article 1 7. Kune R, Konugurthi PK, Agarwal A, Chillarige RR, Buyya R (2016) The anatomy of big data computing. Softw Pract Exper pp 46:79–105. https://doi.org/10.1002/spe.2374 8. Fugini M, Finocchi J, Locatelli P (2021) A Big Data analytics architecture for smart cities and smart companies. Big Data Res 24, Art. No. 100192 9. Wolfert S., Ge L., Verdouw C., Bogaardt M.-J.(2017), ‘Big Data in Smart Farming – A review’, Agricultural Systems, Vol: 153., Pp.69–80. 10. Walter.A,Finger.R, Huber.R&Buchmann.N., (2017), ‘Smart farming is key to developing sustainable agriculture’, PNAS., Vol:114 (24)., Pp:6148–6150 11. R. Dagar, S. Som and S. K. Khatri., (2018)„ “Smart Farming – IoT in Agriculture,” International Conference on Inventive Research in Computing Applications (ICIRCA), Pp.1052–1056 12. Alfred R, Obit JH, Chin CP-Y, Haviluddin H, Lim Y (2021) Towards paddy rice smart farming: a review on Big Data, machine learning, and rice production tasks. IEEE Access (9):50358–50380 13. Balducci F, Fomarelli D, Impedovo D, Longo A, Pirlo G (2018) Smart farms for a Sustainable and optimized model of agriculture. In: AEIT international annual conference, pp 1–6 14. L. C. Stringer, L. Fleskens, M. S. Reed, J. de Vente, M. Zengin., (2013),’Participatory Evaluation of Monitoring and Modeling of Sustainable Land Management Technologies in Areas Prone to Land Degradation, Environmental Management, Vol: 54(5)., Pp:1022–1042 15. Elijah O, Rahman TA, Orikumhi I, Leow CY, Hindia MN (2018) An overview of Internet of Things (IoT) and data analytics in agriculture: benefits and challenges. IEEE Internet Things J 5 (5):3758–3773 16. Sumathi K, Kundhavai S, Selvalakshmi N (2018) Data analytics platform for intelligent agriculture. In: proceedings of the 2018 2nd international conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), pp 647–650 17. Coelli TJ, Battese GE (1996) Identification of factors which influence the technical inefficiency of Indian farmers. Aust J Agric Econ 40 (2):103–128 18. Assogbadjo AE, GlèlèKakaï R, Vodouhê FG (2012) Biodiversity and socioeconomic factors supporting farmers’ choice of wild edible trees in the agroforestry systems of Benin (West Africa). J Sci Direct 14(1):41–49 19. Defrancesco E, Gatto P, Runge F, Trestini S (2008) Factors affecting farmers’ participation in agri-environmental measures: a Northern Italian perspective. J Agric Econ 59(1):114–131

Sarcasm Detection for Marathi and the role of emoticons Pravin K. Patil

and Satish R. Kolhe

1 Introduction Sarcasm detection is an important problem to solve in almost every language, due to the complexity and subjectivity of identifying the true sentiment of the author or speaker. Social media platforms have become popular forums for discussing ideas and interacting with people, and Twitter has become one of the biggest web destinations for users to express their opinions and ideas. However, it is difficult to understand users’ opinions and conduct analysis due to the informal language and character limits. The interactions on Twitter are imbued with strong emotions on various topics, which can serve as a basis for detecting sarcasm. Sarcasm is a common form of communication that relies on the use of irony to express the opposite of what is said. Detecting sarcasm is a challenging task for natural language processing algorithms, as it often requires understanding the context and tone of the message. However, the use of emojis may provide additional clues for detecting sarcasm, as they can convey raw emotions and sentiments. In recent years, researchers have explored the potential of utilizing emojis as a relevant feature for sarcasm detection. By analyzing the use of emojis in combination with other linguistic and textual features, researchers aim to improve the accuracy of sarcasm detection algorithms. English is one of the most studied languages for detecting sarcasm, but there are also attempts to approach this problem in other popular foreign languages such as

Supported by SOCS,KBC NMU Jalgaon, Maharashtra, India. P. K. Patil (B) · S. R. Kolhe School of Computer Sciences, Kavayitri Bahinabai Chaudhari North Maharashtra University, Jalgaon, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_15

193

194

P. K. Patil and S. R. Kolhe

Chinese [1], Czech [3], Dutch [4], Italian [2], Indonesian [5], Hindi, Bengali, and Tamil [6–10]. Marathi quantifies to be the third largest native speaking population in India [23] and is one of the most popular indigenous languages in India due to its morphological diversity. Therefore, it becomes important to study and explore the role that sarcasm plays in the context of Marathi communication, and develop efficient sarcasm detection algorithms focused on Marathi language. In this work, ‘MarathiSarc’ [29], a well-curated dataset of Marathi scripted tweets is used, with corresponding annotations for classifying each tweets as sarcastic or non-sarcastic. The importance of features extracted using emojis on this dataset is explored and a comparative analysis is presented clearly showing the efficacy in terms of performance improvement. Our major contributions include: • Study of the role of emojis in tweets in the context of sarcasm detection for Marathi language, and analysis of the efficacy of using emojis as features fed to machine learning algorithms. • A comparative analysis of combining textual features, extracted using different techniques, with emojis is performed. The consistent improvement obtained by adding emoji as a feature is demonstrated. • Experiments with different machine learning classifiers for checking the consistency of performance improvement obtained by using a combination of textual features and features extracted using emojis.

2 Related Work The ample availability of large annotated datasets in English language allows researchers to examine and discover superior strategies for sarcasm detection. Recently, there has been a strong focus toward exploring the problem and publishing the datasets for non-English languages as well. Lieu et al. [1] proposed a Chinese dataset and accomplished social media sarcasm detection by dealing with the data imbalance problem. The authors first have a look at the capabilities of English and Chinese sarcasm and introduce a few capabilities mainly for detecting sarcasm on social media and proposed a unique approach to cope with the imbalance problem. Ptacek et al. [3] created a big Czech Twitter corpus of 7000 manually tagged tweets and made a primary attempt to explore sarcasm in Czech. Babiette et al. [2] provides a corpus of 25,450 Italian tweets, which include satirical and non-sarcastic tweets. The satirical tweets have been gathered from posts of Spinozait and LiveSpinoza Twitter bills, that are famous Italian blogs for satirical political posts, whereas the non-sarcastic tweets have been gathered from the Twitter bills of a few distinguished Italian newspapers. Lunando et al. [5] proposed extra capabilities to detect sarcasm with the help of a variety of bad messages and interjections. The authors extensively utilized translated SentiWordNet in sentiment category.

Sarcasm Detection for Marathi and the role …

195

Swami et al. [11] provided a blended English-Hindi code dataset of tweets with relevant sarcasm tags, and every tweet additionally tagged with a language. The authors also performed a set of supervised baseline experiments on the dataset. Charalampaki et al. [13] proposed a dataset of 61,427 Greek tweets associated with the Greek election. Librecht et al. [4] demonstrated sarcasm detection on a corpus of 78,000 Dutch tweets having a Dutch phrase for sarcasm. As far as sarcasm detection in English is concerned, Khodak et al. [14] offered a massive corpus of 1.3 million tweets with writer comments, inclusive of the topic, user, and context of the verbal exchange for every tweet. This dataset proves to be beneficial for both balanced and imbalanced classification. Filatova [15] designed a test to create a corpus of satirical and non-sarcastic tweets of Amazon merchandise and services. The authors additionally completed the qualitative and quantitative evaluation of the corpus. Abercrombie and Hovy [16] created an imbalanced dataset of 2240 manually annotated Twitter conversations. Hindi is also considered to be one of the low-resourced Indian regional language. Researchers scrambled social media texts for language anglicization of Hindi (transliteration primarily based totally on pronunciation in preference to meaning) [22]. Research on simple Hindi texts is very limited. One of the seminal work mentioned by Desai and Dave [31], where the authors created a dataset of satirical Hindi sentences and used numerous lexical features (e.g., emoji, punctuation polarity lists) to train an SVM classifier to classify sentences into five classes of various sarcasm. Bharti et al. [17] proposed a pattern-based framework that exploits the contradiction among temporal information and corresponding tweets to predict sarcasm in Hindi tweets. Katyayan et al. [26] additionally tested Hindi sarcasm detection on sentences extracted from social media systems inclusive of Facebook, Instagram, and Twitter. The authors makes use of POS taggers and bag-of-phrases strategies for feature extraction, and examines the performance of two classifiers, particularly Naive Bayes and SVMs. As far as Marathi language is concerned, Kulkarni et al. presented L3CubeMahaSent [12],which is a dataset consisting of Marathi tweets extracted from twitter accounts of maharashtrian politicians to perform the task of sentiment analysis. The ironic nature of text can trick both the humans and machine classifiers since it expresses the opposite of its literal meaning. To detect sarcasm, numerous researchers have experimented with using sentiment analysis and emojis. Subramanian et al. [25] trained a deep learning model using word and emoji embeddings for sarcasm detection. Pamungkas and Patti [26] made use of structural features, such as presence of hashtags, links, emojis, quotes, for sarcasm detection. Lemmens et al. [27] used an ensemble approach to train the classifier so as to classify the tweets as sarcastic or non-sarcastic. Sundararajan and Palanisamy [28] proposed a rule-based classifier with 20 different features. They observed that the features based on sentiment can predict sarcasm better in combination with contradictory features. Kumar et al. [24] demonstrated the use of a hybrid deep learning model trained using two embeddings, namely word and emoji embeddings to detect sarcasm. To the best of knowledge, this is the first work that studies the efficiency of using features extracted from emojis for the task of sarcasm detection for Marathi language.

196

P. K. Patil and S. R. Kolhe

3 Dataset and Annotation Twitter is a popular platform for scraping data to study sarcasm detection in lowresource languages like Marathi. Users post tweets about several topics and express themselves using different markers like emoticons and hashtags while scripting the tweets in their regional languages. To leverage the richness of information available in tweets dataset, ‘MarathiSarc’ is used which is a mono-lingual well-curated dataset consisting of a collection of tweets scripted in Marathi language focused for the study of sarcasm detection. The tweets are scraped using a filtering mechanism on the basis of language and a list of relevant hashtags that can potentially indicate a sarcastic tweet. The time period for the entire corpus is from December 2011 to September 2022. The authors of the dataset follow a two-stage approach for annotating the collected tweets–hashtag-based filtering and manual labeling. Instead of completely relying on hashtags for calling a tweet as sarcastic or not, the authors suggest that a manual annotation performed using a clearly laid policy is necessary to avoid making mistakes in the tweet classification. Table 1 shows examples of sarcastic tweets with their corresponding English translations. Table 2 shows examples of non-sarcastic tweets with their corresponding English translations. To ensure uniformity and consistency in labeling, the authors use specific criteria for annotating tweets collected through hashtag-based filtering. The entire dataset is manually divided into three categories as follows: – Tweets that depict a sarcastic intent, irrespective of the core sentiment, intensity or polarity of the author are labeled as sarcastic. – Tweets that do not clearly depict any sarcastic intent by the author–either from the words, emoticons, punctuation marks or any other markers–irrespective of the core sentiment, intensity or polarity of the author are labeled as non-sarcastic.

Table 1 Examples of sarcastic tweets

Sarcasm Detection for Marathi and the role …

197

Table 2 Examples of non-sarcastic tweets

– Tweets that can potentially fall into sarcastic category but need further conversational context to make this decision, are labeled as unsure.

3.1 Distribution of Tweets The class-wise distribution of the dataset is depicted in Fig. 1 and the exact numbers are presented in Table 3

Fig. 1 Class distribution in the dataset

198

P. K. Patil and S. R. Kolhe

Although the dataset consists of tweets categorized into three classes, the problem is posed as a binary classification task to detect the presence or absence of sarcasm in a tweet. Hence, the categories labeled as sarcastic and non-sarcastic are only considered as a part of the annotations provided with the dataset.

3.2 Emoji Analysis A user expresses the emotions while writing the tweet in the form of emojis. These emojis can be seen from two perspectives–type and count. Every emoji has a specific corresponding intent or emotion associated with it, hence certain pattern of occurrence of the emoji can be expected which can help to identify a correlation with the presence of sarcasm. Also, the intensity of the emotion can be quantified with the frequency of the emojis in a tweet. All the emojis present in every tweet of the dataset are extracted. Also the occurrence of emojis in the tweets is studied and the emojis are ranked in the order of number of occurrences in the tweet. The most frequently occurring emojis in sarcastic tweets are identified. Figure 2 shows the top 10 frequently occurring emojis from the identified sarcastic tweets. From the statistics, it is evident that most of the sarcastic tweets are associated with the emojis depicting a laughing or happy emotion. The emojis follow a long-tailed distribution, thus indicating that a certain set of emojis dominate the sarcastic class of tweets.

4 Pre-processing of Tweets To transform the raw tweets into usable format, some standard as well as task-and language-specific pre-processing steps are performed to discard irrelevant components in the tweet, and only retain information that can help for better feature analysis. Table 3 Number of samples in each category

Category

Count of tweets

Sarcastic hline Non Sarcastic hline Unsure hline Total Tweets

758 2122 263 3143

Sarcasm Detection for Marathi and the role …

199

Fig. 2 Top 10 frequently used emojis

4.1 Cleaning of Tweets The tweets are cleaned to discard the components that do not add any significant value toward understanding the sarcastic intent behind the tweets using the following steps – Removal of stopwords: There are many stop words and thus provide little unique information that can be used for classification. Therefore, the filtered tweets are cleaned to remove stop words by using an extensive list of publicly available Marathi stop words. All words in a given tweet are compared to words in the provided stopwords list. If a tweet contains words from the stopwords list, the words will be removed. – Removal of Twitter handles, tags and hyperlinks: The name of the Twitter handle is not important for discerning the irony of a tweet. Additionally, most tweets contain some HTML tags and hyperlinks to other tweets or related information. These HTML tags and hyperlinks do not contribute anything to the semantics of the tweet. Therefore, if such components exist as part of the Tweet, will be simply removed. – Removal of digits and special characters: Special characters and numbers often provide context-sensitive information, but not emphatically clarifying whether a tweet is meant to be sarcastic. Also, it has been observed that numbers are usually written or spelled in English and Marathi thus removing any form of special characters and numbers. – Removal of hashtags and emoticons: Hashtags are used to filter relevant tweets. Also, it does not intend to use hashtags, as the hashtags can orient the proposed

200

P. K. Patil and S. R. Kolhe

algorithm to a specific context or topic. Likewise, emojis were removed from tweets in their original sense, although later used to provide a richer semantic representation of tweets.

4.2 Tokenization The preprocessed cleaned tweets are then used for further analysis. To obtain tokens,the tokenizer provided by iNLTK [18] is used, which has been built by creating a sub-word vocabulary for Marathi language. This tokenizer is essentially a SentencePiece tokenization model [19] trained on a large corpus of Marathi Wikipedia articles

5 Feature Extraction and Sarcasm Detection To enrich the textual features with additional semantic information, proper and meaningful feature extraction plays a major role in improving performance of any algorithm. Along with language dependent features, there are also some important language-independent features play a significant role, especially in the case of small of imbalanced datasets. Emojis is one such feature that is language independent and occurs in most of the tweets in the dataset.

5.1 Sentence Embedding Using Language Model For rich encoding of the text, there has been an upsurge in using pre-trained language models for getting feature vectors. ULMFit [20], which is an LSTM-based model, is used as one of the language encoding models for our experiments. A ULMFit model pre-trained for Marathi language on Marathi Wikipedia articles has been made publicly available by iNLTK.

5.2 Embeddings for Emojis in the Tweet In every tweet, emoticons are crucial components as emoticons clearly convey the emotion or the feelings. Every emoticon has a semantic meaning attached to it. Therefore it is very important to embed it without losing its relationship with others. Emojis are extracted from tweets using a pre-defined list of emojis. If a particular emoji occurs multiple times in a tweet, its number of occurrences are retained so that the intensity of the emotions presented by the user is not lost. Emoji2vec [21]

Sarcasm Detection for Marathi and the role …

201

Table 4 Results for machine learning classifiers with different feature embeddings Models Textual Textual + Emojis AUROC F1-score AUROC F1-score Logistic regr. Random forest SVM-RBF XGBoost

0.68 0.71 0.72 0.73

0.61 0.46 0.64 0.62

0.81 0.83 0.82 0.84

0.70 0.62 0.71 0.71

The metrics in bold indicate that the best results are obtained when XG Boost classification algorithm is used with a combined feature embeddings of textual and emoji representations

pre-trained model, based on word2vec embedding is used to get the encoding for emoticons in the tweet. Every emoji is encoded into a fixed size embedding which then is used as feature for classification.

6 Experiments and Result Analysis Word embeddings are used to learn the representations of each word, and emoji embeddings to learn complex sentiments in the sentence that are not easily learned by word embeddings alone. A combination of these embeddings is used to prove the efficacy of using an ensemble of features for accurately capturing the sarcastic intent in a tweet. Classification experiments are performed using the four classical machine learning models–Gradient-boosted decision tree (XGBoost [30]), Random Forest, Logistic Regression and SVM with radial basis function (RBF) kernel, and their performance is compared based on AUROC and F1-macro metrics. The performance for each of these algorithms is shown in Table 4. It is observed that gradient-boosted decision tree (XGBoost) model outperforms the other classification models. Most importantly, the performance of all the classifiers has consistent improvements with emoji usage for feature extraction. This can be reasoned with the strong correlation of the occurrence and frequency of emojis in sarcastic tweets, which when combined with contextual word embeddings can enhance the classification performance of the machine learning models. In a nutshell, emojis indeed turn out to be important features for sarcasm detection in Marathi language tweets.

7 Conclusion Sarcasm detection for low-resource languages like Marathi is relatively a challenging and open problem, and we focus on one of the important questions toward that in this work. While posting micro-blogs like tweets, users provide a lot more information

202

P. K. Patil and S. R. Kolhe

in tweets than just the text. The additional information is usually overlooked and we miss on some important components that can help us in analyzing the tweets. Emojis, being one such component present in the tweets, can help us know the semantic meaning or intent of the author behind writing that tweet. The role of emojis in the context of sarcasm and their impact on the performance of machine learning models for detecting sarcasm is studied in the context of Marathi language. Classification experiments are performed and a comparative analysis of the impact on performance when a combination of textual features and emoji-based features are used for training the machine learning models is presented. It is clearly demonstrated that the use of emoji-based features consistently improve the performance over different classification algorithms, thus indicating the efficacy for the task of sarcasm detection. As a part of future work, the intend is to explore different techniques of generating emoji embeddings to best capture the contextual representation with respect to the text in a tweet. Also other important components in a tweet apart from emojis that can help enrich the feature representation can be explored and thus improve the sarcasm detection accuracy for Marathi tweets.

References 1. Liu P, Chen W, Ou G, Wang T, Yang D, Lei K (2014) Sarcasm detection in social media based on imbalanced classification. In: Li F, Li G, Hwang Sw, Yao B, Zhang Z (eds) Webage information management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_49 2. Barbieri F, Ronzano F, Saggion H (2014) Italian irony detection in twitter: a first approach. In: Basile RA (eds) The first italian conference on computational linguistics cLiC-it 2014 and the fourth international workshop EVALITA, Italy, pp 28-32 3. Ptacek T, Habernal I, Hong J (2014) Sarcasm detection on czech and english twitter. In: Oden JT,JH (eds) Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland, August 23–29, pp 213-223 4. Liebrecht C, Kunneman F, van den Bosch A (2013) The perfect solution for detecting sarcasm in tweets #not. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, Atlanta, Georgia. Association for Computational Linguistics, pp 29–37 5. Lunando E, Purwarianti A (2013) Indonesian social media sentiment analysis with sarcasm detection In: Proceedings of the international conference on advanced computer science and information systems (ICACSIS), Sanur Bali, Indonesia, pp 195–198 6. Kulkarni DS, Rodd SS (2022) Sentiment analysis in hindi-a survey on the state-of-the-art techniques ACM transactions on Asian and low-resource language information processing vol 21(1). pp 1–46. https://doi.org/10.1145/3469722 7. Braja P, Dipankar D, Amitava D (2018) Sentiment analysis of code-mixed Indian languages: an overview of SAIL_code-mixed shared task @ICON-2017 8. Akhtar MS, Kumar A, Ekbal A, Bhattacharyya P (2016) A hybrid deep learning architecture for sentiment analysis. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, Osaka, Japan, pp 482–493 9. Mukku SS, Mamidi R (2017) ACTSA: annotated corpus for telugu sentiment analysis. In: Proceedings of the first workshop on building linguistically generalizable NLP systems, Copenhagen, Denmark, Association for Computational Linguistics, pp 54–58

Sarcasm Detection for Marathi and the role …

203

10. Ravishankar N, Raghunathan S (2017) Corpus based sentiment classification of tamil movie tweets using syntactic patterns. IIOAB J: A J Multidisciplinary Sci Technol 8(2):172–178 11. Swami S, Khandelwal A, Singh V, Akhtar SS, Shrivastava M (2018) A corpus of english-hindi code-mixed tweets for sarcasm detection. In: The proceedings of 19th international conference on computational linguistics and intelligent text processing (CICLing-2018) 12. Kulkarni A, Mandhane M, Likhitkar M, Kshirsagar G, Joshi R (2021) L3CubeMahaSent: a marathi tweet-based sentiment analysis dataset. In: The Proceedings of the 11th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 213–220 13. Charalampakis B, Spathis D, Kouslis E, Kermanidis K (2016) A comparison between semisupervised and supervised text mining techniques on detecting irony in greek political tweets, Eng Appl Artif Intell 51:50–57. ISSN 0952-1976. https://doi.org/10.1016/j.engappai.2016.01. 007 14. Khodak M, Saunshi N, Vodrahalli K (2018) A large self-annotated corpus for sarcasm. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), Miyazaki, Japan, European Language Resources Association (ELRA) 15. Filatova E (2012) Irony and sarcasm: corpus generation and analysis using crowdsourcing. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12), Istanbul, Turkey, European Language Resources Association (ELRA), pp 392–398 16. Abercrombie G, Hovy D (2016) Putting sarcasm detection into context: the effects of class imbalance and manual labelling on supervised machine classification of twitter conversations. In: Proceedings of the ACL student research workshop, Berlin, Germany, Association for Computational Linguistics, pp 107–113 17. Bharti SK, Babu KS, Jena SK (2015) Parsing based sarcasm sentiment recognition in twitter data. In: The proceedings of IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), Paris, France, pp 1373–1380 18. Arora G (2020) inltk: natural language toolkit for indic languages. In: The proceedings of second workshop for NLP open source software (NLP-OSS), Virtual Conference, pp 66–71 19. Kudo T, Richardson J (2018) SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, Brussels, Belgium, Association for Computational Linguistics pp 66–71 20. Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Melbourne, Australia, Association for Computational Linguistics, pp 328–339 21. Eisner B, Rocktäschel T, Augenstein I, Bošnjak M, Riedel S (2016) Emoji2vec: learning emoji representations from their description. In: Proceedings of the fourth international workshop on natural language processing for social media, Austin, TX, USA, Association for Computational Linguistics, pp 48–54 22. Jain D, Kumar A, Garg G (2020) Sarcasm detection in mash-up language using soft attention based bi-directional LSTM and feature-rich CNN, Appl Soft Comput 91:106198. ISSN 15684946. https://doi.org/10.1016/j.asoc.2020.106198 23. https://censusindia.gov.in/nada/index.php 24. Kumar A, Sangwan SR, Singh AK, Wadhwa G (2022) Hybrid deep learning model for sarcasm detection in Indian indigenous language using word-emoji embeddings. ACM Trans Asian Low-Resour Lang Inf Process. https://doi.org/10.1145/3519299 25. Subramanian J, Sridharan V, Shu K, Liu H (2019) Exploiting emojis for sarcasm detection. In: Thomson R, Bisgin H, Dancy C, Hyder A (eds) Social, cultural, and behavioral modeling. SBP-BRiMS 2019. Lecture notes in computer science, vol 11549. Springer, Cham. https://doi. org/10.1007/978-3-030-21741-9_8 26. Pamungkas EW, Patti V (2018) # nondicevosulserio at semeval-2018 task 3: exploiting emojis and affective content for irony detection in english tweets. In: International workshop on semantic evaluation, Association for Computational Linguistics, pp 649–654 27. Lemmens J, Burtenshaw B, Lotfi E, Markov I, Daelemans W (2020) Sarcasm detection using an ensemble approach. In: Proceedings of the second workshop on figurative language processing, Online. Association for Computational Linguistics, pp 264–269

204

P. K. Patil and S. R. Kolhe

28. Sundararajan K, Palanisamy AK (2020) Multi-rule based ensemble feature selection model for sarcasm type detection in twitter. In: Computational intelligence and neuroscience, vol 2020. Article ID 2860479, pp 17. https://doi.org/10.1155/2020/2860479 29. Patil PK, Kolhe SR (2022) MarathiSarc: a marathi tweets dataset for automatic sarcasm detection of marathi tweets, In: 13th International conference on advances in computing, control, and telecommunication technologies, ACT 2022, vol 8. pp 108–114 30. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: KDD ’16: proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794. https://doi.org/10.1145/2939672.2939785 31. Desai N, Dave AD (2016) Sarcasm detection in Hindi sentences using support vector machine. Int J Computat Linguistics 4(7):8–15

Fleet Management Mobile Application Using GPS Shortest Path Dhiraj Patil, Sitaram Mane, Sumit Biradar, Swapnil Rankhamb, and Mansi Bhonsle

1 Introduction Today’s world is rapidly industrializing and commercializing. Fleets play an important role in these areas. The industry’s growth index necessitates increased utilization of fleets with properly optimized processes. The effective maintenance of the actively developed fleets is becoming a challenging task. Fleets always require a welldocumented maintenance, driver, trip, and task assigned. The fleet manager must work hard to keep track of the fleet, which becomes more difficult as the number of fleets in any organization grows and becomes more expensive [5]. For the better utilization of fleets, more optimized techniques are required. Fleet management reduces the cost of maintenance and helps to produce accurate results for the fleet. Fleet management uses the technologies and software to manage and optimize the vehicles most probably in the organization or businesses. On average, these types of system involve the use of a GPS tracking system for monitoring the location of the vehicles, and some of them are advanced as they use application that helps managers to optimize fleet operation by analyzing the data of the fleet [14]. A common myth about fleet management system is that it is too expensive and that only large companies can afford them but as technologies go far, the prices of these systems are mitigated [15–20]. The fleet management system offers pay-per-feature plans for smalland medium-sized businesses. This application helps the vehicle owner to achieve better control over the vehicle and get more accurate results. Most mainly it checks the app ability to keep track of the vehicle’s traveling history, driver information [2], and vehicle information scheduling maps to optimize efficiency from both ends of the D. Patil (B) · S. Mane · S. Biradar · S. Rankhamb · M. Bhonsle Department of Computer Engineering, G H Raisoni College of Engineering and Management, Pune, India e-mail: [email protected] M. Bhonsle e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_16

205

206

D. Patil et al.

app, i.e., fleet owners and staff. Restful APIs present in the application help perform better functionalities over the network. The application also performs the reversion and scheduling functions. This project involves not only small organizations but also large organizations of fleets. One of the key features of a fleet management system is route optimization, which uses the shortest path algorithm to find the most efficient route between two points. Fleet management is the process of overseeing and controlling a company’s vehicles, including cars, trucks, vans, and other types of vehicles used for business purposes. The goal of fleet management is to ensure that vehicles are properly maintained, used efficiently, and cost-effectively, and to reduce the risks associated with vehicle use [7]. The shortest path algorithm is a computational method that finds the shortest path between two nodes in a graph. In the case of fleet management, the nodes can represent different locations or waypoints on a route, and the edges between them can represent the roads or paths connecting those locations [1]. By using the shortest path algorithm, a fleet management system can optimize the routes taken by drivers, reducing travel time, fuel consumption, and vehicle wear and tear. This can result in significant cost savings for fleet operators, as well as improved service levels for customers. The fleet management system application can use realtime traffic data, weather conditions, and other factors to adjust the route optimization algorithm on the fly, ensuring that drivers are always taking the most efficient route to their destination. Additionally, the system can provide real-time updates to drivers and dispatchers, enabling them to make informed decisions about route changes and other issues that may arise. Fleet management system with shortest path optimization can help improve efficiency and reduce costs for fleet operators, while also improving service levels and customer satisfaction.

2 Literature Survey Research papers and some journals were used for this project. Reference [1] “Vinay Kukreja,” “Anu Marwaha,” “Bhavna Sareen,” and “Aditi Modgil” published a research paper on “AFTSMS”: “Automatic Fleet Tracking & Scheduling Management System” for scheduling the fleets on roads, budgeting along with time constraints. Benefits led to employees and administration effectively controlling the fleet. It helps to provide an alternate path for the vehicle if there is any hinder in the existing route, to check whether the mentioned route contains any hinder which helps for getting the clearer route for the fleet, to be more secure from the unauthorized driver from the vehicle, and to avoid more wire and tires of the vehicle from the fleet driver. Reference [2] F. Miao et al. published research on the main goal is to maintain the data of the fleet which is collected by using the fleets, The database server (which is not a SQL database) helps to store the data and is readily available as necessary along with this it helps to gain the better customer service, maintain their comfort, and securely deliver the user fleet through one location to the next.

Fleet Management Mobile Application Using GPS Shortest Path

207

Reference [3] “Nikolina Bodiroga,” “Marija Antic,” “Petar Zecevic,” and “Milan Bjelica” published research on the evaluation of the Cassandra database’s back end for fleet management data collecting. The fleet management cloud is used to construct a data collection, storage, and analytics module, which is presented in this paper, and along with this, the architecture of the fleet management system is also attached. This module implements the proper Cassandra controller, which oversees the whole data and can edit it. It also includes a Cassandra database. Spark processes the gathered data. For the creation of fresh scenarios and user features, fleet management data collecting is a requirement. The data gathered can be used for many industrial reasons as well as systems that offer cars to end users as a service. Reference [4] Delays occur when National Airspace System (NAS) flight demand (such as flight operator operations) exceeds capacity (such as airport-, weather-, or airspace-related constraints). The Federal Aviation Administration (FAA) balances capacity and demand across NAS resources using a variety of time-based management (TBM) technologies to ensure an effective NAS. This technology helps to provide the resources for the flight system. When managing demand and capacity at arrival airports and departure fixes/flows, for instance, the FAA employs the TimeBased Flow Management (TBFM) system to distribute delays across airborne and ground-based planes. To balance the demand for traffic with the capacity that is available, TBFM assigns delay that already exists within the NAS. Reference [5] When the number of fleets is greater in number and it spreads all over the branches of the application, it is difficult to keep records up to date and manage corporate vehicle fleets. By automating the procedure, blockchain and IoT offer a contemporary solution that lowers operating expenses and boosts productivity. The IoT-friendly algorithm “PoAh” suggests a combined strategy for blockchain and the Internet of things (IoT), which lowers latency and boosts energy efficiency. In comparison with conventional blockchain systems, the IoT devices give the information related to the fleet, which is stored block by block, confirmed by PoAh in under 30ms, and requires less than 45mJ of energy. Reference [6] The project’s main goal was to find the shortest route within the first line of parking. An intricate and straightforward process was both applied, and both yielded the same result. The “Floyd–Warshall” method and the “Dijkstra’s algorithm” both produced results that led the vehicles to an intermediate parking place as effectively as feasible following the company’s request. The “Dijkstra’s algorithm” was successful in finding the shortest path. Reference [10] Eliminating the drawbacks of the existing system, the authentication is done in the paper [2] “Android Application for Fleet Management System” along with the authentication application will able to assign the vehicle to the driver automatically. The research paper is published by Shivani Gadekar, Dhiraj Shetty, Swapnil Dhekane, Vrushali Gave, and Pravin Hole. Reference [12] Leveraging Technologies for Business Fleet Applications: A Case Study of Fleet Management System Implemented in Kenya and Lighting Company Limited investigated the unexplored area. The study evaluates more than 500 vehicles by applying the fleet management system which has features like a tracking system, driver monitoring, fleet security, and control through a global positioning system in

208

D. Patil et al.

Kenya. The study involves the data collection, data analysis, research findings, and recommendations. Reference [13] The primary purpose of this research was to identify the GPSbased fleet management system for transport companies. The paper “Identifying Key Factors for Introducing GPS-Based Fleet Management Systems to the Logistics Industry” was published by Yi-Chung Hu, Yu-Jing Chiu, Chung-Sheng Hsu, and Yu-Ying Chan. A detailed study of driver behaviors in terms of their speed, travel time, and driver illegal routes is done. Reference [14] The fundamentals of the fleet management book are all about fleet management. The book includes the connection between insurance companies and risk management and the maintenance of cost reduction methods.

3 Proposed System The fleet management system is consisting of different stages, and each stage performs its specific functionality. The initial stage consists of React Native which acts as the bridge the between front end and back end with pillars of REST API. Back end will be performed by NodeJS and Apollo and MongoDB. Architecture diagram is shown in Fig. 1.

3.1 Modules For providing the optimized fleet management system, the module application should need different modules to be implemented as follows.

Fig. 1 Architecture diagram

Fleet Management Mobile Application Using GPS Shortest Path

3.1.1

209

Vehicle Management

This module manages all vehicle categories, unassigned vehicles, assigned vehicles, and under-maintenance vehicles. GPS tracking is an advanced component in the vehicle management system used to provide accurate and up-to-date information like route and location.

3.1.2

Driver Management

This module contains all the driver-related options. The application can check whether the driver is licensed or not and able to view the occupied and unoccupied drivers in the list view format. It will make it easy to admin for assigning the roles to the driver.

3.1.3

Chatbot

Chatbot is an advanced feature specially build for the driver to support and assist the driver. Chatbot can give real-time information to the admin. Drivers can send a request to the admin for maintenance and any kind of technical issue with the vehicle.

3.1.4

Maintenance Scheduling

Maintenance scheduling is the broad term used for preventive maintenance scheduling, predictive maintenance scheduling, and work order management as this kind of scheduling can be done in the maintenance schedule.

3.1.5

GPS Tracking

This API allows the developer to add maps and location-based services to the application. It is used to get the maps, directions, and location-based search functionality in the fleet management system. GPS tracking benefits from keeping track of vehicles in an easy way that will help the management to keep a log of vehicle departing time and reached time.

3.2 Algorithm Used The most optimized algorithm to calculate the shortest path between two points is given by Dijkstra’s algorithm. In addition to this Google, API applies various machine learning algorithms that will help to find the shortest path by avoiding traffic, and

210

D. Patil et al.

this makes the algorithm most useful for the application. Dijkstra’s algorithm is used to find the shortest path between two nodes on a graph. The algorithm calculates the distance from the start node to that neighbor through the current node. If this distance is less than the neighbor’s current tentative distance value, it is updated to the new, shorter distance. This process continues until all nodes have been visited or the destination node has been reached. Once the algorithm has finished, the shortest path between the start node and the destination node can be reconstructed by following the path of nodes with the smallest tentative distance values. Dijkstra’s algorithm is widely used in many applications, including routing protocols in computer networks, transportation planning, and robotics. It is a very efficient algorithm, with a time complexity of O (|E| + |V| log |V|), where |E| is the number of edges and |V| is the number of vertices in the graph. Suppose s and t are the two nodes in the s and t, • Initialization: (a) Set the distance of the starting node s to zero: d(s) = 0 (b) Set the distance of all other nodes to infinity: d(v) = ∞ for all v = s (c) Mark all nodes as unvisited: S = {s} (S is the set of visited nodes) • Visit Node: (a) The unvisited node u with the smallest known distance: u ∈ V \ S such that d(u) is minimized • Update Distances: For each neighboring node v of u: i. Calculate the distance to v as d(u) + w(u, v) (w is the weight of the edge from u to v) ii. If this distance is less than the current distance of v, update the distance of v: d(v) = d(u) + w (u, v) • Mark Visited: Mark the visited node u as visited by adding it to the set S: S = S ∪ {u}. • Termination: If the destination node t has been visited, stop. Otherwise, repeat steps 2 to 4 for the unvisited node with the smallest known distance. • Pathfinding: Once the destination node t has been visited, backtrack from t to s by following the path of the smallest distances. This will give the shortest path between s and t. In this mathematical representation, V is the set of all nodes in the graph, and d(v) represents the distance from the starting node s to node v. The algorithm updates the distance of each node as it visits neighboring nodes, and marks nodes as visited when they have been explored. The algorithm terminates when it has found the shortest path to the destination node, and backtracks to find the actual path. Once the destination

Fleet Management Mobile Application Using GPS Shortest Path

211

node t has been visited, backtrack from t to s by following the path of the smallest distances. This will give the shortest path between s and t.

3.3 Database MongoDB database is used to store data of fleet management system. MongoDB is open source. It is used to store large-scale data, and it will also allow working with data in a very efficient manner. It is a NoSQL database that stores the data in a document-oriented format. It supports C, C + + , Python, Scala, etc. One of the key benefits of a fleet management system is improved visibility and control over the fleet. With vehicle tracking and monitoring modules, fleet managers can track the location and status of vehicles in real time, allowing them to make informed decisions about routing, scheduling, and maintenance. This can help to reduce costs and improve efficiency by minimizing idle time, optimizing routes, and reducing fuel consumption. Another important component of a fleet management system is maintenance and repair management. With this module, fleet managers can schedule and track maintenance activities, monitor vehicle performance and health, and track repairs and associated costs. This helps to ensure that vehicles are kept in good condition and reduces the risk of unexpected breakdowns or downtime. Finally, reporting and analytics modules can be used to provide fleet managers with detailed reports and insights into key performance indicators such as vehicle utilization, fuel efficiency, and maintenance costs. This allows fleet managers to make data-driven decisions and continuously improve the performance of their fleet.

4 Result and Discussion As part of the result, the proposed fleet management system expects the application able to handle all the fleet-related stuff efficiently. Admin can add new vehicle data and new drivers and able to make changes in the vehicle information as well as the driver information. Admin able to see the all-fleets live statuses like a current list of vehicles, live location of vehicles, assigned vehicle, current drivers, empty driver list, and maintenance of the vehicle. Admin able to make the new trip, assign trip, current status of the trip, cancel trip. On the admin dashboard, the user can see the monthly report of the working of fleets. Yearly revenue report is shown in Fig. 2. The dashboard will appear with an assigned task tab, vehicle details, route of the trip, notification of the task, and information about the next trip. The dashboard functionality of the admin, as well as the driver, is provided in the click application. Fuel cost report monthly is shown in Fig. 3. Fleet management systems reduce fuel costs by improving route optimization and vehicle maintenance. As a result, reduced fuel costs can able to improve the overall profitability of the fleet management system.

212

D. Patil et al.

Revenue(In Lakh) 25 20 15 10 5 0

Jan-23

Feb-23

Mar-23

Apr-23

Fig. 2 Yearly revenue report

Fuel Cost Report(Monthly In K) 180 160 140 120 100 80 60 40 20 0 Jan-23

Feb-23

Mar-23

Apr-23

May-23

Fig. 3 Fuel cost report monthly

5 Conclusions Maintaining the records of the fleet is a banal task that requires more time and cost, which results in decreasing the speed of management. There is an imbalance between the storage of information related fleet and assigning the roles to the fleet. For the effective management of the fleets, this proposed system of fleet management system uses the React Native and NodeJS in the back end services with MongoDB. This leads to the proper assignment of the vehicle to drivers and helps to track the vehicle on maps. The data of the fleet also can be used in the new scenarios and trends of the fleets. This research study has proposed a fleet management requirement software with the architecture diagram of the fleet management system.

Fleet Management Mobile Application Using GPS Shortest Path

213

References 1. Vinay Kukreja, Anu Marwaha, Bhavna Sareen, Aditi Modgil: AFTSMS: Automatic Fleet Tracking & Scheduling Management System: 2020 8th International Conference, Amity University, India 2. Miao F et al (2019) Data-Driven Robust Taxi Dispatch Under Demand Uncertainties. IEEE Trans Control Syst Technol 27(1):175–191 3. Nikolina Bodiroga, Marija Antic, Petar Zecevic, Milan Bjelica: Evaluation of fleet management data collection backend using Cassandra database: 2021 Zooming Innovation in Consumer Technologies Conference (ZINC). 4. Sudip Maitra, Venkata P. Yanambaka, Ahmed Abdelgawad, Kumar Yelamarhi: Securing a Vehicle Fleet Management Through Blockchain and Internet of Things: 2020 IEEE International Symposium on smart Electronics System (iSES). 5. Carlos M. Fernandez, António A. Freitas, António N. Morais, Tânia M. Lima, Pedro D. Gaspar: Fleet Management Optimization in Car Rental Industry: Decision id Models for Logistics Improvement; Department of Electromechanical Engineering University of Beira Interior Covilhã, Portugal. 6. Ayima Zahra, Muhammad Asif, Arfan Ali Nagra, Muhammad Azeem, Syed A. Gilani: Vulnerabilities and Security Threats for IoT in Transportation and Fleet Management: 2021 4th International Conference on Computing & Information Sciences (ICCIS). 7. Predrag Belek, Ivan Cvitkovic, Nikola Kolarevic, Katarina Stojanovic Zlatko Sovreski: Application of fleet management in intelligent transport systems: 2022 57th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST). 8. A. Theiss, D. C. Yen, and C.-Y. Ku; Global positioning systems: an analysis of applications, current development, and future implementations: Computer Standards and Interfaces, vol. 27. 9. SANTANI, Darshan, BALAN, Rajesh Krishna, and WOODARD, C. Jason: Spatio-Temporal Efficiency in a Taxi Dispatch System: Research Collection School of Information Systems (SMU Access Only). Paper 9. 10. Shivani Gadekar, Dhiraj Shetty, Swapnil Dhekane, Vrushali Gave, Pravin hole: Android Application for Fleet Management System: IRJET International Research Journal of Engineering and Technology. 11. Rong H et al (2017) Mining Efficient Taxi Operation Strategies from Large Scale Geo-Location Data: in IEEE. Access 5:25623–25634 12. Edward Chege Waiyaki: Leveraging Technologies for Business Fleet Applications: A Case Study of Fleet Management System Implemented in Kenya and Lighting Company Limited’ University of South Africa. 13. Yi-Chung Hu, Yu-Jing Chiu, Chung-Sheng Hsu, and Yu-Ying Chan: Identifying Key Factors for Introducing GPS-Based Fleet Management Systems 14. Budapest, Hungry; Fundamentals of fleet management system: Publish date – 26 September 2016. 15. Franjieh El Khoury (Author), Antoine Zgheib (Author): Building a Dedicated GSM GPS Module Tracking System for Fleet Management; publish year - 14 February 2018. 16. Design and Implementation of a Fleet Management System for a Logistics Company: S. O. Adewumi and B. O. Olatunji (2020) 17. Evaluation of Fleet Management Systems: A Multi-Criteria Decision-Making Approach: H. Boudriga, W. Amroun, and M. T. Haddar (2021) 18. A Comparative Study of Fleet Management Systems for Fuel Efficiency and Emission Reduction: H. Shen, X. Chen, and W. Zhang (2020) 19. A Framework for the Design of a Real-Time Fleet Management System Using Big Data Analytics: H. A. Hassan and A. A. Elsayed (2020) 20. A Review of Fleet Management Systems for Effective Maintenance and Repair Operations: J. L. Nwosu, O. C. Ogbonna, and J. C. Onyejiuwa (2021).

Finger Vein Biometric System Based on Convolutional Neural Network V. Gurunathan, R. Sudhakar, T. Sathiyapriya, T. Gokul, R. Vasuki, M. Sabari, and G. Uvan Veera Sankar

1 Introduction Biometrics is a branch of science that deals with identifying and validating particular physical or behavioral characteristics. It is a system that recognizes and authenticates persons based on their distinguishing physical or behavioral traits. Voice recognition, iris recognition, fingerprints, facial authentication, and behavioral biometrics including typing patterns and signature analysis are a few examples of biometric data. The ability of biometric technology to provide high levels of security and accuracy in a variety of applications, including access control, border control, and financial transactions, has increased its reputation recently [1]. However, biometric technology raises concerns about personal data privacy and security, which must be addressed through appropriate regulations and security measures. Biometric systems use an individual’s unique physical or behavioral characteristics to verify their identity and provide a high level of security because biometric traits are difficult to fake or duplicate, biometric traits are unique to each individual, and biometric systems provide accurate identification. Biometric systems can improve efficiency in a variety of applications, including attendance tracking, border control, and financial transactions [2]. It reduces the time required for identification and authentication by eliminating the need for manual verification. Finger vein biometrics is a method for finding people based on the individual patterns of veins in their fingers. Because the vein patterns in a person’s fingers are unique and difficult to replicate, finger vein biometrics has a very high accuracy rate. It requires a live finger with blood flow to capture the vein pattern, and finger vein biometrics is difficult to spoof. This makes it more secure than other biometric methods, such as facial recognition or fingerprint V. Gurunathan (B) · R. Sudhakar · T. Sathiyapriya · T. Gokul · R. Vasuki · M. Sabari · G. Uvan Veera Sankar Department of Electonics and Communication Engineering, Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_17

215

216

V. Gurunathan et al.

scanning, which can be tampered with by using photographs or fake fingerprint. Finger vein identification verifies a person’s identity by using vein patterns in their fingers. Because each person has a distinct vein pattern, finger vein recognition is a highly secure method of authentication [3]. Encryption is the procedure of changing information (plaintext) into an unreadable form (ciphertext) using an algorithm and a key to protect the information’s confidentiality and integrity. Only someone with the correct key or password can decrypt the ciphertext back to plaintext. The encryption process consists of a series of mathematical operations that use a secret key to transform plaintext into ciphertext. Both the sender and the receiver of the message practice the secret key to encrypt and decrypt the data [4, 5].

2 Related Work Liu [6] presented paper using personalized binary code, an original local binary learning for finger vein images. Linear discriminant analysis is combined with class sparse constraints. The multi-dimensional pixel was calculated at training database of finger vein, and function learns about difference vector. The function consists of quantization error, variance of class, and maximizing variance. The approaches used in this paper are alternating direction method, and it is designed to solve the problem of optimizing. The model is formed for histogram extraction. Yang et al. [7] proposed methods like FV ROI for extraction and image enhancement. The powerful hyper-sphere model is constructed by granular computing. Ma et al. [8] proposed new vein feature representation scheme. The process is based on oriented gradient pyramid histograms and local phase quantization. To extract vein features, local phase quantization needed to solve blurring in images. Pyramid local phase quantization histogram (PLPQ) is used for feature extraction. Vein images can be encoded in frequency domain using different scales and orientations. Ma et al. [9] represented about contactless finger vein identification because user is friendly. Images are captured under condition as contactless finger vein images. The planned method used in this paper is ROI method and elements oriented for feature extraction. A ROI is rectified for rotation that reduces effects on rotation and translated to magnitude using finger vein images. High accuracy is formed by proposed method. The result gives algorithm in effective manner and feasible identification. Ren et al. [10] presented about finger vein features with convolutional neutral network. They run on four public datasets in a series of extensive experiments. It shows that our organization can completely form a rank-1 acknowledgement. Its accuracy is more than 96% on the four open datasets and template protection used to hide the user’s original finger vein image. They proposed a new finger vein copy which uses Rivest–Shamir– Adleman encryption technology. Kun et al. [11] proposed two different models by using single trait for personal identification and cannot satisfy the original applications. ECG signal have their unique merits and also short respect coming. This system gets overcome from single model approaches and joints these models. The result shows system in multimodal, and both recognition is secured and extraction

Finger Vein Biometric System Based on Convolutional Neural Network

217

is accurate. Qin et al. [12] proposed novelty in finger vein extraction that detects structure like valley used in curvature in chance space. They get eight patches from given pixels and rotating along with eight different locations. These patches result in random transform, and vein pattern generates prominent gorges. Obtained results on contactless and contacted finger vein datasets that systems provide accuracy in finger vein confirmation system. Lu et al. [13] proposed high security in IOT, and they introduced finger vein authentication method by competitive and magnitude for finger vein. This finger vein is proven based on biometric system method. For feature extraction, they used local descriptor for extraction and histogram competitive orientation is also used. These features are extracted for MMCBNU-6000 database and also commonly used in local descriptors.

3 Proposed System The proposed finger vein system using CNN is shown in Fig. 1. The proposed system consists of (i) image enhancement, (ii) feature extraction, (iii) encryption, (iv) decryption, and (v) classification. We have adopted convolutional neural network classifier for person authentication.

Fig. 1 Proposed finger vein system

218

V. Gurunathan et al.

3.1 Image Enhancement The method of modifying images is used, so that the outcomes are better suited for image analysis later on [14]. Figure 2 represents an image enhancement flow. Histogram Equalization. The technique used to enhance the contrast is histogram equalization. It converts image pixel distribution to a new distribution. Where the histogram image is like a graph, it shows the occurrence of every intensity value in an image. The histogram equalization is used for transforming the originally developed image into recently developed image. This system is used to compute by mapping original value into recently intensity that is given everywhere uniformly. Mathematical equation is given in Eq. (1). h = E(r ) 0 ≤ r ≤ L − 1

(1)

Contrast Limited Adaptive Histogram Equalization. Contrast limited adaptive histogram is used to ignore an over amplification of a noise. Then, the image is divided into small blocks which is known as tiles. One of the most important merits over CLAHE is that it will enhance the image, and then, it will remove the lighting

Fig. 2 Image enhancement

Finger Vein Biometric System Based on Convolutional Neural Network

219

area, and it will safeguard the local area image. Mathematical equation is given in Eq. (2). g(x, y) = gi j (x, y)

(2)

Modified SUACE Algorithm. Speeded-up adaptive contrast enhancement (SUACE) is an image processing technique; it is designed to improve the visibility and quality of finger vein images [17]. The SUACE algorithm process is based on several criteria like visual appearance, time efficiency, and overall quality in comparison with other similar approaches. The modified SUACE algorithm performance is assessed based on the clarity of the seeing appearance of vein pattern in the enhanced image and the visual quality of the output. The effect of different dynamic ranges and illumination map resolutions are also analyzed. The time efficiency of the SUACE algorithm is evaluated by calculating the number of frames that it can process per second to the idle frame rate. This provides a measure of how fast the algorithm is in practice and allows for comparisons to be made with other similar methods and the output of the modified SUACE algorithm used for the extraction process.

3.2 Features Extraction Feature extraction is necessary because it allows us to reduce the complexity of an image and extract only the relevant information needed for a specific task. Feature extraction is the process of identifying and selecting relevant information that helps to improve the accuracy and efficiency of the models. Histogram of Oriented Gradients (HOG) Algorithm. The algorithm works by computing the gradient orientation and magnitude at each pixel in an image. The fundamental principle behind HOG feature extraction is that the gradient direction density can effectively convey the local feature information of an image. Mathematical equation is given in Eq. (3) and shown in Fig. 3. G(x, y) =



G x (x, y)2 + G y (x, y)2

(3)

Modified Difference of Gaussians (DOG) Algorithm. The MDOG algorithm works by convolving an image with two Gaussian filters of different standard deviations and subtracting one from the other. The resulting image highlights the edges and other high-frequency features in the original image. Mathematical equation is given in Eq. (4). By varying σ 1 and σ 2, the finger vein patterns are extracted. D(x, y, σ ) = L(x, y, k1 σ 1) − L(x, y, k2 σ 2)

(4)

220

V. Gurunathan et al.

Fig. 3 HOG algorithm

3.3 AES Encryption and Decryption The AES algorithm can be used to encrypt and protect the biometric data captured from a finger vein scanner. The use of AES in finger vein authentication ensures that the biometric data captured during enrollment and verification is protected from unauthorized access and cannot be easily tampered. Finger vein authentication, where it is used for individual veins for every person, is to verify their own individuality. AES is generally used in symmetric algorithm that uses same key for encryption and decryption process. Before encrypting, AES is saved and given to a sender through a network. The process where the encryption uses the hidden key changes over the given text to ciphertext, and we have no permission to access without key. We should keep it privately, and so non-authorized person cannot access. Encryption and decryption takes place in a same key. AES encryption and decryption are used in finger vein, and the data will be in a private manner where the particular user only can able to access the particular data [15].

3.4 Convolution Neural Network Classifier is an algorithm that studies the input data one or more which is already categorized. The main aim of the classifier is to align the label of the input data. In this paper, we are using classification algorithm as convolutional neural network. CNN algorithm is used re-classify the block of the vein where the images are classified

Finger Vein Biometric System Based on Convolutional Neural Network

221

MATCH

Extracted Image NOT MATCH

OUTPUT LAYER INPUT LAYER HIDDEN LAYER

Fig. 4 Convolution neural network

for identify [16]. The trained dataset of a vein images is labeled with an equal authentication. While training the images, this algorithm studies the exact relevant images which is used as individual veins of a person, wherein these features are used to detect the image. CNN which compares the classified output image that is stored in a given file or a folder identifies the person or not. Figure 4 represents the process of CNN algorithm.

4 Result and Discussion 4.1 Image Enhancement The input image for single person is shown in Fig. 5a and b. The various enhancement technique results are shown in Fig. 6a–e.

Fig. 5 a Input finger vein, b Grayscale image

222

V. Gurunathan et al.

Fig. 6 a Contrast enhancement, b sharpening, c histogram equalization, d contrast limited adaptive histogram equalization and e modified speeded-up adaptive contrast enhancement

4.2 Feature Extraction We extracted vein pattern using HOG, and the results are shown in Fig. 7a and b. The MDOG results for different values of σ1 and σ2 results are shown in Fig. 8a–d.

Fig. 7 a Modified speeded-up adaptive contrast enhancement, b Histogram of oriented gradients

Fig. 8 a DOG σ 1 = 8, σ 2 = 10, b DOG σ 1 = 1, σ 2 = 0.5, c DOG σ 1 = 2, σ 2 = 3 and d DOG σ 1 = 4, σ 2 = 6

Finger Vein Biometric System Based on Convolutional Neural Network

223

4.3 AES Encryption and Decryption The extracted image is taken into encryption and decryption process. The results of AES is shown in Fig. 9a–f. The various performance metrics of the proposed system using CNN are given in Table 1. We have used various distance measures like hamming distance and Euclidean distance to perform matching. The performance metrics of distance measures are given in Tables 2 and 3. Figures 10 and 11 show the comparison of the proposed work with other works.

Fig. 9 a Difference of Gaussians, b encrypted image, c decrypted image, d histogram of oriented gradients, e encrypted image, f decrypted image

Table 1 Performance metrics of the proposed system (using CNN classifier) Performance metrics

HOG feature extraction

MDOG feature extraction

False acceptance rate

0.05

0.07

False rejection rate

0.07

0.09

Equal error rate Accuracy (%)

0.35

0.39

96.65

95.50

Table 2 Performance metrics of the proposed system (using hamming distance measure) Performance metrics

HOG feature extraction

MDOG feature extraction

False acceptance rate

0.15

0.18

False rejection rate

0.25

0.3

Equal error rate

0.45

0.49

91.65

93.50

Accuracy (%)

224

V. Gurunathan et al.

Table 3 Performance metrics of the proposed system (using Euclidean distance measure) Performance metrics

HOG feature extraction

MDOG feature extraction

False acceptance rate

0.25

0.2

False rejection rate

0.25

0.3

Equal error rate

0.48

0.43

90.65

90.50

Accuracy (%)

Accuracy(%) 93.50% Gabor filter + Hamming distance

92.25%

Gabor filter + Euclidean distance

91.50%

Gabor filter + SVM classifier

90.45%

90.50%

90.25%

Gaborfilter + KNN classifier HOG + SVM classifier MDOG + SVM classifier

Fig. 10 Comparison graph-1

Accuracy(%) HOG + Hamming distance

96.65% 95.50%

HOG + Euclidean distance HOG + SVM classifier

93.50%

HOG+ KNN classifier 92.25% 92.50%

92%

91.65% 90.65%

90.50%

90.50%

HOG+ CNN MDOG + Hamming distance MDOG + Euclidean distance MDOG + SVM classifier MDOG+ KNN classifier MDOG+ CNN

Fig. 11 Comparison graph-2

5 Conclusion Finger vein based person authentication system was proposed in paper. The modified speeded-up adaptive contrast enhancement (MSUACE) algorithm was employed to enhance the vein pattern. We have used histogram of oriented gradients (HOG) and modified difference of Gaussians (DOG) algorithms to extract the vein line pattern present in the finger vein image. The AES encryption algorithm was incorporated

Finger Vein Biometric System Based on Convolutional Neural Network

225

to ensure the security of finger vein pattern. Finally, convolutional neural network (CNN) was used for classification. The proposed system results are compared with other techniques. The proposed biometric system provides EER of 0.35 with accuracy of 96.65% with HOG as feature extractor and CNN as classifier.

References 1. Jain AK, Ross A, Nandakumar K (2008) An introduction to biometrics. In: 19th International conference on pattern recognition. USA, pp 1–1 2. Zhao D, Ma H, Yang Z, Li J, Tian W (2020) Finger vein recognition based on lightweight CNN combining center loss and dynamic regularization. Infrared Phys Technol 105:103221 3. Na H, Hui M, Tao Z (2020) Finger vein biometric verification using block multi-scale uniform local binary pattern features and block two-directional two-dimension principal component analysis. Optik 208:163664 4. Evangelin LN, Lenin F (2021) Securing recognized multimodal biometric images using cryptographic model. Multimedia Tools Appl 80:18735–18752 5. El-Rahiem Basma A,·El Samie Fathi, EA, Amin M (2022) Multimodal biometric authentication based on deep fusion of electrocardiogram (ECG) and finger vein. Multimedia Syst 28:1325– 1337 6. Liu H, Yang G, Yang L, Yin Y (2019) Learning personalized binary codes for finger vein recognition. Neurocomputing 365:62–70 7. Yang J, Wei J, Shi Y (2019) Accurate ROI localization and hierarchical hyper-sphere model for finger-vein recognition. Neurocomputing 328:171–181 8. Ma H, Hu N, Fang C (2021) The biometric recognition system based on near-infrared finger vein image. Infrared Phys Technol 116:103734 9. Ma H, Zhang SY (2019) Contactless finger-vein verification based on oriented elements feature. Infrared Phys Technol 97:149–155 10. Ren H, Sun L, Guo J, Han C, Wu F (2021) Finger vein recognition system with template protection based on convolutional neural network. Knowl-Based Syst 227:107159 11. Tang S, Zhou S, Kang W, Wu Q, Deng F (2019) Finger vein verification using a Siamese CNN. IET Biometrics 8:306–315 12. Qin H, He X, Yao X, Li H (2017) Finger-vein verification based on the curvature in Radon space. Expert Syst Appl 82:151–161 13. Lu Y, Wu S, Fang Z, Xiong N, Yoon S, Sun Park D (2017) Exploring finger vein based personal authentication for secure IoT. Futur Gener Comput Syst 77:149–160 14. Gurunathan V, Bharathi S, Sudhakar R (2015) Image enhancement techniques for palm vein images. In: International conference on advanced computing and communication systems. Coimbatore 15. Mohsin AH, Zaidan AA, Zaidan BB, Albahri OS, Albahri AS, Alsalem MA, Mohammed KI (2019) Based blockchain-PSO-AES techniques in finger vein biometrics: a novel verification secure framework for patient authentication. Comput Standards Interfaces 6:103343 16. Ismail B, Mohamed OZ, Hamza H, Bakhtiar AR (2022) Finger vein identification using deeplyfused convolutional neural network. J King Saud Univ Comput Inf Sci 34:646–656 17. Finger vein database: http://mla.sdu.edu.cn/sdumla-hmt.html

Embedding and Extraction of Data in Speech for Covert Communication Vani Krishna Swamy, R. Arthi, M. S. Srujana, M. Sushmitha, and J. Vaishnavi

1 Introduction Revolutionary data communication techniques have changed the way in which humans exchange data, in its multitude of forms, such as text, images, audio, and video. This very exchange of information has become ubiquitous. It is done by governments, military organizations, law enforcement agencies, and many more. The availability of such data on the Internet exposes it to various risks and vulnerabilities. Data security and privacy still remain a challenge, and the variety of hacking tools and their consistent development ask for an improved security system. Cryptography is one such information-protecting mechanism that alters sensitive information, so that it cannot be accessed by unauthorized parties. It transforms the original text (plain text) into ciphertext using encryption algorithms such as Data Encryption Standard (DES), Advanced Encryption Standard (AES), Rivest–Shamir– Adleman algorithm (RSA) (unreadable form). At the receiver’s end, a decryption algorithm converts the ciphertext back to its original form. But, the use of cryptography may draw the attention of an attacker and is hence susceptible to being modified or hacked, making it an ineffective way of providing data security. This V. K. Swamy (B) · R. Arthi · M. S. Srujana · M. Sushmitha · J. Vaishnavi School of Computer Science Engineering, REVA University, Bangalore, India e-mail: [email protected] R. Arthi e-mail: [email protected] M. S. Srujana e-mail: [email protected] M. Sushmitha e-mail: [email protected] J. Vaishnavi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_18

227

228

V. K. Swamy et al.

disadvantage led to the development of data-hiding techniques like steganography where the existence of important data is in itself hidden. It does so by embedding the secret message onto the digital media and based on the multimedia used, text, image, audio, video, and network; it is categorized into five major types. Steganography makes detecting data quite difficult, but if such a method is detected, the sensitive data can be easily recovered. Hence, the data is preprocessed by including a cryptographic algorithm and a compression algorithm which aids in maintaining the quality of the cover media. The use of cryptography along with speech steganography provides dual-layer security for the data. This paper enables the research done on the subject by using spread spectrum representation-based speech steganography, which utilizes decimated wavelet transform (DWT) to break down the cover audio file into approximate and detailed coefficients, i.e., low frequency and high frequency. The proposed technique is an enhanced version of the FFT based method to improve imperceptibility. Wavelet transform (WT) performs better compared to Fourier transform (FT) because of its ability to describe any type of signal both in the time and frequency domain which provides effective access to localized signal information. The proposed technique has applications in secure handling of confidential communication and secret data storage, protection against unauthorized data modification, access control mechanism for secure digital content distribution, and management of media database systems. Its usefulness can be seen in a wide range of sectors including multimedia, military, navy, civil, and government organization, and more. This paper discusses the process of creating a stego-speech that contains the encrypted sensitive data hidden within the cover speech for covert communication, along with the advantage of using DWT over FFT and facilitates further research in the area of data hiding. The subsequent sections of the paper are structured as mentioned forth. Section 2 includes the literature survey. This paper presents an analysis of the existing methodology in Sect. 3 and the proposed methodology in Sect. 4. The results obtained and evaluation metrics are put forth in Sect. 5, and the paper concludes with the scope for future research in Sect. 6.

2 Literature Survey The process of obscuring data in an audio stream is known as speech steganography, so as to make the hidden data imperceptible to the human ear. The main goal of speech steganography is to maintain the quality of the original speech signal while hiding the data, so that it can be recovered at the receiver side. It is different from cryptography where it is understood that a secret message is being sent [1]. Steganography has been primarily categorized into five types over the years based on the type of digital media being used as the cover object: text steganography, picture steganography, audio steganography, video steganography,

Embedding and Extraction of Data in Speech for Covert Communication

229

and network steganography. In order to minimize the difference between the original medium and the one received after the concealed data was embedded, these steganography techniques take advantage of both human beings’ inherent limits in their aural and visual perceptions as well as the characteristics of digital media. [2]. Audio steganography has been significantly considered due to the multiplicity of uses in various fields such as the Internet [3]. In recent years, several studies have been conducted on the real-time implementation of speech steganography. The majority of these studies have focused on digital audio signals [4], with most of the proposed algorithms being based on quantization error and psychoacoustic models [5]. These algorithms work by modifying the least significant bits of the quantized audio signal, thereby hiding the data. Additionally, steganography is categorized into two on the methods used to conceal data in multimedia, namely the spatial domain and frequency domain. Here, there is usage of frequency as a domain that is further divided into various types like the discrete Fourier transform (DFT), discrete cosine transform (DCT), and discrete wavelet transform (DWT). The cover speech is initially encrypted using a hyperchaotic map that produces the encrypted speech, and then it converts into YCbCr channels. The DCT method divides the first channel (Y) into non-overlapping blocks. Huffman coding is used to quantize the resulting discrete cosine transform (DCT) blocks, and hidden information is then encoded into their frequency coefficients. The output stego-speech can infer the data’s existence. The preprocessing’s potential to enhance data security has not been considered, although this method has produced good voice quality with a sizable payload-capacity [6]. Data is encoded in speech using the discrete cosine transform (DCT) to conceal it [7]. Using the difference between two DCT coefficients with three moduli, two hidden data bits have been added to the speech coefficient values. This improves voice embedding space and considerably lowers speech distortion. But, by applying encryption techniques to the secret information before embedding, the system’s data security can be increased. A collaborative approach is “more secure” than only using the steganography technique because incorporating cryptography improves the security of the data. The secret data is vulnerable to hacking and manipulation the moment the unwanted user discovers the steganographic pattern. The existing approach to audio or speech steganography is a representation of the FFT-based spread spectrum for a steganographic system [8]. But various other research has proven wavelet transform (WT) as a better approach to decomposing signals. Since these signals are non-stationary, small variations may go unnoticed by the Fourier transform, and the outcome may differ based on the length of the data. Thus, when it comes to spectral analysis, the WT is a more appropriate choice than the Fourier transform [9]. Due to their larger size and ability to conceal more information than the other cover signals, audio signals are utilized as covers more frequently than the others [10].

230

V. K. Swamy et al.

3 Existing Methodology The existing methodology of FFT-based spread spectrum representation for speech steganography involves preprocessing the speech signal which is then transformed into the frequency domain using the fast Fourier transform (FFT) [8]. This transforms the signal from the time domain into the frequency domain, where it can be manipulated using various techniques. The sensitive data is then encoded into the speech signal in the frequency domain using a spread spectrum technique. This involves adding a small amount of noise to the signal at specific frequencies that are chosen based on the secret message. The amount of noise added is small enough to be imperceptible to humans but can be picked up by using a decoding algorithm. To extract the secret message from the encoded speech signal, a decoding algorithm is used. This involves applying a matched filter to the encoded signal, which amplifies the noise added during the encoding process. The hidden data can then be retrieved from the filtered signal. The advantage of using FFT-based spread spectrum representation for speech steganography is that it is a robust and efficient technique that can be used to hide secret messages in speech signals without affecting the quality of the speech signal. However, FFT assumes that the signals are linear and stationary, which is not always the case for speech signals, and assumes that the signals are linear and stationary, which is not always the case for speech signals and hence proposed the use of DWT over FFT.

4 Proposed Methodology Figure 1 shows the proposed methodology which makes use of the discrete wavelet transform to divide the input carrier media signal into its detailed and approximate coefficients. The lower frequency subband is termed as the approximate layer, and the higher frequency subband is cited as the detail layer. Practically, the approximate layer is identical to the original speech signal. This wavelet decomposition provides several advantages for speech signal analysis, including multiresolution analysis, efficient compression, noise reduction, feature extraction, and robustness to nonstationary components. For the secret message (M) that is going to be embedded with our speech signal, it initially compresses it by employing the Lempel–Ziv–Welch lossless compression algorithm. By using this method, the cover speech’s capacity is increased while the hidden message payload is reduced. L Z W (M) = M 

(1)

The compressed secret message is then encrypted using the AES algorithm to provide an extra layer of security. This way it can decrease the level of security by

Embedding and Extraction of Data in Speech for Covert Communication

231

Fig. 1 Proposed methodology

obscurity and therefore make the system more robust to threats. It generates a hash key using the SHA256 algorithm and use this to create the cipher/encrypted text. The security that can be attained by utilizing steganography to hide sensitive data inside of a cover media depends on the conviction that no one can suspect that there are any data hidden. However, if someone sees a change in the cover medium, the sensitive data may be revealed. Therefore, it is preferable to utilize another method, such cryptography, to encrypt the sensitive data before hiding it in the cover media. Because it is encrypted, this will guarantee that even if the hidden text is found, no one will be able to decipher its contents. As a result, for increased security, we can benefit from combining the two ways to make sure that even in the case of an extremely difficult security breakthrough, the confidential information is protected and not misused [11–15].   AES M  = M 

(2)

Now, the compressed and encrypted text is ready to be embedded into the cover speech. The speech signal/cover speech (C) has already been preprocessed using the DWT function which decomposes the signal into detailed and approximate coefficients, or cD and cA, respectively. C = cA + cD

(3)

The compressed and encrypted hidden message (M ) is then embedded in the detailed coefficients part of the cover speech. The approximate coefficient is more

232

V. K. Swamy et al.

like the original audio compared to its detailed coefficient; hence by embedding the hidden message in the detailed coefficient of the audio, it can achieve more similarity between the original carrier media and the stego-speech. This ensures that the cover speech is not changed much and avoids suspicion of any secret message being hidden. It encodes the secret data by using the LSB part of the detailed coefficient which is replaced with the confidential information’s bit value progressively in the least significant bit (LSB) steganography technique.   LSB cD, M  = cD

(4)

The stego-speech (S) is constructed by inverse DWT technique, combining the modified detailed coefficient and approximate coefficient of the original speech. S = cA + cD

(5)

The extraction procedure is the reverse process of the embedding procedure. Discrete wavelet transform is performed on the stego-speech decomposing it into detailed and approximate coefficients. They extract the encrypted secret message from the detailed coefficients by running the LSB algorithm. First extract the least significant bit of each byte and convert this into a string, and from the string, extract the message by cutting off the filler characters. This gives us an encrypted message. On performing decryption using the AES algorithm, it can reveal the secret message that was hidden in the speech signal.

5 Results and Evaluation 5.1 Results Running the DWT and LSB code on Python has concluded to have found not any variation between the cover speech and stego-speech which implies the effectiveness of the method proposed, wherein it can perform data hiding without bringing about any difference in the original audio file so as to not distort the signal or add any noise. The matplotlib library on Python also visualized the DWT function and the decomposition of the carrier signal into approximate and detailed coefficients. As shown, the approximate coefficient is similar to the original audio file, with the detailed coefficient taking the secret message. On reconstruction (combining approximate and detailed coefficient), it obtains the stego-speech/reconstructed file which contains the secret message hidden with the cover speech. Figure 2 represents the cover speech or the carrier sound wave in terms of amplitude versus time. The time is divided with respect to the number of frames and the frame rate of the cover speech. This is a single-channel audio, and hence, only one channel is used to plot the graph.

Embedding and Extraction of Data in Speech for Covert Communication

233

Fig. 2 Cover speech

Fig. 3 Approximate coefficients

In Fig. 3, visualize the approximate coefficients obtained from decomposing the original cover speech using a function known as the discrete wavelet transform. The lower frequency subband of the original audio is called as the approximate layer, and this layer appears to be the initial speech signal, which is evident in comparing Figs. 2 and 3. In Fig. 4, the graphical representation of detailed coefficients which is the upperfrequency subband of the original audio. As it can see, there are noticeable differences between the original speech signal and its detailed coefficients. The time frame is also reduced by half, and the amplitude ranges from −10,000 to 10,000 as opposed to −30,000 to 30,000 of the original audio. Figure 5 is the graph of the final stego-speech, reconstructed by combining both detailed and approximate coefficients, along with the secret message encoded within it. The actual speech signal and stego-speech minimally differ among themselves, and no noise is added. This proves the efficiency of the proposed methodology.

234

V. K. Swamy et al.

Fig. 4 Detailed coefficients

Fig. 5 Stego-speech

5.2 Evaluation Metrics Additionally, the performance of steganographic techniques can be evaluated based on several metrics such as imperceptibility, capacity, robustness, and security. Imperceptibility refers to the extent to which the stego-speech is perceptually indistinguishable from the original audio signal. In contrast, the quantity of hidden data that can be camouflaged in the stego-speech signals is termed as capacity. Security relates to the level of protection afforded to the classified material against unlawful access, whereas robustness is the ability of the stego-audio signal to resist various threats such as noise addition, compression, and filtering. Assessing the effectiveness of the least significant bit (LSB) embedding will utilize the criteria of imperceptibility, capacity, and robustness. Imperceptibility. This evaluation metric depends on the number of LSBs that are replaced with secret information bits. Generally, replacing one or two LSBs does not significantly affect how well the stego-audio signal perceives, but replacing more LSBs can result in audible distortions. Therefore, the imperceptibility of LSB embedding can be good or bad depending on how many LSBs were used for embedding.

Embedding and Extraction of Data in Speech for Covert Communication

235

Capacity. The size of the media file and the number of least significant bits used for embedding determine the capability of LSB embedding. For example, if the audio signal is 10-s sampled at 44.1 kHz and use one LSB for embedding, then it can hide approximately 1.5 kilobits of secret information. Robustness. LSB embedding is not robust against multiple threats such as noise addition, compression, and filtering. Adding noise or applying compression can destroy the embedded secret information or make it difficult to recover. Filtering can also remove the embedded secret information if the filter removes the LSBs.

6 Conclusion and Future Work In this paper, the real-time implementation of speech steganography for data hiding is addressed. The suggested solution combined the least significant bit encoding (LSB) and discrete wavelet transform (DWT) methods to incorporate data into voice signals. In comparison with the currently used FFT methodology, the testing findings demonstrated that the proposed method achieved great data concealing capacity while keeping low distortion in the speech signal. The proposed method has several advantages, including real-time implementation, high data-hiding capacity, and imperceptibility of the hidden data. Evaluation metrics in Fig. 5 shows stego-speech. In conclusion, this work demonstrates that speech steganography can be an effective method for data hiding in real-time applications. Future studies can emphasize investigating the usage of deep learning models for speech steganography and enhancing the suggested method’s resistance to various attacks. In general, the suggested technique has the potential to be used in several applications, including secure communications, multimedia watermarking, and digital rights management. The challenges in speech steganography include maintaining the robustness of the hidden information against attacks and distortions, increasing the amount of information that can be hidden within a speech signal while maintaining imperceptibility, need for standardized evaluation metrics to compare different speech steganography techniques, and integration with other technologies such as cryptography, watermarking, and digital signal processing. Future work in speech steganography will focus on improving the robustness, capacity, evaluation metrics, real-world applications, and integration with other technologies to enhance the security and quality of speech steganography systems. Future work could focus on developing practical applications for speech steganography in fields such as forensics, military, and healthcare.

236

V. K. Swamy et al.

References 1. Kahn D (1996) The history of steganography. In: International workshop on information hiding. Springer, Berlin, Heidelberg, pp 1–5 2. Djebbar F et al (2011) A view on the latest audio steganography techniques. In: 2011 International conference on innovations in information technology. IEEE 3. Elham Z, Avaz N (2017) Audio steganography to protect confidential information: a survey. Int J Comput Appl 22–29 4. Dutta H et al (2020) An overview of digital audio steganography. IETE Tech Rev 37(6):632–650 5. Huang YF et al (2017) Steganography in low bit-rate speech streams based on quantization index modulation controlled by keys. Sci China Tech Sci 60:1585–1596 6. Abdel-Aziz MM, Hosny KM, Lashin NA (2021) Improved data hiding method for securing colour images. Multimedia Tools Appl 80:12641–12670 7. Attaby AA, Mursi Ahmed MFM, Alsammak AK (2018) Data hiding inside JPEG images with high resistance to steganalysis using a novel technique: DCT-M3. Ain Shams Eng J 9(4):1965–1974 8. Chen J, Carlos J (2015) A spread spectrum representation based FFT domain speech steganography method. IEEE Trans Audio, Speech Lang Lett 23(1) 9. Akin M (2002) Comparison of wavelet transform and FFT methods in the analysis of EEG signals. J Med Syst 26(3):241–247 10. Dutta H, Das RK, Nandi S, Mahadeva Prasanna SR (2020) An overview of digital audio steganography. IETE Tech Rev 37(6):632–650 11. Al-Juaid N, Gutub A (2019) Combining RSA and audio steganography on personal computers for enhancing security. SN Appl Sci 1:830 12. Gera A, Vyas V (2023) Securing data using audio steganography for the internet of things. EAI Endorsed Trans Smart Cities 6(4):e5 13. Gera A, Vyas V (2023) Encrypted, compressed, and embedded text in audio WAV file using LSB audio stenography. In: Awasthi S, Sanyal G, Travieso-Gonzalez CM, Kumar Srivastava P, Singh DK, Kant R (eds) Sustainable Computing. Springer, Cham 14. Singh P (2016) A comparative study of audio steganography techniques. J Int Res J Eng Technol 3(4):581–585 15. Malik A (2021) Steganography: step towards security and privacy of confidential data in insecure medium by using LSB and cover media (December 12, 2020). In: Proceedings of the International conference on innovative computing & communication (ICICC)

A Machine Learning Based Model for Classification of Customer Service’s Email Javed Akhtar, Md Tabrez Nafis, Nafisur Rahman, and Aksa Urooj

1 Introduction As per the growth in the Internet users, the email communication has become the commonly and widely preferred communication medium worldwide [1]. On average, an individual receives 40–50 email messages on a daily basis [2]; but for others, it may be hundred or more. As a result, individuals may spend a substantial portion of their business hours managing and responding to their inboxes [3, 4]. Therefore, management of mailbox is utmost important area of concern for email users, and it requires the urgent need to drive a process that classifies the emails intelligently and manage with the issue. Email classification [5] is an important tool for managing emails. It helps to separate the SPAM messages [6] from the HAM messages and correctly categorize emails. This is done with use of machine learning models, which are trained using large datasets of emails. Such machine learning models can detect patterns in the content of emails and classify them according to predefined categories. Email management is a routine and an important part of daily life; it helps to keep a person organized and productive. A great way to manage emails is to categorize them as official or personal. This allows for the quick identification of emails and helps prioritize which emails to answer first. Additionally, an automated email management system provides an easy-to-use interface that takes the guesswork out of organizing emails and simplifies the task of responding to them quickly. Further, from an organizational standpoint, it can be a difficult task for supervisors and managers to keep track of the email communications that individual employees are sending out. To make this process easier, companies can invest in software for tracking of email communication. This type of software can be used to monitor and analyze employee J. Akhtar (B) · M. T. Nafis · N. Rahman Department of Computer Science and Engineering, Jamia Hamdard, New Delhi 110062, India e-mail: [email protected] A. Urooj Department of Information Technology, NIT Srinagar, Srinagar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_19

237

238

J. Akhtar et al.

emails sent from work accounts. It can also be used to detect potential issues with email content, such as violations of company policy or inappropriate language.

2 Problem Statement Most of the organization are using email as a channel for customer service although the email is not built to handle customer support [7]. One customer support email address is created by the organizations for their customers to raise inquiries, complaints, and grievance which is faced by the customers during the usage of the product and services offered by the organization. So, the organizations share the email inbox with multiple team members to work on the email received or set the forwarding of email to all the support team members. Managing customer service out of a shared inbox can be challenging.

2.1 Shared Inboxes Fail in Managing the Emails as the Number Grows Using an email inbox for customer support is difficult. Email users use confusing color-coded tags to track issues sent over an email, and as the number of support requests increases, collaboration becomes a nightmare. Soon, managing multiple mailboxes for different teams and products means less efficiency. The team will miss important customer communications that need to be addressed immediately, resulting in decreased customer satisfaction and may lead to a decrease in degree.

3 Proposed Solution 3.1 Advantages of Email Multi-folder Categorization for Better Customer Support System with an email multi-folder categorization which is introduced here automatically converts an incoming email into a database record which is easy to monitor, track, and manage. With a database record entry, the system can have customizable views of inbox, each member of the team has separate login, and removes the possibilities of two team members picking up or working on the same email from a customer or missing out on an email altogether without a methodological mechanism for tracking [8]. The main advantage or benefit is that an email in a database system brings transparency about the status, priority, type, and owner, and of the email to each member in

A Machine Learning Based Model for Classification of Customer …

239

the team. This helps to keep all support team members on the same page which eventually improves clarity and make sure that no customer communication or conversation can slip through the small cracks.

3.2 Email with Multi-folder Categorization Works Well with a Database System Using a database system to store the emails in multi-folders makes it easier to prioritize, track, and follow up with/on customer inquiries from one place. This allows support team to provide faster and more efficient and powerful customer service support. • Assign Ownership: Each email gets assigned or allocated as a unique record number and then allocated to a member of the customer service team. The member becomes the owner of the email and will be accountable for its further course of action. • Prioritize Email: Get each email unique record prioritized based on the exigency or need into four categories—Low, Medium, High, and Urgent. • Set Deadlines: Set performance criteria for all incoming emails. An agreement defined at the service level acts as a custom timer for each email request and improves support efficiency. Manual activities such as tagging, routing, and tracking support emails can take a lot of effort and time. The support team’s effort and time can be freed up or saved, by working on automating these activities with multiple rules that run with different parameters, when an email arrives the inbox and assigning them to available support team members. Team member get notified when an email has been assigned, nearing deadlines, or is updated, enabling them to work or act quickly and provide timely support. The email multi-folder categorization database system also helps in making sure that no two support team members end up employed on the same email.

3.3 Work Together to Solve Problems at a Fast Pace Customer support service teams need to collaborate within and among other teams to work and resolve specific issues. The system allows the members of the team to collaborate even if they are working in different emails, while the system allows stakeholders to quickly get the context of a customer’s issue.

240

J. Akhtar et al.

3.4 Provide Support in the 100% Context In today’s time, context matters. Customers hate repetition and expect to be provided relevant information on their order history and past interactions. The system can provide a separate timeline for each customer and provide full history and context of conversations with the customer service representatives to improve the customer experience. It discourages people from replying to customers without knowing the complete picture and recent developments.

3.5 Measure Individual Performance with Intuitive Reports Using analytics and reporting is most essential for identifying loopholes in the customer service strategy. The system generated one click reports are used to track relevant metrics for better understanding on how quickly the team is solving problems or when the team is busiest. A better bird’s eye view on customer service can be obtained with the live dashboards that provide real-time insights into the team actual performance and rating on the customer satisfaction. These reports and dashboards can be generated from the system once emails are converted into the database records.

3.6 Architecture of Automatic Classification of Email Figure 1 showcases the automatic classification of email architecture. And, as showcased within the diagram, the email classification and categorization method or process are distributed into three discrete levels: preprocessing, learning, and classification. To process or execute an automatic classifier of email system, first, multiple datasets of email should be made available, e.g., to create a model for classifier for email marked as spam, one needs to make sure that a dataset for spam email [i.e., the dataset should have both spam emails and ham (non-spam) emails used for the training of the classifier]. Second, once the data collection is made, the next action is to clean the data. Data preprocessing is the term used to describe the process of cleaning up data. Email data are converted into tokens of words during preprocessing. The preprocessing level additionally removes undesirable words or “stop” words to decrease how much information that should be inspected for their manners. Finally, in the email data preprocessing process, token words are subjected to lemmatization and stemming in order to convert them into their root forms (such as changing “retrieving” into “retrieve”). The features are extracted, and the features sets are established during the second stage of learning. Signs that indicate a measurement of just one aspect of a user’s email

A Machine Learning Based Model for Classification of Customer …

241

Fig. 1 Architecture of automatic classification of email

activity or behavior are referred to as features. The effective abstraction of a features set is crucial to the learning task’s efficiency and accuracy in email classification. In order to enhance the classifier’s efficiency and accuracy, the most distinguishing features are selected for classification following feature abstraction. A classifier is developed and kept to order future approaching messages. Finally, at level when classification is applied, a basic classifier is built which is used for the classification of incoming emails into classes related to multi-folder classification. Classification is done by automatically checking the subject line and content of the body of the email, as well as attachments. The system classifies each email based on the criteria set in the system based on the learning records. Instead of relying on individuals to manually classify emails as potential leads, email classification systems use technology to do 99% of the work, fitting seamlessly into business and compliance workflows. When the system flags or identifies an email

242

J. Akhtar et al.

that may contain investigations, the system can use user and company permissions to determine whether the email received should be marked as a potential inducement. Currently available reviews [9] are focused either on phishing or spam or ham email classification but no detailed study found on multi-folder organization on email classification [10]. Nonetheless, classification of email is used in various areas of application, such as email classification into complaint or non-complaint, personal or official, and suspicious activity or normal, and an incoming email classification into relevant directories.

4 Research Methodology 4.1 Classification of Email Using the Technique of Machine Learning Classification is the process or method of classifying a provided data into different classes. Both structured and unstructured data can be used in this way. Predicting the class of a given data point is the first step in the process. Targets, labels, and categories are all common names for classes. Fitting a mapping function from discrete input variables to discrete output variables is the goal of classification predictive modeling. The primary objective is to determine the new data’s class or category [11].

4.2 Machine Learning Terminologies Used in Classification • Classification Model—The model will determine the data’s class or category based on the input set used in the training process. • Classifier—In order to assign each piece of input data with a certain class or category, an algorithm is used. • Binary Classification—This is a discrete classification technique with only two possible outcomes, such as true or false. • Multi-class Classification—In multi-class classification, the classification with more than two classes is assigned to only one label or target for each input sample. • Feature—A feature is a distinct, quantifiable characteristic of the phenomena under observation. • Multi-label Classification—This type of classification assigns single or multiple labels or targets to each sample. • Initialize—Used to allocate the classifier which can be further used • Train the Classifier—Every classifier fits the model for preparing the train set X and train y mark utilizing the fit (X, y) approach. • Predict the Target—For an unlabeled observation X, the Predict(X) method returns the predicted label y.

A Machine Learning Based Model for Classification of Customer …

243

Fig. 2 Sample representation of support vector machine in 2D graph

• Evaluate—In essence, this refers to the model’s evaluation, including the classification accuracy score, report, etc. [11].

4.3 Support Vector Machine A support vector machine [12] is a classifier which displays data for training as dots in the space that are categorized with the greatest amount of space between them. Next, new additional points get added to the space by making predictions about the category and location they belong in [13]. A sample representation of support vector machine in 2D graph is shown in Fig. 2.

4.4 Advantages and Disadvantages for Using Support Vector Machine The support vector machine (SVM) [14] is efficient in memory and works incredibly well in high-dimensional areas and employs a portion of the training points in the decision function. The techniques used in support vector machines do not directly estimate probabilities, which is their lone flaw.

4.5 Validation Tool/Dataset Used A dataset which has a variety of good emails must be considered. After working with several datasets, the dataset from Enron corpus has been selected. The Enron corpus dataset contains over 500,000 emails produced by the people employed at Enron Corporation, large enough as training dataset [15].

244

J. Akhtar et al.

The support vector machine was first used to categorize each email’s folder based solely on a specific field of data. The fields that were utilized are “From,” “Subject,” “Body,” and “To, CC.” Since date information is not text, it was not used, and the issue applying the date information to classification of email was not fully investigated. SVM was then applied to each email that was treated as a one single bag of words. The analysis that follows gives this approach the label “All.” The previous experiments fields were combined and classified for this representation. As a result, if the similar term appears in both the message’s body and subject, it is regarded as multiple instances of the same feature. The scores of SVM from the “From,” “Subject,” “Body,” and “To, CC” classifiers were linearly combined for the final method, which is referenced to “linear combination”. With the use of the training data, ridge regression was used to learn the weights for each section for each folder belonging to a specific user. The data for each user was divided in to half and sorted chronologically in order to create datasets for training and testing. The training was done with the first half of the messages, and testing was done with the second half. The list of terms was created by applying standard text parsing routines to each email field. The message’s body was also subjected to stemming. The standard “LTC” formula was used to assign the terms of their weights, and the one vs. rest multi-class classification method was used to feed the SVM the terms. Each category had the best thresholds for binary decisions. Precision was the only relevant evaluation metric in earlier experiments because decisions (binary) were based on selecting only the most ranked category for every message. However, the suggestion is that presenting multiple potential assignments for each email could be beneficial to the user. In this way, edges were acquired utilizing score-based neighborhood improvement, the SCut, and assessed utilizing F1 scores, which evaluates both accuracy and review. For these tests, the category hierarchy was flattened by assigning one category to one email. To put it in another way, the email was stored in the lowest level category that was considered to be the “correct” category for a particular message. This is because a correct classification only receives one point of credit. Otherwise, multiple categories, which contain numerous other categories, would significantly raise scores.

5 Results The body section of the email was, on average, the feature that was most useful, but it was not much better than the “Subject Line” and “From” field. The email’s “To” and “CC” fields are unquestionably the least useful feature. This is understandable due to the statistic that the address in the “To” field of the majority of messages sent to a user is not very discriminating. For the methods that used multiple fields of the emails’ data, using ridge regression to linearly combine the individual scores was significantly more effective than treating the fields as a single bag of words. The fact that no one feature dominates the others demonstrates that the Enron corpus users

A Machine Learning Based Model for Classification of Customer …

245

use a combination of multiple data points in their organizational scheme rather than relying solely on a single field to organize their emails.

6 Conclusion This research has proposed a machine learning based model for the classification of customer service’s email. It is evident that there is not a strong correlation between a user’s email performance and the number of messages they have. However, this outcome is reasonable. Classification is easy for a user with a lot of messages in the same category. However, if they are dispersed, the user’s categorization strategy influences the classifier’s performance. To put it another way, the ease with which a user can be automatically classified ought to be significantly influenced by the number of categories that can be had.

7 Limitations and Future Scope In addition to the approaches that were attempted in this research, there are many more methods for modeling email. The utilization of email connections to build up information about a specific message requires extra examination. One of these relationships is thread membership. No one has looked into how to apply threads to the task of email classification, despite the fact that thread detection has received little attention. Time information was also left out of these experiments, despite the fact that it appears to be useful. On the other hand, time cannot be utilized in the same way as other fields; consequently, research is required to determine how time influences a user’s folder management strategies.

References 1. Mujtaba GAG (2017) Email classification research trends: review and open issues. IEEE Access 5:9044–9064. https://doi.org/10.1109/ACCESS.2017.2702187 2. R. Team (2015) Email statistics report, 2015–2019. The Radicati Group, Inc., Palo Alto, CA, USA 3. Brutlag JD, Meek C (2000) ‘Challenges of the email domain for text classification. In: Proceedings of ICML, pp 103–110 4. Stich J-FT (2019) E-mail load, workload stress and desired e-mail load: a cybernetic approach. Emerald Insight 32:430–452. Retrieved from https://www.emerald.com/insight/content/doi/ 10.1108/ITP-10-2017-0321/full/html 5. Brutlag JD (2000) Challenges of the email domain for text classification, pp 103–110. Retrieved from https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/AF1-1.pdf 6. Cormack GV (2007) Email spam filtering: a systematic review’. Found Trends Inf Retr 1:335– 455

246

J. Akhtar et al.

7. HelpDesk Management (n.d.) Retrieved from freshworks.com: https://www.freshworks.com/ freshdesk/helpdesk-management/team-inbox/ 8. Al-Emran M, Chalabi HA (2014) Developing an IT help desk troubleshooter expert system for diagnosing and solving IT problems. In: Proceedings of the 2nd BCS International IT conference 2014 2, pp 1–5. https://doi.org/10.14236/ewic/bcsiit2014.16 9. Blanzieri EB (2008) A survey of learning-based techniques of email spam filtering. Springer, pp 63–92. https://doi.org/10.1007/s10462-009-9109-6 10. Abu-Nimeh S (2007) A comparison of machine learning techniques for phishing detection. ICPS Proc 60–69. https://doi.org/10.1145/1299015.1299021 11. Classification in Machine Learning (n.d.) Retrieved from https://www.edureka.co/: https:// www.edureka.co/blog/classification-in-machine-learning/ 12. Support Vector Machine (SVM)—A complete guide for beginners. Retrieved from Analyticsvidhya: https://www.analyticsvidhya.com/blog/2021/10/support-vector-machinessvm-acomplete-guide-for-beginners/ 13. All machine learning algorithms. Retrieved from terenceshin.medium: https://terenceshin.med ium.com/all-machine-learning-algorithms-you-should-know-for-2023-843dba11419c 14. Renuka DK, Visalakshi P (2014) Latent semantic indexing based SVM model for email spam classification. J Sci Ind Res 73:437–442 15. Klimt BY (2004) The enron corpus: a new dataset for email classification research. In: Machine learning: ECML 2004, pp 217–226. https://doi.org/10.1007/978-3-540-30115-8_22

Intelligent Identification and Multiclass Segmentation of Agricultural Crops Problem Areas by Neural Network Methods Aleksey F. Rogachev and Ilya S. Belousov

1 Introduction Among the artificial intelligence (AI) methods that can be used for computer neural network detection of the state of crops, it is possible to distinguish the segmentation of their color images [1, 2] by the required classes. In the Decree of the Government of the Russian Federation dated 07/14/2021 No. 717 “On the state program for the development of agriculture and regulation of agricultural products markets …” it is justified that ensuring the development of the agro-industrial complex through the introduction of digital information technologies is a popular task. The need to develop and implement “a system for automated recognition of the specifics of the state of plant surface elements…” is also mentioned [3]. The task of identifying and assessing the condition of agricultural fields on significant areas, recorded and analyzed, including in various parts of the spectrum, is high-tech and very labor-intensive [4]. The initial information flows are obtained by remote sensing methods and using data from other functional systems, including unmanned aerial vehicles [5, 6]. AI methods such as machine learning and pattern recognition are used to process input data. The most versatile technologies are deep artificial neural networks (DNN). For the segmentation of images of agrophytocenoses, in order to identify their biological state during vegetation, various computer systems are known [7], which are built according to U-Net and FCN architectures [4] and allow neural networks to be trained on the corresponding labeled images. The specific complexity of the task of assessing the condition and development of crops of various agricultural cultures is the multicomponent structures of the analyzed multicolored images. This leads to the use of a systematic approach in the A. F. Rogachev (B) · I. S. Belousov Volgograd State Agrarian University, Russian Federation, Volgograd, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_20

247

248

A. F. Rogachev and I. S. Belousov

process of computerized intellectual research, including the formation of databases, their preprocessing, and analysis directly (patents RU 2,688,234, IPC G06Q 10/06, SEC A01G 9/00; 2,723,189, IPC A01G 7/00, G06N 5/00, G06N 20/00, G06Q 50/ 02, etc.). Solving the problems of image segmentation provides an increase in the efficiency of agricultural production management [8–10]. Well-known studies, including those carried out by the authors, were based on solving the problem of classification of high-resolution color images. This allowed us to get only a general idea of the problem when the detected defective areas of the fields were of considerable size. The complexity of image processing increases quadratically with increasing linear size of images, which increases the requirements for the hardware of the system being developed. The obtained results of the neural network solution of the problem of image analysis and recognition in relation to the state of agricultural fields [5, 11] show a certain limitation in the formulation and solution of the problem of classification of agricultural fields. This circumstance led to the transition to the formulation and solution of a more complex task of semantic segmentation [12], or “pixel-by-pixel classification,” which allows obtaining results that are more in demand by agricultural production [13]. At the same time, a number of specific issues, including the methodology for choosing the optimal image dimension and database structure, justifying the effective combination of hyperparameters of neural networks, require modification of approaches and additional solutions. The solution of such tasks in agricultural production provides intellectual support for managerial decision-making.

2 Methods and Materials To create a deep segmentation-type ANN, the capabilities of the Pwtorch framework were used [14]. The architecture of the segmentation neural network was selected from modules embedded in the Pwtorch libraries. In the process of researching the capabilities of neural network architecture, the accuracy of the compared architectures was evaluated on the basis of the well-known dataset “COCO train2017” [23]. In order to test the hypothesis put forward about the advantages of the segmentation network, the DeepLabV3 ResNet50 architecture was chon. The mentioned deep ANN architecture is characterized by an accuracy comparable to DeepLabV3 ResNet101, but it functions noticeably faster. The AN configuration represents the DeepLabV3 neural network based on ResNet50 and includes sequences of convolutional blocks whose outputs are directly aligned with the previous blocks. To assess the quality of the segmentation during training, the well-known “Dice coefficient” metrics (1) were used Dice(X, Y ) =

2|X ∩ Y | |X | + |Y |

(1)

Intelligent Identification and Multiclass Segmentation of Agricultural …

249

Fig. 1 Segmentation of images of agricultural fields

where |X| is the cardinality of the predicted X–set, Y is the pre-marked mask, and «Jaccard coefficient» (2) J (A, B) =

|A ∩ B| |A ∪ B|

(2)

To solve the problem of segmentation of agricultural fields, when forming databases and datasets, RGB color images of agricultural fields with a size of 500 × 500 pixels were used, obtained using unmanned aerial vehicles (UAVs) [15]. RGB color images were placed in four classes: (1) a quality field, (2) a defective field, (3) a field after cultivation, (4) a non-agricultural field (other objects). In the process of expert markup of the dataset, a segmenting mask was formed in a graphical editor (Fig. 1). DNN training was implemented on a core i5 CPU with a graphics card using multithreaded technology. The type of GPU is Nvidia RTX 2080Ti, which supports the CUDA Toolkit 11.4 library. In the process of DNN training, various methods of augmentation of source images were used to expand the volume of the training dataset, for example, the “imgaug” method.

3 Results and Discussion 3.1 Justification of the Segmentation DNN Architecture U-Net and DeepLabV3 were used as the architecture in the search experiments (Fig. 2). DeepLabV3 is an architecture that uses blocks of convolutional layers with 3 × 3 convolution cores [16] to highlight specific “features” of images that can be used to solve the segmentation problem. After multiple compressions of images with convolutional layers, the DNN forms a “new” segmented image from images obtained at various stages of processing the original one. After that, the information is concatenated with the final result of the backbone. In the experiments conducted, DNN training was carried out both for compressed images and without compression. The latter option was implemented by splitting

250

A. F. Rogachev and I. S. Belousov

Fig. 2 DNN architectures for segmentation: a U-Net; b DeepLabV3

Table 1 Comparison of segmentation quality using DNN of various architectures Neural network configuration

IoU value

Network accuracy, %

FCN ResNet50

61

91

FCN ResNet101

64

92

DeepLabV3 ResNet50

66

92

DeepLabV3 ResNet101

67

92

the original image into blocks of 500 × 500 pixels. Comparison of the recognition efficiency of a test dataset by intersection over union (IoU) metrics characterizing the recognized quality of various DNN architectures on a test sample is presented in Table 1. The conducted search experiments revealed that the DeepLabV3 architecture based on ResNet50 and ResNet101 demonstrates a higher quality of work compared to FCN ResNet10. Numerical studies of the quality of recognition of agricultural fields using the developed neural networks were carried out on author’s datasets.

3.2 Learning Outcomes Developed by DNN A diagram of a typical learning process of a deep ANN being developed is designed for image segmentation of agricultural fields, according to the number of image batches submitted to its input, which is shown in Fig. 3. Instantaneous loss values are shown in blue. Due to their significant variability, for the convenience of analysis, the average loss values are shown in orange. The

Intelligent Identification and Multiclass Segmentation of Agricultural …

251

2.5 2 1.5 1 0.5

1 151 301 451 601 751 901 1051 1201 1351 1501 1651 1801 1951 2101 2251 2401 2551 2701 2851 3001 3151 3301 3451 3601 3751

0

loss

Average Loss

Fig. 3 Neural network training for segmentation of agricultural fields

average loss values monotonically decrease, start at 1.613, and practically stabilize in 0.351 when processing about 4000 batches. Figure 4 shows diagrams of changes in two segmentation metrics (Jaccard score and accuracy score) in the process of training a neural network by the number of training epochs. Note that with a qualitative coincidence of the dynamics of changes in both metrics in the process of neural network training, the “accuracy score” metric is characterized by both large absolute values and more pronounced ranges of changes in these values. This makes the use of the “accuracy score” metric more visual when analyzing the neural network learning process. As a result of training a segmentation-type DNN, sufficiently detailed masks were obtained for segmentation of the submitted images of agricultural fields (Fig. 5). The use of neural networks based on the segmentation approach with sufficient recognition accuracy has reduced the load on processors due to the possibility of image compression.

3.3 Discussion of the Results U-Net and DeepLabV3 were used as the architecture in the search experiments (Fig. 2). DeepLabV3 is an architecture that uses blocks of CNN convolutional layers with 3 × 3 convolution cores [16] to highlight specific “features” of images that can be used to solve the segmentation problem (Fig. 2). The improvement of the accuracy and quality of segmentation of images of recognized agricultural fields at different stages (phases) of the growing cycle [17, 18] is

252

A. F. Rogachev and I. S. Belousov

Jaccard score

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

(a)

1

3

5

7

9

11 13 15 17 19 21 23 25 27 29

Accuracy score 0.76 0.74 0.72 0.7 0.68 0.66 0.64 0.62 0.6 0.58

(b)

1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930

Fig. 4 Comparison of learning metrics of the segmentation DNN being developed: a Jaccard score; b accuracy score

constrained by the limited resolution and quality of manual markup of the dataset used for training, as well as by the hyperparameters of the DNN [19–21]. In subsequent studies, it is necessary to improve the database and hyperparameter parameters of the DNN being developed. It is also possible to recognize retrospective images of acropolises during plant vegetation by time series (TS) analysis, including solving the problem of neural network regression. This approach will allow to obtain an intellectual assessment of the dynamics of the development of crops.

4 Conclusions Analysis of the results of the study of the use of neural networks for segmentation of agricultural fields showed that the considered DNN architectures DeepLabV3 in combination with ResNet50 provide a solution to the problem of segmentation of

Intelligent Identification and Multiclass Segmentation of Agricultural …

253

Fig. 5 Segmentation of the image of agricultural fields: a The original image; b processing with clusters of 500 × 500 size; c image processing when scaling up to 200 × 200

an agricultural field. At the same time, the identification of the reclamation state of agricultural fields is carried out with a simultaneous assessment of the level of plant development during the growing season. The family of developed neural networks can be used as an algorithmic core to create systems for intelligent assessment of the reclamation state of agricultural fields according to the SaaS model, while the performance of the DNN used is decisive. The improvement of the developed DNNs will provide additional opportunities for analyzing time series of agricultural fields’ images during the growing season in order to intelligently assess the dynamics of plant development. Acknowledgements The publication has been prepared with the financial support of the RFBR project No. 20-37-90142.

254

A. F. Rogachev and I. S. Belousov

References 1. Badrinarayanan V, Kendall A, Cipolla R (2016) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. https://doi.org/10.48550/arXiv.1511.00561. Retrieved from http://docs.cntd.ru/document/902361843 2. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. https://doi.org/10. 48550/arXiv.1704.06857 3. Vakulenko D, Kravets A (2021) Reengineering of business processes of agro-industrial enterprises in conditions of end-to-end digital transformation // Bulletin of the Astrakhan State Technical University. Series: management, computer engineering and computer science, pp 115–125. https://doi.org/10.24143/2072-9502-2021-3-115-125 4. Saiz-Rubio V From smart farming towards agriculture 5.0: A review on cropdata management. https://www.mdpi.com/2073-4395/10/2/207/htm. 5. Rogachev AF, Simonov AB, Ketko NV, Skiter NN (2023) Fuzzy algorithmic modeling of economics and innovation process dynamics based on preliminary component allocation by singular spectrum analysis method. Algorithms 16(1):39 6. Komarova AF, Zhuravleva IV, Yablokov VM (2016) Open multispectral data and basic methods of remote sensing in the study of vegetation cover. Principl Ecol 1:40–74 7. Rogachev AF, Simonov AB (2022) Systematic analysis of retrospective crop yields time series based on their structure identification. IOP Conferen Series Earth Environ Sci 1069(1):012014 8. Lila VB (2012) Algorithm and software implementation of adaptive method of training artificial neural networks // Engineering Bulletin of Don. Retrieved from http://www.ivdon.ru/magazine/ archive/n1y2012/626 9. Chursin IN, Filippov DV (2018) Gorokhova I. N. Recognition of agricultural crops by highresolution multispectral satellite images. Vestn a computer and information technologies. 11:22–27 10. Filippov DV, Chursin IN (2018) Evaluation of the quality of digital aerial photographs // Vestn a computer and information technologies, pp 34–39 11. Kurganovich KA, Shalikovsky AV, Bosov MA, Kochev DV (2021) Application of artificial intelligence algorithms for flood-prone territories control // Water management of Russia: problems, technologies, management, pp 6–24. https://doi.org/10.35567/1999-4508-2021-3-1 12. Soloviev RA, Telpukhov DV, Kustov AG (2017) Automatic segmentation of satellite images based on the modified convolutional neural network UNET. In: Engineering bulletin of the Don, No. 4. Retrieved from ivdon.ru/ru/magazine/archive/n4y2017/4433 13. Alekseev PP, Kvyatkovskaya IY (2022) Application of neural networks in the recognition system of commercial hydrobionts in conditions of increased fluctuation. In: Bulletin of the Astrakhan State Technical University series: management, computer engineering and computer science, no. 2. pp 76–86. https://doi.org/10.24143/2072-9502-2022-2-76-86 14. Meleshko IV, Prokhorenko VA (2019) Development of an application for semantic segmentation of images using Python, PyTorch, OpenCV and Albumentations. In: New mathematical methods and computer technologies in design, production and scientific research, materials of the XXII Republican Scientific Conference of Students and postgraduates. Gomel, pp 142–143 15. COCO train2017. Retrieved from http://cocodataset.org/#home (02.02.2023). 16. Fezan (2020) Review DeepLabv3 (Semantic Segmentation). Retrieved from https://medium. com/swlh/review-deeplabv3-semantic-segmentation-52c00ddbf28d. (22.07.2022). 17. He K, Zhang X, Ren XS, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the 29th IEEE conference on computer vision and pattern recognition—CVPR 2016, Las Vegas, Nevada, USA, 26 June–1 July 2016, pp 770–778 18. Melikhova E, Rogachev A (2023) Computer Optimization of ANN Hyperparameters for Retrospective Information Processing. Lect Notes Netw Syst 509:723–730 19. Alekseev AV, Rozaliev VL, Orlova YA, Zaboleeva-Zotova AV (2016) Context-sensitive image analysis for coloring nature images. Adv Intell Syst Comput 451:133–141

Intelligent Identification and Multiclass Segmentation of Agricultural …

255

20. Skripachev VO, Guida MV, Guida NV, Zhukov AO (2022) Investigation of convolutional neural networks for object detection in aerospace images. Int J Open Inform Technol 10(7):2307–8162 21. Rosebrock A Intersection over Union (IoU) for object detection. Retrieved from https://www. pyimagesearch.com/2016/11/07/intersection-over-unioniou-for-object-detection 22. Sik-Ho T (2022) Review: DeepLabv3. Atrous convolution (Semantic Segmentation). Retrieved from https://towardsdatascience.com/review-deeplabv3-atrous-convolution-semantic-segmen tation-6d818bfd1d74. (22.07.2022)

Perishable Products: Enhancing Delivery Time Efficiency with Big Data, AI, and IoT Saâdia Chabel and El Miloud Ar-Reyouchi

1 Introduction Transport without advanced technologies such as BD, AI, and the IoT can be referred to as CT. These terms signify transportation methods [1] that do not incorporate sophisticated digital systems or data analytics. Here are some ways these technologies can be utilized to enhance transportation. BD platforms [2] can collect and analyze real-time data from various sources, including GPS tracking devices and IoT sensors [3], to provide accurate and upto-date information on the location and condition of products during transit. BD analytics can analyze historical shipping data, traffic patterns, weather conditions, and other relevant factors to identify the most efficient and reliable routes for product delivery. AI algorithms can then continuously optimize these routes based on realtime data [4], minimizing delays and ensuring timely deliveries. AI can monitor the condition of delivery vehicles and equipment, detecting potential issues or failures before they occur. By analyzing data from sensors and machine learning models [5], predictive maintenance algorithms can schedule maintenance and repairs proactively, reducing the risk of breakdowns that could cause delays. This enables stakeholders to monitor and manage the delivery process effectively, intervening promptly in case of any issues that may lead to delays. AI revolutionizes logistics and supply chain activities [6] through various applications. It enhances operations and improves efficiency in key areas such as demand forecasting, warehouse automation, supply chain visibility, predictive maintenance, risk management, chatbots and virtual assistants, supplier selection and relationship S. Chabel Higher School of Technology, Guelmim, Ibn Zohr University Agadir, Agadir, Morocco e-mail: [email protected] E. M. Ar-Reyouchi (B) Abdelmalek Essaâdi University, Tetouan, Morocco e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_21

257

258

S. Chabel and E. M. Ar-Reyouchi

management, quality control, and reverse logistics. In protocol [7], AI enhances wireless baggage tracking and monitoring, optimizing baggage claims at airport terminals. AI enables better decision-making, operational efficiency, and customer satisfaction. Its continuous advancement holds the potential for further innovation and optimization of supply chain processes. AI-powered computer vision systems [8] can assess the condition of fragile and sensitive products during packaging and handling. Attractive proposal based on BD can analyze sentiment using AI [9]. At the same time, in our context, leveraging BD and AI technologies, businesses can gain valuable insights, optimize operations, and improve decision-making processes, ultimately reducing delivery delays and ensuring that fragile and sensitive products reach their destinations on time and in optimal condition. Enhancing the efficiency of product delivery is crucial, especially when dealing with fragile and sensitive items. By leveraging the power of IoT technology, significant advancements can be made in ensuring such products’ safe and timely transportation and other domain such as medical [10, 11]. IoT enables real-time monitoring and tracking of shipments, providing valuable insights into various aspects of the delivery process. Sensors and connected devices can collect temperature, humidity, shock, and vibration data. This information can be continuously analyzed to identify deviations from optimal conditions and take prompt corrective actions. IoT empowers logistics companies to proactively address transportation issues, optimize routes, and efficiently deliver fragile and sensitive products. Real-time monitoring, route optimization [12], and enhanced visibility provided by IoT enable businesses to overcome challenges and ensure secure and timely transportation of these items, benefiting efficiency, safety, and customer satisfaction. This article explores the remarkable progress that can be attained through integrating BD, AI, and IoT in transporting fragile and sensitive products. It places particular emphasis on the goal of minimizing DT to enhance efficiency and customer satisfaction. The rest of this paper is organized as follows. Section 2 introduces DT in our context, including the objective, advantage, challenger, and DT description. Section 3 presents an optimization approach detailing the model system and the proposed optimization methodology designed to minimize the DT for fragile and sensitive products. Section 4 focuses on analyzing optimization results, comparing CT with integrated recent transportation technologies to highlight the advantages of the proposed optimization approach. Section 5 concludes the paper by summarizing key findings, discussing implications, and exploring future research opportunities in integrating optimization techniques to minimize the DT for fragile and sensitive products.

Perishable Products: Enhancing Delivery Time Efficiency with Big Data …

259

2 Delivery Time in Our Context 2.1 Optimizing Delivery Time in Transportation: Exploring Algorithmic Approaches In our context, they are some algorithms that can optimize DT for the transportation of fragile and sensitive products, leveraging BD, AI, and IoT. Genetic algorithms [13] are optimization techniques inspired by natural selection and genetics. In the context of DT optimization, genetic algorithms can generate and evolve a set of possible delivery routes based on various factors, such as distance, traffic conditions, and delivery constraints. The algorithm iteratively evaluates and evolves these routes to find the most efficient route that minimizes. Reinforcement learning algorithms [14] can optimize DT by training an AI agent to make decisions based on a reward system. The agent learns from historical delivery data and real-time information to take actions that minimize DT while ensuring the safety and integrity of fragile and sensitive products. The algorithm continuously learns and adapts its decision-making process to improve delivery efficiency. Particle swarm optimization (PSO) [15] is a population-based optimization algorithm that mimics the behavior of bird flocking or fish schooling. In the context of DT optimization, PSO can be used to simulate the movement of particles (representing potential delivery routes) in a search space. The particles communicate and update their positions based on their own best-known position and the best-known positions of other particles. This iterative process helps find the optimal route that minimizes DT while considering the fragility and sensitivity of the products. Ant colony optimization (ACO) [15] is an algorithm inspired by the foraging behavior of ants. In the context of DT optimization, ACO can be used to find the shortest path between the origin and destination by simulating the pheromone trail left by ants. The algorithm optimizes the delivery route by adjusting the pheromone trail based on factors such as traffic conditions, delivery constraints, and the fragility of the products. The route with the strongest pheromone trail represents the optimal path that minimizes DT. These algorithms can be applied to optimize DT for fragile and sensitive products. The choice of algorithm will depend on the specific requirements, available data, and the complexity of the delivery system.

2.2 Objective The objective of minimizing the DT for transporting fragile and sensitive products is to ensure timely and efficient transportation while maintaining the integrity and quality of these goods. The primary goal is to reduce the time it takes for the products to reach their destination, ensuring that they remain in optimal condition and meet the required delivery deadlines.

260

S. Chabel and E. M. Ar-Reyouchi

2.3 Advantage Minimizing DT time provides several advantages in transporting fragile and sensitive products. It helps to minimize the exposure of these goods to potential hazards, such as temperature fluctuations, vibration, or rough handling, which could potentially result in damage or degradation. Reducing the transit time increases the likelihood of preserving the quality and freshness of perishable items, ensuring they reach the end consumer in the best possible state. Additionally, minimizing DT enhances customer satisfaction by meeting or exceeding their expectations regarding prompt delivery.

2.4 Challenger Minimizing DT for fragile and sensitive products can present certain challenges. These challenges include ensuring adequate handling and packaging techniques to protect the products throughout transportation. Fragile items require careful handling, and sensitive products may have specific temperature or humidity requirements that must be maintained during transit. Coordinating efficient logistics operations, optimizing routes, and overcoming potential obstacles like traffic congestion or unforeseen delays are also challenges in minimizing DT. Balancing speed with the need for careful handling and quality control can be complex, requiring coordination and collaboration among stakeholders involved in the transportation process.

2.5 Delivery Time Description DT for the transportation of fragile and sensitive products refers to the time it takes to transport these items from the point of origin to the intended destination while ensuring their integrity and preserving their quality. It is an important factor to consider in the sustainability assessment of logistics solutions [16]. It involves considering these products’ specific handling requirements and potential risks to minimize delays and ensure timely delivery. Efficient route planning, streamlined logistics processes, careful packaging, and real-time monitoring are crucial in optimizing DT for transporting fragile and sensitive goods. The goal is to minimize transit time while prioritizing the safety and condition of the products throughout the transportation process. If we consider the following dates: D L : Delivery time without emerging technologies. D L P : The planned delivery: date represents the date that the product or order is expected to be delivered.

Perishable Products: Enhancing Delivery Time Efficiency with Big Data …

261

D Lr : The actual delivery date refers to the specific date on which the product or order was successfully delivered. Nc : The number of orders: represents the total quantity of orders to be delivered. Then, the delivery time for conventional transport is given by the following equation. DL =

D L p − D Lr Nc

(1)

The DT with BD, AI, and IoT is determined D L by deducting the processing time of BD data (TB D ), AI data(T AI ), and IoT(TI oT ) transactions from the conventional original delivery time D L . This calculation provides a more precise estimation of delivery duration, empowering companies to enhance their transportation and logistics operations and elevate customer service. The delivery date incorporating the emerging technology D L T can be computed using the formula (2). DL T =

D L p − D Lr − TB D − T AI − TI oT Nc

(2)

As shown by Eq. (2), advanced technologies such as BD, AI, and IoT can enhance DT by improving the quality of transportation, particularly when dealing with fragile and sensitive products. These technologies enable real-time monitoring, data analysis, and decision-making, leading to optimized routes, reduced handling risks, and enhanced overall efficiency. By leveraging these advancements, businesses can ensure timely and secure deliveries, minimizing the chances of damage or spoilage.

3 The Proposed Optimization 3.1 Model System Figure 1 depicts the synchronization architecture for departure and arrival and the loading and unloading operations of goods in diverse transportation modes within logistics, including road transport, rail transport, and maritime transport. The synchronization system comprises a tracker [17] equipped with all the necessary functionalities to track transportation means effectively. It ensures accurate geolocation and detects movements, shocks, and changes in angle, enabling the gathering of valuable information about the goods where the device is installed. Additionally, it collects data through various sensors, such as temperature and shock sensors, transmitting them to a platform that promptly provides real-time results based on BD analysis.

262

S. Chabel and E. M. Ar-Reyouchi

Fig. 1 Synchronized arrival and departure times enhanced by Big Data, Artificial Intelligence, and Internet of Things technologies

The synchronization devices are crucial in accurately calculating the geographical position transmitted to the destination platform. By offering users a comprehensive real-time overview of their activities and assets, this technology enhances efficiency and productivity by synchronizing arrival and departure times. Consequently, it mitigates delays in queues and improves overall DT. This amalgamation of cutting-edge technologies facilitates the acquisition of reliable and precise location data applicable across multiple domains, including navigation, monitoring, and fleet management.

3.2 The Proposed Optimization for Delivery Time The limitations of conventional transport optimization, which does not incorporate advanced technologies, are widely acknowledged. These limitations include extended DT, slow convergence, low optimization efficiency, and delays in final or intermediate delivery station queues. Such challenges significantly impede the improvement of delivery efficiency for fragile items and contribute to queue delays. To overcome these issues, this paper introduces a novel approach: a constrained synchronized optimization method. This method aims to address the limitations mentioned above and enhance delivery efficiency. The flowchart in Fig. 2 illustrates the sequential steps involved in this optimization method. Advanced technology is appropriately utilized to optimize DT. The proposed method introduces a novel cyclic synchronization approach to address turn delays resulting from departure and arrival time coordination across all modes of transportation for finished goods. The optimization parameters are carefully monitored

Perishable Products: Enhancing Delivery Time Efficiency with Big Data …

263

Fig. 2 Detailed sequence of steps in the proposed optimized method

and controlled at each loading and unloading stage based on the results of a significance analysis. Furthermore, various methods and surrogate models can be employed in the optimization process, considering each level’s significance.

3.3 Specification Parameter The parameters used to determine the delivery delays, whether with or without emerging technologies, are summarized in Table 1. Table 1 provides an overview of the parameters and tools for determining DT. These parameters protect and secure fragile and sensitive products throughout transportation. By accurately assessing and monitoring these factors, organizations

264

S. Chabel and E. M. Ar-Reyouchi

Table 1 Parameters for determining DT with and without emerging technologies Parameters for determining DT

The tool used to calculate the DT

Duration of study

24 h

The emerging technologies employed in the study

Big Data, The Internet of Things (IoT), and Artificial Intelligence (AI)

Means of transportation used

car, as it pertains to road transportation

The estimated measured delays

D L P , D Lr , Nc TB D , T AI , and TI oT

can enhance the efficiency and reliability of their delivery operations, minimizing potential risks and ensuring the safe arrival of goods.

4 Optımızatıon Results Analysis Utilizing BD, AI, and IoT technology in transportation and logistics can significantly enhance the DT of fragile and sensitive products. This section will assess and analyze this critical parameter for these three technologies.

4.1 Comparing CT with Integrated Recent Transportation Technologies Figure 3 illustrates the impact of BD on DT by evaluating its performance. Fig. 3 Comparison between CT and transport utilizing BD technology

Perishable Products: Enhancing Delivery Time Efficiency with Big Data …

265

As depicted in Fig. 3, BD is pivotal in enhancing and optimizing DT, improving efficiency and effectiveness across the entire delivery process. When comparing CT with transport leveraging BD technology, notable differences emerge in efficiency, accuracy, and overall delivery performance. Figure 4 provides a visual representation of the AI on DT. Figure 4 visually depicts the substantial impact of AI on enhancing and optimizing the delivery process, leading to remarkable improvements in efficiency, accuracy, and overall performance. The figure vividly illustrates how AI technologies contribute to streamlining operations and achieving significant levels of effectiveness throughout the entire delivery journey. Figure 5 provides a visual representation of the impact of IoT on the DT process. Fig. 4 Comparison between CT and the utilization of AI technology in the transportation industry

Fig. 5 Comparison between CT and the utilization of IoT technology in the transportation industry

266

S. Chabel and E. M. Ar-Reyouchi

Figure 5 serves as a visual representation of the profound impact of IoT on enhancing and optimizing the delivery process, resulting in notable advancements in efficiency, accuracy, and overall performance. The figure vividly showcases how integrating IoT technologies effectively streamlines operations and ensures significant levels of effectiveness throughout the delivery journey. By leveraging IoT, companies can achieve improved operational efficiency, enhanced real-time tracking and monitoring capabilities, and seamless coordination among various stakeholders involved in the delivery process. These benefits contribute to delivering a superior customer experience and fostering greater trust and reliability in the transportation industry.

4.2 Evaluation and Comparison of Optimization Results Figure 6 illustrates the evaluation of findings and comparison, assessing the profound impact of BD, AI, and IoT technologies on the transportation and logistics industry. It provides a comprehensive analysis and contrasting perspective with CT. The figure highlights the significant advancements achieved by integrating these emerging technologies, showcasing the transformative effects on efficiency, accuracy, and overall performance in the transportation sector. Table 2 presents a comparative analysis of performance values derived from Fig. 6 between CT and the studied approach in terms of DT for different emerging technologies. Fig. 6 Evaluation of findings and comparison: Assessing the impact of BD, AI, and IoT technologies on the transportation industry and contrasting with CT

Table 2 Comparative analysis of performance values using various emerging technologies Emerging technologies

BD

AI

IoT

BD + AI + IoT

Optimization of DT in percentage

6%

11%

13%

30%

Perishable Products: Enhancing Delivery Time Efficiency with Big Data …

267

The results obtained from Table 2 demonstrate the significant advantages of optimizing the three emerging technologies, leading to improvements in DT. Specifically, the findings indicate DT enhancements of more than 6% for BD, 11% for AI, and 13% for the IoT, compared to CT. It is important to note that these improvements in DT are achieved while ensuring a high level of security.

5 Conclusion In conclusion, this article underscores the substantial benefits of implementing BD, AI, and IoT in transporting delicate and sensitive goods, particularly in reducing DT. The study conducted over a defined period demonstrates the positive outcomes of integrating these innovative technologies, reducing delivery delays and improving punctuality. The evaluation and comparison with CT reveal significant advancements achieved by utilizing BD, AI, and IoT. These implementations have yielded remarkable improvements in DTs, surpassing those of CT while upholding stringent security measures. Moreover, the seamless synchronization of different modes of transportation further enhances overall efficiency and reliability in the transportation process. These advancements exemplify the transformative impact of emerging technologies on the efficiency, accuracy, and overall performance of the transportation and logistics industry. In future research, we intend to extend our focus to explore the effects of other emerging technologies, such as cloud computing, mobility simulation, and blockchain, on DT.

References 1. Dong C, Akram A, Andersson D, Arnäs P-O, Stefansson G (2021) The impact of emerging and disruptive technologies on freight transportation in the digital era: current and future trends. The International Journal of Logistics Management 32(2):386–412. https://doi.org/10.1108/ IJLM-01-2020-0043 2. Loukili Y, Lakhrissi Y, Ali SEB (2022) Geospatial Big Data Platforms: A Comprehensive Review. KN J. Cartogr. Geogr. Inf. 72:293–308. https://doi.org/10.1007/s42489-022-00121-7 3. Aboushelbaya, R., Aguacil, T., Huang, Q., Norreys, P.A. (2022). Efficient Location-Based Tracking for IoT Devices Using Compressive Sensing and Machine Learning Techniques. In: Nikeghbali, A., Pardalos, P.M., Raigorodskii, A.M., Rassias, M.T. (eds) High-Dimensional Optimization and Probability. Springer Optimization and Its Applications, vol 191. Springer, Cham. https://doi.org/10.1007/978-3-031-00832-0_12 4. Abduljabbar R, Dia H, Liyanage S, Bagloee SA (2019) Applications of Artificial Intelligence in Transport: An Overview. Sustainability 11(1):189. https://doi.org/10.3390/su11010189 5. Day, R.J., Salehi, H., Javadi, M. (2019) IoT Environmental Analyzer using Sensors and Machine Learning for Migraine Occurrence Prevention. In: Proceedings of the 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 2019, pp. 1460–1465. doi: https://doi.org/10.1109/ICMLA.2019.00239. 6. Walter S (2023) AI impacts on supply chain performance: a manufacturing use case study. Discov Artif Intell 3:18. https://doi.org/10.1007/s44163-023-00061-9

268

S. Chabel and E. M. Ar-Reyouchi

7. Chabel, S., Ar-Reyouchi, E.M. (2023). Artificial Intelligence: An Effective Protocol for Optimized Baggage Tracking and Reclaim. In: Shakya, S., Balas, V.E., Haoxiang, W. (eds) Proceedings of Third International Conference on Sustainable Expert Systems. Lecture Notes in Networks and Systems, vol 587. Springer, Singapore. https://doi.org/10.1007/978-981-197874-6_56. 8. Matsuzaka Y, Yashiro R (2023) AI-Based Computer Vision Techniques and Expert Systems. AI 4(1):289–302. https://doi.org/10.3390/ai4010013 9. Sefraoui, O., Bouzidi, A., Ghoumid, K., Ar-Reyouchi, E.M. (2023). An Attractive Proposal Based on Big Data for Sentiment Analysis Using Artificial Intelligence. In: Jacob, I.J., Kolandapalayam Shanmugam, S., Izonin, I. (eds) Data Intelligence and Cognitive Informatics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-60048_26 10. Ar-Reyouchi EM, Ghoumid K, Ar-Reyouchi D, Rattal S, Yahiaoui R, Elmazria O (2022) Protocol wireless medical sensor networks in IoT for the efficiency of healthcare. IEEE Internet of Things J 9(13):10693–10704. https://doi.org/10.1109/JIOT.2021.3125886 11. Rattal, S., Ghoumid, K., Ar-Reyouchi, E.M. (2022). A Flexible Protocol for a Robust Hospitals Network Based on IoT. In: Pandian, A.P., Fernando, X., Haoxiang, W. (eds) Computer Networks, Big Data and IoT. Lecture Notes on Data Engineering and Communications Technologies, vol 117. Springer, Singapore. https://doi.org/10.1007/978-981-19-0898-9_69 12. Ichoua1, S., Gendreau, M., Potvin, JY. (2007). Planned Route Optimization For Real-Time Vehicle Routing. In: Zeimpekis, V., Tarantilis, C.D., Giaglis, G.M., Minis, I. (eds) Dynamic Fleet Management. Operations Research/Computer Science Interfaces Series, vol 38. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-71722-7_1. 13. W. Ongcunaruk et al.Genetic algorithm for a delivery problem with mixed time windows Computers & Industrial Engineering (2021). 14. Yuan Y, Li H, Ji L (2021) Application of Deep Reinforcement Learning Algorithm in Uncertain Logistics Transportation Scheduling. Comput Intell Neurosci 25(2021):5672227. https://doi. org/10.1155/2021/5672227.PMID:34608384;PMCID:PMC8487393 15. Dzalbs I, Kalganova T. Accelerating supply chains with Ant Colony Optimization across a range of hardware solutions. Comput Ind Eng. 2020 Sep;147:106610. doi: https://doi.org/10. 1016/j.cie.2020.106610. Epub 2020 Jun 29. PMID: 32834426; PMCID: PMC7323691. 16. Dai K, Zhu Z, Tang Y et al (2021) Position synchronization tracking of multi-axis drive system using hierarchical sliding mode control. J Braz Soc Mech Sci Eng 43:204. https://doi.org/10. 1007/s40430-021-02906-9 17. Papoutsis K, Dewulf W, Vanelslander T et al (2018) Sustainability assessment of retail logistics solutions using external costs analysis: a case study for the city of Antwerp. Eur Transp Res Rev 10:34. https://doi.org/10.1186/s12544-018-0297-5

CNN Approach for Identification of Medicinal Plants Tushar Kumar Maurya, Aryaman Singh, and V. Pandimurugan

1 Introduction Around the planet, there are almost countless plant species. From ancient times to the present, several medications have been produced from certain plants that are utilized in medicine. There are around 449 recognized medicinal plants in India. Many types of traditional and contemporary medicine can be made from these plants. Even for seasoned botanists, categorizing medicinal plants are a challenging undertaking that takes a long time because of their enormous number, because it heavily depends on the inherited wisdom of a seasoned botanist. In this study, computer vision and machine learning techniques have been used to present a fully automated system for the identification of medicinal plants. Some characteristics of this chapter include: • Traditional medicine has long made use of plants due to their nutritional and medicinal properties. • Multiclass classification: The project involves identifying multiple plant species, making it a multiclass classification problem. • Image processing: Utilizing convolutional neural networks, the project involves processing and analyzing images of various plant species. • Feature extraction: It would be simpler to identify various plant species thanks to the CNN model’s ability to learn how to extract pertinent information from the photographs.

T. K. Maurya · A. Singh · V. Pandimurugan (B) Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kanchipuram, Tamil Nadu 603203, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_22

269

270

T. K. Maurya et al.

1.1 Motivation The motivation behind the CNN-based identification for medicinal plants project is driven by the need to accurately identify and classify various species of medicinal plants. The traditional methods of identifying medicinal plants, based on visual and manual analysis, are time-consuming and require specialized knowledge, which makes the process inefficient and error prone. • CNN-based identification can automate the identification process, enabling quick and accurate identification of medicinal plants. This can have significant implications for the pharmaceutical industry, as identifying new plant species with medicinal properties could lead to the development of new drugs and medicines. • The initiative also has the potential to increase the effectiveness of conventional medical procedures by making it possible to identify uncommon therapeutic plants. This can result in the identification of novel, more potent medicines and remedies that can be used to a variety of ailments and disorders. • Moreover, the project can contribute to the preservation of natural resources by enabling the identification and monitoring of endangered plant species. This can lead to the development of conservation strategies aimed at preserving these important plant species for future generations. Overall, the CNN-based identification for medicinal plants project has the potential to make a significant impact on the fields of medicine, traditional medicine practices, and environmental conservation.

1.2 Scope The CNN-based identification for medicinal plants project has significant scope in the field of botany and medicine. The project can assist botanists and researchers in accurately identifying different species of medicinal plants, which can be used for the development of new drugs and medicines. It can also be used to identify the presence of medicinal plants in agricultural fields, helping farmers to make more informed decisions. Moreover, the project has potential applications in environmental conservation efforts, enabling the identification and monitoring of endangered plant species. Overall, the project has the potential to advance our understanding of medicinal plants and contribute to the development of new medicines and environmental conservation efforts.

CNN Approach for Identification of Medicinal Plants

271

2 Problem Statement The use of plants as a source of medicine has been practiced for centuries, and there is increasing interest in identifying plants with potential medicinal properties. However, identifying medicinal plants accurately and efficiently is a tough task due to the huge variety of spices, and their appearance is also different. This is where it can potentially offer a solution. With the increasing demand for medicinal plants, it is essential to have a reliable and efficient method of identification to ensure that the correct species are being used for medicinal purposes. It has been demonstrated that CNNs work well for a variety of image recognition tasks, and this project seeks to leverage this technology for plant identification. However, there are issues with this project that must be resolved. The variety in the look of plants due to elements like sunlight, color, and shape is one of the main issues. These variations can make it difficult for the CNN to accurately classify plants. Additionally, there is a need for a comprehensive dataset of medicinal plant images that can be used to train CNN effectively. Another challenge in developing a CNN-based identification system for medicinal plants is the potential for misclassification due to similarities between different plant species. Some species of plants may have similar features, making it difficult to distinguish between them using just visual characteristics. Therefore, the CNN model developed in this project must be capable of recognizing subtle differences in plant features to ensure accurate classification. Additionally, the CNN model’s robustness to changes in environmental factors, such as lighting and background, needs to be addressed. Environmental factors can significantly affect image quality, making it challenging to extract features for classification. One way to enhance the model’s accuracy and efficiency is to fine-tune its hyperparameters, such as the number of convolutional layers and the batch size. Another approach is to augment the training dataset by adding more images or by applying transformations, such as rotation, cropping, and flipping, to the existing images.

3 Literature Review The plants are also challenging to distinguish due to their almost identical shape and color. In a study by Zhang et al. (2019), titled “Plant identification based on deep convolutional neural networks,” [2] the authors developed a plant identification system that used a CNN to classify images of plant leaves. The study used a dataset of 102 plant species and achieved an accuracy of 97.18%. The authors concluded that CNN-based identification systems have the potential to provide reliable and accurate plant identification, which can be beneficial for agriculture and environmental monitoring. A review article by Kumar et al. (2020), titled “Plant disease diagnosis using deep learning: a review,” explored the use of deep learning techniques, including CNNs,

272

T. K. Maurya et al.

in plant disease diagnosis.[5] The authors highlighted the potential of CNN-based models to accurately identify plant diseases based on plant images, which can aid in early detection and treatment. The study emphasized the need for more research on CNN-based plant disease diagnosis to improve the efficiency and accuracy of plant disease diagnosis. “Identification of Different Plants through Image Processing Using Different Machine Learning Algorithms” by Vikhao et al. (2020) reviews several studies that have utilized image processing techniques for plant identification.[13] It discusses the use of feature extraction methods, such as shape, texture, and color descriptors, to characterize plant images. Additionally, it explores the effectiveness of different machine learning algorithms in classifying plant species based on these extracted features. The survey aims to provide insights into the current state-of-theart approaches and identify potential areas for future research in this field [1–10]. The survey summarizes the existing research and highlights the strengths and limitations of different approaches [7, 10–19]. “A novel framework for automatic plant species identification using deep learning” by Cheng et al. (2020) developed a plant identification system that utilized a CNN to classify plant images.[1] The study used a dataset of ten plant species and achieved an accuracy of 97.6%. The authors concluded that CNN-based identification systems have the potential to revolutionize plant identification by improving the efficiency and accuracy of plant identification. [6] “Deep learning-based plant identification system using convolutional neural network” by Jang et al. (2019) proposed a plant identification system that utilizes deep learning techniques such as CNNs to classify plants based on their images. The authors used a dataset of 185 plant species and achieved an accuracy of 98.3%. The study showed the potential of CNN-based plant identification systems to accurately classify plants and improve the efficiency of plant identification. “Plant recognition by AI - Deep neural nets, transformers, and kNN in deep embeddings” by Picek. L et al. (2022) presents a review of plant recognition using AI techniques, specifically deep neural networks, transformers, and k-nearest neighbors (kNN) in deep embeddings.[11] It discusses the application of these algorithms in plant classification, emphasizing their strengths and limitations. The review aims to provide a comprehensive understanding of the current advancements in plant recognition through AI-based approaches [20][15]. They come to conclusion that a competitive alternative that outperforms direct classification is to train an image retrieval system and then classify the results using nearest neighbors. All things considered, these experiments show how effective and precise plant identification, disease diagnosis, and recognition systems based on CNN can be. The experiments highlight the need for additional study to create more reliable and precise CNN-based plant identification systems.

CNN Approach for Identification of Medicinal Plants

273

4 Proposed System The process used to perform is preliminary processing of the dataset, preprocessing of data, algorithms would be employed for categorization in order to properly classify the review, and evaluation of performance (Several metrics are employed to evaluate performance.). Proposed system architecture is shown in Fig. 1.

4.1 Dataset Several deep learning models were trained and evaluated on the provided datasets in order to construct the real-time plant species identification system. Below are descriptions of the datasets, the preprocessing procedures, and the training and testing of the models. The PlantCLEF 2015 dataset (Table 1) is a collection of images specifically curated for the purposes of the ImageCLEF Plant Identification Task in 2015. It is a subset of the larger PlantCLEF dataset, which aims to promote research and development in plant species recognition and identification using machine learning and computer vision techniques. This dataset consists of a diverse range of plant images, including various species and growth stages, and serves as a benchmark for evaluating plant recognition algorithms and systems. The term “UBD Herbarium” refers to the herbarium collection connected to the institution in Brunei known as University Brunei Darussalam (UBD). A herbarium is a collection of many different kinds of plants, including pressed and dried plants,

Fig. 1 Proposed system architecture

274

T. K. Maurya et al.

Table 1 PlantCLEF dataset

Table 2 UBD Botanical Dataset

Total

Leaf

Leaf-scan

Train

18,949

6122

9522

Test

2380

767

1190

Validation

2379

781

1134

Train

1691

Test

157

Validation

249

Total

2097

as well as related data like taxonomy, location, and collection date. UBD Botanical Dataset is given in Table 2.

4.2 Data Preprocessing • A common image processing method used to lower picture noise and enhanced image quality is called Gaussian filtering. The method blurs the image by convolving it using a Gaussian kernel, which is a bell-shaped curve. Various computer vision algorithms use this technique to improve the image structure at various scales. The human visual perception system is the basis for the Gaussian filter, which blurs images better than other filters. • Otsu’s threshold method is also used to determine the best threshold value, which maximizes the separation between the two classes, using a histogram of pixel intensities. This technique is frequently employed in image processing tasks like edge detection, image segmentation, and object recognition. More available selected data may exist than you need to deal with. • Algorithms may require more computing and memory to run as well as taking significantly longer to process larger volumes of data. A smaller amount of data helps to give good accurate results within less time and also test ideas, rather than thinking about the complete dataset. • Data augmentation is a machine learning technique used to produce extra variations of the original data to artificially enhance the size of a training dataset. It entails creating new data from old data by subjecting the original data to various transformations, such as flipping, rotating, scaling, and cropping the photographs. It aims to increase the generalizability of machine learning models and prevent overfitting, which happens when a model becomes overly complicated and memorizes the training data instead of discovering general patterns that can be applied to new data. It is particularly useful in object detection, where the size of the training dataset is often limited, and collecting additional data can be expensive

CNN Approach for Identification of Medicinal Plants

275

and time-consuming. By augmenting the training dataset, the model can learn to recognize objects under different lighting conditions, angles, and perspectives, which increases its robustness and accuracy when tested on new, unseen data. • There are several techniques used for data augmentation, including geometric transformations, color transformations, and noise injection. Geometric adjustments, such scaling, rotating, and cropping, entail altering the image’s shape and direction. Color transformations alter the color of the image, such as changing the brightness, contrast, and saturation. Noise injection adds random noise to the image, simulating different conditions that the model may encounter in the real world. Overall, this technique will improve our performance.

4.3 Pretrained CNN Architecture We will first choose the input picture size for the CNN architecture for plant identification, which will be based on the dimensions and resolution of the images in our dataset. The number of convolutional layers, their matching filter sizes, and the number of filters will all be determined next. The kind and number of pooling layers to be employed will also be decided. The size and number of fully connected layers, as well as the activation function, will then be decided. In order to train the network, we will finally choose an acceptable optimizer and loss function. The main components of a CNN are the convolutional layers, pooling layers, fully connected layers, and the output layer. Each layer performs a specific function in the process. • The input layer, which is the initial layer in a CNN, receives raw picture pixels as input. The convolutional layer, which comes after the input layer, is in charge of taking features out of the input image. Numerous filters, also referred to as kernels, make up the convolutional layer. These filters convolve over the features that they have extracted from the input image using dot products. • Each filter detects specific patterns in the input image, such as edges, curves, and shapes. The filters are learned through backpropagation, where the model adjusts the filter weights to minimize the loss function during training. The convolution operation decreases the input image’s spatial dimensionality while maintaining its key characteristics. • After the convolutional layer, the output is passed through a rectified linear unit (ReLU) activation function, which applies an element-wise threshold to the output values. ReLU introduces nonlinearity to the model and helps to model more complex relationships between the input and output. The feature maps are downscaled, and the computational complexity of the model is decreased as a result of the pooling layer’s subsequent processing of the ReLU layer’s output. The pooling layer commonly employs either maximum (max) or average (avg) pooling, where the output is the maximum or average value of a subregion of the feature map.

276

T. K. Maurya et al.

• The fully connected layers follow the pooling layer and are responsible for classifying the features into the desired classes. A group of neurons that are connected to every neuron in the layer before form the completely connected layer. In order to produce the output, each neuron in the fully connected layer computes a weighted sum of the inputs and then applies an activation function. The output layer of the CNN is the final layer that produces the prediction for the input image.

5 Result and Discussion After training is finished, the model is ultimately assessed. Discussions and an analysis of the objective results are included in this section. The effectiveness of parameter implementation is assessed using standard metrics including accuracy, specificity, sensitivity, precision, and F1 score. Their numerical equations are described in the eqns. below. Accuracy =

TP +TN T P + T N + FP + FN

(1)

Sensitivit y =

TP T P + FN

(2)

Pr ecision =

TP T P + FP

(3)

Speci f icit y = F1 scor e =

TN T P + FP

2 ∗ (Sensitivit y ∗ Pr ecision) Sensitivit y + Pr ecision

(4) (5)

True positive (TP), true negative (TN), false positive (FP), and false negative (FN) are the abbreviations for these terms. We tested the effectiveness of our method using three datasets for identifying plant species that are freely available: The observations are gathered from the PlantCLEF dataset, the UBD Botanical dataset, and both. It has been found that the proposed CNN performs better than all other methods. The use of Otsu’s method, which catches important information lost by regular CNN during training, is responsible for this result. Accuracy rate of different models are given in Table 3. The proposed CNN model is better in comparison with other machine learning algorithms as CNN scores 89.3% in predicting the correct plant. CNN is so good at identification because of these features:

CNN Approach for Identification of Medicinal Plants Table 3 Different model accuracy

Model name

277

Accuracy (%)

Random forest

86.1

Support vector machine

81.5

K-nearest neighbors

80.0

Proposed CNN model

89.3

• Hierarchical feature extraction: CNNs are expert at automatically learning hierarchical features from raw pixel data because they were originally created for processing images. Convolutional layers, which they use to extract significant features at various levels of abstraction, can capture local patterns and spatial correlations. The complexity and variety found in images may not be fully captured by classic algorithms like random forest, SVM, and kNN, which often rely on created features. • End-to-end learning: CNNs are capable of end-to-end learning, which enables them to simultaneously improve their internal classifiers and representations while learning from the underlying raw data (images). As a result, human feature engineering is no longer necessary, increasing the flexibility and dataset adaptability of CNNs. • Scalability: Thanks to their ability to take advantage of parallel processing, effective memory management, and optimization approaches like mini-batch training, CNNs have shown good scalability to big datasets. The performance of CNN in image classification is shown in Fig. 2. Despite the widespread usage of random forest, SVM, and kNN in machine learning applications, CNNs are more competitive in picture recognition tasks due to

Fig. 2 Performance of CNN in image classification

278

T. K. Maurya et al.

their greater ability to capture complicated image patterns and their lack of reliance on manual feature engineering. It is important to keep in mind that the choice of algorithm depends on the particular problem, the data that is available, and the computing power available. In certain cases, conventional algorithms may still be a good choice for specific image recognition jobs.

6 Conclusion In conclusion, the proposed CNN-based model for the identification of medicinal plants has shown promising results in accurately classifying plant species. The use of convolutional layers and data augmentation techniques has helped to extract relevant features and improve the model’s performance. The dataset used in this project was carefully curated and preprocessed to ensure its quality and suitability for the task at hand. However, there is still room for improvement in this project. Expanding the dataset to include more plant species and varieties is one way to make improvements. This could help the model learn more diverse features and enhance its accuracy. Another possible improvement is to experiment with different CNN architectures and hyperparameters to find the optimal combination for this specific task. Additionally, it further improves the accuracy of the model. The proposed model has demonstrated the potential to revolutionize the identification of medicinal plants. Further improvements and advancements in this field could have significant implications for the development of new medicines and the conservation of plant species.

References 1. Taisong Jin, Xueliang Hou , Pifan Li, Feifei Zhou A Novel Method of Automatic Plant Species Identification Using Sparse Representation of Leaf Tooth Features (2015) 2. Zhu, H., Liu, Q., Qi, Y. et al. Plant identification based on very deep convolutional neural networks. Multimed Tools Appl 77, 29779–29797 (2018). CONFERENCE 2016, LNCS, vol. 9999, pp. 1–13. Springer, Heidelberg (2016). 3. Vijayashree, T. and Gopal, A., 2015. Classification of Tulsi Leaves Based on Texture Analysis. 4. Putri Y, Djamal C, Ilyas R (2021) Identification of Medicinal Plant Leaves Using Convolutional Neural Network. J Phys: Conf Ser 1845:012026. https://doi.org/10.1088/1742-6596/1845/1/ 012026 5. Lee, Sue Han & Chan, Chee Seng & Wilkin, Paul & Remagnino, Paolo. (2019). Deep-Plant: Plant Identification with convolutional neural networks. 6. Sun, Yu & Liu, Yuan & Wang, Guan & Zhang, Haiyan. (2017). Deep Learning for Plant Identification in Natural Environment. Computational Intelligence and Neuroscience. 2017. 1-6https://doi.org/10.1155/2017/7361042 7. Satti, V., Satya, A. and Sharma, S., 2013. An automatic leaf recognition system for plant identification using machine vision technology.International Journal of Engineering Science and Technology, 5(4), p.874.

CNN Approach for Identification of Medicinal Plants

279

8. Kumar S (2012) E, Leaf Color, Area and Edge features based approach for Identification of Indian Medicinal Plants”. International Journal of Computer Science and Engineering 3(3):436442 9. Du JX, Wang XF, Zhang GJ (2007) Leaf shape based plant species recognition. Appl Math Comput 185(2):883–893 10. Signal Processing and Multimedia. MulGraB 2010, SIP 2010.Communications in Computer and Information Science, vol 123. Springer, Berlin, Heidelberg 11. Picek L, Šulc M, Patel Y, Matas J (2022) Plant recognition by AI: Deep neural nets, transformers, and kNN in deep embeddings. Front Plant Sci 13:787527. https://doi.org/10.3389/fpls.2022 12. Zhao-yan, L., Fang, C., Yi-bin, Y., Xiu-qin, R.: Identification of rice seed varieties using neural network. Journal of Zhejiang University SCIENCE, 1095–1100 (2005)Google Scholar Neuman, M., Sapirstein, H.D., Shwedyk, E., Bushuk, W.: Wheat grain color analysis by digital image processing: I. Methodology. J. Cereal Sci. 10, 175–182 (1989)CrossRefGoogle Scholar 13. Damian, M., Cernadas, E., Formella, A., Sa-Otero, P.M.: Pollen classification of three types of plants of the family Urticaceae, http://trevinca.ei.uvigo.es/~formella/inv/aaa/formella- 2002pollen.pdf, http://cas.psu.edu/docs/CASDEPT/Hort/LeafID/Arrange ment.html 14. Sanyal, P., Bhattacharya, U., Parui, S.K., Bandyopadhyay, S.K.: Color Texture Analysis of Rice Leaves for Detection of Blast Disease. In: Proceedings of the 20th CSI Conference, pp. 45–48 (2006)Google Scholar 15. N. Manasa, P. Herur, P. Shetty, N. Prarthana, and P. Venkatrao. Plant recognition using watershed and convolutional neural network. In 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA), pages 969 973. IEEE, 2019. 16. Bora DJ (2017) Performance Comparison of K-Means Algorithm and FCM Algorithm with Respect to Color Image Segmentation. International Journal of Emerging Technology and Advanced Engineering. 7(8):460–470 17. Im, C., Nishida, H. and Kunii, T.L., 1998, November. A Hierarchical Method of Recognizing Plant Species by Leaf Shapes. In MVA (pp. 158–161). 18. B. K. Varghese, A. Augustine, J. M. Babu, D. Sunny, and E. S. Cherian. Infoplant: Plant recognition using convolutional neural networks. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), pages 800807. IEEE, 2020. 19. Kumar, S., 2012. E,“Leaf Color, Area and Edge features based approach for Identification of Indian Medicinal Plants”. International Journal of Computer Science and Engineering, 3(3), pp.436442. 20. Kaur, Surleen & Kaur, Prabhpreet. (2019). Plant Species Identification based on Plant Leaf Using Computer Vision and Machine Learning Techniques. Journal of Multimedia Information System. 6. 49–60. https://doi.org/10.33851/JMIS.2019.6.2.49.

OBSERVO: Teaching Strategy Recommendation by Monitoring Student Behavior Patterns Rishaan Jacob Kuriakose, Sanchit Raj, M. Suguna, and C. U. Om Kumar

1 Introduction 1.1 Problem Statement The majority of teachers deal with students’ wandering attention regularly. They are frequently spotted dozing off, napping, looking away from the front of the room, texting, or working on assignments for other classes. The disregard for the class is an issue, and instructors frequently find it difficult to not take it personally. It was discovered that 80% of the pupils in the classroom were not paying attention, and a sizable portion had dyslexia or poor anger management. As education is a vital human right, current efforts to increase the number of students attending schools and make lectures enjoyable are, at best, acceptable. Even with the development of technology, digitalization of education, educational startups, workshops, and the availability of online tutors, the rate of progress is insufficient. School dropouts are becoming more and more common. Few children who are handicapped or have various levels of ability attend school because their guardians are concerned about their performance and safety in public settings. Even among the students who attend school, they have difficulties staying focused in class.

R. J. Kuriakose (B) · S. Raj · M. Suguna · C. U. Om Kumar School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India e-mail: [email protected] S. Raj e-mail: [email protected] M. Suguna e-mail: [email protected] C. U. Om Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_23

281

282

R. J. Kuriakose et al.

It was required to research the numerous situations that students encountered in lectures. It starts with a few particular queries. How many pupils in a class are not paying attention at any one time? How long do these breaks last? How many such breaks occur throughout each class period? When throughout the class hour are students’ thoughts more prone to wander? Do some teaching techniques correspond to more or less attentiveness? URL [1], a study of inattention, carried out in three separate chemistry classes addressed all of these issues. Students in those classes utilized clickers to self-report attention lapses to the study team rather than the course instructor. Students used clickers to mark each delay after each lapse by pressing one of three buttons: one for lapses of one minute or less, another for lapses of two to three minutes, and a third button for lapses of five minutes or more. Only after realizing their focus had strayed did students report lapses in attention. Because of this, the student data gathered for this study only includes the moments just after attention gaps, not the lapses themselves. The results of this study reveal that, contrary to popular opinion, students do not listen attentively throughout the whole 10–20 min of a lecture. Instead, during the lecture section, their attention shifts in shorter cycles between being interested and not engaged. When professors used “non-lecture related themes and concepts,” such as demonstrations, group work, and clicker questions, students consistently reported fewer lapses. This outcome supports previous research showing that students involved in activities other than instructor lectures are more focused and engaged. The statistics demonstrating that there are noticeably fewer attention lapses during lecture parts immediately after a demonstration or clicker question are equally noteworthy. Students seem to be reinvigorated by the tempo shift and the opportunity for exercise, which makes it simpler for them to pay more attention when the lecture continues. It was also observed that when lecturers used audio and visual aids in their lectures, students tended to respond to questions following the lectures with greater assurance and sophistication. The research’s evident teaching implications are that instructional techniques, particularly those that engage students, can increase students’ attention spans. The topic may be presented to students in a variety of ways through these exercises, which also help them focus when the activity is over. But how can a teacher tell when a certain pupil is paying attention and when he or she is not? It is important to keep in mind that a teacher may enroll classes of 20–50 pupils. While it is true that experience tends to make it simpler to comprehend students’ general behavior, there are numerous instances in which pupils struggle to grasp the ideas that even experienced teachers attempt to teach.

1.2 Aims Research was therefore initiated with the following aims in mind:

OBSERVO: Teaching Strategy Recommendation by Monitoring …

283

• Understand causes of lack of attention paid by students in general and specific scenarios. • Analyze ways it affects the students’ learning processes. • Find an optimal solution that can solve the issue of or compensate for the students’ lack of attention in classes.

1.3 Competitive Analysis While figuring out Observo’s purpose and design, research was directed extensively on existing systems and products intended to help the cause. The products enlisted below seemed to highlight the same issues mentioned above but were not analogous to the proposed product or were not notable competitors in the market, as they differ in terms of means of addressing the problem, cost, materials used, and so on and so forth. Interactive Whiteboards (IWB). Interactive whiteboards demonstrate the positive impact on students with special education needs and encourage a multisensory method of learning. IWB offers a tangible interface directed to improve the special educational needs children’s learning capacities. However, their arrangement within the classroom setting in the teacher’s ‘territory’ impacts negatively on the children as users. Interactive whiteboard’s implementation is used for low academic instructions and limited uses, mainly for review games and teacher-led interactive websites. Teachers therefore think it is essential to develop their expertise in the area of IWBs, so that they can use these tools more widely and derive a greater degree of depth from them. Flipped Classroom. Flipped learning is a pedagogical strategy in which the traditional idea of classroom-based learning is turned upside down, according to URL [2]. Students are introduced to the topic outside of class through peer discussions and teacher-led problem-solving exercises rather than learning the same in class. This concept enables the student to grasp the concept easier in class and gives him/her more time to go for extended learning and extra problems and additional sources. But again, this might not be associated with the likes of the teacher, due to difference in various values and points in the teacher’s material and the sources referred to by the student and can lead to a disruptive relationship between the 2 ends concerned. And also, the risk in letting the students to take on the conceptual part of the subject on their own, despite in a homely environment with/without the presence of the guardians must not be overlooked. Media: Scape LearnLab™. The LearnLab™ is a learning environment that combines furniture, technology, and work tools to facilitate various teaching and learning approaches. The screens are positioned in a triangular formation to provide equal access to the content and sightlines for everyone. There is no traditional front or back of the classroom, enabling all students to remain actively involved in the

284

R. J. Kuriakose et al.

learning process. Face-to-face seating encourages engagement and team collaboration. Display screens and fixed and movable whiteboards offer information permanence and enable students to create, record, and share their work. However, students can tend to misuse this abundance of technology and deviate their attention from the core content of the subject. Once reviewing the products, Observo was fine-tuned to deal with the flaws mentioned in the competing products. The device highlights on incorporating the benefits of technical advancements in education and consists of a thermal camera used to detect movements of students in classes and help in analysis of the students’ behavior in class. The device’s straightforward design also makes it inexpensive and simple to implement in classrooms. In addition to analysis of the student’s movements, analysis of his/her attention spans helps one understand their interests and weaknesses, enabling the provision of teaching strategies suited to the particular child. This helps the teacher analyze the students’ behavior from a third-person perspective. The system ensures that the teacher teaches the student better and at the same time, does not make the teacher’s contribution seem insignificant. Existing competing products are shown in Fig. 1.

Fig. 1 Existing competing products a IWB b Flipped classroom c Media: Scape LearnLab™. Source URL [2]

OBSERVO: Teaching Strategy Recommendation by Monitoring …

285

2 Theoretical Background 2.1 Background Technology Thermal Cameras and Image Processing. Thermal cameras (Fig. 2), when combined with video analytics, have long been considered the best way to detect people’s movements. They act like heat vision cameras rather than the ones which use reflected lights. Thermal images also help determine the body temperature of the subject and ascertain considering health as a possible cause of challenging student behavior. Vardonne [3] said that in order to present heat in a format appropriate for human vision, thermal security cameras convert the temperature of objects into shades of gray which are darker or lighter than the background. • Thermographic cameras usually detect radiation in the long-infrared range of the electromagnetic spectrum (roughly 9000–14,000 nm or 9–14 µm) and produce images of that radiation, called thermo-grams. • Thermal imagery is very rich in data, sensing small temperature variations down to 1/20th of a degree. The camera converts that infrared data into an electronic image that shows the apparent surface temperature of the object being measured. Warmer temperatures can be assigned a shade of red, orange, or yellow, whereas colder temperatures are sometimes given a shade of blue, purple, or green. • Inside of a thermal camera, there are a bunch of tiny measuring devices that capture infrared radiation, called micro-bolometers, and each pixel has one. From there, the micro-bolometer records the temperature and then assigns that pixel to an appropriate color. • So, in short, the infrared energy is emitted proportionately to the temperature of an object. The infrared energy from the objects is focused by the lens. This energy passes through to the infrared detector, and all this information is passed to the computer for processing as an image. • Footage safety, quality, and cost-effectiveness are some of the other encompassing features. Fig. 2 Thermal camera

286

R. J. Kuriakose et al.

Fig. 3 Images processed by thermal camera

For analysis of data, the thermal images need to be processed. All thermal images are produced by the computer after associating the different temperature readings to their respective colors. Images processed by thermal camera is shown in Fig. 3. Wi-Fi Networking. One of the many proposed advantages of Observo is that the thermal cameras also provide for a Wi-Fi connection, allowing the teachers to view the analyzed data from any device via the Internet. Wi-Fi’s wavebands are best used in line-of-sight because of their relatively high absorption levels. As of 2019, at close range, some versions of Wi-Fi running on suitable hardware can achieve speeds of over 1 Gigabit per second.

2.2 Face Detection and Recognition Eriksson and Anna [4] and Zhang et al. [5] Face detection using the Haar featurebased cascade classifier is an efficient way to detect faces. It is a technique that relies on machine learning to train cascade functions by looking at lots of positive and negative images. It is then used to detect objects in the images. To train the classifier, the algorithm first needs a lot of positive images (images of faces) and negative images (images without faces). Then, features are to be extracted from it. A visual representation of the relation between selection, positive and negative instances, and true and false positives is shown in Fig. 4. Chilap et al. [6] Local binary pattern histogram (LBPH) is a simple yet very efficient texture operator, which labels the pixels of an image by limiting the neighborhood of each pixel and considers the result as a binary number. The local binary pattern (LBP) is used for face recognition, which means identifying the captured image against the image already stored in the database. The algorithm, more suited for the application under consideration (as it is found to be faster and more efficient) than multi-block LBP and camshift algorithms explained in [3, 4, 9], makes use of

OBSERVO: Teaching Strategy Recommendation by Monitoring …

287

Fig. 4 Visual representation of the relation between selection, positive and negative instances, and true and false positives. Source [4], p. 23

four main parameters to recognize a face. The local binary pattern is applied to the image and compared against the central pixel of the image, after which the image’s histogram value is calculated. The LBPH algorithm typically makes use of four parameters: 1. Radius: The distance of the circular local binary pattern from the center pixel to its circumference and usually takes a value of 1; 2. Neighbors: The number of data points within a circular local binary pattern. Usually, the value of 8; 3. Grid X: The number of cells in the horizontal plane is usually a value of 8. 4. Grid Y: The number of cells in the vertical plane is usually a value of 8.

2.3 Concept Details Hardware Design. Observo’s components are as follows: • • • • • •

Thermal camera; Digital camera; NAND flash memory; Cables; Processing equipment; Wi-Fi networking.

The design of the thermal camera will be similar to that of the FLIR products as described in URL [7], allowing for subtle changes with the rendition of algorithmic technology to allow for detection of certain movements of students. They will be

288

R. J. Kuriakose et al.

placed at the top-most left corner of the side facing the students, so as to perceive the entire class in one frame. Graphic concept representation is shown in Fig. 5. Cameras. The thoughts of surveillance among students may create a shady atmosphere in the classroom. They may tend to imitate their actions to try to show a “plastic” or “ideal” behavior to cover their true intentions. Therefore, the size of the cameras is kept as small as possible, so as to try to not hamper the atmosphere of the classroom. Also, it is planned to finish the cameras with different colors, to match the camera with the paint of the interior walls of the classroom, so that they do not get identified/ distinguished easily by the student or get distracted by them during a lecture. Moreover, these cameras need not be there, throughout the semester. They can be taken away after a week or two from the start of the academic year, as the

Fig. 5 Graphic concept representation

OBSERVO: Teaching Strategy Recommendation by Monitoring …

289

product will have the data required for analysis by then. They can be put back again for re-analysis after a break to check the students’ behavior once again, after having adopted the new suggested teaching strategies. Assuming each session is of approximately 50 min, it is also suggested to produce classroom scans every 2 min so as to get the required number of inferences of the behavior patterns of the students. The patterns inferred from the processed scans are then further analyzed using the behavior analysis algorithm proposed. Analysis on Behavior Patterns. Given limited resources for practical research, Observo analyzes health, behavioral difficulties, home environment, distracted by external stimuli, school anxiety, lack in understanding material, and lack of routine among the causes for challenging behavior only. Customized Modules. There are currently seven learning styles, as mentioned in URL [8]: • • • •

Visual (Spatial)—Preference of pictured, images and spatial understanding; Aura (Auditory or Musical)—Interested in sound and music; Verbal (Linguistic)—Prefers the usage of words, both in speech and writing; Logical (Mathematical)—Gives importance to using logical, reasoning, and systems; • Social (Inter-personal)—Prefers learning in groups or with other people; • Solitary (Intra-personal)—Likes to work alone and do self-study. For years, teachers and students have had to struggle with how to teach and how to learn. Each teacher has their particular styles; but, then so do most students. The problems develop when the styles of the teachers and students do not match. Reservations of students as to why some teachers were better teachers than others or why one liked a certain subject over another are important observations for fine-tuning the product. Using the customized modules, Observo can be used to map the analyzed challenging behavior causes to the learning style of the particular student. If the learning style of the student can be understood, the teaching strategies fine-tuned to the particular student can be provided. Let us take an example. Suppose a student behaves in such a way that he/she progresses more using visual and solitary techniques of teaching. Observo, using the thermal image processors, senses these learning capabilities in the child and reports that the student is comfortable with the passive or deflective teaching, depending on which learning style is more comfortable to the class teacher. Along with that, the device also suggests new and adaptive revision techniques suited to the particular child, such as suggesting private tuitions at a separate time of the day, provision of various techniques on improving body-language while teaching, suggestions of how and when to ask questions to that particular child during the lectures, and so on and so forth. Teaching Strategies. Since the teacher cannot or does not have the time to analyze the behavior of each student carefully and create teaching patterns crafted for a

290

R. J. Kuriakose et al.

specific student for a class of about 10–30 children with different personalities and attitudes, this work is done by Observo for the teacher. For example, let us suppose a student who is more suited to the social style of learning. He/she may be interested in taking up group activities and can grasp the concept more easily. Observo, in this case highlights the social quality of the child after observing that he/she interacts with her peers more often during lectures, seems to get frustrated while doing individual projects and at the same time readily takes part in discussions with the teacher. Observo also mentions some useful group activities such as origami work, presentations, etc., that can lead to enhancing the child’s inner talents as well as improve her social skills. Some of the teaching strategies that can be suggested by Observo based on the analyzed behavior pattern of a student in class, based on findings in URLs [12, 13], are listed below: • Passivity—Provide students with options and chances to engage in their learning. Instead of simply transcribing notes for long periods, why not divide reading tasks into a puzzle-like activity? Explore different methods to involve learners in their own education in significant and applicable manners. Instead of offering directions, it would be more beneficial to inquire about their approach to a task. • Emotional—Emotions have a significant impact on attention. Factors such as peer relationships, family interactions, anxiety, fear, and excitement can all preoccupy the mind and hinder focus on important information. When informed that teachers are available to allocate time or offer emotional support, students can potentially alleviate their distractions and concentrate on the present task. • Solitary—Certain students tend to process information more when taught alone, as they may not tolerate the atmosphere of a classroom with several students. Teachers can therefore ask certain questions specifically to the student to make sure they are following and may even conduct private tuitions for the student. • Deflection—Students with a destructive personal life may be worried about their personal problems and not focus on the content taught in class, throughout the lecture. The teacher, here can bring his/her attention by discussing some facts that might be interesting to him/her, or by creating fun-and-play environment of teaching.

3 Methodology A library of rudimentary algorithms is proposed for the small-scale testing of the device. Observo uses both digital and thermal image processing to receive the data required for behavioral analysis. The digital images processed for further analysis are sent as inputs for the face detection and recognition algorithm, wherein if a face is detected, the face is recognized among the students of the class. The histogram values are determined for each image based on the formulas and calculations described further below in this section. The face recognition feature helps to collect and store data of the particular student for future analysis in behavior patterns.

OBSERVO: Teaching Strategy Recommendation by Monitoring …

291

Subsequently, the image is subjected to behavior analysis where the image analysis algorithm determines the state of the subject detected, and adds the inferred data to a dataset. Once the session is over, the algorithm to generate the recommended teaching strategy for the student is generated. In the teaching strategy algorithm, the dataset containing the states of the subject is processed to obtain the frequency of each state, and based on the data calculated, the most probable causes of challenging behavior are determined. These causes are then input in a mapping that gives the learning styles accustomed to the student and the best teaching strategy fit for him/her. The algorithms proposed are explained in detail below:

3.1 Thermal Image Processing Algorithm Vardonne [3] and Lin et al. [9] For observing the student health, thermal imaging is used which is a non-invasive technique for health detection, as it enables the detection of differences in body temperature. Following are the steps involved in processing thermal images for health detection: 1. Image acquisition: Getting thermal photographs of the test subject is the first step. The infrared radiation that the body emits, which is proportional to its temperature, may be picked up by thermal cameras. 2. Image preprocessing: The thermal pictures must be preprocessed once they are acquired to get rid of any noise or artifacts that could impact how accurately the temperatures are measured. To smooth out the picture and lower noise, filters like median filters or Gaussian filters may be used. 3. Selection of region of interest: The area of the body that will be the focus of the analysis must then be chosen as the region of interest (ROI) in the picture subsequently. 4. Temperature measurement: The temperature of the region may be measured once the ROI has been set by averaging the temperature data inside the ROI. The database then is used to perform a behavioral analysis incorporating this temperature measurement.

3.2 Face Detection and Recognition Algorithm Chilap et al. [6] For face detection, the Haar Cascade classification which is a popular technique in computer vision that involves detecting faces in an image by using Haar features and a cascade classifier is used. Following are the steps involved in this process: 1. Collect training data: The first step in training a Haar Cascade classifier is to collect positive and negative images. The object to be detected (face) is present in positive images, while negative images contain images without the object.

292

R. J. Kuriakose et al.

2. Feature extraction: In this step, Haar features are extracted from the positive and negative images, which are rectangular regions of the image that indicate different brightness levels. They help in distinguishing between the object and the background. 3. Training: The classifier must then be trained using the extracted features. A machine learning algorithm is applied to learn to recognize positives and negatives. 4. Cascade classifier: Once trained, the classifier may be used to detect faces in fresh images. At each step, the classifier applies the learnt classification algorithm while sliding a window over the picture. The classifier works on the next window once a face is detected in the current window. This process is called the sliding window approach. 5. Post-processing: The detected faces can then be processed to increase the detection accuracy. To do this, it may be necessary to eliminate overlapping detections, remove false positives, and improve the positioning as well as the dimensions of the identified faces. A widely used methodology for face identification in computer vision, local binary patterns histograms (LBPH), is described in [1, 2]. It employs a texture-based feature extraction technique and works by encoding the local texture patterns of an image. The procedures for facial recognition using LBPH are as follows: 1. Dataset collection: Creating a dataset of images of the faces that need to be identified is the first stage. Each individual should appear in many photographs from various perspectives and lighting conditions in the collection. 2. Face detection: The faces in the images are then identified using the Haar Cascade classifier. 3. Face alignment: In this step, the images are aligned such that the faces have the same size and are in the same orientation. 4. Feature extraction: LBPH is used to extract features from the faces after they have been aligned. The face is divided into a grid of cells. For each cell of the grid, a local binary pattern histogram is calculated. The LBPH method converts each cell’s texture pattern into a binary integer, which is then used to create a histogram of each cell’s patterns. 5. Training: This stage involves training a machine learning system to recognize the faces using the LBPH characteristics. The dataset’s extracted facial features and accompanying labels are used to train the algorithm. 6. Recognition: Once trained, the algorithm may be used to recognize faces in new pictures. This involves face detection, alignment, and extraction of its LBPH characteristics and comparison with the faces in the dataset. The closest match in the dataset is then used by the algorithm to assign a label to the face. Figure 6 details the steps of the algorithm. Once the camera is opened, the algorithm reads each frame sequentially. It reads the next frame only after the postures of the subjects are detected and saved in the session dataset. The algorithm computes LBPH values for parts of the frame and compares them to find faces. Once a face is

OBSERVO: Teaching Strategy Recommendation by Monitoring …

293

detected, the frame is aligned to straighten the face. The values are then compared with those of saved records of subjects for recognition.

3.3 Database Training Algorithm To train the algorithm, one can employ an app that contains data on the postures of each student in several classes. The model can be trained over these images any number of times to improve face detection and recognition accuracy. Once again, the same database can store the data for further training or reference. The database training algorithm is explained in Fig. 7.

3.4 Image Behavior Pattern Analysis Algorithm Lin et al. [9], Clark and Kruse [11], Yu and Eizenman [12] After face detection, information about the subject’s body temperature is collected from the color pattern on the image. Based on the state of the eyes and the position of the head, inferences are obtained and stored in a dataset forwarded to generate teaching strategies. The temperature is also checked, and based on the conditional parameters, a possible inference indicating “Health concerns” may be added. Figure 8 displays the working of the behavior pattern analysis algorithm. For detecting position of head, the following algorithm is incorporated: 1. Image preprocessing: The image must first go through preprocessing to enhance the areas around the head and lower noise. The image can be smoothened, and the noise can be lowered by applying filters, such as Gaussian filters. 2. Head region detection and feature extraction: The area of the picture corresponds to the part of the face where the head is must then be chosen as a region of interest (ROI). After selecting the head region, features need to be extracted to identify the head. This can be done automatically by using Haar Cascade classifier technique. 3. Head position estimation: Once the head have been detected, their position can be estimated based on the features extracted in the previous step. 4. Validation and refinement: Finally, the head positions can be validated and refined to improve accuracy. This can involve comparing the estimated positions to known head positions for the same person or using iterative techniques to refine the positions until they converge.

3.5 Algorithm to Generate Teaching Strategy The following proposed algorithm determines the possible challenging behavior which causes based on the frequencies of different inferences gathered, using several

294 Fig. 6 Face detection and recognition algorithm. Source [10]

R. J. Kuriakose et al.

OBSERVO: Teaching Strategy Recommendation by Monitoring …

295

Fig. 7 Database training algorithm. Source [10], p. 71

conditions detailed in Fig. 9. Note that the cause “Health” is checked for once all other causes are checked for as it can be considered exclusive to the others. Based on the learning style in Fig. 10, the algorithm suggests a set of teaching strategies suited to the particular student based on the behavior patterns exhibited during the session and the learning styles assumed. Throughout several sessions, the report on teaching strategies evolves, and finally, the best combination of teaching strategies is submitted. The analysis of each session should be available for reference by the teacher anytime.

296

R. J. Kuriakose et al.

Fig. 8 Image behavior pattern analysis algorithm

3.6 Behavior Pattern—Teaching Strategy Mapping Using Learning Styles All the causes of challenging behavior exhibited by students can be assumed to stem from their different learning styles. Also, the learning styles suited to the students help understand the apt teaching strategy to be employed by the teacher. These relations can be drawn into a simple map as shown in Fig. 10. Different behavior patterns detected can lead to determining more than one learning style suited to a student and can suggest a combination of the suited teaching strategies. Throughout various sessions, the results evolve and become more precise for the particular student.

OBSERVO: Teaching Strategy Recommendation by Monitoring …

297

Fig. 9 Teaching strategy algorithm

4 Results and Discussions 4.1 Haar Cascading Observations of using Haar cascading into our proposed algorithms for face detection are discussed, as in [10], URL [13]. The area covered by the eyes is darker than the region just above the cheekbones. The Haar-like rectangle c in Fig. 11 is used to

298

R. J. Kuriakose et al.

Fig. 10 Behavior pattern—teaching strategy map

identify the eyes on a human face (Fig. 12). Because the junction area of the nose is brighter than either of its two chick sides, the Haar feature f in Fig. 11 can be utilized to identify the nose feature (Fig. 12). This algorithm’s ability to swiftly compute the rectangle features and create an integral image is essential. The integral picture is built in such a way that it contains the total of all the pixels to its left and to its right at the coordinates (x, y) of the integral image. This algorithm employs a sequence of stages known as a cascade classifier, which uses a set of rectangular features to scan sub-windows and determine whether they Fig. 11 Haar block diagram. Source [10], p. 65

OBSERVO: Teaching Strategy Recommendation by Monitoring …

299

Fig. 12 Relevant face detection Haar features. Source [10], p. 66

contain a face or not. The rectangular features slide over the sub-windows, and if a region is not deemed a face candidate, it is rejected. The algorithm also utilizes a threshold check to determine whether a sub-window should be pushed to the next stage for further processing. To detect faces of varying sizes, the algorithm uses a pyramid of scaled images with the same set of rectangle features but in different sizes, scanning the initial image until all faces are identified. Red rectangles are used to mark the faces in the original test image finally.

4.2 Recognition Rates The following (Tables 1 and 2) offer detailed information of the suggested facial recognition system. Reference demonstrated face identification at a minimum of 35 px with 90% accuracy, whereas our proposed system can recognize faces in real-time. The different angles at which the head is held in relation to the camera affect the recognition accuracy. The recognition rate declines as the deflection angle increases. Face image recognized at different resolutions is shown in Fig. 13. Table 1 Recognition rates Particulars

Ours

Method

LBPH + CLAHE + alignment

Lowest image resolution

15 px

Accuracy at 15 px

78.4%

Accuracy at 35 px

96.60%

Accuracy at 45 px

98.05%

Accuracy with

30o

angular deflection

72.25–81.85% @ 45 px

Use of Android app for database

Yes

Auto-update of the database and auto-restart of process

Yes

Number of images per person in the database

200

Source [10], p.76

300

R. J. Kuriakose et al.

Table 2 Detailed information of recognition rates at different resolutions Recognition

Using database LRD200 Correct times

Wrong times

Recognition rate (%)

At 15 pixel

1568

432

78.40

At 20 pixel

1842

158

92.10

At 30 pixel

1919

81

95.95

At 35 pixel

1932

68

96.60

At 45 pixel

1961

39

98.05

Source [10], p.74

Fig. 13 Face image recognized at different resolutions. Source [10]

4.3 Behavior Analysis and Teaching Strategy Generation A survey was prepared enquiring about student behavior patterns and presented to 25 students of a class in grade 8. The results were compared with the results of a manual movement detection of students in a session of the same class of students, whose inferences were subjected to the proposed behavior analysis and teaching strategy generating algorithm. A similarity of 83% was observed. The behavior patterns were input to the learning style map whose results were compared with those of a second survey among teachers and parents wherein details regarding learning styles and potentially beneficial teaching strategies for the students were obtained. A similarity of 77% was observed, owing to the fact that Observo suggests a combination of learning styles and teaching strategies, while the survey results were more precise. The study’s findings demonstrated that instructors may more effectively identify and meet the requirements of their pupils by employing behavior tracking of students in the classroom that uses image processing techniques. The computer vision algorithms incorporated were successful in correctly identifying a variety of behavioral signs, including attentiveness, engagement, and participation. The recommended novel teaching methods based on the analysis of the behavioral cues that the teacher may employ to enhance student learning outcomes were wellreceived by both instructors and students.

OBSERVO: Teaching Strategy Recommendation by Monitoring …

301

5 Conclusion Observo, aiming toward providing exclusive teaching to students, not only improves their interest in schools and toward studies but at the same time builds up their confidence in institutions. Self-confidence is essential to a student’s character and is a quality most fail to achieve. This product ensures that students obtain such qualities. As children get more confident about their performance, their attendance rates increase, encouraging previously disinterested parents to send their children to school, providing good quality primary education. Image processing techniques for behavior tracking in the classroom has several advantages over conventional methods. It provides a way to track student behavior without interrupting the teacher–student interaction in real-time. It is less prone to error than manual observations. And, it can offer valuable insights into student behavior that traditional approaches might not be able to. One aspect of this study that must be considered is that cameras must be strategically placed in the corners of the classrooms to capture every student’s presence within the classroom. This may require additional resources and equipment, which may not be feasible in all classroom settings. In addition, there may be privacy concerns with regards to the classroom cameras that need to be addressed. Using image processing techniques to monitor student behavior in the classroom might give teachers insightful knowledge on how to better understand and meet the requirements of their pupils. To fully explore the potential of this strategy and solve any privacy and resource-related concerns, more study is required in this area. One must note that Observo is proposed to be merely an aid and not a replacement for the prevalent institutional arrangements. Observo, using its innovative concept of using face detection, recognition, and behavior analysis algorithms in education, with the utilization of thermal image processing, redefines ways of imparting knowledge and aims to be every school’s prerequisite in ensuring academic excellence. Acknowledgements I acknowledge the use of ChatGPT [https://chat.openai.com/] to generate ideas and material for background research and project planning in the drafting of this research study.

References 1. Weimer M (2014) Students and attention: an interesting analysis https://www.facultyfocus. com/articles/teaching-and-learning/students-attention-interesting-analysis/ 2. Wikipedia. Flipped classroom https://en.wikipedia.org/wiki/Flipped_classroom 3. Vardonne (2011) Algorithms for image processing 4. Eriksson J, Anna L (2015) Measuring student attention with face detection: viola-jones versus multi-block local binary pattern using OpenCV 5. Zhang L, Chu R, Xiang S, Liao S, Li SZ (2007) Face detection based on multi-block LBP representation. In: Proceedings of advances in biometrics: international conference, ICB 2007, Seoul, Korea, Aug 27–29, 2007. Springer Berlin Heidelberg, pp 11–18 6. Chilap P, Chaskar N, Amup V, Pawar APS (2022) Face recognition using machine learning. In: Haar cascade algorithm and local binary pattern histogram LBPH algorithm in face recognition

302

R. J. Kuriakose et al.

7. Teledyne FLIR (2021) Professional tools https://www.flir.in/applications/professional-tools/ 8. Start School Now (2023) 7 Different learning styles https://www.startschoolnow.org/7-differ ent-learning-styles/ 9. Lin JW, Lu MH, Lin YH (2019) A thermal camera based continuous body temperature measurement system. In: Proceedings of the IEEE/CVF international conference on computer vision workshops 10. Paul KC, Aslan S (2021) An improved real-time face recognition system at low resolution based on local binary pattern histogram algorithm and CLAHE. arXiv preprint arXiv:2104. 07234 11. Clark VL, Kruse JA (1990) Clinical methods: the history, physical, and laboratory examinations. JAMA 264(21):2808–2809 12. Yu LH, Eizenman M (2004) A new methodology for determining point-of-gaze in headmounted eye tracking systems. IEEE Trans Biomed Eng 51(10):1765–1773 13. HumanoidX-VITCC (2021) Prismanoid https://github.com/HumanoidX-VITCC/Prismanoid 14. Harahap M, Manurung A, Prakoso A, Tambunan MF (2019) Face tracking with camshift algorithm for detecting student movement in a class. J Phys Conf Ser 1230(1):012018 15. Time4Learning. How to embrace different learning styles in homeschooling https://www.tim e4learning.com/learning-styles.shtml

Earthquake Magnitude and Depth Prediction Based on Hybrid GRU-BiLSTM Model Abhiraj, Amit Rathor, Avaneesh Kumar Yadav, and Ranvijay

1 Introduction An earthquake is a vigorous shaking of earth’s surface. The outer surface of earth known as crust is divided into a jigsaw like puzzle whose pieces are known as tectonic plates, and these plates keep floating on semi solid inner surface of earth known as mantle. The floating movements of tectonic plates can develop strain, and when these strain forces overcome the friction which was holding tectonic plates to move, there is sudden outburst of energy like ripples in a pond. This energy is known as seismic energy. Seismic energy radiating outwards caused destruction of infrastructure. Further, earthquakes of greater magnitude can cause fire, tsunami and other disasters. Energy released by an earthquake is measure of magnitude of earthquake. Instruments known as seismograms which are sensitive to movement of earth’s surface are used to measure size of earthquakes. With seismograms, the velocity of seismic wave can be measured, which depends on elasticity and density of the medium. Different type of seismic waves take different time to travel and velocity of seismic wave changes with depth. This information is used to pinpoint hypocenter of earthquake. Since, velocity of seismic waves depends on medium, seismograms

Abhiraj (B) · A. Rathor · A. K. Yadav · Ranvijay Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, Prayagraj 211004, Allahabad, India e-mail: [email protected] A. Rathor e-mail: [email protected] A. K. Yadav e-mail: [email protected] Ranvijay e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_24

303

304

Abhiraj et al.

reading are used to determine type of surface. The size of wave measured by a seismogram is affected by amount of slip, wave amplitude and fault size. An earthquake is perhaps the most dangerous calamity of nature, and can potentially cause maximum damage. There is much debate in scientific community about what actually causes an earthquake, some believe earthquakes are due to temperature anomaly change, while others say earthquake are due to change in concentration in carbon dioxide, methane and hydrogen that gets released from deep earth and this hydro-geological phenomena causes earthquake. Interestingly, the seismic energy released in form of electromagnetic radiation from earthquakes easily impacts animals such as dogs or cows, and so some animals can sense earthquakes in advance.

1.1 Motivation According to World health Organization (WHO), there have been 750,000 deaths globally from 1998 to 2017 due to earthquakes, and over 125 million people were injured or displaced during this period. An earthquake on 26 December, 2004 in Indonesia alone caused 227,898 fatalities. Figure 1 shows a rise in severity of earthquake with each coming year.

1.2 Contribution The aim of this paper is to develop a machine learning model capable of accurately predicting the magnitude of earthquake and depth of earthquake based on previous

Fig. 1 Graph showing increase in earthquake severity each year. Source Nievas et al. [1]

Earthquake Magnitude and Depth Prediction Based on Hybrid …

305

earthquake data. This advance knowledge of occurrence of earthquake can save lives and prevent potential infrastructure loss. Following are further contributions of this paper: • In-depth analysis of 12,461 significant earthquake events from 1960 to 2019 that occurred in Japan. • Detailed Study of various methodologies used by researchers including precursor analysis, statistical modelling, Principal Component Analysis (PCA), Artificial Neural Networks (ANN), Dense Layers (DL), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) Networks, Gated Recurrent Units (GRU), Bidirectional Long Short-Term Memory (BiLSTM) Networks. • Feature Engineering of various seismic features to establish the relationship between features and magnitude of earthquake. • Proposing a hybrid GRU-BiLSTM architecture for earthquake prediction. To the best knowledge of author, no such model has been developed till date. • Comparison of accuracy and other metrics with previous existing models and hence establishing superiority of proposed model.

1.3 Paper Organization The following section discusses all the related work that has been already done in this field. Section 3 consists of detailed discussion of working of structural components of proposed architecture. Section 4 provides the details of implementation. Section 5 consists of results obtained and thorough analysis. Lastly, at end is Sect. 6 providing the conclusion of this paper.

2 Related Work The earliest significant study in relation to earthquake magnitude prediction dates back to 1944, when Gutenberg-Richter law was proposed, which states a linear logarithmic relationship between magnitude of an earthquake and frequency of occurring. Simply put, higher the magnitude, less is the chance of occurring. Earthquake occurrence analysis is studied in detail using Poisson regression. Poisson regression attempts to map a linear logarithmic relation between two variables and is very helpful to model data related to count per unit time or space, as in case of predicting earthquake frequency over a period of time. The methodology used for earthquake prediction till now can be divided into four categories: First is prediction based on Statistical Models and Mathematical Analysis, Second is prediction based on Precursor Information. Third types are ML classifiers and Fourth type is Deep Learning and Neural Networks.

306

Abhiraj et al.

Alvan et al. [2] found that there are some precursors which are affected by even earthquake of slight magnitude and these precursors can be used for predicting earthquake. Such precursors include, temperature anomaly, strange cloud formation, water level changes, electromagnetic waves, radon gas concentration, crust deformation and many more. Kannan et al. [3] used Poisson’s distribution along with spatial connection theory between different tectonic plates and concluded that a pattern exists in occurrence of earthquakes. The research is based on an important assumption that for any fault zone, future earthquakes are dependent on previous earthquakes. The author establishes a relationship angle between epicentres and density stratum. Low density stratum have obtuse angle and high density stratum have acute angles between epicentres. The author uses triangulation methods and then subdivision until one point is left for identifying the exact epicentre. Akhoondzadeh et al. [4] performed time-series analysis of optical depth of aerosol. The authors use Dark dense vegetation (DDV) Algorithm on Satellite captured images, to determine aerosol optical depth which shows changes before and after earthquake. This change is theorized to be due to sudden release of energy in atmosphere. Other algorithms were also used such as SYNTAM method, which overcomes the limitations of DDV. The authors proposed that other precursor can be combined with this to form a multi precursor analysis model. Su et al. [5] select time, magnitude, latitude, longitude and focal depth as main features and try to predict magnitude of all these features separately. The work is different from previous models because, difference between adjacent data points is fed to the network, for improving accuracy. CRF layer uses Viterbi algorithm to figure out the highest path score. The authors use dataset of Japan for training and testing. There are 100 neurons used for each LSTM layer and data staggering is used for solving problem of insufficient data. Fuentes et al. [6] uses CNN with LSTM, and feed two parallel inputs of 20 × 20 grids, after performing a logarithmic operation, to CNN-LSTM model. For tuning this model, Keras Tuner is used. The results obtained are better compared to simple FFNN model. This shows that LSTM are better at handling time sequence data compared to simple neural networks. The authors conclude that multi column ConvLSTM models are better than single column ConvLSTM models. Al Banna et al. [7] used attention mechanism with BiLSTM, for predicting occurrence and location of earthquake. 100 and 50 neurons are used for BiLSTM layer. There were several dense layers, flatten layer along with single LSTM layer. Overall accuracy of 0.7467 was achieved. In traditional encoder-decoder architecture, hidden state of the previous encoder is used as a context vector for next decoder. Hence, LSTM which are supposed to be better than RNN at long term dependency problem, still fail. The authors claim that with help of attention mechanism, the model can focus on specific inputs. This can improve accuracy of model in long term dependencies and since earthquake data is sequential and time-series based, there is a good chance of long term dependencies. The authors observed that best accuracy was achieved when learning rate was 0.01 and even though a BiLSTM model takes a little more time to train than a LSTM model, but the boost in accuracy achieved is

Earthquake Magnitude and Depth Prediction Based on Hybrid …

307

worth the trade off. Although, the authors predicted the location of earthquake for the next month, the authors were unable to predict the exact time period of occurrence, which makes earthquake disaster management difficult. Kavianpour et al. [8] used CNN, BiLSTM and Fully Convoluted layer. The optimizer used was Adam. The model was trained with 14187 records of earthquakes in Japan. Further, for better prediction results, the region of Japan was divided into 49 equal regions. This helps to separate areas with less seismic activity from areas with more seismic activity. Berhich et al. [9] used RNN for earthquake prediction. The dataset was modified into a clustered dataset and produce a location dependent prediction model. PCA was used for feature extraction, followed by RNN for prediction. The model gives better accuracy than simple ML classifiers. The authors have used dataset of Morocco, Turkey and Japan for training and testing. Berhich et al. [10] used LSTM with attention for earthquake magnitude, region label and timestamp prediction. The author developed different models for each feature with slight changes in architecture depending on the feature. Sonthalia et al. [11] used LSTM on spatial temporal data. Indonesia dataset was used. Berhivh et al. [10] and Sonthalia et al. [11] show that LSTM models can achieve good accuracy in earthquake prediction.

3 Proposed Work The proposed model is a hybrid GRU and BiLSTM model because earthquake data is sequence oriented and complex, and LSTM-based models such as GRU and BiLSTM are functionally efficient in dealing with sequential data. GRU achieves accuracy similar to LSTM with added advantage of simplified architecture. BiLSTM is used because of being able to relate current context to both past information as well as future information, and BiLSTM will always achieve accuracy better than a simple LSTM. Figure 2 shows proposed workflow. The first step in workflow is preprocessing, after that the processed data is fed to hybrid GRU-BiLSTM model for training. After the model has been trained, testing data will be used to check whether the model is making accurate predictions or not.

Fig. 2 Workflow of proposed architecture

308

Abhiraj et al.

The following section contains a detailed description of functionality of Preprocessing, GRU and BiLSTM.

3.1 Preprocessing Preprocessing consists of cleaning data, randomizing data, normalization, feature engineering and feature scaling. The first step in preprocessing is analysis of data, for this each time interval is converted into seconds, for simplicity of analysis. Then, all features having more than 50% missing values are found and then dropped. Then numerical and categorical imputation is done, that is substituting proper values in place of missing values. Next step is feature engineering. Since, for categorical features, only 0.0001% of data is missing, which can be easily removed. For numerical features, all missing values are replaced with median values. This would help avoid any outliers. One hot encoding is used for all categorical features. Feature scaling is used to convert this data into standard normal distribution using normalization and standardization. This means that the data will have zero mean and unit variance. This will make sure that features measured at different scales contribute equally to the learning of model and no bias is created. After that, for feature selection LASSO Regression is used, which uses shrinkage of data points(less variance) towards mean. This helps avoid over-fitting of data. Note that, simple features such as time, latitude, longitude, magnitude and depth are used for training the model and not seismological features, such as b-value, seismic energy, because Bhatia et al. [12] found that accuracy obtained in both cases was same. The explanation for such behaviour was attributed to the fact that all seismological features are obtained from simple features themselves and while this may be beneficial for a simple model, a neural network is perfectly capable of computing all significant features by itself.

3.2 Gated Recurrent Units Gated Recurrent Units is an extension to RNN architecture that was introduced to simplify the working of LSTM unit, introduced by Cho et al. [13] . As pointed out by Chung et al. [14], GRU is faster to compute compared to LSTM because GRU replaces the three gate architecture of LSTM with two gate architecture, without any significant decrease in performance. Figure 3 shows the architecture of a GRU cell. The two gates of GRU are the reset gate and the update gate. The reset gate is responsible for choosing which previous hidden states will be forgotten and the update gate decides which input gates will be used to update the hidden state. So, the reset gate is focused on capturing short term dependencies and the update gate is focused on capturing long term dependencies. The update gate eliminates risk of vanishing gradient problem. Both of the layers have sigmoid as activation function

Earthquake Magnitude and Depth Prediction Based on Hybrid …

309

Fig. 3 Architecture of GRU cell

at output gate. After getting output from these two gates, candidate hidden state is computed, and tanh is used as activation function. The candidate hidden state is also known as current memory gate. Finally the hidden state is updated using output of update gate, previous hidden state and candidate hidden state. The activation function used is tanh. Suppose that at any time t, X t and Ht denotes input and hidden state respectively. Then output of reset gate, Rt and update gate, Z t can be computed as follows: Rt = σ (X t .Wxr + Ht−1 .Whr + br )

(1)

Z t = σ (X t .Wx z + Ht−1 .Whz + bz )

(2)

Here, Wx r and Wx z are weight parameters for input state in reset and update gate. Wh r and Wh z are weight parameters for hidden state in reset and update gate. br and bz are bias parameters in reset and update gate. Candidate hidden state Hˆ t can be computed as follows: Hˆ t = tanh(X t .Wxh + (Rt ⊗ Ht−1 )Whh + bh )

(3)

where X t is input state, Rt is from equation (1) . Wxh and Whh are weight parameters for input state and hidden state, Ht−1 is hidden state at time t − 1, bh is bias parameter and ⊗ is Hadamard(element-wise) product operator. Finally, hidden state for time t, Ht can be computed as follows: Ht = Z t ⊗ Ht−1 + (1 − Z t ) ⊗ Hˆ t where Z t is from equation (2) and Hˆ t is from equation (3).

(4)

310

Abhiraj et al.

Fig. 4 Architecture of LSTM cell

3.3 Bidirectional Long Short-Term Memory Networks BiLSTM is a neural network consisting of two parallel LSTM both of which have data flow in opposite direction. One LSTM has forward propagation of sequential information where as the other LSTM has backward propagation of sequential information. This makes BiLSTM a more accurate model compared to only one LSTM, where propagation can only be unidirectional. Flow of information in both directions, allows BiLSTM to have sequential information about any point, before and after. In simple words, future information as well as past information both help in understanding the present context. The basic unit of BiLSTM is still an LSTM cell. LSTM is an extension to RNN. In theory, RNN is perfectly capable of connecting any past information to present, however, LSTM fails when the dependency on past information is very long. LSTM solves this problem by making a few changes to the cell architecture. This enables LSTM to retain cell state. Figure 4 shows the architecture of a LSTM cell. LSTM have three types of gates–forget gate, input gate and output gate. The forget gate determines what information each cell is going to retain for cell state, and what information is going to be forgotten. The input gate determines which part of information is needed to update, and hence enabling how much change will happen in current memory cell internal state. The output gate determines the contribution of memory cell into output at any particular time step. All of these gates have sigmoid as activation function. Let at any time t, input is X t , hidden state is Ht and memory cell internal state is Ct−1 , then output of input gate It , output of forget gate Ft , output of output gate Ot can be computed by following formulae, It = σ (X t .Wxi + Ht−1 .Whi + bi )

(5)

Earthquake Magnitude and Depth Prediction Based on Hybrid …

311

Ft = σ (X t .Wx f + Ht−1 .Wh f + b f )

(6)

Ot = σ (X t .Wxo + Ht−1 .Who + bo )

(7)

where Wxi , Wx f and Wxo are input weight parameters for input, forget and output gate, respectively. Whi , Wh f and Who are hidden weight parameters for input, forget and output gate, respectively. bi , b f and bo are bias parameters for input, forget and output gate. Input node Cˆ t can be computed as, Cˆ t = tanh(X t .Wxc + Ht−1 .Whc + bc )

(8)

where Wxc and Whc are weight parameters for input and hidden state, bc is bias parameter, X t and Ht−1 are input and hidden state. The memory cell internal state Ct is given by following formula, Ct = Ft ⊗ Ct−1 + It ⊗ Cˆ t

(9)

where Ft is from equation (6), It is from equation (5) and Cˆ t is from equation (8). Note that for keeping the memory constant forever, input gate needs to be 0 and forget gate should be 1. Lastly hidden state Ht can be computed, Ht = Ot ⊗ tanh(Ct )

(10)

where Ot is from equation (7) and Ct is from equation (9).

4 Experiment 4.1 Dataset The choice of proper dataset is very important for training any machine learning model. For proposed model, United States Geological Survey (USGS) catalog is source of dataset. The area of analysis will be Japan, because of being located at an earthquake prone zone. Note that, 31N to 38N latitude and 136E to 143E longitude is the defined area of interest. All earthquake events of magnitude greater than 4.0 from 1960 to 2019 are taken into account. The dataset will contain 12,461 earthquake events of which 80% will be used for training purpose and remaining 20% for testing purpose.

312

Abhiraj et al.

4.2 Model Architecture As per the workflow, after preprocessing the data of dimensions (N ,1,3) is fed to the model, which in turn generates output (N ,2) here 3 and 2 are representing input features (longitude, latitude and timestamp) and output features (magnitude, depth) respectively, where N is number of earthquake events. Proposed model has GRU consisting of 128 neurons, in the first layer. The activation function is tanh. Then a dropout layer where 20% neuron output values are dropped, followed by a Bidirectional LSTM layer consisting of 128 neurons, followed by Dense layer of 32 neurons, having activation function Relu. This follows, a Dropout layer with 20% values dropped, and then finally a Dense layer with 2 neurons for predicting both output features. Dropout layer is added so that the model is not over-trained. The optimizer used is Stochastic Gradient Descent (SGD) and the loss function used is Squared Hinge function.

5 Results and Analysis Accuracy is defined as, the ratio of number of accurately predicted records by the total number of records. Accuracy =

TP +TN T P + T N + FP + FN

(11)

Here, TP denotes True Positives, TN denotes True Negatives, FP denotes False Positives and FN denotes False Negatives. The achieved accuracy is 0.9955 on training data and 0.9928 on testing data. While accuracy determines how many times the model, predicts the correct outcome relatively, another metrics such as precision, recall and F1-score indicate other properties of model. Precision is a measure of how many of the positive predictions made are actually correct. In this case, how many predicted earthquake magnitude values and depth are actually correct. Pr ecision( p) =

TP T P + FP

(12)

Recall is a measure of how many correct positive predictions are made over all positive case in data. In this case, how many earthquakes’ magnitude and depth are correctly predicted over all earthquake data. Recall(r ) =

TP T P + FN

(13)

Earthquake Magnitude and Depth Prediction Based on Hybrid …

Fig. 5 Metrics comparison between models

Fig. 6 Graph showing training time for an LSTM model and proposed GRU-BiLSTM model

313

314

Abhiraj et al.

The precision and recall values obtained are 0.9997 and 0.4999 for training data and 1.0000 and 0.5000 for testing data. To combine, precision and recall in a single metric, F1-score is defined, F1-score is also defined as the harmonic mean of precision and recall. F1 − scor e =

2r p r+p

(14)

F1-score for training data is 0.6700 and for testing data is 0.6666. There is a comparison made between the proposed model, model proposed by Narayanakumar et al. [15], model based on LSTM, and model proposed by Su et al. [5]. All these models are based on magnitude prediction of earthquake. Narayanakumar et al. [15] has used backpropagation algorithm with simple artificial neural network and Su et al. [5] has used BiLSTM with CRF. Figure 5, shows the performance of proposed model in comparison to other models. However, this boost in accuracy comes with a drawback, i.e. the increase in training time. Figure 6, shows a comparison between training time for LSTM and for proposed architecture.

6 Conclusion In this paper, a new promising method for earthquake magnitude and depth prediction has been discussed and the results verify that the proposed model has outperformed previous available models. Earthquake magnitude is difficult to predict, but with proper dataset, and correct model, a great level of accuracy can be achieved. This early information can be used to take preventive measures for any serious loss. However, the scarcity of data for high magnitude earthquakes makes training models difficult for such prediction. In such cases, Generate Adversarial Networks can be used. For further improvement in performance, attention mechanism can be used. Transformers seems to be very promising in this aspect. Primarily, earthquake prediction is a time-series analysis problem and there is an important need for robust models, which can predict earthquake with minimum error involved.

References 1. Nievas CI, Bommer JJ, Crowley H, van Elk J (2020) Global occurrence and impact of smallto-medium magnitude earthquakes: a statistical analysis. Bullet Earthquake Eng 18(1):1–35 2. Alvan HV, Azad FH (2011) Satellite remote sensing in earthquake prediction. A review. In: 2011 National postgraduate conference, IEEE, pp 1–5 3. Suganth K (2014) Innovative mathematical model for earthquake prediction. Eng Failure Anal 41:89–95

Earthquake Magnitude and Depth Prediction Based on Hybrid …

315

4. Mehdi A, Chehrebargh FJ (2016) Feasibility of anomaly occurrence in aerosols time series obtained from MODIS satellite images during hazardous earthquakes. Adv Space Res 58(6):890–896 5. Zhittong S, Zhang Q (2020) Earthquake prediction based on Bi-LSTM+ CRF model and spatiotemporal data. In: 2020 IEEE 9th joint international information technology and artificial intelligence conference (ITAIC), vol 9. IEEE, pp 1190–1195 6. Fuentes AG, Nicolis O, Peralta B, Chiodi M (2021) ConvLSTM neural networks for seismic event prediction in Chile. In: 2021 IEEE XXVIII international conference on electronics, electrical engineering and computing (INTERCON), IEEE, pp 1–4 7. Al Banna MH, Ghosh T, Al Nahian MJ, Taher KA, Shamim Kaiser M, Mahmud M, Hossain MS, Andersson K (2021) Attention-based bi-directional long-short term memory network for earthquake prediction. IEEE Access 9:56589–56603 8. Parisa K, Kavianpour M, Jahani E, Ramezani A (2021) Earthquake magnitude prediction using spatia-temporal features learning based on hybrid cnn-bilstm model. In: 2021 7th international conference on signal processing and intelligent systems (ICSPIS), IEEE, pp 1–6 9. Asmae B, Belouadha F-Z, Kabbaj MI (2022) A location-dependent earthquake prediction using recurrent neural network algorithms. Soil Dyn Earthquake Eng 161:107389 10. Asmae B, Belouadha F-Z, Kabbaj MI (2023) An attention-based LSTM network for large earthquake prediction. Soil Dyn Earthquake Eng 165:107663 11. Ankit S, Pasari S, Devi S (2023) Earthquake prediction using long short term memory on spatiotemporally segmented data. In: 2023 Third international conference on artificial intelligence and smart energy (ICAIS), IEEE, pp 1378–1382 12. Anmol B, Sumanta P, Anand M (2018) Earthquake forecasting using artificial neural networks. In: The international archives of photogrammetry, remote sensing and spatial information sciences, vol 42. pp 823–827 13. Kyunghyun C, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 14. Junyoung C, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 15. Narayanakumar S, Raja K (2016) A BP artificial neural network model for earthquake magnitude prediction in Himalayas, India. Circuits and Syst 7(11):3456–3468

Homomorphic Encryption to Improve Pharmaceutical Data Security on the Cloud and Blockchain N. Aravindhraj, V. C. Mahavishnu, V. M. Aswin Vishal Kumar, N. T. Nethiran, and P. Manoranjith

1 Introduction With the current growth in cybercrime, security has grown into an important issue. Companies may store and manage their information independently, yet this is not cost-effective and many are moving to the cloud both efficiency and convenience. Because of its popularity, healthcare organizations are storing sensitive information, such as electronic medical records, in cloud-based storage solutions. Homomorphic encryption HE is among the most practical techniques of ensure privacy. Cloud computing has become a significant field of research for providing unlimited storage and speedy data access, where efficient information storage and accessibility is key for the functioning of any application. Yet, in addition to its advantages, cloud storage presents a number of concerns, the most of which are associated with the user information security or privacy preservation. The pharmaceutical industry generates a large amount of private and sensitive information in the form of medication formulas, personnel information, and medical data. This must offer high-level defence against attackers. Recent cloud services research has mostly interested on developing and deploying encryption and decryption methods enable users to be accountable for the security of their data outsourced. HE enables information to be changed while it is encrypted with no need to decode it. Pharmaceutical industries face a high risk of

N. Aravindhraj · V. C. Mahavishnu · V. M. Aswin Vishal Kumar (B) · N. T. Nethiran · P. Manoranjith Department of Computer Science and Engineering, PSG Institute of Technology and Applied Research, Coimbatore, India e-mail: [email protected] N. Aravindhraj e-mail: [email protected] V. C. Mahavishnu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_25

317

318

N. Aravindhraj et al.

security breaches, which may be mitigated by employing Homomorphic Encryption [1–4].

2 Lıterature Survey Bhagadia [5] has mentioned the types of data present in pharmaceutical companies and the types of ways in which we can secure the data. He has clearly mentioned about the cloud computing and the methods in which homomorphic encryption will be implemented. Biksham [6] has given a detailed information about cryptography and the method used to encrypt the data and decrypt the data. He has also explained about the cryptographic techniques in cloud computing. Monique Ogburna, Claude Turnerb, and PushkarDahal. Ogburna [7] has given detailed information about the homomorphic encryption and the significance of it. This have given examples for which homomorphic encryption is used and a detailed explanation of how it works. This have also mentioned about the other applications of homomorphic encryption. Mahmood [8] has provided with the information of fully homomorphic encryption scheme. It provides us an idea of how the fully homomorphic encryption is implemented in the multistage partial homomorphic encryption in cloud computing.

3 Workflow of the Proposed System HE permits computations to be performed on encrypted data without decoding the information. It allows computation on cipher text, which is the encrypted version of plaintext. In homomorphic encryption, there are three major types. FHE which is fully homomorphic encryption and another is PHE which is partially homomorphic encryption and last one is SWHE which is somewhat homomorphic encryption [9–12]. FHE permits any computation to be performed on the information which is encrypted. PHE has some limitations like addition or multiplication. HE has several benefits like privacy, security, and sensitive information which can be processed without exposed in plain text. This makes it particularly useful in settings where data privacy is critical, such as health care or finance [13–16]. However, homomorphic encryption is still an emerging technology and comes with certain challenges, such as increased computational complexity and the need for specialized hardware. Despite these challenges, it is capable to provide new levels of data security and privacy. With sensitive information, such as medical records, homomorphic encryption may be employed to improve the security of current services. This proposed system uses AES algorithm to encrypt textual data and Paillier algorithm to encrypt integer data which is the homomorphic encryption algorithm.

Homomorphic Encryption to Improve Pharmaceutical Data Security …

319

3.1 Paillier Cryptosystem Algorithm HE is an encryption in it allows conduct logical or mathematical functions on encrypted information. Assume there are 2 integers (a 1 and a 2) that needs to be encrypted using asymmetric encryption strategy with private and public key. We obtain two cipher texts: (cip 1 = Enc public(a1)) and (cip 2 = E public(a2). Encryption typically seeks to make the encoded data unrecognizable for anybody those lacks the secret key necessary to decrypt. Certain associations, some are retained via HE. For instance, if there is a HE which allows adding, there is going to operation (adding) anybody can apply on (c1) and (c2), resulting in (adding public(×1, ×2)), decrypt to a product of (y1) and (y2)y1 + y2 = T private(addingpublic(E public(y1), E public(y2)). For a long time, there have existed partial homomorphic encryption methods that allow just a restricted amount of operations on encoded information like adding or multiplicating it. During the last decade or two, fully homomorphic encryption techniques that permit arbitrary computations on encrypted data have been created.

3.2 Working of Paillier Algorithm Based on the findings, the difficulty of the Paillier cryptosystem is determined by the unpredictability of the prime integers used. The steps that follow are those in the Paillier algorithm: • • • • • • • • •

two large prime numbers, a and b are chosen product is calculated by y = a*b, gcd(x, 8(x)) = 1, Euler function is 0(x). Some random integer is chosen which is R it has order multiple of X Private-key (p, q, λ), Public-key (g, n), To encrypt the message: cipher = (gmrn) mod N2 to decrypt the cipher: M = mod msg = (G((gλ (M n2)))((G(gλ (M n2)))-1))M N

4 Proposed System 4.1 Methodology The suggested approach aims to address the deficiencies in pharmaceutical industry security. Managers and employees in the pharmaceutical industry are the two key customers of the programme. Employees/Researchers conduct experiments when

320

N. Aravindhraj et al.

developing medication, resulting in frequent modifications in the quantity of the components present in the medicines. Only authorized employees, such as the manager, should have access to secret information. The manager has complete control over everything, including hiring and firing personnel, reviewing thorough drug descriptions, and so on. The keys which are used for encryption and decryption are stored as encrypted, giving it additional layer of protection. Let’s look at how each user interacts.

4.2 Manager The manager of a certain branch is responsible for keeping track of the progress in the creation of a specific drug as well as the specifics of the components. When a manager logs in, the keys needed to access the concealed data retrieved, after the retrieval the data is decrypted to provide content to the management. Figure 1 depicts the actions taken by the manager. Fig. 1 Working of manager

Homomorphic Encryption to Improve Pharmaceutical Data Security …

321

4.3 Employee The next type of user is the researcher/employee are charge of the development of a medicine. These individuals should not be shown all of the information of the medication since this would result in privacy breach. The flow of stages begins with the researcher entering the information of his research, such as component, amount, and price. Following that, an algorithm is executed, which first determines whether or not the database already has that particular component. If yes, the amount and awards are homomorphically added to the table; otherwise, a new entry is produced. The keys needed to complete are obtained, and then the flow chart protocol as in Fig. 2 is followed.

4.4 AES Algorithm This uses AES to encrypt text. It is one of the widely used algorithm for encryption and considered as one of the most secure methods for encrypting information. In AES, same key is used to encrypt and decrypt the information so this is a symmetric key encryption algorithm. The strength of AES lies in its ability to resist attacks such as brute force attacks, where an attacker tries all possible keys until the correct one is found. AES is widely used to encrypt sensitive data such as financial transactions, and is used by organizations such as the U.S. government to protect classified information. • Key-gen(): random 16 byte keys are generated for both encryption and decryption. • enc(key,message): the message is encrypted using the keys generated using the previous function • decrypt(ciphertext,key): the cipher text is decrypted using the keys generated • enc-file( name_of_file,key_used): file is encrypted using the key • Dec-file(name_of_file,key_used): file is decrypted using the key

5 Desıgn 5.1 Manager Once the manager is registered after entering their credentials. Two files are produced upon registration: two text files m-text e-text. Both private and pub keys as well as one AES key are created for Paillier homomorphic encryption. All those keys are now in the m-text. The e-text file string name is used to encrypt. The manager password decrypt’s the key file when he registers/logs in. The manager can see details like name, quantity, and pricing of the component. The keys and all other information are extracted and displayed in a relational form.

322

N. Aravindhraj et al.

Fig. 2 Working of employee

Employees in the manager’s department can be added. On this page, the manager adds an employee b name and email. When you complete this form, the email address you provide will be used to send the employee an email with his login credentials. The employee may then log in using these credentials.

Homomorphic Encryption to Improve Pharmaceutical Data Security …

323

5.2 Employee The employee gains access, where he may enter information about the pharmaceutical the string data like name are encrypted using AES. The homomorphic encryption algorithm will be used to encrypt the price and quantity of the components. By operating on encoded information, homomorphic encryption simplifies such analysis. When an employee provides data, there are two possibilities: • If the cloud doesn’t currently hold a data of the encoded same name, a fresh record would be generated with the component name, quantity, and cost encrypted values. • If database does not currently hold an instance of the encrypted name, a new record would be created with information encrypted values. When the data is already there in cloud the data which encrypted is homomorphically added to it.

6 Experımentatıon and Results Several obstacles arise while analysing the performance of a homomorphic algorithm. Because the Paillier technique employs random integers during encryption, comparing performance for encrypting various data becomes challenging. Numerous studies have been conducted to evaluate and compare the algorithm with other Pillier is best in avalanche effect. Few bits change in the input affects more bits. An algorithm is secure if even a single bit is changed in the input if it affects half of the entire output meaning that the cipher text has a 50% probability of altering. X = avalanche score. Different set-ups for Pailler cryptosystem with its avalanche score is given in Table 1. x = (the no of swapped bits/Total no of bits) ElGamal is more suitable for multiplicative homomorphic applications. ElGamal supports additive homomorphic functionality by changing parameters, although it does not work for big input numbers. On the other hand the Paillier is good for Table 1 Different set-ups for Pailler cryptosystem with its avalanche score S. No.

Number of bits changed in plain text 1

2

3

4

1

0.478

0.533

0.518

0.495

2

0.500

0.478

0.539

0.484

3

0.479

0.527

0.524

0.502

4

0.524

0.486

0.523

0.498

5

0.513

0.493

0.498

0.513

Average

0.498

0.503

0.52

0.489

324

N. Aravindhraj et al.

implementation of additive homomorphic stuffs. The goal is to abstract the process of making a medication, which necessitates additive homomorphism. We favour Paillier over ElGamal because of its ease of use in offering additive homomorphism.

7 Conclusıon Thanks to a functional prototype, we were capable of carrying out homomorphic operations precisely as anticipated. As previously stated, and created an application that allows workers at a particular pharmaceutical firm to alter sensitive information on the server cloud without having to decode any of the encoded stored data on the cloud. Given the importance of keys in completing such homomorphic operations, it was critical to protect those keys that were distinct to each service user. To do this, created a cryptographic keys creation and retention mechanism that maintained the keys encrypted on the server and was only available via login information that were distinct in and of itself, offering an extra degree of protection to the application.

References 1. Zhao (2019) Homomorphic encryption technology for cloud computing. Proc Comput 2. Gentry C (2009) Fully homomorphic encryption using ideal lattices. In: Proceedings of the annual ACM symposium on theory of computing 3. Brakerski Z (2014) Efficient fully homomorphic encryption from (standard) (LWE). SIAM J Comput (2014) 4. Wang D (2017) A faster fully homomorphic encryption scheme in big data. In: IEEE 2nd international conference on big data analysis, ICBDA2017 5. Bhagadia1 D (2020) Securing pharmaceutical data using homomorphic encryption. Int J Future Gener Commun Netw (2020) 6. Biksham V (2017) Homomorphic encryption techniques for securing data in cloud computing. Int J Comput Appl 7. Ogburna M (2020) Homomorphic encryption. Science Direct 8. Mahmood ZH (2018) New fully homomorphic encryption scheme based on multistage partial homomorphic encryption applied in cloud computing. In: Annual informational conference on information and science 9. Paillier P (2010) Public-key cryptosystems based on composite degree residuosity classes. In: Eurocrypt 10. Sobitha Ahila S, Shunmuganathan KL (2014) State of art in homomorphic encryption schemes. Int J Eng Res Appl 4(2) 11. Gentry C (2009) A fully homomorphic encryption scheme. Doctoral dissertation. Stanford University 12. Poteya MM, Dr. Dhoteb CA, Mr. Sharmac DH (2016) Homomorphic encryption for security of cloud data, computing and virtualization. In: 7th International conference on communication, procedia computer science vol 79, pp 175–181 13. Tebaa M, El Hajii S (2013) Secure cloud computing through homomorphic encryption. Int J Adv Comput Technol (IJACT) 5(16) 14. Hayward R, Chiangb C-C (2015) Parallelizing fully homomorphic encryption for a cloud environment. J Appl Res Technol 13

Homomorphic Encryption to Improve Pharmaceutical Data Security …

325

15. Armknecht F et al (2015) A guide to fully homomorphic encryption. IACR 16. Smart NP, Vercauteren F (2010) Fully homomorphic encryption with relatively small key and ciphertext sizes. Public Key Cryptography-PKC, Springer, Berlin, Heidelberg

Medical Reimbursement Prediction Using Artificial Intelligence Monica Gaur, Suman Pal, Rupanjali Chaudhuri, Oshin Benny Anto, and R. Kalaivanan

1 Introduction According to a survey conducted by the American Hospital Association (AHA), the payment toward the healthcare provider is way underpaid by the major public payers which may jeopardize access to healthcare services in the populations they operate [1]. In 2017, there was a shortage of $76.8 billion in reimbursement toward patient care expenses by Medicare and Medicaid. Apart from underpayment, hospitals spent $38.4 billion toward care that was uncompensated, meaning hospitals did not get reimbursed by the insurer or the patient [2]. Healthcare reimbursement is the payment that hospitals, diagnostic facilities, or other healthcare providers receive for providing a medical service [3]. Often, the payers cover the cost of all or part of the healthcare service provided to the patient. Depending on the health plan, the patients may also be responsible for some of the cost, and in case they do not have any healthcare coverage, they will be responsible to reimburse the entire cost to healthcare providers for the medical services rendered. Suman Pal, Rupanjali Chaudhuri, Oshin Benny Anto and R. Kalaivanan these authors are contributed equally to this work. M. Gaur (B) · S. Pal · R. Chaudhuri · O. B. Anto · R. Kalaivanan India DataScience CoE, Oracle Cerner, Bangalore, India e-mail: [email protected] S. Pal e-mail: [email protected] R. Chaudhuri e-mail: [email protected] O. B. Anto e-mail: [email protected] R. Kalaivanan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_26

327

328

M. Gaur et al.

A healthcare provider sends hundreds of claims daily to insurance companies to get paid for the treatment provided to patients. Often, the reimbursement is far lower than the billed amount because of the underlying rules and regulations. Also, the reimbursement gets delayed or denied by the insurance companies due to reasons like medical services rendered not supported by the payer, the time to file the claim expired, etc. Hence, it is beneficial for healthcare providers to know the reimbursement amount early in the workflow and keep track of their upcoming reimbursements to avoid revenue leakage. Figure 1 depicts the Revenue Cycle Management (RCM) Accounts Receivable (AR) process followed by US healthcare providers to track the revenue collected in return for the services rendered to patients. In RCM, the digital communication between a hospital and a payer is done using Electronic Data Interface (EDI) transactions [4]. Hospitals electronically verify the patient’s eligibility for a healthcare service using EDI transactions 270 and 271 [5]. Post verification, the patient is admitted for diagnosis and appropriate procedures. The provided treatment is documented as a medical transcript, which is later converted to the relevant industry standards codes. Charges are assigned in accordance with the services provided, and an electronically generated claim file (EDI 837) is transmitted. The claim files identified as flaws in the in-house provider-side scrubbing process are labeled as invalid claims and are sent back to the edit queue for rectification. The valid claims from the provider’s system then pass through the payer-side scrubber. The claims that are unable to clear at this step are called rejected claims and the response is sent to the provider using EDI 277CA. The clean claims that pass the payer-side scrubber are then moved for the claim adjudication process. Here reimbursement amount would be determined, the claim may be paid in full or partial amount or may not receive any payment in case of denied claims. Following that, the payer shares the Explanation of Benefits (EOB) document that provides information on how the bill is broken down between the healthcare provider(s), the insurance provider, and the patient. An EDI 835, also called a remit file, is used to send the provider the reimbursement amount and EOBs. Lower healthcare reimbursement is one of the unaddressed problems in the healthcare industry. The authors of [6] aimed to discover variables influencing medical reimbursement rates for Medicare and Commercial health insurance holders in certain geographic regions of the US. Cerner Corporation has developed a claims denial prediction model using the standard EDI data with the aim to increase hospital revenue by reducing the claim denial rate [7]. In this study, a novel hospital reimbursement prediction framework is proposed that utilizes a boosting-based ML model to predict the reimbursement amount. Also, explainable AI is utilized to fetch the most informative features that led to lower reimbursement. The proposed model should be executed in the RCM workflow before the claim exits the system of the healthcare provider. Reviewing the claim at this stage can help the healthcare provider to proactively address the errors and avoid or minimize huge revenue loss.

Medical Reimbursement Prediction Using Artificial Intelligence

329

Fig. 1 Detailed RCM workflow

2 Data Summary The reimbursement prediction model was trained on EDI 837 and 835 data. These are industry-standard files used for the electronic submission of healthcare claims and payment information [8]. The professional (EDI 837P) and institutional claims (EDI 837I) were used as data input for the model. Institutional claims are filed for services provided by hospitals and skilled nursing homes, whereas professional claims are used for services provided by doctors, suppliers, and other non-institutional providers for both outpatient and inpatient services. Every claim file contains details specific to each patient and patient encounter whereas a remit file contains details specific to claim payment information. The claim files used were from April 2019 to August 2020 for a single healthcare provider. A total of 87,383 claim files and 51,874 remit files were extracted as shown in Table 1. The claim submitter id is a unique claim identifier. A claim may contain one or more line items and each line item corresponds to a service rendered to the patient. Each claim line was mapped with its corresponding remit line based on specific mapping criteria.

Table 1 Data summary Type Claim files Remit files Total line items in claims Total line item in remits Mapped claim-remit line items Mapped claim-remit line items (%)

Total

Professional

Institutional

87,383 51,874 254,153 131,934 101,136 76.67

41,895 24,215 62,001 33,423 30,509 91.28

45,474 24,899 192,152 98,511 70,627 71.69

330

M. Gaur et al.

3 Methodology The healthcare reimbursement prediction for each filed claim was solved as a regression problem. CATegorical Boosting (CATBoost) regression model was utilized for predictions [9]. The proposed advantages of the reimbursement model are claim edits and Work Queue Prioritization (WQP) as shown in Fig. 2. The claim edits require identifying claims with lower predicted reimbursement when compared with the historical reimbursement pattern by the payer for a specific service. Thereafter, valid edits can be made to a claim before the claim leaves the system in order to get the expected amount of reimbursement. In WQP, the predicted reimbursement can help in streamlining claim queues by prioritizing claims with higher expected reimbursement and lower time to response for follow-up [10]. The demonstration of the proposed advantages is shown in the results section.

3.1 Ground Truth The reimbursement amount, also known as the allowed amount, was used as ground truth for the proposed model’s development. The allowed amount can be defined as the maximum reimbursement amount that a member’s health insurance plan can permit for a certain service. The allowed amount is calculated by adding patient responsibility and the payment amount from the insurance for each line in the remit. Line-level Patient Responsibility (PR) is the amount that a patient is supposed to pay out of pocket. It may be present in the form of deductibles, co-insurance, copayments, or non-covered charges. Figure 3 shows the different components that add to the allowed amount.

Fig. 2 Expected reimbursement model methodology

Medical Reimbursement Prediction Using Artificial Intelligence

331

Fig. 3 Allowed amount

3.2 Data Preparation and Transformation The data utilized were EDI transactions (837P, 837I, and 835) that were originally present in x12 data format (version 5010) and were converted into structured data tables. These EDI transactions are as per American National Standards Institute (ANSI) transaction set. The utilization of standard files as the data source for the model makes it a vendor-agnostic product, i.e., it can be utilized by all the healthcare providers who generate claims as per ANSI standards. The claims were later aligned with their corresponding remits based on certain factors that were determined after analyzing the timelines of sent claims and received remits. It was observed that the remit files of certain claims were not available. The potential cause behind this was the unavailability of electronic remits as few insurers prefer paper-based responses as remits. It is quite common for a claim to be represented in various claim lines where each line relates to the information about every service provided during the patient’s stay. A claim level charge can be determined by summing up the charges at the line level. Similarly, a remit can also be represented in various remit lines where each line corresponds to the payment for service and EOBs to justify the payment. In the professional claims dataset, it was observed that the payer responses along with their payments ($0 payments in case of denial) were transmitted at the line level by the payers. Hence, predictions were made at the line level. In the institutional claim dataset, payments are received at the claim level as well as at the line level. A claim-level payment is mostly done when a bundled payment is involved. A bundled payment is a comprehensive payment made for a solitary episode of care instead of paying for each individual service [11]. For bundled payments, it was observed that only the payments and adjustments were done at the claim level where the entire

332

M. Gaur et al.

payment was received within the initial few lines (one- or two-line items) of the claim with a Claim Adjustment Reason Codes (CARC) of 94, described as processed in excess of charges. The remaining line items of the claim with bundled payment had $0 payment with CARC code of 97, described as ‘benefit for this service is included in the payment/allowance for another service/procedure that has already been adjudicated’. For model explainability purposes, bundled payments were removed from the data set, and predictions were done at the line level for institutional claims as the services rendered by the provider are present at the line level. The claim files are passed through a scrubbing process before sending them to the payers. Scrubbing is a rule-based process that scans the claims for any human errors like alphanumeric characters present in the patient’s name. Such errors can lead to claim rejection by the payer. The dataset utilized for model development was more pristine since the claim files were extracted following the scrubbing procedure. There could be more than 900 fields in a claim file. Therefore, rigorous cleaning procedures were used to ensure that only the most relevant features were passed further to the model build pipeline. The following steps capture the data cleaning process: • Denied claims were removed from the data as they represent $0 reimbursement and would contribute to noise for the model build. • Features representing similar information were eliminated. For instance, a physician’s ID and first, middle, and last names would all refer to the same individual. • Features with a variance of zero were dropped. • The sparse columns were converted into indicator fields that were used to signify the presence of field values. Some of the sparse fields in claim files are situational fields, i.e., they are only populated to provide extra information that is crucial for the payer to know. • Based on thorough data analysis alongside domain knowledge, some of the features were engineered. • Missing values were handled using the imputation method. In the categorical attributes, nulls were replaced with ‘unknown’ and in the numeric attributes, nulls were replaced with ‘0’. • Data Anomalies were removed from the dataset. Figure 4 shows the standard set of input features post-data cleaning that were considered for the feature selection. The engineered features were created using the domain knowledge provided by Subject Matter Experts (SMEs). One such feature is ‘days to claim submission’. This feature represents the time taken to submit a claim to the insurer. Another engineered feature was ‘repeated procedures in a claim’ is a dummy variable that indicates the presence of a procedure that was repeated within a day.

Medical Reimbursement Prediction Using Artificial Intelligence

333

Fig. 4 Standard input features

3.3 Feature Selection It is quite common to select random attributes from a large number of features that may prevent the model from generalizing well on unseen data. This issue can be resolved with Gradient Boosting as it provides a significance score that shows the importance of each feature in building the model. A condensed collection of attributes were extracted using eXtreme Gradient Boosting Classifier (XGB) [12] that captured the best-represented information about reimbursement prediction. The 10-fold stratified cross-validation approach provided an impartial evaluation of XGB model performance on the test set that helped in the selection of the features that generalized well on unseen data. The data were stratified by patients to ensure no information about the patients present in the train sets flows to the test sets. Figure 5 explains the feature selection process. The key factors used to select relevant features were rank and negative mean score. A feature rank denotes its usage frequency across all folds. A negative mean score of a feature denotes its negative contribution frequency across all folds. A threshold feature rank and negative mean feature score were used to filter the relevant features. The number of features selected from XGBoost feature selection for the Professional and Institutional prediction model is 14 and 31, respectively.

3.4 Model Development The reimbursement prediction model was built using CATBoost regressor with the optimized hyper-parameters post feature selection. Bayesian optimization [13] was utilized to tune the hyper-parameters. Other methods like Grid Search [14] tries out all the parameter combinations, and Random Search [15] tries only a few ‘random’ combinations and can be extremely time-consuming with less luck in finding the best

334

M. Gaur et al.

Fig. 5 Feature selection using XGB

parameters. On the other hand, Bayesian Optimization provides the best estimates, constantly learning from the earlier optimizations to discover the ideal parameter list. Moreover, a lesser number of samples are required to learn the best hyper-parameter values. The hyper-parameters used for tuning were iterations, learning rate, depth of the tree, and L2 regularization term. RMSE was used as the loss function [16]. Boosting algorithms are still the go-to tool when compared to deep learning algorithms as they require an enormous amount of data and computational power and also their superiority in tabular data is unclear [17]. The CatBoost algorithm was utilized for the predictions as our dataset contained around 95% of categorical variables and it outperformed other boosting algorithms like XGBoost and LightGBM. CATBoost also comes with a number of advantages: • Categorical variable encoding: Catboost does not require explicit encoding of categorical features. It utilizes a target encoding process to convert categories into numbers.

Medical Reimbursement Prediction Using Artificial Intelligence

335

• Better Outcomes: CatBoost offers cutting-edge results and can compete in terms of performance with any top ML algorithm. • Robustness: It lowers the need for intensive hyper-parameter tuning and lessens the possibility that overfitting may result in more generalized models.

3.5 Explainability Using SHAP Machine learning and artificial intelligence have unleashed a lot of power for the organizations utilizing them but they fail to provide interpretability and explainability of the predictions that end users can easily grasp and comprehend. The reimbursement model explainability was achieved using the SHapley Additive exPlanations (SHAP) package. It is a game-theoretic approach that is capable of explaining both global and local interpretability [18]. The global interpretability of each predictor attribute describes how much it contributes positively or negatively to the ground truth. Local interpretability provides the capability of feature importance at the individual line level, aiding administrative staff in understanding the drivers of the low prediction value and avoiding a machine learning ‘black box’ approach. Figure 6 shows the SHAP summary plot that explains the feature importance and its contribution to prediction. The features are shown on the y-axis, while the SHAP values are shown on the x-axis. The feature ‘line_item_charge_amount’ is the most important feature at the dataset level and it has a high range of SHAP values. The color gradient on the

Fig. 6 SHAP summary plot

336

M. Gaur et al.

Fig. 7 Individual SHAP decision plot

vertical scale indicates the SHAP value range from high to low. Overlapped points are scattered in the y-axis direction to aid in understanding the spread of SHAP values for each feature. The attributes are ranked in order of significance. The SHAP decision plot effectively demonstrates the model’s decision-making process [19]. Figure 7 shows the decision plot of a predicted reimbursement value. The base value, indicated by the grey vertical line, is the average of the actual reimbursement in the training dataset. The blue line running through the plot captures the direction of the feature effect in moving the prediction value to the higher or lower end. This blue line which starts at the bottom of the plot shows how the SHAP values diverge from the base value to the model’s final prediction, which is shown at the top of the plot. As decision plots directly represent SHAP values, they are easy to interpret.

4 Results The model’s performance was evaluated by grouped 10-fold cross-validation stratified by patients. Table 2 displays the performance metrics of the reimbursement model. It represents the average of performance metrics across 10 folds. Adjusted R-Squared (also known as R2 or the coefficient of determination) and Root Mean Squared Error (RMSE) were used in model evaluation. R-Squared is a statistical metric that quantifies the extent to which an independent variable can explain all the

Medical Reimbursement Prediction Using Artificial Intelligence Table 2 Model performance metrics Metric Professional Line-level RMSE Claim-level RMSE Adjusted R-Squared

42.60 56.48 0.905

337

Institutional 314.31 629.43 0.795

variation in the dependent variables. It was observed that the Adjusted R2 value of the professional dataset was comparatively better than that of the institutional dataset. RMSE measures the absolute error value between the actual and the predicted value and solely applies to the dataset from which it was derived. The RMSE value of the institutional dataset is higher compared to the RMSE value of the professional dataset as the average reimbursement amount is higher for institutional claims compared to professional claims. Identification of lower reimbursement claims: The Reimbursement Range can be used as a reference range by healthcare providers to monitor whether the reimbursement received against a particular claim is unusually lower than expected. The reimbursement range is calculated from historical claims and remits. A sample output of a claim falling in the lower range of the reimbursement is shown in Fig. 8. It represents a claim with the payer ‘Medicaid Outpatient’ and procedure ‘29807’ is estimated to have a lower reimbursement when compared with the minimum reimbursement done in the past (as shown in the field reference_range_statistics). Estimated Reimbursement in Work Queue Prioritization: In the Patient Accounting system, administrative personnel manually decide on work queue followup. This in turn leads to decrease in cashflow when low-priority claims are followed up and further leads to the wastage of healthcare providers’ time and resources. According to the hospital benchmarks, AR days for healthcare providers’ ranges between 30 and 70 days [20]. However the majority of specialists do concur that an average AR day count exceeding 50 denotes an issue with medical billing or collection procedures. The notion behind developing a WQP model is to prioritize claims that would maximize hospital cashflow and facilitate healthcare staff for AR followup. WQP can be accomplished by knowing the Time To Response (TTR) of a claim and the expected reimbursement from that claim. Table 3 shows that the priority of a claim can be calculated by summing up TTR probability [10] and the weighted log of expected reimbursement. The weight is assigned to increase or decrease the impact of reimbursed amount on claim priority.

5 Strengths The use of standard EDI files for model development and prediction makes this solution vendor agnostic as the metadata is standard across all the US healthcare

338

M. Gaur et al.

Fig. 8 Claim with lower reimbursement Table 3 Claim prioritization using reimbursement and TTR Concept Urgency Impact Source AKA

Output of ML model Time to response

Output of ML Calculation model Net reimbursement Log claim amount

Probability of response 81.3

Expected reimbursement $712.34

LOG10 (claim amount) * 5 14.26

Priority Calculation Log claim amount + Time to response Time to response + Log claim amount 95.56

providers and does not require customized integration with the hospital systems. The model is no longer a ‘black box’ because of the explainable AI’s contribution to the model’s interpretability. The administrative personnel can use it to make the necessary modifications and prevent underpayment of claims. Additionally, it can assist in expediting the review process for claims that need to be corrected at the earliest stage of the claim submission cycle which makes it the least expensive.

6 Conclusion We developed an ML model to accurately make predictions of reimbursement amounts that can help US healthcare providers overcome their biggest problems related to underpayment of medical claims. Also, it helps in prioritizing the claims having higher reimbursement for AR follow-up keeping into consideration the response time of the payers. The easy integration of the solution makes it platform friendly. We also selected a reliable set of features to provide the most accurate predictions. Model explainability was utilized to highlight the potential key features that lessen the reimbursement, thereby consuming lesser time to look through the full claim file. We can further enhance the model performance by training it on more consumer data. Also, we can utilize EDI 270 & 271 that captures the health plan coverage details of the patient that will help in more accurate reimbursement prediction. Next, we want to go further upstream to enable early identification of claims that can result in an underpayment.

Medical Reimbursement Prediction Using Artificial Intelligence

339

References 1. Fact Sheet (2023) Underpayment by medicare and medicaid. America Health Association. https://www.aha.org/fact-sheets/2020-01-07-fact-sheet-underpayment-medicare-andmedicaid/. Accessed 6 Feb 2023 2. LaPointe J (2023) Medicare, medicaid reimbursement $76.8b under hospital costs. https:// revcycleintelligence.com/news/medicare-medicaid-reimbursement-76.8b-under-hospitalcosts. Accessed 6 Feb 2023 3. Torrey T (2023) Understanding healthcare reimbursement. https://www.verywellhealth.com/ reimbursement-2615205#. Accessed 6 Feb 2023 4. Electronic Billing & Edi Transactions. https://www.cms.gov/Medicare/Billing/ ElectronicBillingEDITrans. Accessed 6 Feb 2023 5. X12 document list. https://en.wikipedia.org/wiki/X12_Document_List. Accessed 6 Feb 2023 6. Wang QC, Sickler AP, Chawla R, Nigam S (2015) Predicting medical reimbursement amount— what factors drive the medical cost trend 18:269. https://doi.org/10.1016/j.jval.2015.03.1569 7. Pal S et al (2022) Driving impact in claims denial management using artificial intelligence. In: Singh M, Tyagi V, Gupta PK, Flusser J, Oren T (eds) Advances in computing and data science, vol 1613, pp 107–120. https://doi.org/10.1007/978-3-031-12638-3_10 8. Medical Billing. https://en.wikipedia.org/wiki/Medical_billing. Accessed 6 Feb 2023 9. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) Catboost: unbiased boosting with categorical features. In: Proceedings of the 32nd conference on NeurIPS. https:// proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf 10. Chaudhuri R, Parsa SPK, Nagpal D, Kalaivanan R (2022) Time to response prediction for following up on account receivables in healthcare revenue cycle management. In: Singh M, Tyagi V, Gupta PK, Flusser J, Oren T (eds) Advances in computing and data sciences, vol 1614, pp 124–137. https://doi.org/10.1007/978-3-031-12641-3_11 11. Bundled payments for care improvement (bpci) initiative. https://innovation.cms.gov/ innovation-models/bundled-payments. Accessed 6 Feb 2023 12. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: 22nd ACM SIGKDD international conference on KDD, pp 785–794. https://doi.org/10.1145/2939672.2939785 13. Snoek J, Larochelle H, Adams RP (2018) Practical Bayesian optimization of machine learning algorithms. In: 25th International conference on NeurIPS. https://proceedings.neurips.cc/ paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf 14. Tuning the hyper-parameters of an estimator. https://scikit-learn.org/stable/modules/grid_ search.html, Accessed 6 Feb 2023 15. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. http://jmlr.org/ papers/v13/bergstra12a.html 16. Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy, vol 2. https:// doi.org/10.1016/j.ijforecast.2006.03.001 17. Grinsztajn L, Oyallon E, Varoquaux G (2022) Why do tree-based models still outperform deep learning on tabular data?. https://arxiv.org/abs/2207.08815 18. Lundberg S, Lee S-I (2017) A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on NIPS 19. Decision plot. https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/ decision_plot.html. Accessed 6 Feb 2023 20. Gamble M (2023) 40 hospital benchmarks. https://www.beckershospitalreview.com/hospitalmanagement-administration/40-hospital-benchmarks.html. Accessed 6 Feb 2023

An Intelligent System for Plant Disease Diagnosis and Analysis Based on Deep Learning and Augmented Reality G. A. Senthil, R. Prabha, J. Nithyashri, S. Revathi, and R. Mohana Priya

1 Introduction Crop diseases represent a significant danger to the safety of food. Furthermore, its quick discriminating test is still problematic in many regions of the world. To diagnose disorders, numerous laboratory procedures such as polymerase chain reaction, gas chromatography and mass spectrometry, thermal imaging, and hyper spectral techniques have been applied. This is due to a scarcity of important historical data. However, these technologies are both costly and time-consuming. We primarily focus on the data set generation, collection of features, classification training, and classification implementation processes. HOG is used to extract features (a gradient G. A. Senthil (B) Department of Information Technology, Agni College of Technology, Chennai, India e-mail: [email protected] R. Prabha Department of Electronics and Communication Engineering, Sri Sai Ram Institute of Technology, Chennai, India e-mail: [email protected] J. Nithyashri Department of Computing Technologies, School of Computing, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, India e-mail: [email protected] S. Revathi Department of Computer Science and Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India e-mail: [email protected] R. Mohana Priya Department of Electronics and Communication Engineering, Aarupadai Veedu Institute of Technology, Vinayaka Missions Research Foundation (Deemed to Be University), Salem, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_27

341

342

G. A. Senthil et al.

histogram). Machine learning is generally utilised to train huge data sets and provide us with a clear approach for detecting wide-scale ailments in the leaves of plants. Curling, crinkling, oozing out, colour change, and dots on the leaf are the main symptoms that the plant exhibits. The primary symptoms are the creation of brown, red, grey, black, and yellow-coloured rings. Gall midge infestation, black mould funnel, plant malformation, flesh weevil, stem mine, anthracnose, Alternaria leaf spot, and other parasites are examples of common plant diseases. Insects, bacterial, fungal, and viral diseases attack leaves, flowers, fruits, and stems to harvest plant yields, causing these ailments [1, 2]. Modern technologies provide various facilities to cultivate, yield crops and increase the production to meet the demand. The increase in the crop yield cultivation proportionately the increase in the disease caused by the pathogens. It is a nightmare for the farmers who cultivate the crops to get their crops or plants infected with a disease. Recent years the deep learning techniques replaced the naked eye vision identification of the plant disease, these progressive identified various diseases that are caused in the case of plant species especially in crop fields. The traditional machine learning methods face many issues such as natural lighting conditions. Using deep learning techniques in recognising, the disease in plants can solve the difficulties in improving the system. The proposed system uses CNN, KNN, and InceptionV3 algorithms for developing the model [3, 4]. The deep learning technique, convolutional neural network (CNN) is mostly used for image-based tasks including segmentation, image processing, and image classification. Layers of convolution, pooling, concatenate, and dropout make up the convolutional neural network. The input pictures are divided into tiny pixels by these layers, which travel through them to explore the tiniest features for each layer’s neurons [5–7]. The K-nearest neighbour algorithm is classification algorithm [8– 10]. Minkowski distance, Euclidean distance, and Manhattan distance are the three distinct distance calculation techniques that the K-nearest neighbour will employ to determine the closest neighbour for the dilemma classification issues. This model of a neural network made up of convolutional neurons has 48 layers of InceptionV3 were pre-trained. It is a network that was trained using a subset of the ImageNet databases over a million pictures. In an interactive experience known as augmented reality, a real-world environment is enhanced with artificial intelligence computer-generated information about perception, which can cover several sensory modalities, including visual, auditory, haptic, somatosensory, and olfactory [11–16]. The system then uses the deep learning model to conduct the detection and recognition module. Lastly, the system uses augmented reality (AR) to present the associated graphical data overlaid on the text that was detected. The ONNX model is fed into the environment and the output of the model is converted into an AR object which is the result of the deep learning model fed into the unity environment. Because the produced deep learning model is incompatible with the Unity game engine, the developed model was changed to one that is compatible. In order to express machine learning models, an open format called ONNX was developed. This enables artificial intelligence developers to utilise models with a range of frameworks, tools, runtimes,

An Intelligent System for Plant Disease Diagnosis and Analysis Based …

343

and compilers by defining a standard set of operators, the essential pieces of deep learning, as well as models for machine learning, and a single file format [17–21]. The Unity engine’s Barracuda package may be used to do this. Unity Barracuda is a compact, cross-platform library for Neural Net inference. Barracuda can run neural networks on both the GPU and the CPU. Barracuda is still in the area, so there will likely be adventures in sample developmental stages. The detection system is developed using various deep learning algorithms and compared with the accuracy yielded by each of the algorithms. The resulting model is fed into the unity engine as the ONNX model. The output of the model is then displayed as AR output using Vuforia SDK, a tool for displaying the AR object in the unity. Reason behind the use of AR in the system is that it makes the user to feel an immersive and interactive experience, thereby enabling the complete understanding of the script.

2 Related Work Kumar et al. [22] in their research proposed a system that estimates the leaf wetness duration using deep learning techniques. The deep learning techniques employed in the research. Data now has a better time resolution of 4 min as opposed to 30 min. With better temporal resolution, the mean signal-to-noise ratio (SNR) levels for three test signals went from 19.36 to 22.16 db. Similar to this, the predicted LWD values’ mean root means square error (RMSE) decreased from 0.55 to 0.20 following temporal resolution. As compared to previous time-resolution approaches, the suggested architecture calculates LWD values more accurately. Ahmed et al. [23] in their work harvests features using a hybrid model made up of a classifier MobileNetV2 framework and a filter network. The data set is collected from the crop yields of the rural areas and the model is developed accordingly to have exact identification of the disease caused. Runtime augmentation replaces conventional augmentation methods in order to prevent data leaks and handle the problem of class imbalance. The proposed architecture yields 99.30% accuracy with a target size of 9.60 MB and 4.87 M bobbing operations, making it an acceptable option for low-end devices, according to evaluation using tomato leaf pictures from the Plant Village data set. Fatimi et al. [24] proposed a model was trained using the MobileNetV2 architecture under some controlled settings as MobileNet to meet the classification requirements, to see whether we could gain shorter training times, greater accuracy, and simpler retraining. The obtained results demonstrated that in out MobileNet model attains performance of the classifier for legumes leaf disease, with the proposed model’s average classification accuracy exceeding 97% on training data sets and 92% on test data sets for two undesirable classes but one healthy class, respectively. Hassan et al. [25] using three separate plant disease data sets, the suggested model was trained and evaluated. The separated plant disease data set possesses the usage of various characteristics of the plant and the variations in the kind of disease caused. The plant village data set had an overall performance accuracy of 99.39%, the rice

344

G. A. Senthil et al.

Table 1 Comparison between existing and proposed work Author

Existing work

Proposed work

Kumar et al. [22]

Estimates the leaf wetness duration using deep learning techniques, after temporal resolution was enhanced, the estimated LWD values’ average root means square error (RMSE) decreased from 0.55 to 0.20

Hybrid model with improved accuracy

Ahmed et al. [23]

A hybrid model made up of a classifier MobileNetV2 To identify plant framework and a filter network using plant village data set illnesses, CNN, KNN, for the detection of plant disease in the crop fields and InceptionV3 models are used

Fatimi et al. [24]

Proposed a model was trained using the MobileNetV2 architecture under some controlled settings as MobileNet to meet the classification requirements

Uses CNN, KNN and InceptionV3 models to detect the plant diseases and improved accuracy

Hassan et al. [25]

The proposed model was trained and tested on three independent plant disease data sets. The plant village data set had an efficiency accuracy of 99.39%, the rice disease data set had a rating accuracy of 99.66%, and the cassava data set had a performance accuracy of 76.59%

A homogeneous data set is employed, which enhances system performance

Zinonos et al. [26]

Proposed a system approach that does not require any training data to modify settings and is both effective and adaptable to the characteristics of each leaf disease

Classifies the plant disease based on the overall data set

disease data set had a performance accuracy of 99.66%, whereas the cassava data set had a performance accuracy of 76.59%. In compared to cutting-edge deep learning models, the proposed model achieves higher accuracy while using fewer parameters. Zinonos et al. [26] proposed a system approach that does not require any training data to modify settings and is both effective and adaptable to the characteristics of each leaf disease. It is important to note that because of cutting-edge advances in the area of Explainable Artificial Intelligence, end-user confidence in machines and models based on deep learning has greatly grown in recent years (XAI). The GradCAM approach is used in this work to show the CNN’s output layer judgements. The visualisation results show that the illness spot location is significantly stimulated. The network makes a distinction between several grape leaf diseases in this way (Table 1).

3 Proposed System The proposed work is broken down into three main modules. Creating a deep learning model using the CNN, KNN, and InceptionV3 algorithm, converting the model to an ONNX model and integrating it into Unity using the Barracuda package, and showing the results in augmented reality utilising the Vuforia SDK. The project’s technical needs to include the Vuforia SDK for loading AR tools, the Unity engine for

An Intelligent System for Plant Disease Diagnosis and Analysis Based …

345

Fig. 1 Architecture diagram of the plant disease analysis using AR

creating the application, and Google Colab for creating the deep learning model. The proposed system’s deep learning module employs three deep learning algorithms, the most accurate of which is used to identify plant illness. Figure 1 depicts the architectural schematic plant diseases analysis using augmented reality and illustrates the process undergoing in the plant disease analysis. When a new image is sent into the application, it proceeds with image processing techniques then the important features are extracted for analysis, then the image is sent to the InceptionV3 model which is trained with different data sets and the model is converted to ONNX model for fitting it with an AR application, Finally, the application classifies the plant and find the region in which the plant is affected and shows the result of the classification in augmented reality.

3.1 Data Set Description The data set contains approximately 70,000 different images collected from 14 different plants like apple, tomato, grape, potato, orange, strawberry, cherry, blueberry, pepper bell, and raspberry plants which are affected by 26 different diseases and normal plants to classify the healthy plants. The data set was published in Kaggle (link).

346

G. A. Senthil et al.

3.2 Feature Engineering The goal is to create a deep learning model that classifies 26 distinct illnesses from 14 distinct plant species. The data set is huge with more than 70,000 images, so the feature engineering step plays a major role and makes a sensible change in developing the model. The data set contains two directories one is for training that contains 80% of total data from training a deep learning model and the other folder is for testing the model performance which contains 20% of data. To perform feature engineering, the data set is explored using exploratory data analysis methods like, describe, shape, unique, group by and other analysis metrics available in panda’s library, where the data set information is found and analysed for feature engineering and model creation. The data set contains 14 different plants affected by 26 different disses like spider mite, black measles, bacterial spot, and other diseases which affects the plant health that causes immature growth, affects the leaves and fruits. To perform smooth classification for these huge data sets, it should contain a balanced data set, so the number of images for each category should be balanced. The performance of the model can be improved by increasing the numbers of data in the data set. Feature engineering also helps in improving the performance of the model. Table 2 gives the number of images is inferred that all the categories in the data set contain balanced data sets with a maximum difference of 10,000 images, so the data set consistency does not affect the model creation and accuracy of the classification. Figure 2 shows the majority of fungi that produce leaf spot are generally hostspecific and do not spread easily across a variety of plant habitats. However, because they all require relatively similar environmental conditions to infect, they frequently arise on diverse hosts at the same time. To the untrained eye, each of them appears to be the same virus. The spot on the leaves can be treated with a variety of methods. They can be seldom chemically controlled and frequently adequately managed by adhering to proper sanitation as well as cultural techniques. After the exploratory data analysis now, the data are visualised to gain a knowledge on how the data set looks and representing the relevant details in the visual format. The data set which contains images in different folders are labelled and stored in a single directory named train. Now, the below Fig. 3 shows the leaf of the apple plant affected by scab disease. In the data set, there are many images with poor quality, with small detailing and blurry images these images will affect the model efficiency to avoid these images with low-quality images being dropped from the data set. After cleaning the data set, the new train set is generated that can be used for training the deep learning model. Figure 4 shows the different healthy and unhealthy leaf’s that are used for training batch. Now after performing exploratory data analysis, feature engineering the images are down sampled to 244 × 244 size. Because the images with large size will have more parameters than the actual amount leads to inaccurate classification of classes, missing the details of the image, need of huge data set, and finally, it

An Intelligent System for Plant Disease Diagnosis and Analysis Based … Table 2 Number of plant leaf images in each category

347

S. No

Tested plant category

No. of images

1

Tomato_Late_Blight

1861

2

Tomato Leaf Healthy

2900

3

Grape Leaf Healthy

2602

4

Soyabean Leaf Healthy

2022

5

Potato Leaf Healthy

2300

6

Tomato Early Blight Leaf

1980

7

Strawberry Leaf Scorch

2350

8

Peach Leaf Healthy

6500

9

Apple Scab

2500

10

Apple Black rot

1895

11

Blueberry Leaf Healthy

4200

12

Peach Bacteria Leaf Spot

1700

13

Apple Cedar rust

2700

14

Potato Bacteria Leaf Curl Virus

1878

15

Tomato Yellow Leaf Curl Virus

1989

16

Sweet Corn Northern Leaf Blight

1789

17

Square Powdery Mildew Leaf

1980

18

Jack Fruit Leaf

1690

19

Paddy Virus Leaf

1790

20

Ground Nut Leaf

1280

requires more GPU for training the model with big size images and finally data set is ready to fit with an algorithm. The above confusion matrix is Fig. 5 shows the correlation between the different features. It illustrates the correlation of the plant with its disease and healthy plants.

3.3 Augmented Reality To give an immersive and interactive experience for the users to detect the plant diseases, the augmented reality techniques are employed. Augmented reality (AR) uses an attachment such as a smartphone or glasses to give the user with visual elements, audio, and other sensory information. This information is overlaid onto the device to create a seamless interaction in which digital information alters the user’s experience of the physical surroundings. A component of the natural world might be hidden or added to by the superimposed information. The implementation of the AR model is done in the unity engine with the supplement of various components as Vuforia, barracuda, and Cloudinary.

348

G. A. Senthil et al.

Fig. 2 Classified infected leaf

3.4 Barracuda Package To import the developed deep learning model in unity engine, a package called barracuda is employed. A small, cross-platform library for Neural Net inference is called Unity Barracuda. Both the GPU and the CPU can execute neural networks on Barracuda. Adventures are anticipated as Barracuda is still in the preview development stage. Using the package, the developed keras model can be imported into the unity environment. The Barracuda package uses ONNX format to execute the model in unity. It was created to express machine learning models. ONNX provides a common set of operators, the essential components of machine learning and deep learning models, and a single file format to allow Artificial Intelligence developers to use models with a variety of frameworks, tools, runtimes, and compilers. The

An Intelligent System for Plant Disease Diagnosis and Analysis Based …

349

Fig. 3 Leaf of the apple plant affected by scab

Fig. 4 Images for first batch of training

developed deep learning model is converted into ONNX model and imported in the unity engine to display the output in augmented reality.

350

G. A. Senthil et al.

Fig. 5 Confusion matrix for the plant disease classification

3.5 Vuforia SDK The Vuforia SDK is a software development kit (SDK) that allows you to create augmented reality (AR) apps. With the SDK, you can incorporate cutting-edge computer vision capabilities into your application, allowing it to quickly identify images, objects, and places while interacting with the actual environment through easy set-up options. Using this SDK, Fig. 6 the output of the ONNX model is displayed in augmented reality, the unity engine supports Vuforia to achieve leaf image analysis and classification. A 3D model is created in unity and displayed in augmented reality and the output of the model is displayed efficiently. Vuforia Portal is shown in Fig. 7.

An Intelligent System for Plant Disease Diagnosis and Analysis Based …

351

Fig. 6 Loading the model in unity

Fig. 7 Vuforia portal

3.6 Cloudinary With the help of the Unity Cloud Build API, you can easily automate your builds and benefit from incremental cloud builds across several platforms. Developers may design a cloud-based production pipeline using this API, allowing them to preserve a constantly high-quality product and make builds on the go. For exporting the model in cloud, this cloudinary is employed and used. So, the cloud deployment of the model makes the possibility of using the resources anywhere and anytime.

4 Methodology In the model development phase, the data set is fit into the algorithm and the algorithm loss, variance, bias and other metrics are calculated to evaluate the model performance with each algorithm until a model is developed that satisfies the required metrics. Since the data set contains many classes and the data set is in image format, it is hard to use machine learning techniques, so deep learning techniques are used to create an accurate model for the classification of earlier classification of plants with different diseases.

352

G. A. Senthil et al.

4.1 Conventional Neural Network (CNN) The Convolutional Neural Network (CNN) is a method of deep learning techniques used mostly for image-based applications such as image categorization, imagerecognition, image processing, and segmentation. The CNN is made up of a set of layers of convolution, pooling, concating, and dropout. These layers break the input images into smaller pixels and pass through the layers to explorate the minute details for every neuron in the layer. This process is recurrently executed to improve the ability of neural networks to recognise and classify images. It is a sequential process of building layer-by-layer in a forward propagation way to feed and execute the network. CNN is used predominantly in all image classification deep learning models. This is used widely because of the many inbuilt convolutional layers and it scrutinises the image and gives accurate results. It also reduces the number of parameters in the network. Figure 8 shows a single cell of input is selected from the collection of cells to the next layer to the convolution process. The generation of 6-layer neurons which will use optimization techniques and this is finally sent to a neural network which is fully connected to every other neuron and finally creates the output. G[m, n] = ( f ∗ h)[m, n] =  j k h[ j, k] f [m − j, n − k]

(1)

This is the starting point of every convolutional neural network. The convolution layer holds the information that is much needed by refining each input data. This information is the main source of activation to compute with neural networks. The filtering out process can be specified by custom parameters. In the equation, the input images are represented as f and kernel as h. The indexes of rows and columns of the result matrix are marked with m and n, respectively. Figure 9 shows the architecture of max pooling layers are considerate over the output of convolution layers and helps to down sample it through its maximum or average input values. This process prevents the model from overfitting into the data set.

Fig. 8 Architecture diagram of convolutional neural network (CNN)

An Intelligent System for Plant Disease Diagnosis and Analysis Based …

353

Fig. 9 Architecture of max pooling layers

h ∗ w ∗ c = (h − f + 1)/s ∗ (w − f + 1) ∗ c

(2)

h and w are the height and width of the feature map, respectively, c is the channel presented in the feature map, f is the size of the filter, and s is striding length. Flattening layer will result in the output as a single continuous vector from the input of a two-dimensional array which is extracted from the feature map. For example, if the input to the layer is an H-by-W-by-C-by-N-by-S array (sequences of images), then the flattened output is an (H * W * C)-by-N-by-S array. Dense layer is the final stage of a neural network that helps to classify images by the output values of the precedent layer. This is the activation function used in the neural network which is called a rectified linear unit, the extracted output layer is feeded as input that helps to produce rectified feature maps for every image. Flattening layer will result in the output as a single. Thus, the output sets 0 for all the negative. f (x) = max(0, x)

(3)

The function f (x) is continuous and the slope of f (x) for all x < 0 is clearly 0 and the slope of f (x) for all x > 0 is clearly 1. The neural network is built with numerous numbers of the above-mentioned layers and trained with the provided leaf images data set. Model is executed over 7 epochs and attains 96% of accuracy. Training and validation accuracy for KNN and CNN are shown in Fig. 10.

4.2 K-Nearest Neighbours (KNN) The KNN algorithm is used for the multi-class classification because the k-nearest neighbour will classify the multi-class problem more accurately with the help of kvalue. The k-nearest neighbour will make use of three different distance calculation algorithms for calculating the nearest neighbour for the predicament classification

354

G. A. Senthil et al.

Fig. 10 Training and validation accuracy for KNN and CNN

problems; they are Minkowski distance, Euclidean distance, and Manhattan distance. The Euclidean algorithm is used because of the use cases and it consumes less time in finding the nearest neighbour. d(x1 , x1 ) =



(x1 − x1 ) + · · · +

 (xn − xn )

(4)

The above equation is the equation for Euclidean distance which finds the distance between the data point with the help of two nearest points and the point which is very close to the new data variable will be considered and classified as that class. P(y = j|X = x) =

 1   (i) I y = j k i A

(5)

Finally, the class which has the highest probability will be classified as a result with the help of the above equation where the probability of the distance will be calculated. Choosing the K value for the algorithm is the import part in KNN algorithm, here hyperparameter tuning is used for choosing the best K value for classification. The lower the K value will result in low bias classification and high variance but with higher value it produces low variance with high bias, so the K-value in the mid average can be used for perfect classification. From the above Fig. 11, the K-values that range from 5 to 10 produced less error rate that eventually reduce the bias and variance so the algorithm is trained with k value as 8 and thus produce an accuracy of 96% which is quite good for real-time classification problems. The proposed solution contains three models. The model developed using CNN and KNN provided 4% recognition error but the model developed using InceptionV3 architecture provided recognition error of 2%. This shows out of 100% recognitions only 2% recognitions will be wrong. The InceptionV3 model produced only 2% of recognition error which is quite good for real

An Intelligent System for Plant Disease Diagnosis and Analysis Based …

355

Fig. 11 Error rate with respect to different K-values

time prediction. But the recognition error can be further reduced by developing, the model with more epochs that produces accurate results and the recognition error can also decrease by using more data set for model training, data collected by the model performance.

4.3 Transfer Learning (InceptionV3) The model developed using CNN produced only 96% accuracy so transfer learning is used which uses a pre-trained model and load it with our custom data to develop a transfer learning model that can predict based on our configuration and it also has the capability to predict or classify the objects based on the existing images that are already trained with the model itself. InceptionV3 is a type predefined model with a set of 42 layers. The inception architecture consists of five layers of optimization in the first layer the input image is passed with a 1 × 1 convolutional neural network, where its analysis each pixel of the images and try to learn the patterns followed in the image. Next the image is passed to a 3 × 3 neural network that finds the spatial patterns followed in each class with minute details in the data set, further it is proceed with 5 × 5 neural network by now the image size is reduced to small to follow the spatial patterns in the images with more details that accurately differentiate between different classes.

356

G. A. Senthil et al.

Table 3 Layers in the InceptionV3 model

Layers (type)

Output shape

Param #

Inceptinal_v3 (Functional)

(None, 5, 5, 2048)

24, 702, 784

Flatten

(None, 51,200)

0

Dense

(None, 512)

29, 214, 813

Dropout

(None, 512)

0

Dense_1

(None, 38)

19, 494

g(x, y) = ω ∗ f (x, y) =

b a  

ω(d x, dy) f (x − d x, y − dy)

(6)

d x=−a dy=−b

In the above equation, the g(x, y) is the image generated after applying a 5 × 5 neural network that can be obtained by multiplying the kernel filter with the original image obtained in the previous step. As a final optimization process, the max pooling and concordination layer are added to the architecture. In the max pooling layer, the generated image is reduced in such a way the model produces a lightweight output. MAX POOKING = [1 − P\S] − 1

(7)

The above equation creates a poling layer with I as the input shape, P is the pooling window size and S is the stride required for minimising the output and finally these images are concatenated to produce an output from the inception model using a concatenation layer. It mainly has the ability to reduce the dimension by converting larger convolution into small convolutions. In this transfer learning model, InceptionV3 is fine tuned in such a way the model adopts to the new data that are passed for custom training. Layers in the InceptionV3 model is given in Table 3.

5 Result and Discussion The deep learning model is trained with the deep learning algorithms such as CNN, KNN, and InceptionV3. Various aspects of the results are discussed to compare the efficiency and accuracy of the model developed. The accuracy obtained for the classification model by using CNN is 96%, the accuracy obtained by KNN is 96%, and the accuracy obtained by the InceptionV3 algorithm is 98%. By comparing the accuracy obtained by the results, InceptionV3 is the efficient model resulting in better results and accuracy. Thus, the efficient model can be obtained by using the InceptionV3 model. The results obtained by the model are then fed into the unity engine as the ONNX model to display the results in augmented reality. Comparing accuracies obtained using three algorithms is given in Table 4.

An Intelligent System for Plant Disease Diagnosis and Analysis Based … Table 4 Comparing accuracies obtained using three algorithms

357

S. No.

Algorithm

Accuracy (%)

1

CNN

96

2

KNN

96

3

InceptionV3

98

Fig. 12 Training and validation accuracy for InceptionV3

Figure 12 shows the number of neural network layers like flatten, dropout, and dense which are included in the architecture with the output shape and the parameters. The model started training with a maximum of 10 epochs and it produced an accuracy of 96% with minimum loss. Now, the model is able to perform real-time classification of the leaves affected by different diseases. Figure 12 shows the increase in the accuracy of the model with respect to the increase in epochs of the InceptionV3 model.

6 Conclusion The need of detecting plant disease is important as the increase in plant diseases can result in reduced yield of the crop. There is a need for a system to detect the plant disease. This can be achieved by the use of deep learning techniques by developing a model that utilises algorithms that detects the plant diseases based on the image data sets. In this research, deep learning algorithms CNN, KNN, and InceptionV3

358

G. A. Senthil et al.

are used to develop a model that detects and analyses the plant diseases with accurate results. The accuracy obtained by the algorithms is 96% by CNN, 96% by KNN, and 98% by InceptionV3. Thus, the resulting model is very accurate as it compares more than one algorithm and suits the best algorithm to yield better results. By comparing the accuracies obtained, the InceptionV3 model yields high accuracy among the three proposed models. Proposed models with precise and expanded results can be yielded with the improved data set. As the need for a model for plant disease prediction increases, the data set for the precise analysis is required. The efficient model for predicting and analysing the plant diseases can be developed with the improved and expanded data set that possesses the inclusion of various other plant diseases. Also, the system can be improved by reducing the weight of the model by using lightweight algorithms.

7 Future Work In future, the plant disease prediction can be upscaled to multiple plants and trees for creating an all-in-one application that predicts and classifies the disease in plants and trees. Since there is a need for the improved plant disease prediction model, the need for a large data set is required. The data set with extended versions of the plant diseases can satisfy this need. The model weight can be reduced by training with lightweight models to provide an application that can run in any device. Using GPU set-up, the model is trained with more data to provide more accurate results. Further, the accuracy of the model can be increased with experimenting the model with various other deep learning algorithms.

References 1. Chen J, Chen J, Zhang D, Sun Y, Nanehkaran YA (2020) Using deep transfer learning for image-based plant disease identification. Comput Electron Agric 173:105393. ISSN 0168-1699 2. Hu Q, Zhao Y, Wang Y, Peng P, Ren L (2023) Remaining useful life estimation in prognostics using deep reinforcement learning. IEEE Access 11:32919–32934. https://doi.org/10.1109/ ACCESS.2023.3263196 3. Han J-Y, Kwon J-H, Lee S, Lee K-C, Kim H-J (2023) Experimental evaluation of tire tread wear detection using machine learning in real-road driving conditions. IEEE Access 11:32996– 33004. https://doi.org/10.1109/ACCESS.2023.3263727 4. Li X et al (2023) A min-pooling detection method for ship targets in noisy SAR images. IEEE Access 11:31902–31911. https://doi.org/10.1109/ACCESS.2023.3262804 5. Prabha R, Senthil GA, Suganthi P, Boopathi D, Razmah M, Lazha A (2022) Analysis of cognitive emotional and behavioral aspects of Alzheimer’s disease using hybrid CNN Model. In: 2022 International conference on computer, power and communications (ICCPC), Chennai, India, 2022, pp 408–412. https://doi.org/10.1109/ICCPC55978.2022.10072126 6. Senthil GA, Prabha R, Razmah M, Veeramakali T, Sridevi S, Yashini R (2022) Machine learning heart disease prediction using KNN and RTC algorithm. In: 2022 International conference on

An Intelligent System for Plant Disease Diagnosis and Analysis Based …

7.

8.

9.

10.

11.

12. 13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

359

power, energy, control and transmission systems (ICPECTS), Chennai, India, 2022, pp 1–5. https://doi.org/10.1109/ICPECTS56089.2022.10047501 Senthil GA, Prabha R, Razmah M, Sridevi S, Roopa D, Asha RM (2022) A big wave of deep learning in medical imaging - analysis of theory and applications. In: 2022 6th International conference on intelligent computing and control systems (ICICCS), 2022, pp 1321–1327. https://doi.org/10.1109/ICICCS53718.2022.9788412 Roopa D, Prabha R, Senthil GA (2021) Revolutionizing education system with interactive augmented reality for quality education. Mater Today: Proc 46(Part 9):3860–3863. ISSN 22147853. https://doi.org/10.1016/j.matpr.2021.02.294 Prabha R, Senthil GA, Lazha A, VijendraBabu D, Roopa D (2021) A novel computational rough set based feature extraction for heart disease analysis. In: I3CAC 2021: Proceedings of the first international conference on computing, communication and control system, I3CAC 2021, 7–8 June 2021, Bharath University, Chennai, India. European Alliance for Innovation, p 371. https://doi.org/10.4108/eai.7-6-2021.2308575 R-Prabha M, Prabhu R, Suganthi SU, Sridevi S, Senthil GA, Babu DV (2021) Design of hybrid deep learning approach for covid-19 infected lung image segmentation. J Phys Conf Ser 2040(1):012016. https://doi.org/10.1088/1742-6596/2040/1/012016 Zhang X (2023) Improved three-dimensional inception networks for hyperspectral remote sensing image classification. IEEE Access 11:32648–32658. https://doi.org/10.1109/ACCESS. 2023.3262992 Minnie ID, Sun Y (2023) A Deep learning ensemble with data resampling for credit card fraud detection. IEEE Access 11:30628–30638. https://doi.org/10.1109/ACCESS.2023.3262020 Lui DG, Tartaglione G, Conti F, De Tommasi G, Santini S (2023) Long short-term memorybased neural networks for missile manoeuvres trajectories prediction. IEEE Access 11:30819– 30831. https://doi.org/10.1109/ACCESS.2023.3262023 Li M, Ma Z (2023) Accurate prediction of electric fields of nanoparticles with deep learning methods. IEEE J Multiscale Multiphys Comput Techn 8:178–186. https://doi.org/10.1109/ JMMCT.2023.3260900 Lo MH, Liu HL (2023) Utilisation of augmented reality in assisting surgical needle insertion guidance. In: 2023 19th IEEE international colloquium on signal processing and its applications (CSPA), Kedah, Malaysia, pp 59–63.https://doi.org/10.1109/CSPA57446.2023.10087665 Feng B, Wang Z, Wu H, Zhu Z, Wang J, Wang G (2023) Research on the visualization technology of diesel engine acoustic state based on augmented reality. In: 2023 IEEE 3rd international conference on power, electronics and computer applications (ICPECA), Shenyang, China, pp 408–412. https://doi.org/10.1109/ICPECA56706.2023.10075965 Haouchine N, Juvekar P, Nercessian M, Wells WM III, Golby A, Frisken S (2022) Pose estimation and non-rigid registration for augmented reality during neurosurgery. IEEE Trans Biomed Eng 69(4):1310–1317.https://doi.org/10.1109/TBME.2021.3113841 Sereno M, Wang X, Besançon L, McGuffin MJ, Isenberg T (2022) Collaborative work in augmented reality: a survey. IEEE Trans Vis Comput Graph 28(6):2530–2549. https://doi.org/ 10.1109/TVCG.2020.3032761 Wang H, Li G, Ma Z, Li X (2012) Application of neural networks to image recognition of plant diseases. In: 2012 International conference on systems and informatics (ICSE 2012), Yantai, China, pp 2159–2164. https://doi.org/10.1109/ICSAI.2012.6223479 Gobalakrishnan N, Pradeep K, Raman CJ, Ali LJ, Gopinath MP (2020) A systematic review on image processing and machine learning techniques for detecting plant diseases. In: 2020 International conference on communication and signal processing (ICCSP), Chennai, India, pp 0465–0468. https://doi.org/10.1109/ICCSP48568.2020.9182046 Wang Q, He G, Li F, Zhang H (2020) A novel database for plant diseases and pests’ classification. In: 2020 IEEE international conference on signal processing, communications and computing (ICSPCC), Macau, China, pp 1–5. https://doi.org/10.1109/ICSPCC50002.2020.925 9502 Kumar A, Saini R, Patel M, Palaparthy VS (2022) Improved estimation of leaf wetness duration using deep-learning-based time-resolution technique. IEEE Sens J 22(24):24276–24285. https://doi.org/10.1109/JSEN.2022.3220712

360

G. A. Senthil et al.

23. Ahmed S, Hasan MB, Ahmed T, Sony MRK, Kabir MH (2022) Less is more: lighter and faster deep neural architecture for Tomato leaf disease classification. IEEE Access 10:68868–68884. https://doi.org/10.1109/ACCESS.2022.3187203 24. Fatimi EL, Eryigit R, El Fatimi L (2022) Beans leaf diseases classification using MobileNet models. IEEE Access 10:9471–9482. https://doi.org/10.1109/ACCESS.2022.3142817 25. Hassan SM, Maji AK (2022) Plant disease identification using a novel convolutional neural network. IEEE Access 10:5390–5401. https://doi.org/10.1109/ACCESS.2022.3141371 26. Zinonos Z, Skelios S, Khalifeh AF, Hadjimitsis DG, Boutalis YS, Chatzichristofis SA (2022) Grape leaf diseases identification system using convolutional neural networks and LoRa technology. IEEE Access 10:122–133. https://doi.org/10.1109/ACCESS.2021.3138050.

Power Balancing in Four-Wheel Drive EV Using Carrier-Based PWM with Two-Level Inverter Fed Drive M. Chindamani, K. Sudhiksha Darshini, H. Shoaib Yusuf, and E. Naveen

1 Introduction SOC (state of charge) is a vital metric in battery management systems (BMS) as it plays a crucial role in managing the battery’s performance, safety, and lifespan. It represents the remaining energy available in the battery as a percentage of its total capacity. Accurately measuring SOC is essential for maintaining optimal performance by regulating the charging and discharging process. Failure to do so can cause a reduction in capacity, damaging the battery, or even compromising safety[1– 5]. BMS monitors SOC to prevent overcharging or over-discharging, which may lead to thermal runaway, fires, and other hazardous situations. Repeatedly discharging a battery from a low SOC can result in accelerated aging and decreased lifespan. On the other hand, overcharging can also reduce the battery’s lifespan. SOC is a useful metric in optimizing the battery’s energy usage, avoiding wastage, and ensuring efficient charging and discharging. To sum up, SOC is a critical parameter in BMS that helps monitor and manage the battery’s performance, safety, and longevity while maximizing energy efficiency [5–10]. At the same time, balanced power sharing from the battery to the load also has its own priority as well. This can be done with the help of the two-level inverter that is connected between the battery packs and the motor. This dual inverter fed drive can M. Chindamani (B) · K. Sudhiksha Darshini · H. Shoaib Yusuf · E. Naveen Department of Electrical and Electronics Engineering, Sri Ramakrishna Engineering College, Coimbatore, Tamil Nadu, India e-mail: [email protected] K. Sudhiksha Darshini e-mail: [email protected] H. Shoaib Yusuf e-mail: [email protected] E. Naveen e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_28

361

362

M. Chindamani et al.

be controlled with the help of various techniques that includes direct torque control technique, field-oriented control technique, and various PWM techniques. But among these, the PWM technique is considered as the most suitable one due to its ease in controlling the VSI. The main objective of this proposed method is to maintain the battery SOC level and to feed balanced power to the motor by incorporating dual twolevel VSI-equipped BLDC drive. This method introduces a modified carrier-based modulation technique, for providing balanced power supply and to bring down the uniform the SOC of the battery packs and to effectually regulate the power sharing with the help of two inverters, depending upon the need of the drive [10–15]. The operation of the proposed scheme has been validated through both hardware and MATLAB simulation.

2 PWM Control Techniques There are many techniques based on pulse width modulation that are used to control the SOC, and they can be listed as sinusoidal PWM (SPWM), carrier-based PWM (CB-PWM), coupled random PWM, and decoupled pulse width modulation (DCPWM). Figure 1 represents the block diagram of a basic PWM technique. There are two voltage source inverters (VSI1 and VSI2) where one of those ends relates to the battery pack (which are represented as BP1 and BP2) and the other end is with the open-end winded motor. Here, the gate pulses point the signals that are processed from the PWM generator that are given to the VSI switches. So, according to the gate pulses, the switches acts controlling the supply from either of the battery packs (BP1 and BP2).

Fig. 1 Block diagram of PWM control

Power Balancing in Four-Wheel Drive EV Using Carrier-Based PWM …

363

3 Different PWM Techniques 3.1 Sinusoidal PWM Technique In this method, the sinusoidal waveform with different amplitude and frequency is used along with the modulated fixed frequency triangular waves. It provides an improved adaptability in nonlinear control systems and the DC bus voltage availability can be turned out for efficient use in this method.

3.2 Coupled PWM Technique In CPWM technique, the VSI should be in a coupled manner in order to generate the PWM wave, 3-level pulsed are used to gain the switching pulses for the two coupled inverters. In order to generate the reference voltage, the very near voltage vectors should be selected. It has two features that should be strictly followed for this vector selection. 1. At each transition, more than one switching should not be done. 2. The initial and final transition switching states must be the same.

3.3 Decoupled PWM Technique This is the technique to generate the pulses directly for both the inverters in a decoupled way. It consists of a plane which has 8 levels of possible switching states for transition from both the inverters. So, on total, there will be 64 transitional switching states available to generate the voltage vectors that is required for the induction motor.

3.4 Carrier-Based PWM Technique CB-PWM works based on the level shifted carrier waves in the PWM. Initially at the midpoint of the band, the zero crossing of the PWM wave is present which gives uniform power sharing for each cycle by measuring average value. If there is discrepancy in the SOC, then this wave can be mixed with an offset value which shifts on either side depending on up the positive and negative sides of the SOC.

364

M. Chindamani et al.

Fig. 2 Circuit diagram of two-level voltage source inverter

4 Two-Level Inverter Topology The dual inverter circuit consists of 2 inverters where one end in each VSI is connected to the battery packs (BP); here the battery packs denote V dc1 and V dc2 , and each pair of MOSFET switches (S1S2, S3S4, S5S6) is connected to each phase of the open-end motor as shown in Fig. 2. This connection is similarly done for both the inverters. The switches are provided with the PWM signals which are generated from the PI controller using the carrier-based PWM technique. According to this, the switches act that drains the required voltage from the sources V dc1 and V dc2 .

5 Block Diagram for Proposed PWM Technique Figure 3 represents the block diagram of the controller in generating the carrier-based PWM. Initially, during the unbalanced state, the difference between the SOCs of the battery1 and battery 2 is given to the power sharing PI controller. This difference is

Power Balancing in Four-Wheel Drive EV Using Carrier-Based PWM …

365

Fig. 3 Block diagram for PI power sharing controller

used to set up the offset value and to calculate the k limit. The equations used for these two calculations are given below. 2 VR m=√ · [V 3 dc1 + Vdc2 ] In the above equation, m represents the modulation index depending upon which the offset sign is calculated, and V R is the reference voltage which is the common mode injection. The k limit is calculated using, k=

1−m ; 0≤m≤1 2

The offset range with respect to the SOC of battery packs is given is the Table 1. Now, depending upon the offset sign, the carrier wave is generated either upper or lower side in order to achieve the zero crossing. This condition is done because, if the wave is in its zero crossing, the power balancing is naturally achieved. Hence, the carrier wave with the common mode injection creates the power balancing. Now this signal is given to the PWM generator, and the gate pulses are generated which feed the switches of the voltage source inverters. This is how the PI power sharing controller works in balancing the power. (V dc /2). Due to the parallel condition of the VSIs, the voltage from the Vdcs become half (V dc /2). Due to this condition, the VSI will draw more power from the BP that has charge greater than the other one, and Table 1 Offset values

SOC conditions of BPs

Offset sign

SOC1 > SOC2

+ve

SOC1 < SOC2

−ve

366

M. Chindamani et al.

similarly, the BP with lower charge will be discharged at a slower rate. As a result, both the batteries will attain uniform SOC during this repeated process.

6 Simulation and Results The simulation model of this proposed CB-PWM technique was carried out in MATLAB Simulink software and the results are shown below. The dual inverter fed drive which has been modeled along with the PI controller and PWM generator is shown below in Fig. 4

Fig. 4 Simulation circuit of the proposed model

Power Balancing in Four-Wheel Drive EV Using Carrier-Based PWM …

367

Different supply was given to both the inverters, and the DC output line voltage was measured and noticed to be same which ensures the balanced power flow to the motor from the inverters. Here, the two VSIs are connected with the open-end BLDC motor. The PWM generator, which takes input values from the controller, gives the gate pulses as output to the switches of the inverter. And the inverter acts according to the provided gate pulses. The scope1 and scope2 show the gate pulses, and scope3 shows the inverter phase output wave form. The balanced SOCs are measured as V dc1 , and V dc2 and has been shown with the voltage graph. The design of the PI controller used is given below in Fig. 5. This controller takes the difference between the SOCs as an input and calculates the offset values, the V dc mention is the common mode injection input which are used to produce the carrier wave in order to balance the power to the load. The simulation output parameters and their values are represented in Table 2 The simulation output waveforms are given in Figs. 6, 7, and 8. Figure 6 represents the 3-phase inverter output waves after the control technique. Figure 7 shows the gate pulse that has been generated from the PWM generator. And the Fig. 8 shows the motor output characteristics.

Fig. 5 Simulation circuit of PI controller

Table 2 Simulation output values

Parameter

Value

Motor

Outputs

Power

1.1 KW

Poles

4

Voltage (L-L)

400 V

Stator resistance and inductance

7.48 , 0.021 H

Frequency

50 Hz

Load torque

6 N-m

Rated speed

1500 rpm

Dual

Inverter

DC bus voltage (V dc1 , V dc2 )

440 V

Carrier frequency

1 kHz

368

Fig. 6 3 Phase inverter output

Fig. 7 Gate pulses for VSIs

M. Chindamani et al.

Power Balancing in Four-Wheel Drive EV Using Carrier-Based PWM …

369

Fig. 8 Motor output characteristics

7 Hardware Setup and Its Result The experimental setup for the proposed system is shown in Fig. 9. This setup is similar to that of the simulation model except for the V dc supply and the PI controller. Here in hardware prototype, two battery packs (BP1 and BP2) are used along with the PIC controller. The PIC controller is programmed in such a way that the required power to draw the BLDC motor is given as the common mode injection and the same power is shared by both the battery packs. Here, the unbalanced case was set up by disconnecting one of the BPs and checking with the input power for the motor. The 3-phase inverter output was monitored through the DSO in both the cases (Figs. 10 and 11).

370

Fig. 9 Experimental setup for the proposed dual inverter fed drive

Fig. 10 Output of the drive connected with the battery packs

M. Chindamani et al.

Power Balancing in Four-Wheel Drive EV Using Carrier-Based PWM …

371

Fig. 11 3 Phase inverter output shown in DSO

8 Conclusion The CB-PWM technique has been proposed and carried out with the help of PI controller in calculating the offset values and k limits. This calculation was used to generate the carrier waves and PWM gate pulses to the inverter switches. Based on the instantaneous SOC in the battery packs of the two-level VSI, the VSI connected to the battery pack that has SOC greater than the other was noticed to deliver more power to the drive. 1. When modulation index (m) is less than 0.5, the power would be supplied completely by either of the two inverters, depending on the SOC of the V dc supplies. 2. If the SOC of the first V dc supply is greater than that of the second, the entire modulating wave would have moved to the upper carrier wave band, resulting in no power flow from VSI2 to the load motor. 3. If the SOC of the first V dc supply is lower than that of the second, the entire modulating wave would have moved to the lower carrier wave, resulting in no power flow from VSI1 to the BLDC motor. 4. When m is between 0.5 and 1, both the inverters supply according to the power demand of the load motor. However, the inverter which is connected with a higher SOC battery would deliver more power to the BLDC motor.

372

M. Chindamani et al.

References 1. Aktas M, Awaili K, Ehsani M, Arisoy A (2019) Direct torque control versus indirect fieldoriented control of induction motors for electric vehicle applications. Eng Sci Technol Int J 2020(23):1134–1143 2. Al-Mamoori DH, Al-Tameemi ZH, Jumaa FA, Neda OM, Al-Ghanimi MGA (2022) Comparative study of DTC-SVM and FOC-SVM control techniques of induction motor drive. J Eng Appl Sci 2135–2140 3. Ali S, Reddy V, Kalavathi MS (2019).Coupled random PWM technique for dual inverter fed induction motor drive. Int J Power Electron Drive Sys (IJPEDS) 4. Elgbaily M, Anayi F, Alshbib MM (2022) Combined control scheme of direct torque control and field-oriented control algorithms for three-phase induction motor: experimental validation. Mathematics 5. Habib AKMA, Hasan MK, Mahmud M, Motakabber SMA (2021) Energy storage system and balancing circuits for electric vehicle application. Ibrahimya MI. IET Power Electron 6. Hari Krishna U, Rajeevan PP (2021) Development of voltage space vector based switching scheme for dual inverter fed BLDC motor drives with open-end stator windings. In: 2021 IEEE 2nd international conference on smart technologies for power, energy and control (STPEC) 7. Jnayah S, Khedher A (2019) DTC of induction motor drives fed by two and three-level inverter: modeling and simulation. In: 19th International conference on sciences and techniques of automatic control and computer engineering 8. Safsouf K, Sawma J, Kanaan HY (2022) Power sharing algorithm for a dual inverter fed openend winding induction motor in HEVs. In: IEEE 20th international power electronics and motion control conference (PEMC) 9. Rao RK, Srinivas P, Suresh Kumar MV (2014) Design and analysis of various inverters using different PWM techniques. Int J Eng Sci (IJES) ISSN.4 10. Menon R, Azeez, NA, Kadam H, Williamson SS (2018) Carrier based power balancing in three-level open-end drive for electric vehicles. In: IEEE 12th international conference on compatibility, power electronics and power engineering 11. Muduli UR, Behera RK, Hosani KA, Moursi MS (2022) Direct torque control with constant switching frequency for three-to five phase direct matrix converter fed five-phase induction motor drive. IEEE Trans Power Electron 12. Muduli UR, Al Jaafari K, Behera RK, Beig AR, Alsawalhi JY (2021) Predictive control based battery power sharing for four-wheel drive electric vehicle. In: IEEE applied power electronics conference and exposition (APEC) 13. Nalini Devi M, Srinu Naik R (2021) Generalized approach for DCPWM based dual inverter fed OEWIM-DTC drive. Int J Recent Technol Eng (IJRTE) 10(2). ISSN 2277-3878 (Online) 14. Bajjuri NK, Jain AK (2021) An improved dual DTC of double-inverter-fed WRIM drive with reduced torque ripple by emulating equivalent 3L NPC VSC. IEEE Trans Ind Electron 69 15. Petri AM, Petreus D (2020) Vector control of induction machine used in electric vehicle. In: Proceedings of the 43rd international spring seminar on electronics technology (ISSE), Demanovska Valley, Slovakia, pp 14–15

Breast Cancer Detection Using B-Mode and Ultrasound Strain Imaging N. Anusha, Pyata Sai Keerthi, Manyam Ramakrishna Reddy, M. Rishith Ignatious, and A. Ramesh

1 Introduction Ultrasound imaging offers a high sensitivity for identifying breast cancer in women with dense breast tissue who are at high risk when utilized as a diagnostic tool [1]. B-mode breast ultrasonography and breast US elastography have grown in favor for identifying breast cancer [2]. Breast US elastography, a cutting-edge imaging technology developed in the last several decades, differentiates between soft and hard lesions by presenting stiffness in a range of colors. Most of the time, benign soft lesions outnumber malignant hard lesions [2, 3]. With US elastography, strain and shear-wave imaging are achievable. Shear-wave elastography (SWE) ultrasound determines the speed of the shear wave inside the tumor, while strain elastography (SE) ultrasound is a qualitative method that distinguishes stiffness between soft and hard tissue based on tissue displacement [4]. These approaches are used in clinical scenarios because they outperform individual B-mode imaging in terms of diagnostic accuracy. Breast cancer diagnosis in the United States of America (USA) is difficult due to shortage of radiologists and poor imaging quality [5, 6]. These concerns have prompted the development of computer-aided detection (CAD) systems to help in diagnostic decisions [6]. Several researchers have created breast cancer CAD systems using machine learning (ML) approaches. Despite their promising results, traditional ML algorithms are time-consuming and complex to implement [7]. Since 2015, deep learning (DL) algorithms have been increasingly used in breast imaging in the USA [1, 8]. B-mode pictures are classified using the Breast ImagingReporting and Data System (BI-RADS) score [1, 2]. Patterns in lesions, such as their orientation, boundary, and echogenicity, are identified using DL algorithms. The convolutional neural network (CNN) is the most often used DL neural network N. Anusha · P. S. Keerthi (B) · M. R. Reddy · M. Rishith Ignatious · A. Ramesh Department of Computer Science and Engineering, Vidya Jyothi Institute of Technology, Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_29

373

374

N. Anusha et al.

for image classification [9, 10]. A large amount of labeled image data is required to train a CNN; however, gathering a large amount of US breast data takes time and is often impossible. As a result, one method to the problem of inadequate breast US pictures for CNN model training has been proposed: transfer learning (TL) [7, 11, 12]. To convert grayscale B-mode pictures into red, green, and blue images (RGB), Byra et al. [11] utilized Visual Geometry Group (VGG)-19 model that has previously been trained to categorize breast mass. Tanaka et al. [12] accomplished this by integrating the CNN architectures VGG19 and ResNet-152 developed an ensemble TL model that used the average probability value to classify breast US pictures as benign or cancerous. Xception, a convolution neural network developed by Zhang and colleagues, was utilized in a study of how ultrasound pictures identify benign and malignant breast tumors. Eroalu et al. [13] combined AlexNet, MobileNetV2, and ResNet50 models to produce a hybrid CNN system for detecting breast cancer lesions. The most important features are then selected utilizing the lowest redundancy, maximum relevance feature selection approach. Zhang et al. [14] developed a DL approach based on point-wise gated and restricted Boltzmann machines that may discriminate between breast lesions and masses. Fujioka et al. [15] utilized CNN models to differentiate between benign and malignant breast cancers using SWE pictures.

2 Literature Survey To understand the need for this research and to get a better idea of the DL models and datasets that can be used, research has been carried out on previous published works in the literature and is given in Table 1. The most prevalent DL brain structure used for organizing images is CNN. To launch a CNN, it needs a lot of labeled datasets, but gathering a lot of US breast data takes time and is frequently impossible. The use of TL has been presented as a possible solution to the issue of CNN models missing breast US pictures. Byra et al. [11] used red, green, and blue (RGB) instead of grayscale B-mode images, established a TL-based approach for characterizing breast masses and established a Visual Geometry Gathering (VGG)-19 model that had previously been built. This research aims to a special method for distinguishing between benign and serious tumors by combining B-mode and SE breast US images with gathered TL. To extract attributes and achieve more precise results, this research used a stack- wise combination of B-mode and SE images. In this case, two characterization models are combined into one superior classifier for astounding expectation execution. The ImageNet dataset is utilized to prepare the CNN model [11, 17]. In this research, an ensemble model which is a combination of ResNet [18] and AlexNet is proposed. ResNet was developed by He et al., which is a deep neural network technique. AlexNet was developed by Krizhevsky et al. [17] is one of the few CNN models that is well known, accurate at grouping clinical images, and has a low false-positive rate. This ensemble DL model increases the detection efficiency by

Breast Cancer Detection Using B-Mode and Ultrasound Strain Imaging

375

Table 1 Related work Author(s) and Year Published

Diagnosis/ Techniques

Dataset

Description

Metrics used for measurement of tumors

H. Madjar 2010 [1]

Ultrasound and curative diagnosis

Custom data

Detection and differentiation of breast lesions done through breast US

BI-RADS and ACR breast density

A. Elkharbotly, Receiver H. M. Farouk operating et al. 2015 [2] characteristic, negative predictive value (NPV), area under curve (AUC)

Kaggle

Results in mammography revealed thick glandular breast tissue

B-mode ultrasonography, ultrasound elastography (UE), BI-RADS

M. H. Yap et al. 2017 [5]

U-net, FCN-AlexNet, Le-Net and TL approach of CNN

Dataset A 2001(didactic media file (B&K Medical Panther, 306 images)), Dataset B (2012) (UDIAT diagnostic centre of the Parc-Taulí Corporation, Sabadell (Spain), 163 images)

Compares and contrasts two conventional ultrasound image datasets (dataset A and dataset B) acquired from different ultrasound systems

F-measure

T. Fujioka et al. 2020 [15]

CNN, ultrasound shear-wave elastography, Xception, InceptionV3, InceptionResNe tV2, DenseNet121, DenseNet169, NASNetMobile The ultrasonic SWE images converted to joint photographic experts

158 images of benign masses and 146 images of malignant masses of 153 and 141 patients, respectively The ultrasonic SWE images converted to joint photographic experts group

Malignant masses were larger than benign masses and patients with malignant masses were significantly older than those with benign masses

Sensitivity, specificity, and AUC of SWE and CNN

(continued)

376

N. Anusha et al.

Table 1 (continued) Author(s) and Year Published

Diagnosis/ Techniques

Dataset

Description

Metrics used for measurement of tumors

G. Ayana et al. 2021[16]

AlexNet,VGG19, VGG16, VGG128, VGG-Net and InceptionV3

ImageNet dataset 163 US images from UDIAT, 306 US images from BK medical panther and Hawk These are the images of benign and malignant breast masses

Transfer learning is applied to improve the performance of target learners by transferring the knowledge of US breast image classification and detection from the perspective of models

Feature extraction and fine tuning the TL approaches

assisting radiologists in identifying breast tumors in US images more accurately. To know which algorithm would perform well to classify or function for the better vision of this research used loss–accuracy curves for better understanding and prediction of tumor in breast irrespective of skin type and texture.

3 Methodology Breast cancer is a preventable disease, if and only if it is detected at early stages (stage 1 or 2) [7]. But it is difficult to diagnose by using the traditional methods in early stages for the dense skin textured breast [1]. To solve this problem, an efficient model is proposed to help doctors and patients to diagnose the tumor particles in the breast. In addition, a web application is developed by deploying the proposed ensemble model. The registered users of this web application can upload the US breast images and can get initial consultation. The ensemble or hybrid model proposed and used in this research is selected through literature survey (refer Table 1). This model is a combination of AlexNet and ResNet [18] models that will display accuracy of the tumor detection in accordance with the datasets [11, 17]. This model helps in diagnosing cancer in the early stages. As illustrated in the framework (refer Fig. 1), the pretrained neural network models (AlexNet and ResNet models) of the system are loaded in step-1. In step- 2, these AlexNet and ResNet models are finely tuned using US images. After tuning, some insignificant classification layers discovered in step-2 will be removed in step-3. In step-4, AlexNet and ResNet models are integrated into an ensemble model that helps the system work efficiently. Ensembling makes a combinational work, which makes a system to use the functionalities and advantages of both ResNet and AlexNet models

Breast Cancer Detection Using B-Mode and Ultrasound Strain Imaging

377

Fig. 1 Overall process applied

in one stake. After creating the ensemble model, once again fine tuning is performed in step-5 to check whether any insignificant layers are raised. The system (refer Fig. 1) has been trained and tested with different groups of US datasets with different tumors collected from Kaggle dataset [11, 17]. Sample US datasets of normal breast, benign, malignant, breast tumors used to train and test the DL models in this research (refer Figs. 3a–c). SqueezeNet [15], VGG19 [17], VGG16 [16], Inception ResNetV2 [15], MobileNet [11],MobileNetV2 [11], DenseNet 121, 169, 201 [5], ResNet 50V2, 101V2, 152V2 [11], AlexNet, Ensemble [19] are the various algorithms worked in this research to build the system in Fig. 1. SqueezeNet is an 18-layer deep convolutional neural network. The VGG model’s VGG19 variation has 19 layers in total: 16 convolution layers, 3 completely associated layers, 5 MaxPool layers, and 1 SoftMax layer. VGG11, VGG16 are variations of VGG [16, 17]. Using more than one million images from the ImageNet collection, a CNN known as Inception ResNet-v2 had been developed. Photographs can be sorted into one of 164 distinct thing classes by the 164-layer organization, including console, mouse, pencil, and a few animals [15]. MobileNet-v2 is a CNN with a depth of 53 layers. It is based on an inverted residual structure, with the bottleneck layers acting as the residual connections [11]. DenseNet, a CNN, connects each layer to every layer below it. The primary layer is connected to levels 2, 3, 4, etc., while the subsequent layer is linked to the layers 3, 4, etc., in DenseNet [5, 15]. On the ImageNet dataset, the superior Res-Net50V2 outperforms the initial ResNet50 and ResNet101. In ResNet50V2, the spread plan of the connections between blocks is altered. In addition, Res- Net50V2 performs admirably on the ImageNet dataset [11, 15]. CNN model is most often used DL network for image classification and it also helps in differentiating the benign and malignant tumor using SWE pictures [5, 15]. ResNet101 is a 101-layer CNN. The ImageNet data collection includes a pretrained depiction of the data that has trained over 1,000,000 image data. By learning the remaining portrayal works instead of the sign portrayal straightforwardly, ResNet

378

N. Anusha et al.

can make a very profound organization with up to 152 layers [9, 11]. ResNet offers the skip association which consistently fits the contribution from the past layer to the resulting layer. The initial five layers of AlexNet were convolutional, some portions of them were trailed by max-pooling layers, and the last three layers were totally connected. It utilized the non-immersing Rectified Linear Units (ReLU) enactment capability, which beat tanh and sigmoid concerning preparing execution [16].

3.1 Ensemble Model Utilizing an assortment of displaying strategies is the most common way of building various models to foresee an outcome. The ensemble model then, at that point, consolidates each base model into a solitary in general expectation for the unseen information [5, 19]. As depicted in Fig. 2, different phases of preprocessing are done on the ImageNet dataset [11, 17] to train and test the model. Generation of image data, data resizing or scaling and converting an image to an array are the different pre- processing phases applied on the input considered dataset. Due to the effectiveness of learning techniques, picture scaling is a crucial preprocessing step in the object identification process. The input dataset is scaled to 128 * 128 pixels for this research [11, 17]. Then, the system allows to build the model with different algorithms like SqueezeNet, VGG19, VGG16, Inception, ResNetV2, MobileNet, MobileNetV2, DenseNet121, DenseNet169, DenseNet201, ResNet50V2, ResNet101V2, ResNet 152V2, Alexnet, Ensemble, and ResNet50V2. These algorithms are implemented using the tool Python 3.11. The performances of these algorithms are assessed with a loss–accuracy curve. Based on the results the algorithm with good performance will be deployed into web application. The registered user can upload the US breast images to this application and can get the results predicted by the model. The web application is developed using HTML, CSS.

3.2 Loss–Accuracy Curve The accuracy is the proportion of successful outcomes achieved using the validation dataset, whereas the loss is the total of mistakes based on training and testing datasets. So, it has lower outcomes in the train datasets but better ones in the testing dataset if loss is falling and accuracy is growing.

Breast Cancer Detection Using B-Mode and Ultrasound Strain Imaging

379

Fig. 2 Overall system architecture

Fig. 3 a–c Samples of the US dataset collected from Kaggle with normal breast, benign and malignant tumors, respectively

4 Results and Discussions Normal stage, benign, and malignant tumors are the three kinds of US breast inputs considered in this research to train and test the DL techniques are depicted in Figs. 3a– c. The downloaded 2546 US breast images from Kaggle dataset [11, 17] are categorized into normal breast, benign2, and malignant tumor classes using the Image

380

N. Anusha et al.

Fig. 4 Preprocessing of input

Table 2 Accuracy–loss

Algorithms

Accuracy

Loss

ResNet50

0.56

No loss

InceptionV3

0.90

51.7

VGG16

0.82

No loss

VGG19

0.56

No loss

InceptionResNetV2

0.90

No loss

CNN

0.64

1.92

MobileNetV2

0.42

1.06

DenseNet201

0.56

No loss

ResNet50V2

0.27

No loss

ResNet152V2

0.56

No loss

Ensemble

0.96

1.12

CNN

0.64

1.92

Data Generator module in Python. The resultant categorized images are count 1578 as given in Fig. 4. Then, these resultant categorized images are rescaled to 128*128 pixel size by using the same Python module. SqueezeNet, VGG19, VGG16, Inception, ResNetV2, MobileNet, MobileNetV2, DenseNet121, DenseNet169, DenseNet201, ResNet50V2, ResNet101V2ResNet 152V2, AlexNet, MobileNet + AlexNet, and ResNet50V2 models are trained and tested by dividing the 1578 categorized images to 80:20 ratio [20]. The obtained accuracy and loss metrics of these algorithms are given in Table 2. It can be noticed from Table 2 that compared to other algorithms, the ensemble model gives higher accuracy of 0.96 and minimal loss of 1.12. The loss and accuracy curves of ensemble models for training, validation, and testing are given in Figs. 5 and 6, respectively. These are (refer Figs. 5 and 6) generated based on the good and bad prediction on the model trained using Adam optimizer. Loss–accuracy curves are plotted using Keras, Pyplot libraries in Python. In the loss curve (refer Fig. 5), the y-axis represents loss proportion of the model and x-axis represents training and validations epochs. In the accuracy curve (refer Fig. 6), the y-axis represents the accuracy proportion of the model and x- axis represents training and validations epochs of the dataset that is preprocessed. Orangecolored curve represents the validation and blue-colored curve represents training accuracy and loss, respectively.

Breast Cancer Detection Using B-Mode and Ultrasound Strain Imaging

381

Fig. 5 Loss curve of ensemble algorithm for training and validation

The highest accuracy and low loss obtained ensemble model algorithm is deployed to the breast cancer analysis web application (refer Fig. 7). The user can sign up to this application by entering the required details (refer Fig. 8). Through this web application, any registered user can login (refer Fig. 9) by giving the credentials. Then, the user can upload the US breast images by selecting the choose file and upload options (refer Fig. 10). The ensemble model diagnoses the input image uploaded by the user, predicts, and gives the result as normal breast or benign or malignant tumors (refer Fig. 11) based on the diagnosis.

382

Fig. 6 Accuracy curve of ensemble algorithm for training and validation

Fig. 7 Home page of the breast cancer analysis web application

N. Anusha et al.

Breast Cancer Detection Using B-Mode and Ultrasound Strain Imaging

Fig. 8 Registration page of application

383

384

Fig. 9 Login page of the application

Fig. 10 US breast scan images uploading by the registered user

N. Anusha et al.

Breast Cancer Detection Using B-Mode and Ultrasound Strain Imaging

385

Fig. 11 Prediction of tumor

5 Conclusion In this research, the clinical B-mode and SE pictures create a TL-based computeraided design framework for identifying breast tumors as normal or benign or hazardous. The experimental results demonstrate that the proposed ensemble model outperforms individual models trained solely using B-mode or SE images. This is done by synchronizing the AlexNet and ResNet CNN hybrid models. To accomplish this model, ImageNet data have been used which helps in increase in the performance of the model. The hybrid CNN models detect a wide range of properties in ultrasound image data. The ensemble model which is created by this system will help detect even in the dense breast texture and predict breast cancer more accurately and efficiently. The system functioned more efficiently with the MobileNet DL algorithm. To make it more convenient for the user, this algorithm is linked to the front-end web application page where clients uploading the input will predict which kind of cancer is present in the US image.

References 1. Madjar H (2010) Role of breast ultrasound for the detection and differentiation of breast lesions. Breast Care 5(2):109–114 2. Elkharbotly A, Farouk HM (2015) Ultrasound elastography improves differentiation between benign and malignant breast lumps using B-mode ultrasound and color Doppler. Egyptian J Radiol Nucl. Med 46(4):1231–1239 3. Zahran MH, El-Shafei MM, Emara DM, Eshiba SM (2018) Ultrasound elastography: How can it help in differentiating breast lesions? Egyptian J Radiol Nucl Med 49(1):249–258 4. Zhang X, Liang M, Yang Z, Zheng C, Wu J, Ou B, Li H, Wu X, Luo B, Shen J (2020) Deep learning-based radiomics of B-mode ultrasonography and shear- wave elastography: improved performance in breast mass classificatio. Front Oncol 10:1621

386

N. Anusha et al.

5. Yap MH, Pons G, Marti J, Ganau S, Sentis M, Zwiggelaar R, Davison AK, Marti R, Yap MH, Pons G, Marti J, Ganau S, Sentis M, Zwiggelaar R, Davison AK, Marti R (2017) Automatessd breast ultrasound lesions detection using convolutional neural networks. IEEE J Biomed Health Inform 22(4):1218–1226 6. Cai L, Wang X, Wang Y, Guo Y, Yu J, Wang Y (2015) Robust phasebased texture descriptor for classification of breast ultrasound images. Biomed Eng OnLine 14(1):26 7. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345– 1359 8. Guo R, Lu G, Qin B, Fei B (2018) Ultrasound imaging technologies for breast cancer detection and management: a review. Ultrasound Med Biol 44:37–70 9. Bardou D, Zhang K, Ahmad SM (2018) Classification of breast cancer based on histology images using convolutional neural network. IEEE Access 6:24680–24693 10. Zhou Y, Xu J, Liu Q, Li C, Liu Z, Wang M, Zheng H (2018) A radiomics approach with CNN for shear-wave elastography breast tumor classification. IEEE Trans Biomed Eng 65(9):1935–1942 11. Byra M, Galperin M, Fournier HO, Olson L, O’Boyle M, Comstock C, Andre M (2019) Breast mass classification in sonography with transfer learning using a deep convolutional neural network and color conversion. Med Phys 46(2):746–755 12. Tanaka H, Chiu S-W, Watanabe T, Kaoku S, Yamaguchi T (2019) Computer-aided diagnosis system for breast ultrasound images using deep learning. Phys Med Biol 64(23):235013 13. Eroálu Y, Yildirim M, Çinar A (2021) Convolutional neural networks-based classification of breast ultrasonography images by hybrid method with respect to benign, malignant, and normal using mRMR. Comput Biol Med 133:104407 14. Zhang Q, Xiao Y, Dai W, Suo J, Wang C, Shi J, Zheng H (2016) Deep learning-based classification of breast tumors with shear-wave elastography. Ultrasonics 72:150–157 15. Fujioka T, Katsuta L, Kubota K, Mori M, Kikuchi Y, Kato A, Oda G, Nakagawa T, Kitazume Y, Tateishi U (2020) Classification of breast masses on ultrasound shear wave elastography using convolutional neural networks. Ultrason Imag 42(4–5):213–220 16. Ayana G, Dese K, Choe SW (2021) Transfer learning in breast cancer diagnosis via ultrasound imaging. Cancers 13(4):738 17. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks Commun ACM 60(6):84–90 18. Anusha N, Gupta S, Naidu NY, Ruchitha M, Pandey R (2023) Face mask and social distance detection using deep learning models. In: Computational vision and bio-inspired computing in Proceedings of ICCVBIC 2022, AISC, 1439, pp 461–484 19. Zhang X, Li H, Wang C, Cheng W, Zhu Y, Li D, Jing H, Li S, Hou J, Li J, Li Y, Zhao Y, Mo H, Pang D (2021) Evaluating the accuracy of breast cancer and molecular subtype diagnosis by ultrasound image deep learning model. Front Oncol 11:606 20. Cepeda S, García SG, Arrese I, Pérez GF, Casares MV, Puentes MF, Zamora T, Sarabia R (2021) Comparison of intraoperative ultrasound B-mode and strain elastography for the differentiation of glioblastomas from solitary brain metastases. An automated deep learning approach for image analysis. Front Oncol 10:3322

Prompt Engineering in Large Language Models Ggaliwango Marvin , Nakayiza Hellen , Daudi Jjingo , and Joyce Nakatumba-Nabende

1 Introduction 1.1 Overview of Large Language Models (LLMs) A Large Language Model (LLM) is an AI algorithm that employs deep learning and vast datasets to comprehend, summarize, generate, and predict new content. LLMs are trained on extensive amounts of text using self-supervised learning and excel at a wide range of tasks [1]. LLMs are constructed using neural networks and multiple parameters usually billions of weights and more. These models are pre-trained on vast amounts of data to help them understand the complexity and relationships within language. By using techniques such as fine-tuning, in-context learning, and zero-/one-/few-shot learning, these models can be tailored for specific tasks [2]. A Large Language Model (LLM) is essentially a transformer-based neural network that predicts the text that is likely to come next, therefore its sophistication and performance can be evaluated by the number of parameters it possesses. The model’s parameters are the factors it takes into account when generating output [3]. LLMs have generated much excitement recently due to their impressive capabilities as general-purpose computers when conditioned on natural language instructions. However, the effectiveness of the model is heavily influenced by the quality of the prompt used to guide it, and most successful prompts have been created manually by humans [4]. This has led to the rise of prompt engineering as an important skill set for NLP and AI engineers, conversational AI researchers, and most importantly,

G. Marvin (B) · D. Jjingo · J. Nakatumba-Nabende Makerere University, Kampala, Uganda e-mail: [email protected] N. Hellen Muni University, Muni, Uganda © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_30

387

388

G. Marvin et al.

normal information seekers in various domains like education and health care were improved efficiency in the use of LLMs which is more valued.

1.2 Importance of Prompt Engineering for LLMs Prompt engineering is the process of designing and refining input queries, or “prompts,” to elicit desired responses from Large Language Models (LLMs). Prompts are crucial in guiding LLMs to generate useful and relevant outputs [5]. By understanding how to create effective prompts, information seekers and developers can improve LLM performance, explore new applications, and save valuable time and resources. This can be achieved by leveraging good quality prompts to guide the model toward generating more accurate and useful outputs. This in turn allows information seekers and developers to unlock the full potential of LLMs. Prompt engineering permits exploration of new applications for LLMs, saves valuable time and resources, and provides a form of programming that can customize the outputs and interactions with an LLM [6]. It offers reusable solutions to common problems faced in output generation and interaction when working with LLMs. By facilitating the development of more accurate, context-specific, and nuanced responses from LLMs, prompt engineering supports the advancement of research and discoveries in the field of AI most especially conversational AI. Prompt engineering provides a framework for documenting patterns for structuring prompts to solve a range of problems and allows information seekers and developers to adapt prompts to different domains. It enables the combination of multiple prompt patterns to improve LLM outputs and facilitates the transfer of knowledge between information seekers and developers working with LLMs [7]. Prompt engineering supports the development of more sophisticated and effective LLM applications, helps developers to better understand the behavior and capabilities of LLMs, and enables information seekers and developers to steer LLMs toward truthfulness and/or informativeness. It also improves few-shot learning performance by prepending optimized prompts to standard in-context learning prompts and facilitates the development of more effective chatbots, virtual assistants, domain-specific prompt engineering tools, and other conversational AI systems [8, 9]. Therefore, prompt engineering supports the advancement of natural language processing (NLP) tasks by improving LLM performance. Prompt engineering is likely to play an increasingly important role in unlocking the full potential of Large Language Models (LLMs) in the near future. If information seekers and developers and empowered to generate specific language output quickly and accurately, prompt engineering will become marketable element for increasing efficiency and streamlining business operations across various domains. This rapidly growing field may also lead to new job opportunities for those skilled in prompt engineering. As information seekers and developers continue to experiment with more sophisticated prompts [10, 11], we are likely to see the development of more efficient and understandable user interfaces for controlling LLM outputs. This enhanced

Prompt Engineering in Large Language Models

389

control over LLM outputs will enable developers to fine-tune the generated content and unlock new applications for LLMs that were previously impossible.

1.3 Research Objective and Motivation The problem that this chapter addresses is the need for effective prompt engineering to unlock the full potential of Large Language Models (LLMs). LLMs have generated much hype in the recent months due to their impressive capabilities as generalpurpose computers when conditioned on natural language instructions [12]. However, the effectiveness of the model is heavily influenced by the quality of the prompt used to guide it, and most successful prompts have been created manually by humans. This has led to the rise of prompt engineering as an important skill set for information seekers, AI engineers and researchers to improve and efficiently use LLMs. Therefore, this study provides understanding of prompt engineering, presents an overview of the latest prompting techniques, and provides demonstrations and exercises to practice different prompting techniques. It also discusses the current and future trends of research on LLMs and prompt engineering, including the rise of automatic instruction generation and selection methods for prompt engineering. By understanding how to create effective prompts, information seekers and developers can improve LLM performance, explore new applications, and save valuable time and resources.

2 Prompt Engineering What are Prompts? A prompt is a text-based input that is fed to a language model to guide its output. A prompt can be audio but, in this case, the audio input would be transcribed into text and fed to the language model as a text-based prompt. The language model would then process the text-based prompt and generate an output based on the instructions and context provided in the prompt [13]. The primary purpose of a prompt is to provide the language model with instructions and context for achieving a desired task. Prompt engineering is the practice of developing and optimizing prompts to efficiently use language models (LMs) for a variety of applications. A prompt is composed of an instruction and can have input data, context, and some output indicator (Table 1).

390

G. Marvin et al.

Table 1 Examples of prompts for large language models (LLMs) Prompt

Instruction

Input data

Context

Output indicator

Paint a tall African man dancing to cool Ugandan folk songs

Paint

NA

Tall African man dancing to cool Ugandan folk songs

NA

“Write a science fiction Write a science Babies Set in a future short story set in a future fiction short story where humans where humans particularly have colonized babies have colonized Mars Mars, exploring the ethical implications of terraforming and the potential conflicts between Earth and Martian societies.” (Creative writing)

Explore the ethical implications of terraforming and potential conflicts between Earth and Martian societies

“Write a short paragraph Write a short explaining the Pythagorean paragraph theorem and providing an example to demonstrate its use.” (Mathematics)

NA

Explaining a mathematical concept

Provide an example

“Write a philosophical essay exploring the concept of free will, incorporating arguments from determinism, compatibilism, and libertarianism.” (Philosophy)

Write a philosophical essay

NA

Explore the concept of free will

Incorporate arguments from determinism, compatibilism, and libertarianism

“Generate a technical report on the feasibility of using nuclear fusion as a sustainable energy source, including an analysis of current technological limitations and potential solutions.” (Engineering)

Generate a technical report

NA

Feasibility of using nuclear fusion as a sustainable energy source

Include an analysis of current technological limitations and potential solutions

“Write a historical fiction Write a historical NA novel set during the French fiction novel Revolution, incorporating accurate historical details and exploring themes such as social inequality, political upheaval, and personal agency.” (Creative writing)

Set during the French revolution

Incorporate accurate historical details and explore themes such as social inequality, political upheaval, and personal agency (continued)

Prompt Engineering in Large Language Models

391

Table 1 (continued) Prompt

Instruction

Input data

Context

Output indicator

NA

Current state of research on artificial intelligence and its potential impact on society

Include an analysis of ethical considerations and potential risks

“Write a legal brief arguing Write a legal brief NA for the protection of privacy rights in the digital age, incorporating relevant case law and constitutional principles.” (Law)

Argue for the protection of privacy rights in the digital age

Incorporate relevant case law and constitutional principles

“Generate a detailed analysis of the environmental impact of large-scale deforestation, including effects on biodiversity, carbon sequestration, and soil erosion.” (Environmental science)

Generate a detailed analysis

Environmental impact of large-scale deforestation

Include effects on biodiversity, carbon sequestration, and soil erosion

“Write a screenplay for a romantic comedy set in New York City, incorporating elements of social commentary and exploring themes such as identity and self-discovery.” (Entertainment)

Write a NA screenplay for a romantic comedy

“Generate a Generate a comprehensive review of comprehensive the current state of review research on artificial intelligence and its potential impact on society, including an analysis of ethical considerations and potential risks.” (Academia)

“Generate a Generate a comprehensive treatment comprehensive plan for someone with treatment plan chronic pain, incorporating both traditional and alternative therapies and taking into account factors such as comorbid conditions and medication interactions.” (Medicine)

NA

NA

Set in New York Incorporate City elements of social commentary and explore themes such as identity and self-discovery

For someone with chronic pain

Incorporating both traditional and alternative therapies and taking into account factors such as comorbid conditions and medication interactions

392

G. Marvin et al.

2.1 The Process of Prompt Engineering Steps involved in creating effective prompts: 1. Defining the goal: The first step in creating an effective prompt for a language model is to clearly define the goal of the prompt. What do you want the model to generate as a response to the prompt? Having a clear goal in mind enables you to focus your efforts and create a more effective prompt [14]. This is important because it provides direction and purpose for the rest of the prompt engineering process. 2. Understanding the model’s capabilities: For every language model selected for prompting, it is very important to understand its limitations and abilities. An example is clarity on what kind of responses the model generates (text, audio, images, etc.) The strengths and weaknesses of a given model guide the synthesis of appropriate prompts that are effectively compatible with the model’s capabilities and bypassing such limitations [15]. Fortunately, LLMs today have been built to optimize their performance by making use of some online and openly available tools via relevant APIs, e.g., access to diagraming tools. 3. Choosing the right prompt format: There is a huge impact on the quality of LLM generated responses based on prompt formats. Clear, precise, and concise prompting formats empower the model with all necessary details for generating coherent responses [16]. Selection of appropriate prompt formats enhance natural language understanding for LLMs hence improving quality of response given. 4. Providing Context: This is one of the most under estimated stages yet most influential in terms of information accuracy generated by the models. Without context LLMs provide generic coherent and relevant responses [17]. But with limited contexts, LLMs provide misinformation for less represented context within existing data. Context may include additional relevant/related information about the topic, setting or characters for inference within the prompt. Context provides deeper metrics for the LLM to understand the prompt and desired output. 5. Testing and Refining: For every synthesized prompt, it is vital to test it with appropriate LLMs based on the defined goal [18]. The generated LLM response can then be used to evaluate and refine the prompt before its output in a production setting. The whole aim of this testing is to obtain feedback for objective refining before making any data-driven decision about prompt optimization/ improvement. While the above procedure gives a structured approach to creating optimal and efficient prompts for LLMs like ChatGPT, information seekers and developers may choose to deviate from some of the above steps. This may usually happen if one has the expertise and extensive experience with a particular prompt engineering tool or LLM. In this case, the information seeker or developer can majorly rely on their creativity and intuition to synthesize prompts. The divergence may also happen cases of LLM testing and highly experimental circumstances. In cases of design and innovative thinking, validation and verification, information seekers may

Prompt Engineering in Large Language Models

393

take exploratory approaches to prompt engineering. Please note that the synthesized procedure /framework above is not a set of rules and restrictions and not mandatory to be followed religiously. However, it is a guide that can help prompt engineers create effective prompts. Ultimately, the most effective approach to prompt engineering will depend on the specific goals, audience, and language model being used. A skilled prompt engineer will use their judgment and expertise to determine the best approach for each situation.

3 Prompt Engineering Techniques 3.1 Techniques for Optimizing Prompts Optimizing prompts for language models like ChatGPT involves a combination of creativity, intuition, and data-driven approaches. By using the techniques described below, you can create effective and engaging prompts that help users achieve their goals; Clearly define the goal of the prompt, understand the capabilities and limitations of the language model, choose a format that is clear and concise, provide context to help the model generate more relevant responses, test and refine your prompts to improve their effectiveness, use engaging language and visuals to capture the user’s attention, Tailor your prompts to the interests and motivations of your audience, experiment with different formats and styles to see what works best, gather feedback from users to improve your prompts, use data-driven approaches to refine and optimize your prompts, keep your prompts focused and on-topic, avoid using overly complex or technical language, use examples and analogies to help explain complex concepts, break down complex topics into smaller, more manageable chunks, use humor and creativity to make your prompts more engaging, leverage the power of storytelling to make your prompts more memorable, use repetition and reinforcement to help users retain information, provide clear and concise instructions for the user, use visual aids such as images and diagrams to support your prompts, provide additional resources for users who want to learn more, use a conversational tone to make your prompts more approachable, encourage user interaction and engagement with your prompts, use personalization to tailor your prompts to individual users, continually update and refresh your prompts to keep them relevant, and most importantly, monitor the performance of your prompts and make adjustments as needed. It is important to note that optimizing prompts for language models like ChatGPT is an ongoing process. As language models evolve and improve, and as user needs and expectations change, it is important to continually monitor and update your prompts to ensure that they remain effective and relevant most especially using data-driven techniques. This can involve gathering data on how users interact with your prompts, analyzing the responses generated by the language model, and using this information to make data-driven decisions about how to improve your prompts. By continually

394

G. Marvin et al.

testing and refining your prompts, you can optimize their performance and achieve better results.

3.2 Advanced Techniques for Prompt Engineering Prompt engineering can be used to create effective prompts for a wide range of natural language processing tasks, including Text Summarization, Question Answering, Text Classification, Role Playing, Code Generation, Reasoning, Text Generation, Text Translation, Sentiment Analysis, Named Entity Recognition, Text Completion, Dialog Generation, Paraphrasing, Text Simplification, Text-to-Speech, Speechto-Text, Image Captioning, Text-based Game Playing, Poetry Generation, Lyric Generation, Story Generation, Joke Generation, Recipe Generation, Email Generation, Resume Generation, News Article Generation, Text-based Adventure Game Playing, Text-based Puzzle Solving, Text-based Strategy Game Playing, Text-based Simulation Game Playing, Text-based Interactive Fiction, Text-based Virtual Assistant, Text-based Customer Service, Text-based Personal Shopping Assistant, Textbased Personal Finance Assistant, and Text-based Personal Health Assistant among others [1–30]. Accomplishing such tasks often requires basic and advanced prompt engineering techniques. These kinds of techniques are explained in Table 2. The prompt engineering techniques can be used to improve the performance of language models like ChatGPT, allowing them to generate more coherent, relevant, and sophisticated responses to user inputs. Besides those, there are some much more advanced prompts engineering techniques with include; Automatic Instruction Generation: It refers to the use of technology to automatically generate instructions or prompts. This can be done in various contexts, such as creating how-to guides or generating prompts for machine learning models. One example of an advanced technique for automatic instruction generation tool is the Automatic Prompt Engineer (APE), which was proposed for generating and selecting instructions for Large Language Models. APE considers the instruction as a “program” and optimizes it by searching through a pool of instruction candidates suggested by a Large Language Model to maximize a chosen score function [37]. The effectiveness of the chosen instruction is then assessed by measuring the zero-shot performance of another Large Language Model that follows the selected instruction. Experiments on 24 NLP tasks showed that instructions automatically generated using APE outperformed previous benchmarks and achieved better or similar performance to instructions created by human annotators on 19 out of 24 tasks [38]. This demonstrates the potential of automatic instruction generation as an advanced prompt engineering technique. Program Synthesis for Prompt Engineering: Classical program synthesis and human approach prompt engineering is what inspires this technique. It entirely relies on automatically generating instructions or prompts by using techniques from program synthesis [38]. APE is still a good example for selected LLMs. Here, it searches and optimized instruction as a program within a pool of LLMs while

Prompt Engineering in Large Language Models

395

Table 2 Prompt engineering techniques Technique

Description

Most recommended large language models (LLMs)

Few-shot prompts [19]

It involves using a small number of carefully crafted prompts to enable a language model to perform a new task or adapt to a new domain [19]

GPT-3, GPT-4

Chain-of-thought (CoT) prompting [20]

Using a sequence of interconnected prompts to guide the language model’s responses in a coherent and logical manner [20]

GPT-3, GPT-4

Self-consistency [21]

Using prompts to encourage the GPT-3, GPT-4 language model to generate responses that are consistent with its previous responses [21]

Knowledge generation prompting [22]

Using prompts to encourage the language model to generate new knowledge or insights based on its existing knowledge and understanding of the world [23]

Reasoning and acting (ReAc) [24]

Using prompts to encourage the GPT-3, GPT-4 language model to reason about a given situation and generate appropriate actions or responses [24]

Contextual prompting [25]

Providing additional context to the language model to help it generate more coherent and relevant responses [25]

Transformer-based models such as GPT-3, GPT-4, and BERT

Dynamic prompting [26]

Dynamically adjusting the prompt based on the language model’s previous responses to improve its performance over time [27]

Transformer-based models such as GPT-3, GPT-4, and BERT

Multi-modal prompting [28]

Using multiple modalities, such Multi-modal models such as as text and images, to provide CLIP and DALL-E richer and more detailed prompts to the language model [29]

Adversarial prompting [30] Using adversarial examples to test and improve the robustness of the language model’s responses to prompts [30] Transfer learning prompting [31]

GPT-3, GPT-4

Transformer-based models such as GPT-3, GPT-4, and BERT

Using transfer learning to adapt a Transformer-based models pre-trained language model to such as GPT-3, GPT-4, and new tasks or domains using BERT carefully crafted prompts [31] (continued)

396

G. Marvin et al.

Table 2 (continued) Technique

Description

Most recommended large language models (LLMs)

Meta-learning prompting [32]

Using meta-learning to train a Meta-learning models such as language model to quickly adapt MAML and reptile to new tasks or domains using a small number of carefully crafted prompts [32]

Zero-shot prompting [33]

Using carefully crafted prompts Transformer-based models to enable a language model to such as GPT-3, GPT-4, and perform tasks that it has not been BERT explicitly trained on [33]

Active learning prompting [34]

Using active learning to Transformer-based models iteratively improve the such as GPT-3, GPT-4, and performance of a language model BERT by selecting the most informative prompts for training [34]

Curriculum learning prompting [35]

Using curriculum learning to gradually increase the difficulty of the prompts used to train a language model, allowing it to learn more effectively [35]

Transformer-based models such as GPT-3, GPT-4, and BERT

Reinforcement learning prompting [36]

Using reinforcement learning to train a language model to generate responses that maximize a reward signal based on the quality of its responses to prompts [36]

Reinforcement learning models such as PPO and A2C

continuously maximizing its score functions [38]. Still, zero-shot performance is used measure the effectiveness of the prompt for overall evaluation within other LLMs. Program synthesis for prompt engineering and automatic instruction generation is related but distinct concepts. Automatic instruction generation refers to the use of technology to automatically generate instructions or prompts. This can be done in various contexts, such as creating how-to guides or generating prompts for machine learning models. Program synthesis for prompt engineering, on the other hand, refers to the use of techniques from program synthesis to automatically generate instructions or prompts [39]. Program synthesis is fundamentally a field of research that focuses on automatically generating programs from high-level specifications, such as examples or natural language descriptions. In the context of prompt engineering, program synthesis techniques are used to generate instructions or prompts that meet certain criteria or specifications. The primary difference between the two concepts is that automatic instruction generation is a broader term that encompasses various techniques for generating instructions automatically, while program synthesis for

Prompt Engineering in Large Language Models

397

prompt engineering refers specifically to the use of program synthesis techniques for this purpose.

3.3 Demonstration Tasks for Prompt Engineering Few-shot learning: This technique enables in-context learning where demonstrations are provided in the prompt to steer the model to better performance [19]. The demonstrations serve as conditioning for subsequent examples where the model is expected to generate a response. Exercise 1: Experiment with few-shot learning by providing a series of messages between the user and the assistant in the prompt as few-shot examples. Understand how this technique improves accuracy and response grounding in LLMs. Exercise 2: Execute few-shot learning for NLP tasks like text classification, sentiment analysis, and language translation using LLMs. Non-chat scenarios: This provides a better understanding of application of prompt engineering can for specific tasks, e.g., answering questions on specific topics [40]. Exercise 1: Use ChatGPT or GPT-4 for a non-chat scenario task like generation text that satisfies specific questions, e.g., “What is the capital city of Uganda?” Exercise 2: Explore how prompt engineering can be applied to non-chat scenarios, such as generating text for a specific purpose or answering questions on a specific topic with Large Language Models. Start with clear instructions: Experiment with providing clear instructions to the model in the prompt to see how it affects the generated responses. Exercise 1: Experiment with providing clear instructions to the model in the prompt to see how it affects the generated responses. Fine tune the prompt to understand how this technique can be used with alternative LLMs to improve accuracy and grounding of given responses. Exercise 2: Explore different techniques for providing clear instructions to the model in the prompt when working with Large Language Models. Explore different APIs: For Azure OpenAI GPT models, there are two separate APIs where prompt engineering is important: the Chat Completion API and the Completion API. Each API requires the input data to be structured in a specific way, which affects the overall design of the prompt. Exercise 1: Experiment with using different APIs for interacting with Azure OpenAI GPT models. Explore how these APIs can be used with Large Language Models to improve their accuracy and grounding of responses. Exercise 2: Explore how different APIs require input data to be formatted differently and how this impacts overall prompt design when working with Large Language Models. Validate responses: Even when using prompt engineering effectively, it is crucial to verify the responses generated by the models.

398

G. Marvin et al.

Exercise 1: Experiment with different techniques for validating responses generated by LLMs. Fine tune the prompts to comprehend the limitations of LLMs based on the known strengths and weakness. Exercise 2: Explore different techniques for validating responses generated by LLMs when working with Large Language Models.

4 Applications, Tools, and Trends of Prompt Engineering 4.1 Applications and Tools Since prompt engineering is a technique used to improve the performance of natural language processing (NLP) models through providing better and more focused input, there are several tools and applications available to facilitate this process for example; LangChain, a library aimed at assisting in the development of applications that combine large language models with other sources of computation or knowledge [41]. Dust.tt which is a platform that helps build large language model applications as a series of prompted calls to external models [42]. OpenPrompt which is a PyTorch-based library that offers a standard, adaptable, and expandable structure for implementing the prompt-learning process. [43]. BetterPrompt, another test suite for large language model prompts before pushing them to production [44]. Prompt Engine which is an NPM utility library for creating and maintaining prompts for large language models [45]. Promptify, a library aimed at assisting in developing a pipeline for using large language model APIs in production [46]. TextBox 2.0 which is a modern text generation library that uses Python and PyTorch to create a consistent and standardized process for using pre-trained language models for text generation [47]. ThoughtSource is a central, open resource, and community centered on data and tools for chain-of-thought reasoning in large language models [48]. There is also GPT Index. It is a very useful project constituting data structures utilized for easing the usage of large external knowledge bases with LLMs [49, 50]. Moreover, a community-driven initiative was also established to gather numerous prompts for different cases. The community can be accessed via https://huggingface.co/datasets/ fka/awesome-chatgpt-prompts.

4.2 Current Research and Future Trends of Prompt Engineering We have to acknowledge the fact that the rapid development of conversational AI has underlined lots of emerging Natural Language Processing (NLP) innovations. One of which are Large Language Models (LLMs). Such innovations have attracted attention which has particularly facilitated the growth in size, capabilities, and transformative

Prompt Engineering in Large Language Models

399

potential of NLP driven technologies. We observe how such technologies have been scaled to various domains like education and health care. It is important to note that LLM operations involve creating and optimizing methods for input (prompts) in order to obtain appropriate good quality output (responses). This interdependency and need for good quality outputs across various domains of application is what has inspired research initiatives and defined for us various research trend in prompt engineering. Among the inspired directions of prompt engineering research is the development of better and more sophisticated domains specific prompt engineering tools and techniques for interfacing with LLMs. The other trend is utilization of advanced techniques in machine learning to automate the synthesis and optimization of prompts for specific tasks. Opportunities are also opening up to integrate prompt engineering in other NLP research areas like transfer learning and domain adoption. This focuses on developing methods to adapt existing prompts to new domains or tasks. It also fits into transfer of knowledge learned from one task to another prompt driven task. As the world is working toward Society 5.0, a vision where every technology responsibly connects and interacts with other technologies, it opens up research opportunities on responsible utilization of prompt engineering for improved reliability and safety of LLMs. Such research initiatives usually involve developing methods for detection and mitigation of potential biases and errors in LLM inputs and outputs. It also covers aspects for ensuring that LLMs behave in predictable, traceable, trustworthy, and explainable means when applied in sensitive domains like health care. Since prompt engineering is becoming a programming paradigm in conversational AI, it opens up opportunities for utilizing prompts to improve explainability, interpretability, and reliability of conversational AI applications most especially LLMs. This can involve developing of methods for generating prompts that enable users to understand how LLMs arrive to their outputs. This requires providing LLM explainers for LLM output decisions. Building an equitable and well-connected world drives attention on utilization of prompt engineering for efficiently and scalable deployment of LLMs. This opens up research opportunities on methods for reducing computational costs of LLMs. Some these include how humans can enable usage of large-scale datasets and distributed computing environments for LLMs.

4.3 Current Research and Future Trends of Large Language Models (LLMs) We are certain of increasing applications of LLMs across domains. This increase comes with attention to the processes of developing larger and more powerful conversational AI tools that accurately perform a variety of tasks that require human intelligence. This will likely manifest as Conversational Artificial General Intelligence (CAGI).

400

G. Marvin et al.

This opens up research trends building methods for developing, deployment, and monitoring LLMs. It opens up research opportunities on equitable training and utilization of LLMs among various social groups. Methods of transfer learning, domain adaptation, and training LLMs on small amounts of data also open up. There are inspired research trends to investigate reasoning, planning, and decision making within various architectures of LLMs. For matters of generalizability and scalability, the future and trend of research on LLMs needs to focus on combination of technologies (Multi-modeling) for NLP, computer vision, robotics, and quantum AI. This requires development of optimal methods for integrating those technologies for effective operations. The other trend in LLMs is Responsible Conversational AI, this involves interpretability, explainability, inclusivity, trustworthiness, and responsibility in modeling, development, deployment, and monitoring LLMs as conversational AI systems. Finally, human–machine interaction is another interest developing among multiple disciplines who want to understand domain-specific human language and behavior for effective communication and control of LLMs across domains. Whereas LLMs are useful tools, they are still limited by correctness, lack of enterprise context, stale training data, limited controllability, and private data risks. These make them perpetuate stereotypes and harm disadvantaged groups, they also spread misinformation, outright disinformation and are still constrained by computing power.

5 Conclusion The potential and significance of prompt engineering on Large Language Models (LLMs) and conversational AI systems is undeniable. With the impressive performance demonstrated by LLMs on a wide range of NLP tasks in complex applications and use cases, we declare urgent and extra attention on prompt engineering processes and procedures. Since prompt engineering is now a programming paradigm for conversational AI, information seekers, AI engineers, and researchers need to get excited about the new opportunities and challenges that come with developing more powerful and capable Conversational AI systems. By leveraging the latest advances in these fields, engineers and researchers can build systems that can perform a wide range of tasks more effectively and efficiently. Part of such tasks is building domain specific prompt engineering platfoms based on responisble data standards and AI practices. These will be very imporant for prompt engineering research, digital equality, coversational AI research and industrial practices. For ordinary users and information seekers, LLMs and prompt engineering have the potential to significantly improve their interactions with Conversational AI systems. By enabling AI systems to better understand and respond to human language and behavior, LLMs and prompt engineering will make it easier for users to communicate with and control these systems. Therefore, active participation and staying up-to-date with the latest advances in these fields, AI engineers, researchers, and

Prompt Engineering in Large Language Models

401

ordinary information seekers can all benefit from the exciting new opportunities offered by LLMs and prompt engineering.

References 1. Brants T, Popat AC, Xu P, Och FJ, Dean J (2023) Large language models in machine translation. Research.google. Online. Available: http://research.google/pubs/pub33278.pdf. Accessed 01 May 2023 2. Du Y et al (2023) Guiding pretraining in reinforcement learning with large language models. arXiv cs.LG 3. Wang Y et al (2022) AdaMix: mixture-of-adaptations for parameter-efficient model tuning. arXivcs.CL 4. Wei J et al (2022) Emergent abilities of large language models. arXiv cs.CL 5. Oppenlaender J (2022) A taxonomy of prompt modifiers for text-to-image generation. arXiv cs.MM 6. White J et al (2023) A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv cs.SE 7. Lo LS (2023) The CLEAR path: a framework for enhancing information literacy through prompt engineering. J Acad Libr 49(4):102720 8. Short CE, Short JC (2023) The artificially intelligent entrepreneur: ChatGPT, prompt engineering, and entrepreneurial rhetoric creation. J Bus Ventur Insights 19(e00388):e00388 9. Strobelt H et al (2023) Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE Trans Vis Comput Graph 29(1):1146–1156 10. Abukhalaf S, Hamdaqa M, Khomh F (2023) On codex prompt engineering for OCL generation: an empirical study. arXiv cs.SE 11. Oppenlaender J, Linder R, Silvennoinen J (2023) Prompting AI art: an investigation into the creative skill of prompt engineering. arXiv cs.HC 12. Chalkidis I (2023) ChatGPT may pass the bar exam soon, but has a long way to go for the LexGLUE benchmark. arXiv cs.CL 13. Johnson C, Rodríguez-Fernández N, Rebelo SM (2023) Artificial intelligence in music, sound, art and design. In: 12th international conference, EvoMUSART 2023, held as part of EvoStar 2023, Brno, Czech Republic, Apr 12–14, 2023, proceedings. Springer Nature, Cham, Switzerland 14. Shtedritski A, Rupprecht C, Vedaldi A (2023) What does CLIP know about a red circle? Visual prompt engineering for VLMs. arXiv cs.CV 15. Polak MP, Morgan D (2023) Extracting accurate materials data from research papers with conversational language models and prompt engineering—example of ChatGPT. arXiv cs.CL 16. Busch K, Rochlitzer A, Sola D, Leopold H (2023) Just tell me: prompt engineering in business process management. arXiv cs.AI 17. Kumar K (2023) Geotechnical parrot tales (GPT): harnessing large language models in geotechnical engineering. arXiv cs.CL 18. Trautmann D, Petrova A, Schilder F (2022) Legal prompt engineering for multilingual legal judgement prediction. arXiv cs.CL 19. Ahmed T, Pai KS, Devanbu P, Barr ET (2023) Improving few-shot prompts with relevant static analysis products. arXiv cs.SE 20. Diao S, Wang P, Lin Y, Zhang T (2023) Active prompting with chain-of-thought for large language models. arXiv cs.CL 21. Taveekitworachai P, Abdullah F, Dewantoro MF, Thawonmas R, Togelius J, Renz J (2023) ChatGPT4PCG competition: character-like level generation for science birds. arXiv cs.AI 22. Kather JN, Ghaffari Laleh N, Foersch S, Truhn D (2022) Medical domain knowledge in domainagnostic generative AI. NPJ Digit Med 5(1):90

402

G. Marvin et al.

23. van Dis EAM, Bollen J, Zuidema W, van Rooij R, Bockting CL (2023) ChatGPT: five priorities for research. Nature 614(7947):224–226 24. Yang Z et al (2023) MM-REACT: prompting ChatGPT for multimodal reasoning and action. arXiv cs.CV 25. Khattak MU, Rasheed H, Maaz M, Khan S, Khan FS (2022) MaPLe: multi-modal prompt learning. arXiv cs.CV 26. Wang B, Deng X, Sun H (2022) Iteratively prompt pre-trained language models for chain of thought. arXiv cs.CL, pp 2714–2730 27. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv 55(9):1–35 28. Yang Z, Li Z, Zheng F, Leonardis A, Song J (2022) Prompting for multi-modal tracking. In: Proceedings of the 30th ACM international conference on multimedia 29. Zhu J, Lai S, Chen X, Wang D, Lu H (2023) Visual prompt multi-modal tracking. arXiv cs.CV 30. Maus N, Chao P, Wong E, Gardner J (2023) Adversarial prompting for black box foundation models. arXiv cs.LG 31. Wang Z, Panda R, Karlinsky L, Feris R, Sun H, Kim Y (2023) Multitask prompt tuning enables parameter-efficient transfer learning. arXiv cs.CL 32. Zhang H, Zhang X, Huang H, Yu L (2022) Prompt-based meta-learning for few-shot text classification. In: Proceedings of the 2022 conference on empirical methods in natural language processing, pp 1342–1357 33. Kojima T, Gu SS, Reid MM, Matsuo Y, Iwasawa Y (2022) Large language models are zero-shot reasoners. arXiv cs.CL 34. Köksal A, Schick T, Schütze H (2022) MEAL: stable and active learning for few-shot prompting. arXiv cs.CL 35. Lin J, Chen Q, Zhou J, Jin J, He L (2022) CUP: curriculum learning based prompt tuning for implicit event argument extraction. arXiv cs.CL 36. Zhang T, Wang X, Zhou D, Schuurmans D, Gonzalez JE (2022) TEMPERA: test-time prompting via Reinforcement learning. arXiv cs.CL 37. Zhou Y et al (2022) Steering large language models using APE 38. Zhou Y et al (2022) Large language models are human-level prompt engineers. arXiv cs.LG 39. Austin J et al (2021) Program synthesis with large language models. arXiv cs.PL 40. Sun K et al (2020) Adding chit-chat to enhance task-oriented dialogues. arXiv cs.CL 41. Chase H (2023) Welcome to langchain—langchain 0.0.154. Langchain.com. Online. Available: https://python.langchain.com/en/latest/index.html. Accessed 01 May 2023 42. Dust—design and deploy large language models apps. Dust.tt. Online. Available: https://dus t.tt/. Accessed 01 May 2023 43. “OpenPrompt,” Openprompt.co. Online. Available: https://openprompt.co/. Accessed 01 May 2023 44. “The art & science of AI prompts,” The Art & Science of AI Prompts. Online. Available: https://www.betterprompts.ai/. Accessed 01 May 2023 45. “Promptengines.com,” Afternic.com. Online. Available: https://www.afternic.com/forsale/pro mptengines.com?traffic_id=GoDaddy_DLS&traffic_type=TDFS&utm_campaign=TDFS_G oDaddy_DLS&utm_medium=sn_affiliate_click&utm_source=TDFS. Accessed 01 May 2023 46. “Promptify.Ai,” Promptify.ai. Online. Available: https://www.promptify.ai/. Accessed 01 May 2023 47. TextBox: TextBox 2.0 is a text generation library with pre-trained language models 48. ThoughtSource: A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald. info/ 49. G.-3 Demo, “GPT index,” Gpt3demo.com. Online. Available: https://gpt3demo.com/apps/gptindex. Accessed 01 May 2023 50. “llamaindex (LlamaIndex),” Huggingface.co. Online. Available: https://huggingface.co/llamai ndex. Accessed 01 May 2023

Activity Identification and Recognition in Real-Time Video Data Using Deep Learning Techniques Anant Grover, Deepak Arora, and Anuj Grover

1 Introduction Human Activity Recognition is the task of identifying and recognizing the activities performed by a user based on the input received through a sensor. The Human Activity Recognition can use different kinds of motion capturing sensors like: accelerometers, gyroscope, etc. [1]. The sensors provide temporal data for time series classification. Therefore, HAR can be referred to as machine learning task for multi-variate temporal classification problem. The task of human activity recognition includes a lot of steps, and these steps can be divided into groups such as the collection of data from the sensor, extracting the patterns from the data collected using the sensors and classifying the activity performed by the person. The main challenges faced by sensor-based Human Activity Recognition is that the ways of performing a certain activity may be varied and different for each individual and the collection of this type of data in a large scale can be very difficult. Video Classification is the task of predicting a label for a video from the predefined set of classes. On a high level, video classification can be understood as the task of taking a video as input, and then analysing the content of the video to predict a class label for the video. It is essentially a classification problem. The video classification model can be easily used for Human Activity Recognition [2]. Here, the video acts as a sensor for capturing the motion of the person in the frame. This approach is computationally more expensive than other approaches which uses sensors but is more practical as a video camera can easily record a person. The collection of a large HAR video classification data set is a lot easier than using sensors and attaching it to a person. It is a robust approach and can easily generalize to the different ways an activity can be performed. It also considers the environmental A. Grover (B) · D. Arora · A. Grover Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University, Lucknow Campus, Lucknow, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_31

403

404

A. Grover et al.

Fig. 1 Process of human activity recognition

changes which affect the way the activity is performed. The architectures for Human Activity Recognition are evaluated using the ‘accuracy’ metric. The block diagram for the steps in Human Activity Recognition using video is shown in Fig. 1. The HAR system takes the sequence of frames as input and splits and pre-processes all the frames such that all the frames are of equal resolution. The frames are inputted to a feature extractor which usually consists of convolutional and recurrent layers to extract spatial and temporal information from the input to produce a feature vector. The feature vector is passed to the prediction head to predict the activity being performed from the pre-defined set of activities.

1.1 Baseline: Single-Frame-Human Activity Recognition The baseline human activity recognition model is a very naïve video classification architecture which uses the image classifier architecture as a backbone and does not perform a lot of processing on its results. The baseline architecture uses a convolutional neural network because it captures the environmental context of the input frames along with the activity of the person [3]. The environmental context is very important in predicting the activity of the person. For example, environmental context is required to predict if the person is playing football or running. The architecture takes the video and splits it into multiple frames in time-distributed steps. The dimensions of all the frames should be equal as a common architecture is used. Each frame is pre-processed and passed to the convolutional feature extractor. The feature extractor may use any image classifier architecture such as residual network as the backbone architecture. The prediction head is built on top of that. The feature extractor is shared by all the time steps and thus has common weights. The major drawback of this approach is that the predicted activity might change with each frame making it difficult to infer which activity is being performed. An appropriate approach to get rid of the flickering predictions on each frame would be to not predict and display the predictions for every single frame but to compute the average of the last n results and then display it for the n-th frame. The moving average and image classification-based architecture is chosen as the baseline architecture for video classification because it is the simplest architecture for video classification and can be

Activity Identification and Recognition in Real-Time Video Data Using …

405

implemented very easily. It provides satisfactory results on the validation and test data sets. The major disadvantage of single-frame classification is that it does not capture temporal dependencies between frames.

1.2 Late Fusion The late fusion approach uses a convolutional neural network on each frame like the single-frame baseline architecture but is a little more complicated [4]. The singleframe approach took the features produced by the CNN and passed them into independent fully connected layers. Whereas, in the late fusion approach, these frames are flattened and concatenated and then passed to a fully connected layer. If the   dimension of the produced feature is D × H × W then after the flattening and concatenation of the feature matrices produced by the two frames the length of the   concatenated feature vector will be 2DH W . The activity is predicted by the fully connected layer. The concatenation and flattening of the layers make it very computationally expensive and impractical for actual use. This is done by doing a global  average pool over space and time which takes the input of dimension T × D × H ×  W and outputs the feature vector of size D. The advantage of this approach is that it can learn the global motion characteristics from the number of input frames. Global motion characteristics show the motion of the object of interest in the frame window.

1.3 Early Fusion The downside of the late fusion approach for capturing the temporal dependencies is that it is only able to reason about the global information in the input frames and not the low-level features. The early fusion approach can be used for capturing the low-level motion between the frames [5]. The early fusion approach performs the fusion at a very early stage in the network. It performs the fusion at the pixel level by concatenating all the frames. The idea is to take the video frames and stack them together and consider the temporal dimension as another channel like the RGB channels. In an RGB image during convolution operation, the kernel is only slid across the spatial dimension in two dimensions and the three RGB channels are considered part of the tensor. Similarly, in early fusion, the temporal as well as the RGB channels are considered the part of the tensor and the kernel is slid in 2D only. In practice, this is implemented by extending the filters of the first convolutional layer to the full temporal extent of our input like it is extended across all channels when a normal 3D tensor is inputted. The rest of the network is the simple 2D CNN consisting of two-dimensional filters. This can be considered as the input is reshaped from (T × 3 × H × W) to (3T × H × W) and processed by a normal 2D convolutional filter. The early fusion approach has a major drawback which is that it is not temporal shift invariant. Temporal shift invariance means that the output should not change if

406

A. Grover et al.

the main activity takes place at a different time step. It is because a certain filter is learned for only detecting activity occurring at a specific time step. Another drawback of this approach is that it only consists of one layer of temporal processing which is not enough.

1.4 Slow Fusion (3D CNN) The Slow Fusion architecture states that the temporal dimension should not be fused completely in a single layer but should carry the temporal dependencies forward to the subsequent layers [6]. This is done using the 3D convolutional filters in contrast to the known 2D kernels. The 3D convolutional network proposes to extend the 2D convolutional and pooling layers so that the temporal data is fused slowly in the complete architecture instead of a single layer. This is achieved by making the convolutional filters operate in the temporal dimension in the same way they operate in the spatial dimension, which means that the kernels will not only slide across the width and height of the input tensor but also across the temporal dimension as well. The 3D filters will thus not only be applied in a single layer but will be applied in different layers throughout the network. An end-to-end 3D Convolutional Network can be trained like the usual 2D Convolutional Network.

1.5 CNN–LSTM: Long Range Convolutional Network (LRCN) The previous approaches for video classification or human activity recognition capture the spatial and temporal dependencies but are very computationally expensive. The sequential data are best captured by the recurrent neural network. The LSTM network is a type of recurrent neural network that preserve the long-term dependencies and can carry the information in the earlier layers of the network to the later layers. The LSTMs work by only storing the relevant input of the sequence and forgetting the sequences which are not important. The CNN-LSTM architecture [7] takes the sequential frames from the video as input and passes it to the convolutional layer wrapped around the time-distributed wrapper (all the time steps use the same weights and convolutional architecture in parallel). Multiple convolutional layers are used to extract the feature vector from each time step. The output of each time step is inputted to the LSTM layer, multiple LSTM layers are stacked on top of each other and the output of the final LSTM layer is passed through the SoftMax activation function to give output of each time step. The output is fused together using late fusion approach.

Activity Identification and Recognition in Real-Time Video Data Using …

407

1.6 Slow Fast Network The Slow Fast Network consists of two towers: the slow path and the fast path [8]. The slow path inputs the frames from the video at a slow rate (less frames are inputted to the network in each period). The fast path inputs the frames from the video at a faster rate (a greater number of frames are inputted to the network in each period). The slow path is used for capturing the semantics and the spatial understanding of the person in the video, whereas the fast path is used for capturing the motion in between all the frames of the video. The slow path consists of a high number of parameters (high network capacity), while the fast path consists of a smaller number of parameters (low network capacity). The paper states that the people, objects, and the background of the person may not change drastically in a video sequence. Therefore, the semantics of the video should be re-captured and computed slowly but the motion changes fast so it should be computed at a faster rate. The slow pathway uses a slow refreshing rate of the frames and extract the semantics from the input frame. The fast pathway uses a fast-refreshing rate of the frames and extract the motion from the multiple input frames. The fast pathway uses a lightweight architecture because it is not required to extract the spatial information but only the temporal dependencies. Each layer outputs a smaller number of channels. The number of channels in the fast pathway is 1/8 of the number of channels in the slow pathway. The two pathways are connected at certain stages through fusion techniques using lateral connections.

2 Literature Survey The Single-Frame Baseline architecture for video classification was proposed by Andrej Karpathy from Stanford University [3]. The Single-Frame architecture is the perfect baseline for Human Activity Recognition. It performs activity recognition on every individual frame of the video. It passes each frame through a CNN image classification architecture. This approach gives flickering results therefore moving average of the predictions of all the frames is computed. The Late Fusion approach was proposed by Akilan et al. in 2017 [4]. The Late Fusion approach aims at capturing the temporal dependencies which are not captured by the Single-Frame Baseline. It is done by feeding each frame into a CNN feature extractor to produce a sequence of feature matrices. These feature matrices are then average pooled over space and time to give a 1D feature vector which is fed to multiple fully connected layers for predicting the activity performed. The Early Fusion approach for video classification was proposed to capture the low-level motion between the frames which is not captured by the late fusion approach. It was proposed by Williams et al. in 2018 [5]. It works by fusing the sequence of frames at the pixel level and then applying a 2D convolutional operation on it. This is implemented by extending the filters of the first convolutional layer to the full temporal extent of our input like it is extended across all channels when

408

A. Grover et al.

a normal 3D tensor is inputted. It is then passed to the 2D CNN for predicting the activity in the video. The Slow Fusion approach proposed to use a 3D CNN architecture which works by using a 3D convolutional and pooling layer which convolves over both spatial and temporal dimension. It was proposed by Tran et al. in 2019 [6]. Hence, the temporal dimension of the input is not collapsed in the first layer itself but is carried forward to the subsequent layers as well. The Long Range Convolutional Network (LRCN) was proposed by Donahue et al. in 2016 [7]. The previous architectures for HAR are very computationally expensive and hard to train. Because RNNs are the best at capturing temporal dependencies, the paper proposes an architecture that runs each frame through a CNN for feature extraction before passing the extracted features to multiple LSTM layers for capturing long-term dependencies and predicting activity. The MoViNet architecture is an end-to-end 3D CNN for HAR proposed by researchers from Google in 2021 [9]. It stands for Mobile Video Networks for Efficient Video Recognition. It is a lightweight and computationally inexpensive architecture. The Slow Fast Network was proposed by Christoph Feichtenhofer et al. from Facebook research in 2018 CVPR [8]. The paper proposes to use two towers: the slow path and the fast path. The slow path inputs the frames from the video at a slow rate. The fast path inputs the frames from the video at a faster rate. The slow path is used for capturing the semantics and the spatial understanding of the person in the video whereas the fast path is used for capturing the motion in between all the frames of the video. The slow path has a high network capacity whereas the fast path has a low network capacity. The comparison of the accuracy of all the discussed architectures on the Kinetics Video Classification data set [10] as mentioned in their respective research papers is given in Table 1. Table 1 Different architectures used and their results S. No.

Author

Architecture used

Accuracy (%)

References

1

A. Karpathy

Single frame (baseline)

56.0

[3]

2

T. Akilan

Late fusion

59.3

[4]

3

J. Williams

Early fusion

57.7

[5]

4

D. Tran

Slow fusion (3D CNN)

60.9

[6]

5

J. Donahue

CNN + LSTM (LRCN)

65.4

[7]

6

D. Kondratyuk

MoViNet

75.3

[9]

7

C. Feichtenhofer

Slow fast network

77.1

[8]

Activity Identification and Recognition in Real-Time Video Data Using …

409

3 Description of Data Sets and Experimental Set-Up The Human Activity Recognition architecture takes the sequence of frames from the video as input and predicts the activity performed in the video. The architecture is either evaluated using the sparse categorical loss and accuracy metrics. All the architectures are trained and evaluated on the subset of the UCF–101 data set. The architectures are built using TensorFlow. The architectures were built using a novel approach of using the pre-trained ResNet model as a feature extractor from TensorFlow Hub instead of training one from scratch. This increases the accuracy of the architectures. The architectures implemented are Single-Frame Baseline, Late Fusion, Early Fusion, 3D CNN (Slow Fusion), Long Range Convolutional Network (LRCN), MoViNet and Slow Fast Network.

3.1 Methodology of Data Set Description The UCF–101 data set was used for the training and evaluation of all the architectures [11]. UCF–101 is a state-of-the-art benchmark data set for video classification. There are a lot of data sets for video classification but the UCF–101 data set is specific for Activity Classification on the video data. The data set consists of 101 classes of activities being performed. The data set is collected from YouTube videos and other resources with all the proper labelling. The data set was thoroughly checked and no video was mislabelled into the wrong class. It is an in-the-wild data set which means that the architectures trained on this data set are not biased to a certain racial colour, background setting, or gender.

3.2 Implementation and Testing The implementation of the Human Activity Recognition architectures was done on Google Colab. The architectures are implemented using the TensorFlow library. It is necessary to pre-process the videos in the data set so that all the videos have the same number of frames and the resolution of all the videos is same. The number of frames is reduced in a video in a uniform distribution manner. Hence, the data set is pre-processed by reducing the sequence length of the number of frames to 20 and the resolution of each frame to 64 × 64. Small resolution helps in faster training of the architectures and therefore can be used to evaluate which architecture will be the best architecture for Human Activity Recognition. UCF101 is a very large data set, and it is difficult to train all the models on the entire data set because it is not possible to load it into the limited RAM memory provided by Google Colab. Therefore, the architecture is trained on selected 4 classes from the data set. The selected classes are Walking with Dog, Tai Chi, Swing and Horse Race.

410

A. Grover et al.

The Single-Frame CNN model [3] is constructed by using the ResNet [12] feature from TensorFlow Hub as the base model for the architecture. The weights of this model are set as not trainable so the speed of training increases. The ResNet outputs a feature vector of shape 2048 for each frame. Time-distributed dense layer [13] is used and then predictions are made for each time sequence using SoftMax activation function. The mean of the predictions is computed the final mean of the predictions are used for computing the loss for the optimization of the weights. The Lambda layers are used in TensorFlow for creating a custom layer for a TensorFlow model. The Late Fusion model [4] is constructed by using a shared convolutional neural network for all the frames like the single-frame model. The vectors produced by the CNN are fused together by using a pooling layer which averages across both space and time. Followed by a couple of dense layers, the final dense layer consists of a SoftMax activation function. The late fusion model is trained on the given data set, and it reaches a global minimum. The Early Fusion model [5] takes the video feature vector as input. The video feature vector is a four-dimensional tensor of the shape (sequence length, height, width, 3). The input is converted into a 3D tensor by convolving it across a 2D kernel [14]. This kernel is different from the normal kernel because it does not only contain an equal number of channels but also an equal length across the temporal dimension. The output 3D tensor is fed to the pre-trained ResNet from TensorFlow Hub. The feature vector produced the ResNet architecture is fed to a couple of Dense layers to predict the activity. The 3D CNN (Slow Fusion) model [6] uses the 3D Convolution layer which is just like the 2D convolution, but it convolves across both the spatial and temporal dimension. The 3D pooling layers are used instead of the usual 2D pooling layers. The deeper we go into the 3D Convolutional Neural Network; the temporal and spatial dimensions decrease, while the number of channels increases [15]. The architecture can be considered as the baseline for 3D Convolutional Network. Unlike, the rest of the architectures, this architecture does not use a pre-trained feature extractor. The Long Range Convolutional Network (LRCN) [7] is implemented by passing the temporal frames from the input into pre-trained CNN from TensorFlow Hub in a time-distributed manner like the earlier architectures. The sequence of features outputted by the CNNs is passed to multiple LSTM layers for predicting the activity being performed. The MoviNet architecture [9] is an end-to-end lightweight pre-trained 3D convolutional network built by Google. The architecture is available on TensorFlow hub for inference only but can be used from TensorFlow models provided in their GitHub repository for fine-tuning by modifying the configurations of the architecture (such as the number of output classes). The TensorFlow models are accessed for finetuning by installing the tf-models-officials library. The architecture is then trained on the data set. The Slow Fast Network [8] is also available in TensorFlow Hub for inference like the MoViNet architecture. It can be used from TensorFlow models provided in their GitHub repository for fine-tuning by modifying the configurations of the architecture.

Activity Identification and Recognition in Real-Time Video Data Using …

411

4 Results and Discussion The evaluation accuracy and loss of all the architecture on the test set (25% of the data) is given in Table 2. There are 4 classes to be predicted. Therefore, to make sure that an architecture is learning something, the accuracy of the architecture needs to be more than one-fourth, i.e. 25%. There are a total of 6 architectures used for comparison which are trained on 4 classes from the UCF–101 data sets. The SingleFrame architecture is a strong Baseline; hence, it gives a good accuracy on the data set. The evaluation accuracies of all the architectures are visualized in Fig. 2. The Early Fusion architecture does not give satisfactory result so it should not be used. The Slow Fast Network and MoViNet architectures gives the best and second-best accuracies respectively. The evaluation loss for all the architectures is visualized in Fig. 3. The lower the loss or the higher the accuracy, the better the model. Table 2 Evaluation accuracy and loss of all the architectures

S. No.

Architecture used

Accuracy

Loss

1

Single frame

94.26

0.213

2

Late fusion

92.26

0.322

3

Early fusion

70.49

0.651

4

LRCN

89.34

0.379

5

3D CNN (slow fusion)

90.00

0.342

6

MoViNet

95.24

0.124

7

Slow fast network

96.21

0.103

Fig. 2 Comparison of accuracy of all the architectures

412

A. Grover et al.

Fig. 3 Comparison of loss of all the architectures

The MoViNet architecture required the least amount of training time therefore it is most suited for real world applications. The training curve for the MoViNet architecture is shown in Fig. 4. The training curve shows that the training and validation accuracy increases as the number of epochs increases. The architecture achieved a validation accuracy after training for 9 epochs. Since the validation accuracy is higher than the training accuracy throughout the curve, therefore the architecture is not overtrained and can be fine-tuned. The MoViNet architecture gives the close second-best results but is way faster than the Slow Fast Network. Therefore, the MoviNet architecture is further trained for 10 classes for a greater number of epochs as the training curve of the MoViNet architecture showed that it could be further trained for more accuracy. The MoviNet architecture after training reached an accuracy of 98% for 10 classes on the test data set. The confusion matrix for the evaluation of the trained model is shown in Fig. 5. The architecture gives almost perfect results and only gets confused between applying eye makeup and lipstick for some video samples which is natural for a human as well because both belong to the makeup category.

Activity Identification and Recognition in Real-Time Video Data Using …

Fig. 4 Total versus validation loss of MoViNet architecture

Fig. 5 Confusion matrix for MoViNet

413

414

A. Grover et al.

5 Conclusion and Future Scope The Human Activity Recognition system inputs a temporal sequence of frames in the form of video as input and predicts the activity being performed in the video. The various architectures and their performance have been thoroughly discussed in the paper. The MoViNet architecture has been the most promising architecture giving state-of-the-art results for activity recognition. The performance of the model was analysed for each category by using a confusion matrix. The MoViNet architecture is very fast and therefore can also be used for real-time inference for the activity being performed. Human Activity Recognition has a lot of potential applications like automatic monitoring of surveillance videos. It can be used to detect crime in real time.

References 1. Lara OD, Labrador (2012) A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutorials 15(3):1192–1209 2. Heilbron FC et al (2015) Activitynet: a large-scale video benchmark for human activity understanding. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE 3. Karpathy A et al (2014) Large-scale video classification with CNN. In: Proceedings of the IEEE conference on computer vision and pattern recognition 4. Akilan T et al (2017) A late fusion approach for harnessing multi-CNN model high-level features. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC). IEEE 5. Williams J et al (2018) Recognizing emotions in video using multimodal DNN feature fusion. In: Proceedings of grand challenge and workshop on human multimodal language 6. Tran Du et al (2019) Video classification with channel-separated convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision 7. Donahue J et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition 8. Feichtenhofer C et al (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF international conference on computer vision 9. Kondratyuk D et al (2021) Movinets: mobile video networks for efficient video recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 10. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition 11. Cho H et al (2013) Evaluation of LC-KSVD on UCF101 action dataset. THUMOS: ICCV workshop on action recognition with a large number of classes, vol. 7 12. He K et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition 13. Mutegeki R, Han DS (2020) A CNN-LSTM approach to human activity recognition. In: International conference on artificial intelligence in information and communication (ICAIIC). IEEE 14. Tran D et al (2019) Video classification with channel-separated convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision 15. Vrskova R et al (2022) Human activity classification using the 3DCNN architecture. Appl Sci 12(2)

Article Summarization Using Deep Learning D. Femi, S. Thylashri, Roshan Baniya, Mukesh Kumar, and Rajat Kumar Bhagat

1 Introduction The ability to summarize information or any article quickly and accurately is a valuable skill in today’s fast-paced world. Summarizing a long piece of text can be time-consuming and difficult especially when we are dealing with large volumes of data. That is where text summarization comes in a technique that allows us to quickly condense large amounts of text into a shorter and more manageable summary. In this project, we are developing a text summarization tool that not only summarizes the text but also converts it into audio format and generates a PowerPoint presentation. Our aim is to create a comprehensive solution that caters to the needs of people who prefer audio content or visual aids over traditional text-based summaries. Types of Summarization: There are two main types of summarization—extractive summarization and abstractive summarization as shown in Table 1.

D. Femi (B) · S. Thylashri · R. Baniya · M. Kumar · R. K. Bhagat Vel Tech Rangarajan Dr Sagunthala R&D Institute of Science and Technology, Chennai, India e-mail: [email protected] S. Thylashri e-mail: [email protected] R. Baniya e-mail: [email protected] M. Kumar e-mail: [email protected] R. K. Bhagat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_32

415

416

D. Femi et al.

Table 1 Types of summarization Feature

Extractive summarization

Definition

Selects the most important sentences or Creates a summary by synthesizing phrases from the original text and new sentences that convey the essential combines them to form a summary information from the original text

Abstractive summarization

Input

Requires the original text as input

Requires the original text as input

Output

Outputs a summary composed of sentences or phrases from the original text

Outputs a summary composed of new sentences not present in the original text

Accuracy

Tends to be more accurate because it uses the exact words from the original text

Tends to be less accurate because it may introduce errors or biases in the summary

Language

Limited by the vocabulary and syntax of the original text

Not limited by the vocabulary and syntax of the original text

Complexity

Simple to implement, but may not capture the full meaning of the text

More complex to implement, but can capture the full meaning of the text

Training data

Requires less training data because it Requires more training data because it only needs to learn to select and reorder needs to learn to generate new sentences sentences

Applications Useful for summarizing news articles, Useful for summarizing creative scientific papers, and other factual texts writing, poetry, and other expressive texts

1.1 Extractive Summarization Extractive summarization is a method that selects the most important sentences or phrases from the original text and uses them to create a summary. This technique involves identifying the most relevant information from the source text and reproducing it in a condensed form. Extractive summarization is more commonly used because it relies on actual content from the original text and ensures that the summary remains faithful to the source material.

1.2 Abstractive Summarization Abstractive summarization involves using natural language processing and machine learning algorithms to create a summary that is not restricted to the original text. Abstractive summarization requires the tool to understand the underlying meaning of the text and then create a summary that captures the essence of the text in a new and unique way. Abstractive summarization is a more challenging technique because it requires the tool to understand the context, structure, and intent of the original text, but it has the potential to create more comprehensive summaries.

Article Summarization Using Deep Learning

417

The text summarization project aims to create a tool that can efficiently summarize large volumes of text into shorter and more manageable summaries. By incorporating audio and PowerPoint presentation features, we hope to provide an alternative to traditional text-based summaries that will appeal to the new way towards the summarization. This will be very helpful not only for the multi-taskers but also to the visually impaired people. They can enjoy the comfort of listening news, articles, and various sports events.

2 Literature Review The study deals with the summarization of the texts that are extracted from the long and descriptive paragraphs. The model is trained with the spaCy pipeline which is supported by the thicken and CNN in the background. The algorithm used in the model is more accurate than the other algorithms. The model is also trained under the LSA algorithm which does the semantic analysis of the texts. The model will be consisting of four main modules. The first module consists of the summarization of the texts. The second module will be conversion of audio. The next module consists of the paraphrasing which uses hugging face as algorithm. The final phase is for ppt conversion which is implemented by the pptx library present in the Python. The model accuracy is calculated with the ROUGE. The model has much accuracy than other models present. Freitas et al. [1], Here trainable machine learning algorithms are used. These features are of two kinds: statistical—based on the frequency of some elements in the text; and linguistic—extracted from a simplified argumentative structure of the text in which the important points are extracted from the given sentences. Naive Bayes algorithm is used for better result. The precision is increased by almost 10%. Khan et al. [2], In this paper, the approach to carry out text summarization was executed using the abstractive summarization methods and semantic-based approach. The summarization is carried out for two set of documents, and the performance judging similarity values are 75.2 and 96.2%, respectively. These methods produce concise, information rich, coherent, and less redundant summary as well as improve the linguistic quality of summary with the exception of information item-based method, in which a few summary sentences contain grammatical mistakes due to incorrect parses. Gaikwad et al. [3], The methods which are used in this summarization are natural language processing (NPL), extractive summarization technique, and abstractive summarization technique. Extraction techniques merely copy the information taken to be most significant by the system to the summary (e.g., key clauses, sentences, or paragraphs), while abstraction involves paraphrasing sections of the source document. Jayasudha et al. [4], In this research paper, analysis of the data has been done followed by the methods (data cleaning, dataset building, build dict, and tokenize). The model acts as an encoder consisting of the long short-term memory (LSTM).The

418

D. Femi et al.

LSTM network reads the sentence input and processes it and sends back to the another LSTM network working as a decoder. The algorithm used here is text-totext transfer transformer (T5) which is a pretrained with text-related tasks. Recalloriented understudy for gisting evaluation (ROUGE) has been used as evaluation metrics with score as 0.8. The model was able to perform text summarization in both the approaches. Divya et al. [5], In this paper, they use ROUGE as the validation metric. ROUGE stands for recall-oriented understudy for gisting evaluation. ROUGE is used for evaluating automatic text summarizations and machine translations. It consists of a set of metrics which are used for evaluation and also compares the result automatically produced using the model by the machine against human produced results with the training accuracy of 0.9615 and 0.9114, respectively, with 90percent combined accuracy. Maurya et al. [6], In this paper, they used architecture called ENCODER AND DECODER (machine learning approach) in Keras and TensorFlow. Used evaluation method recall-oriented understudy for gisting evaluation (ROUGE). The result is employed trainable text summarization in multiple languages and multiple documents conjointly And output sequence is generated using context vector representation. Yousefi-Azar et al. [7], In this paper, they presented a query-based singledocument summarization scheme using an unsupervised deep neural network. They used the deep auto-encoder (AE) to learn features rather than manually engineering them. Their experiments explore both local and global vocabularies. They investigate the effect of adding small random noise to local as the input representation of AE and propose an ensemble of such noisy AEs which they call the ensemble noisy autoencoder (ENAE). Their key factor of their model is the word representation. Typically, their automatic text summarization systems use sparse input representations. Sparse representations can cause two problems for the model. First, not observing (enough) data in training process. Second, too much zero in the input and output of AE and the accuracy by the model is around 80%. Mythreagi et al. [8], In this paper, the approach to carry out text summarization was executed using the methods and algorithm deep learning restricted Boltzmann machine (RBM) feature extraction rake algorithm. The summarization is carried out for three set of documents, and the performance judging similarity values are 56.4, 75.2, and 96.2%, respectively when compared to manual summarization. Mahajan et al. [9], In this research paper, the generation of the summarized text is done after training the data under a dataset consisting of 98,000 news article. The model architecture is designed with a encoder (consisting of 128 gated recurrent units), attention layer (to eliminate the repeated words), and the decoder (numerical output which is mapped with the tokenized array).The algorithm used in it is pre trained bi-directional encoder representations from transformer (BERT) which is a NLP trained on Google. The accuracy obtained by the model is around 60% which was far better than the RNN (10%) and encoder–decoder model (40%).

Article Summarization Using Deep Learning

419

Alubelli et al. [10], Here three algorithms are used for summarization, that is surface-level algorithms, intermediate-level algorithms, and deep parsing algorithms. These algorithms are used to reduce the summary of the document and made precise, representation of the text which seeks to render the exact idea of its contents. According to F-measure, the system is giving 85% accuracy.

3 Design and Methodologies • Module 1 – Filtering tokens and summarizing • Module 2 – Converting summarized text into speech • Module 3 – Paraphrasing of text • Module 4 – Converting text to PPT

3.1 Text Summarization Loading the modules. There are lots of modules that need to be loaded in order to run a project. The required modules are loaded in the environment in order to make the project implementation smooth. The various important module can be the spaCy, counter, punctuation, and en_core_web_sm. Passing the string. The project deals with the loading of the text in two different ways. The first one is directly copying the text into the doc file. The next is directly providing the link of the article, the article will be extracted in the text format, and later, it will be saved as a document. Filtering tokens. The document contains large amount of text and many more unwanted texts. Those texts needs to be filtered and taken in record. The texts are splitted into different word using word vectors also called as tokens. Some of the words that are present in the texts but are not much important are the stop words, and they are filtered out in this step. Weighing sentences. After the filtration of the stop words, the words which are having more number of weightage and the counter counts it more are taken into

420

D. Femi et al.

consideration and marked as important. Those sentences are further taken out as the part of the summarized sentences. Summary. The final summary is fetched after passing it through various steps which can be easily described through the architecture diagram.

3.2 Converting Summarized Text into Speech The summarized text that will be received from the summarization process will be taken as the input for the audio generation process. The conversion of the audio file takes place with the help of the Google text to speech (gTTS) module. It is not only a Python library but also a command line interface tool which interacts with the Google to translate the text to audio. The summarized text will be given as the input, and the gTTS will convert it into audio file which will be saved with the extension of (.mp3) with the name as output.mp3.

3.3 Text Paraphrasing The term paraphrasing is defined as the conversion of the sentences in an advance way without losing the main values. Simply we can say it as making error free if any errors in the text. The algorithm that will be used for this is hugging face and Pegasus model. For accessing the Pegasus module, there is the need of loading the torch, transformer, and sentence splitter. The paraphrasing of the text is done by getting the text as input and passing it through the sentence splitter. The sentence splitter splits the text into sentences. The default language is initially given as English. Later the list of the sentences are marked out. The loaded Pegasus and hugging face algorithm does the work in paraphrasing. The splitted sentences are again appended once it gets paraphrased. Finally the (‘’) and ([]) are removed from the appended sentences. Once the cleaning is done, the sentences are printed and loaded in the paraphrased_text text loader.

3.4 Conversion of Text to PPT Loading the pptx module. In order to deal with the ppt conversion, there needs to be the installation of the pptx module using the pip install. The module is downloaded and loaded for the future implementation. Importing presentation library. The library which was installed in order to create the ppt is imported, and a root file is created with the function as presentation.

Article Summarization Using Deep Learning

421

Adding first slide. The presentation function is imported and loaded for adding the first slide. The layout of the first slide is designed with the cell code as root. slide_layouts [1]. The title of the presentation slide is specified in the “”. The line is changed, and the text that is need to be added is passed. The final command which will save the ppt file is run, and once the first slide is done with the generation done is printed. The same process is repeated for adding the second slide.

4 Proposed Methodology The summarization deals with the shortening of the larger content without losing the important aspects. This will enhance the saving of time. In order to deal with the proposed system the first thing that is required is the data for which the summarization is going to take place. The initial document or the text link is provided from which the text is extracted. After the extraction of the texts, it is further processed for the data preprocessing. In the data preprocessing, the first step that carries is the splitting of the sentences. The sentences are splitted using the sentence splitter which is a Python library. Thus, splitted sentences are further tokenized into small tokens with the help of the sentence tokenizer. General architecture diagram is shown in Fig. 1. Tokenization is the process of converting the texts into meaningful understandable bits. It is carried out with the help of the Python library that is sentence piece and spaCy pipeline. The sentences are converted into small tokens or words using word vectors. Thus, obtained tokens are sent for the lemmatization steps. Lemmatization is a useful technique in text summarization to reduce the dimensionality of the text data and improve the accuracy of the summaries. Lemmatization involves converting words to their base or dictionary form, which is known as the lemma. It converts each word to its base form based on its part-of-speech tag. For

Fig.1 General architecture diagram

422

D. Femi et al.

example, the lemma of the word “teaching” when used as a verb is “teach”. This can be done using a pre-built lemmatizer, such as WordNet lemmatizer or spaCy lemmatizer. In this project, spaCy is being used. By performing lemmatization, the number of unique words in the text can be reduced, which can improve the accuracy and speed of the summarization process. Additionally, lemmatization can help to group words with similar meanings together, which can improve the coherence of the summary. Removing stop words is a common technique used in text summarization to eliminate common words that do not add much meaning to the text, such as “the”, “and”, and “a”. Here are the general steps to remove stop words in a text summarization project. Create a list of stop words which can be done using a pre-built list of stop words, such as the one available in Natural Language Toolkit (NLTK) being used in NLP. Iterate over the tokens, and remove any words that are in the list of stop words. The completion of the sentence extraction leads to extractive summarization. The first step in it is generating similarity matrix. The words which are having similar values are converted to one and further assisted for sentence ranking. The ranking of the sentences is done on the basis of the word counter. Once the weightage of the words are defined, the sentences become ready to be extracted. The extracted sentences are sent for paraphrasing. Once the paraphrasing is done, the output becomes ready to be extracted.

5 Performance Anaysis For the smooth completion of a project, there should be some evaluation metric that could determine the potentiality. For article summarization, the evaluation metric we have used is ROUGE. ROUGE is a type of evaluation metric that stands for recall-oriented understudy for gisting evaluation. The ROUGE consists of various evaluation metrics in it. The metric is used for the automated text summarization by comparing it with the human produced value. The system generates the summary using some projected model is known as system summary, whereas the summary proposed by the human for the reference is known as the reference summary. The summarized text is evaluated on the basis of the machine translated text and human proposed summary. The results are supported by the formulas. Recall is calculated by using the proportion of the machine translated and humangenerated summary. For the evaluation of the accuracy and precision with the score, the ROUGE is used. Recall =

Overlapping words from the summary Words in reference summary

Precision value =

Overlapping words from the summary Words in system summary

Article Summarization Using Deep Learning

423

Table 2 Performance metrics Test

Attribute

Min.

Max.

Average

Source text

Sentences

11

94

30

Words

205

1537

465

Manual summary

Sentences

4

30

8

Words

62

625

162

Sentences

2

32

9

Words

55

637

170

Overlapping words

30

553

Precisions

0.54

0.86

Recall

0.64

0.88

0.76

F-measure

0.58

0.869

0.72

System summary

Evaluation

0.70

Table 3 Comparative analysis Evaluation metric

LSA

TEXTRANK

T5

KL-sum

LUHN’S

Proposed model

Precision

0.6527

0.4637

0.4615

0.4776

0.1829

0.7120

Recall

0.6438

0.4383

0.4109

0.4383

0.2054

0.7610

F-Score

0.6482

0.4507

0.4347

0.4571

0.1935

0.7232

Fig. 2 Representation of metrics

0.76

Score

0.74 0.72

0.76

0.7

0.72 0.7

0.68 0.66

Precisions

Recall

F-measure

Evaluation Metric

F − measure = 2∗

Precision∗ Recall Precision + Recall

The testing were performed based on the above given data. The data were collected after taking the count of the sentences, words, and overlapping words. The average is taken out from numbers of data collected. Table 2 contains the minimum and maximum values present in the data. The illustration of the accuracy can be shown by the below bar diagram. Comparative analysis is shown in Table 3. Figure 2 represents about the evaluation metrics. It depicts about the precisions, recall, and F-measure value which is 0.7, 0.76, and 0.72, respectively.

424

D. Femi et al.

Fig. 3 Comparison of results

Model Prediction Comparison 0.8

Score

0.6 0.4 0.2 0

Model Types Precision

Recall

F-Score

Figure 3 represents about the comparison of proposed model with present model available. The proposed model has the better precisions, recall value, and F-measure value when compared to the models which are based on LSA, Text rank, T5, KL-Sum, and Luhn models.

6 Conclusion The increase in the use of technology has made the access of the data to the people at a very huge amount. The people are not able to invest their time reading the long and descriptive texts. Though they do not want to waste their time in long tasks, they need to deal with those long time-consuming tasks. The people are in need for something that could summarize the articles and present only the important contents. Since there is a huge amount of data being overloaded, there is the requirement of the summarization which will be very desirable. According to the ABC, the digital consumer of the magazines has been increased from 2.5 million to 3.1 million within a year. The conclusion can be made that there is a requirement of the summarizer which could save a large number of time by decreasing the quantity with quality of information. The humans are provided with the capability to manually summarize the data but sometimes the data are in a very large quantity; there the manual summarization cannot be performed so there is a need to introduce automatic summarization.

Article Summarization Using Deep Learning

425

7 Future Enhancement The present module is designed to deal with the conversion of the descriptive text into summarized text, conversion of the summarized text to audio file, paraphrasing of the text, and the final one is to convert the paraphrased text to presentation file (pptx). Further modification that can be added in the project is to use the advanced algorithm. The language obtained in the project is in English, and there could be an option for translating the language from English to other preferable language. Instead of providing the document and link as the input, there could be the enhancement for the speech or vocal typing can be added as a module, so these are some of the modification that can be done in the project.

References 1. Vast A, Mahajan R, Mhaske S, Barahate S (2021) Text summarization using deep learning. Int Res J Eng Technol (IRJET) 08(05):1737–1740 2. Chokshi A, Maurya K, Patel M, Vyas S (2018) Machine Learning approach for automatic text summarization using neural networks. 7(1):298–302 3. Freitas AA, Kaestner CAA, Joel Larocca Neto G (2010) Automatic text summarization using a machine learning approach. Int J Comput Appl Technol 42(02):205–215 4. Sahoo A, Nayak AK (2018) Review paper on extractive text summarization. Int J Eng Res Comput Sci Eng (IJERCSE) 5(4):46–52 5. Baisetti Sowmya G, Rao S, Divya K, Sneha K (2020) Text summarization using deep learning. Int Res J Eng Technol (IRJET) 07(05):3673–3677 6. Namrata Mahender C, Gaikwad DK (2016) A review paper on text summarization. Int J Adv Res Comput Commun Eng 05(03):154–160 7. Jayasudha K, Thanmay D (2022) Text summarization using deep learning. Int J Creative Res Thoughts 10(6):604–608 8. Atlam E-S, Aoe J-I, Morita K, Fuketa M, Hiroya Kitagawa S (2013) Document summarisation on mobile devices using non-negative matrix factorisation. Int J Comput Appl Technol 46(01):13–23 9. Sharon JRT, Deepthi KS, Alubelli ST (2019) Multi-document bi-lingual news summarization. Int J Res Anal Rev 06(1):494–499 10. Hamey L, Yousefi-Azar M (2017) Text summarization using unsupervised deep learning. Expert Syst With Appl 68(1):93–105 11. Atif Khan M, Salim N (2014) A review on abstractive summarization methods. J Theor Appl Inf Technol 59(01):64–72 12. Mridha MF, Nur K, Hasan M (2021) A survey of automatic text summarization progress, process and challenges. J Emerg Technol Web Intell 9(30):156043–156056 13. Manju D, Radhamani V, Dhanush Kannan A, Kavya B, Sangavi S, Srinivasan S (2022) Text summarization—a survey. In: Proceedings of the sixth international conference on inventive computation technologies, vol 21(7). pp 173–182 14. Desai M, Baxi J (2018) A survey on varıous technıques to buıld extractıve text summarızer. Int J Comput Sci Inform Technol 5(12):495–500 15. Mythreagi R1, Yuvaraj N (2019) Automatic document summarization using deep learning mechanism with competent analysis. 14(7):1709–1714 16. Sharma R, Varshney N, Sharma M, Paliwal R (2022) Voice based summary generation using LSTM. Int J Res Appl Sci Eng Technol (IJRASET) 10(06):46–51

Knee Osteoarthritis Severity Prediction Through Medical Image Analysis Using Deep Learning Architectures C. Dymphna Mary, Punitha Rajendran, and S. Sharanyaa

1 Introduction Osteoarthritis (OA) of the knee is a wide spread degenerative joint disease that primarily affects persons over 50 but also affects millions of others globally. It is a chronic ailment that develops as the cartilage that protects the joints slowly deteriorates, resulting in discomfort, stiffness, and decreased mobility. Many variables, such as genetics, ageing, obesity, joint traumas, and overuse, might contribute to the development of knee OA. Although there is no known treatment for knee OA, there are a few options that can assist control symptoms and halt the disease’s progression. They include dietary and exercise modifications, as well as medication, physical therapy, and surgery in more serious circumstances. Anybody who may be at risk or already has knee OA should be aware of the causes, signs, and therapies of the disease [1]. Elderly folks frequently experience it. Primary and secondary stages might be used to roughly categorize it. The initial stage categorizes those who experience an adverse event with no obvious reason or cause. Those who are subjected to an abnormal con- centration of force across the joints, which causes osteoarthritis, are classified as being in the secondary stage. Osteoarthritis is a chronic condition that gradually impairs movement in the joints. The location and extent of the joint deterioration can be used to categorize knee osteoarthritis. X-Ray Images of Normal Knee Joint and Arthritic KneeJoint is shown in Fig. 1. The following systems are commonly used in clinical practice: Radiographic classification: This classification system uses X-ray images of the knee joint to assess the degree of joint injury [2] 9. Such people. C. D. Mary (B) · P. Rajendran · S. Sharanyaa Department of Information Technology, Panimalar Engineering College, Poonamallee, Chennai, Tamil Nadu 600123, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_33

427

428

C. D. Mary et al.

Fig. 1 1 X-Ray images of normal knee joint and arthritic kneeJoint

A deep learning algorithm has been developed to automatically classify knee osteoarthritis images, using convolutional neural network design to compare CNN algorithms and classify the findings most accurately. The potential of using deep learning to predict knee osteoarthritis is great, and has the potential to significantly improve our understanding of the condition, and lead to early intervention and develop patient-specific treatments. Knee osteoarthritis (OA) is diagnosed using a variety of methods [3], which can include: Clinical evaluation: This involves a physical examination to determine the range of motion, stiffness, edema, and pain in the joints. Blood tests: These can help rule out other illnesses like rheumatoid arthritis or gout that could cause joint discomfort. Arthrocentesis: This procedure involves taking a tiny sample of joint fluid for testing in order to identify any inflammation and rule out other conditions. Analysis of joint fluid: This can be used to identify inflammation and support the diagnosis of knee OA. Testing for biomarkers: Scientists are investigating biomarkers in blood, urine, and joint fluid that may aid in identifying knee OA and forecasting the course of the condition’s-rays, MRI scans, and ultra sounds are imaging techniques that can be used to visualize the joint structure and gauge the disease’s severity. Examples of radiographic classification systems include the Osteoarthritis Research Society International (OARSI) grading system and the Kellgren-Lawrence grade scale. Clinical classification: The results of the physical examination and the symptoms serve as the basis for this system of classification. The American College of Rheumatology (ACR) criteria, which classify kneeOA as definite, probable, or possible based on the presence of joint pain, stiffness, and other clinical symptoms, are among the clinical classification systems. Anatomical classification: The location of the joint injury within the knee joint is the basis for this system of classification. The Outer Bridge Classification System is one of the anatomical classification schemes. It assigns a score of 0–4 de- pending on the degree of cartilage destruction and the

Knee Osteoarthritis Severity Prediction Through Medical Image …

429

presence of visible bone. The current technique focuses on supplying the framework utilizing MRI images, compares the MRI-measured cartilage MTF (Mesenchymal Tropic Factor) thickness, and then forecasts the results. When the patient has routine checkups, this procedure is pricey. The current system makes use of various, frequently complex, and confusing grading systems. There is no way for the patient to carry out a basic examination before visiting a hospital or a doctor. The existing machine learning methodologies have concluded VGG architecture to be best suited with the accuracy of 80%.

2 Literature Survey Employing baseline knee radiography, the authors created a deep learning model [4] to predict the development of pain. An artificially intelligent neural network was used to build a traditional risk assessment model that uses demographics, medical, and radio logical risk factors to anticipate how pain will develop. Over the course of the initial48 months, a subset of knees with and without pain progression—defined as an in-crease in pain score of at least 9 points between base and two or even more carry timepoints—was utilized. Training (4200 knees, 60 percentage points female, average age60 years) and lock test (500 knees, 60 percentage points female, average age 60.8 years) datasets were selected randomly from the whole. The Deep Learning analysis of baseline knee radiographs was coupled with demographics, medical, and radiological risk factors in a combined model. Using the hold-out testing dataset, areas underneath the curves (AUC) research was done to assess performance of the model. According to the Kellgren-Lawrence (KL) grading system, the authors [2] used two profound deep convolutional neural networks to autonomously identify the level of knee OA (CNN). A modified one-stage YOLOv2 network, which considers the size of knees and hips that are widely dispersed in X-ray images, serves as our model’s first step to detect knee joints. To classify the discovered knee joint images with a revolutionary adjustable ordinal loss, changes are also required to the most wellliked CNN models, such as variants of ResNet, VGG, and DenseNet in addition to InceptionV3. When there is a wider gap between the expected and real KL grades, the intensity of the consequence for categorization should be increased. The cardinal structure of the knee KL causes this. The Osteoarthritis Initiative (OAI) dataset’s first X-ray images are analyzed. This model successfully recognizes the knee joint with an average Jaccard value of 0.858 and recollection of 92.2%, which are both superior over the Jaccard index requirement of 0.75. The better classification performance for the knee KL grading test is achieved by the perfectly alright VGG-19 model with the suggested cardinal loss, with an average absolute error (MAE) of 0.344. Knee KL rating and knee joint recognition both provide cutting-edge performance. For our deep (eight hidden layers) learning methods, the authors [3] propose using simple multilayer- neural networks learned using the traditional backpropagation algorithm. For each Deep learning model that’s been placed to an experiment, a

430

C. D. Mary et al.

wide range of hyper-parameters were changed, such as the number of layers that are hidden, the number of neurons within each layer, the activation function, the optimizing strategic plan, and normalization techniques. One neuron made up the final layer, which reflected the results of the regression. The optimization loss function was accuracy with regularization terms. Each layer of the Deep Neural Network uses the ReLU activation function, and there is a probability of 0.2 of a dropout after each layer. Adam optimization was used to train them since it frequently produces excellent empirical results and is more reliable when employing hyper-parameters. We applied the Batch data normalization technique after the first two layers to further reduce over-fitting and offer more stable convergence of the modeling techniques The Keras framework and TensorFlow were both used to implement each model. The incidence of osteoarthritis was categorized using the bilateral cross—entropy loss function. A weighting system was applied to the dataset to balance out all the disparity between both the classes. The significance of the minority classes was increased by the class weight. In accordance with the combined American College of Rheumatology (ACR) criteria, the system systematically investigates several approaches for the prediction of incidence of radiography knee Osteoarthritis on a testing sample with blinded underlying data over 78 months, according to the author [4]. We constructed a testing group of 423 knees without symptomatic radiological knee Osteoarthritis using MRI and X-ray imaging data as well as clinical risk indicators at baseline with the goal of detecting which knees accidentally developed symptomatic radiographic knee OA during the course of the follow-up period. Data from the X-ray pictures was gathered using a variety of techniques and technology. With pixels of 0.104 mm by 0.104 mm, the images were created using a Swiss ray (ddR Compact System, Hoch Dorf, Switzerland) radiographic machine. Utilizing General Electric (GE) (Thunder Platform, Waukesha, USA) radiography equipment, X-ray pictures with pixel widths ranging from 0.190 to 0.192 mm were recorded at 60–70 kVp and 3–5 mAs. The participants had access to detailed information about the X-ray phone manufacturer, tubes voltage, duration, and pixel size. For longitudinally tracking Knee Osteoarthritis severity, The Authors [5] have created an innovative deep learning architecture called an aggressive evolving neural network (A-ENN). ENN uses convolution and deconvolution algorithms to compare an image as input to templates images of various KL grades in order to precisely describe the disease as it evolves from a low to severe level. To retrieve the evolution traces, a discriminator-based reinforcement learning technique is also created [6– 8]. The result is that the generic convolutional picture representations are further combined with the evolution traces to serve as perfectly alright domain expertise for longitudinal grading. It should be highlighted that ENN can be used with current deep architectures in a range of learning tasks, with results defining progressive representations. The proposed approach was thoroughly tested using the dataset from the Osteoarthritis Initiative (OAI). The precision was 62.7% overall, 64.6% for the baseline, 63.9% for the 12-month, 63.2% for the 24-month, 61.8% for the 36-month, and 60.2% for the 48-month periods.

Knee Osteoarthritis Severity Prediction Through Medical Image …

431

3 Methodology There are over 40 percentiles of people worldwide who have knee osteoarthritis. Regular doctor visits should result in more time wasted. The suggested system should use deep learning to anticipate the disease severity to make this process easier [9]. The suggested system offers a framework based on a two-dimensional image and classifies the phases using the Convolutional Neural Network (CNN) method. To discover the best performing design to increase accuracy, we implemented three architectures in this system: manual net architecture, VGG architecture, and LeNet architecture. Creating a Deep learning model able to recognize X-ray images of knee osteoarthritis is the suggested method for this project. Data pre-processing, visualization, and extraction of features are performed on the knee X-Ray image dataset. For each severity, a different number of images—divided into training and testing images—was gathered. The shape and texture-oriented features have been the main components of the image. The CNN algorithm is used in the suggested system to increase accuracy and produce appropriate results. The model, system architecture, and architecture of the system are delivered using the Django framework, which would be shown in Fig. 2.

Fig. 2 Architecture of the proposed work

432

C. D. Mary et al.

3.1 Data Collection and Pre-processing The dataset was collected using the Knee Osteoarthritis Initiative (OAI) dataset, which rates the intensity of knee pain in four phases using the Kellgren-Lawrence scoring system. The stage 0 depicts a healthy knee and progressively increasing the severity to stage 2 and 3 with stage 4 having the highest severity. These classifications are based on certain parameters namely, the thickness of the femur bone and tibial plateau due to swelling in the join and altered shape of the bone. Some other parameters include the knee space between the joint which can decrease along the edges causing the knee to be asymmetric which shows the osteoarthritis symptoms. There are also other cases that can be detected in x-ray images. They include subchondral cysts which form in the femur bone joint as cysts, subchondral sclerosis referring to the thickening of the bone causing its shape to vary, marginal osteophytes or geodes which are formed between the gap of the joint. The clinical symptoms include the weight bearing in the affected knee along with 45-degree flexion. Reference [10] Based on all these parameters and attributes the dataset is classified between stages of health and severity of knee OA. The thickness being > 5 mm is considered as a healthy knee while when it reduces to < 3 mm it indicates knee osteoarthritis. to Collecting a dataset for knee osteoarthritis involves gathering a set of medical images, such as X-rays or MRI scans, and corresponding clinical data from patients with knee osteoarthritis with the help of this dataset, machine learning algorithms for the automated diagnosis and prognosis of osteoarthritis of the knee would be developed and assessed. Patients with knee osteoarthritis are identified and recruited for the study. Medical images of the patient’s knees are acquired using X-rays, MRI scans, or other imaging modalities [11–15]. The images should be of high quality and resolution to enable accurate analysis. The images are annotated by medical experts to identify and label relevant features, such as cartilage thickness, joint space narrowing, and bone spurs, that are indicative of knee osteoarthritis.

3.2 Model Selection 3.2.1

Convolutional Neural Network

Since convolutional neural networks (CNNs) are made expressly to deal with the spatial relationships and patterns contained in picture data, they are a popular choice for image-based tasks like predicting the severity of knee osteoarthritis. By applying learnt filters to the input image and combining the output from each convolutional layer, CNNs, which are made up of many layers, may automatically create hierarchical representations of the image data. This makes it possible for the network to effectively capture intricate patterns and features at various scales, which is crucial for correctly estimating the severity of knee osteoarthritis.

Knee Osteoarthritis Severity Prediction Through Medical Image …

433

For picture-based tasks, other architectures can also be employed, such as fully connected neural networks or recurrent neural networks, although they might not perform as well as CNNs since they don’t account for the spatial correlations and patterns seen in the image data. CNNs are therefore frequently the preferable option for jobs involving the processing of image data because of their capacity to record spatial information and patterns.

3.2.2

Resnet Architecture

ResNet is a commutative neural network, or CNN, designed to train deep neural networks that can be useful for applications such as medical imaging. Methods to prevent overfitting and improve generalizability include data enhancement, dropout batch normalization, learning rate rescheduling and its attempts to assess accuracy, precision, recall, and other parameters are analyzed on a different test data set. The ability to train very deep neural networks using ResNet has the advantage of increasing the model’s ability to extract useful features from image input We used the ResNet model with an accuracy of 74% however it requires a great deal of computing power and is administratively expensive for us to train.

3.2.3

Alexnet Architecture

One deep learning model for this work uses the AlexNet framework to predict knee osteoarthritis. AlexNet is a neural network of convolution (CNN) system originally developed for classification purposes and used in various medical applications for image processing there is osteoarthritis –Let’s use the ray -image dataset and then use supervised learning to train the AlexNet model on this dataset, so that it can learn how to associate input images to bone stages of the disease. We use techniques such as data enhancement, dropout normalization, and deciding the number of classes to reduce fitting problems and increase generalizability during training. Only after the model is trained its performance is analyzed by varying tests to determine recall, accuracy, and other performance measures Overall, the use of AlexNet for knee arthroplasty of disease prognosis is a promising approach, but the best approach to this work is careful evaluation and activity. Compared to the learning model, the accuracy obtained by basing the AlexNet architecture on machine learning and traditional methods is 81.5%.

3.2.4

VGG (Visual Geometry Group)

The accuracy of this deep neural network model is improved by its usage to forecast the severity of knee osteoarthritis. These VGG have 16-layer sets. These layers help the model work better and help predict the disease’s stages at the appropriate times. The VGG architecture consists of a number of convolutional layers, usually preceded

434

C. D. Mary et al.

by a maximum pooling layer, and a few fully connected (fc layers) at the very end. The VGG architecture stands out for two reasons: its regularity and simplicity. All of the network’s max pooling layers were 2 × 2 with such a stride of 2, while all of its convolution operation have kernels size of 3 × 3 as well as a stride of 1. These 16-layer sets of VGG have demonstrated excellent performance in a range of image recognition applications, including picture segmentation, object detection, and classification. These layers help the model work better and help predict the disease’s stages at the appropriate times with accuracy of 74%. Compared to the manual architecture, this architecture uses the x-ray image to predict the disease stages with greater accuracy and fewer data loss.

3.2.5

LeNet Architecture

To increase the performance of Knee Osteoarthritis Severity Prediction, Convolutional neural network (CNN) architecture LeNet, also known as LeNet-5, is used frequently in image identification with modification in layers. Convolutional, subsampling (pooling), and fully linked layers are among the layers that make up the architecture. A 32 × 32 pixel grayscale image serves as the network’s input. The LeNet-5 structure has the following levels: • On the Convolution operation, six 5 × 5 filters are used on the input image. A 28 × 28 pixel feature map is the result of each filter. The subsampling layer employs max pooling across non-overlapping 2 × 2 regions to minimize the size of the feature maps. This layer’s output is 14 × 14 pixels in size. The output from the preceding layer is run through sixteen 5 × 5 filters in this layer. A 10 × 10 pixel feature map is the result of each filter. • Max pooling is used over non-overlapping 2 × 2 sectors in the subsampling layer. This layer’s output is 5 × 5 pixels in size. 120 neurons in this fully connected layer are coupled to every pixel in the 5 × 5 feature maps from the previous layer. Fully connected layer contains 84 neurons that are connected to every other neuron. • The output layer, which includes 10 neurons—one for each conceivable digit (0– 9)—generates the final output probabilities using a SoftMax activation function. The LENET-5 architecture in our suggested system enhances the model’s precision. Compared to the proposed CNN and the VGG architecture, this architecture has a lot more layers that the model must pass through. It has one fully connected layer, one output layer, two levels of convolution and two layers of max pooling. With a 30% loss, it provides 89% accuracy compared to other pre-trained models. Modified LeNet Architecture is given in the Fig. 3. • Mathematical Formulation: Input shape calculation: The input shape of the CNN is determined by the size of the input images and the number of channels. If the input images are grayscale, then the input shape will be (height, width, 1). If the input images are RGB, then the input shape will be (height, width, 3). In this paper, the output shape of the image image can be calculated.

Knee Osteoarthritis Severity Prediction Through Medical Image …

435

Fig. 3 Modified LeNet architecture

Parameter calculation: Number of parameters in CNN can be calculated using the formula (F x F x C_in x C_out) + C_out, where F is the filter size, C_in is the number of input channels, C_out is the number of output channels. Pooling layer calculation: Pooling layer to cut the remove the negative values. Dropout calculation: the droplut layer can be used to drop or remove the unused or ummatched values.

3.2.6

Training and Testing

The stored images are extracted from the training and test X-ray images. Then, the obtained dataset can be divided into a training dataset and a testing dataset. The rails are used to train the machine learning algorithm, while the testing algorithm is used to evaluate the performance of the learned method. Deep learning algorithms are defined by training sets once the dataset is partitioned. Target variables of algorithms are the presence or absence of knee osteoarthritis, and are studied to identify patterns and relationships between features and target variables Tests used to validate machine learning algorithms once trained The prediction of the presence or absence of knee osteoarthritis in experiments is achieved by Several performance measures such as f are used Several hyper-parameters can be changed, such as the number of classes, the number of hidden layers, and the number of nodes of that series to improve the model. Once the model is refined, it can be tested against a new dataset to see if it generalizes well and performs very well unverified data. This stage assists in ensuring that the model can make accurate predictions on new data and is not overfit to the training set of data.

3.2.7

Creating Web Application

Creating a web application that can take user input data, send it to the model of machine learning, and then return the model’s predictions to the user is required to deploy a machine learning model using the Django framework. The first step is to

436

C. D. Mary et al.

establish a new Django project using the “Django-admin start project” command. Any of the well-known Scikit-learn or TensorFlow packages for Python can be used to define the machine learning model. The model can be stored as a file in the Django app directory once it has been trained and optimized. Django’s templates specify how a web application should appear and function. Both the form that users will use to enter their input data and the template that will display the prediction results need to be defined. Django’s URL routes translate URLs into views. For the form page and the results page, separate URL routes must be defined. With the “python manage.py runserver” command, we can finally run the Django server and test the web application by going to the URL route in a web browser.

3.2.8

Performance Evaluation

Precision, Recall, F1-Score, and Accuracy are some of the performance measurements. The precision ratio is the sum of all correctly predicted positive outcomes. Predictive model performance increases with precision. Precision =

True Positive True Positive + False Positive

(1)

The proportion of correctly predicted positive values to all positive values is known as recall. Predictive model performance improves with increasing recall. Recall =

True Positive True Positive + False Negative

(2)

The weighted sum of both recollect and accuracy serves as the basis for the F-Score, which is intended to achieve an equilibrium between the two values. F1 - Score = 2 ∗

(Accuracy ∗ Recollect) (Accuracy + Recollect)

(3)

Among performance metrics, accuracy is particularly significant. Using the total number of data points multiplied by 100 to get a percentage, it displays the proportion of accurately anticipated data. If each label’s data points are equally spaced apart, it functions well. A percentage between 0 and 100% is used to represent accuracy, which is the ratio of accurate predictions to all other predictions. Accuracy =

True Positive + True Negative TP + TN + FP + FN

(4)

Knee Osteoarthritis Severity Prediction Through Medical Image …

437

4 Results and Discussion The project on Knee Osteoarthritis using deep learning showed promising results for the diagnosis and prognosis of knee OA. The DL-based model developed in this project demonstrated high accuracy and efficiency in detecting knee OA and predicting disease progression. The model also outperformed traditional methods of knee OA diagnosis and prognosis, such as imaging studies and clinical assessments. Based on the project’s findings, Deep Learning-based models have a great deal of promise for improving the accuracy and efficacy of knee OA diagnosis and prognosis, that would enhance patient outcomes and reduce medical expenses. To assess and enhance the Deep Learning-based approach for knee OA therapy in clinical practise, more studies are needed. By comparing all the five architectures LENET gives the maximum accuracy of about 89.27% and minimum loss of 30%. Figure 4 shows the Accuracy and loss of VGG (Visual Geometry Group) Architecture. It compares the test and training data and gives the graph in which accuracy and loss can be easily compared and visualized. This model gives accuracy of about 74% which is quite not sufficient to predict the stages of knee Osteoarthritis. So in this case further architecture are compared to produce better accuracy with least loss. Figure 5 shows the Accuracy and loss of CNN architecture. It compares the test and training data and gives the graph in which accuracy and loss can be easily compared and visualized. This model gives accuracy of about 70% and loss of about 60% which is quite not sufficient to predict the stages of knee Osteoarthritis. So in this case further architecture are compared to produce better accuracy with least loss. Figure 6 shows the Accuracy and loss of LENET Architecture. It compares the test and training data and gives the graph in which accuracy and loss can be easily compared and visualized. This model gives accuracy of about 89% and loss of about 30% which is quite not sufficient to predict the stages of knee Osteoarthritis. So in this case further architecture are compared to produce better accuracy with least loss. Table 1 gives the comparison of performance metrics of various deep learning models and Fig. 7 shows the comparison chart of various models.

Fig. 4 Model precision and loss of VGG model

438

C. D. Mary et al.

Fig. 5 Model precision and loss of CNN model

Fig. 6 Model precision and model loss of LENET Table 1 Comparison of evaluation metrics Architecture

Precision (%)

Recall (%)

F1-Score (%)

Accuracy %

CNN

70.00

71.00

70.00

71.50

RESNET

73.00

70

73.00

74.00

ALEXNET

80.00

81.50

80.00

81.50

VGG

85

74

81

82.03

Modified LENET

87.00

88.00

89.00

89.27

Fig. 7 Comparison chart of various deep learning models

PERFORMANCE METRICS 100.00%

50.00%

0.00% CNN Accuracy %

VGG

LENET

Precision

ALEXNET RESNET

Recall

F1-Score

Knee Osteoarthritis Severity Prediction Through Medical Image …

439

Fig. 8 Simulation tool model for prediction

As shown in Fig. 7, the LENET architecture gives more accuracy compared to other architecture and gives least model loss. Hence LENET architecture is used to predict the severity stages of knee Osteoarthritis. Simulation tool model for prediction is shown in Fig. 8.

5 Conclusion Osteoarthritis of the knee is a common and disabling condition that can adversely affect a patient’s quality of life. An accurate prognosis of knee osteoarthritis is essential for early diagnosis and effective disease management. Deep learning has shown promise in predicting knee osteoarthritis, with a variety of data sources including imaging and clinical data. The advantages of using deep learning to predict knee osteoarthritis include the ability to identify complex patterns from data to provide accurate predictions but the challenges and limitations of need much higher information and the potential for bias in data are both compared to CNN architectures. It turns out that VGG, ALEXNET, RESNET and manual architectures are not perfect compared to LENET architecture. Thus, we can say that the Lenet system gives the highest accuracy of 89.27% compared to all the other systems studied.

6 Future Work A possible future direction for use for knee osteoarthritis prediction using deep learning is to incorporate additional data sources Current deep learning techniques affect the prediction of knee osteoarthritis primarily X-ray and MRI imaging data,

440

C. D. Mary et al.

as well as patient demographics and medical histories Clinical data and also use but other data can be used, such as gait test data, biomarkers, and genetic data about each patient so that we can offer treatments according to their individual health conditions. Another possible direction to work on is the creation of more individual strategies. Current deep learning methods for knee osteoarthritis prediction typically use a one-size-fits-all approach, with the same model applied to all patients but more importantly if drugs self-administration is used, with clinical studies tailored to the specific characteristics of an individual patient it is possible to take into account factors such as age, sex and comorbidities to obtain a prognostic follow in and have been adapted. Acknowledgements I acknowledge the use of ChatGPT [https://chat.openai.com/] to generate ideas and material for background research and project planning in the drafting of this research study.

References 1. Dalia Y, Bharath A, Mayya V, Kamath SS (2021) Deepoa: clinical decision support system for early detection and severity grading of knee osteoarthritis. In: 2021 5th international conference on computer, communication and signal processing (ICCCSP), May, IEEE, pp 250–255 2. Guan B, Liu F, Mizaian AH, Demehri S, Samsonov A, Guermazi A, Kijowski R (2022) Deep learning approach to predict pain progression in knee osteoarthritis. Skeletal Radiol 1–11 3. Tolpadi AA, Lee JJ, Pedoia V, Majumdar S (2020) Deep learning predicts total knee replacement from magnetic resonance images. Sci Rep 10(1):6371 4. Kim DH, Lee KJ, Choi D, Lee JI, Choi HG, Lee YS (2020) Can additional patient information improve the diagnostic performance of deep learning for the interpretation of knee osteoarthritis severity. J Clin Med 9(10):3341 5. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 6. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 7. Raisuddin AM, Nguyen HH, Tiulpin A (2022) Deep semi-supervised active learning for knee osteoarthritis severity grading. In: 2022 IEEE 19th international symposium on biomedical imaging (ISBI), March, IEEE, pp 1–5 8. Liu F, Zhang X, Zhang B, Zhang Y (2021) A novel deep learning framework; Ma J, Wang Y, Li L, Zhou Y, Li Z, Li Y (2021) Diagnostic performance of a deep learning algorithm for knee osteoarthritis on radiographs compared with expert opinion. IEEE J Translational Eng Health and Med 9:1–8.for Knee osteoarthritis diagnosis and progression prediction. IEEE Trans Biomed Eng 68(2):602–613 9. Zhang J, Li W, Jiang Y, Li J, Li Y, Li L, Zhang X (2021) Multi-task learning for knee osteoarthritis progression prediction using deep learning. IEEE J Biomed Health Inform 25(4):1314–1324 10. Hashmi I, Khan MA, Ashraf A, Khan S, Hussain M (2021) Automated diagnosis of knee osteoarthritis using deep learning techniques. IEEE J Translational Eng Health and Med 9:1–9 11. Ma J, Wang Y, Li L, Zhou Y, Li Z, Li Y (2021) Diagnostic performance of a deep learning algorithm for knee osteoarthritis on radiographs compared with expert opinion. IEEE J Translational Eng Health and Med 9:1–8

Knee Osteoarthritis Severity Prediction Through Medical Image …

441

12. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 13. Yu Y, Li S, Hao Y, Zhang Y, Yu H, Zheng Y (2021) Development and validation of a deep learning model for knee osteoarthritis severity assessment on radiographs. IEEE J Biomed Health Inform 25(4):1304–1313 14. Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146 15. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, Springer, Cham, pp 818–833

Prediction of Harmful Algal Blooms Severity Using Machine Learning and Deep Learning Techniques N. Karthikeyan, M. Bhargav, S. Hari krishna, Y. Sai Madhav, and T. Sajana

1 Introduction Harmful algal blooms (HABs) pose a significant threat to both marine ecosystems and human health [1]. These blooms occur when specific types of algae rapidly spread in aquatic environments, releasing toxins that can be harmful to marine animals and humans [2]. Detecting and monitoring HABs is crucial to mitigating their negative impacts, but traditional prediction methods are often costly, time-consuming, and have a limited range of use. However, boosting algorithms such as XGBoost, CatBoost, and LightBoost show promise as an effective approach to detecting HABs using machine learning algorithms. One of the main advantages of using boosting algorithms is their ability to handle high-dimensional and imbalanced datasets. HABs are influenced by various environmental factors, and their occurrence is often rare, making it difficult to obtain a balanced dataset for training and testing the model. Boosting algorithms can efficiently extract relevant features from these complex datasets, allowing for accurate HAB prediction [3]. N. Karthikeyan · M. Bhargav · S. H. krishna · Y. S. Madhav Koneru Lakshmaiah Education Foundation, Vaddeswaram, A.P 522302, India e-mail: [email protected] M. Bhargav e-mail: [email protected] S. H. krishna e-mail: [email protected] Y. S. Madhav e-mail: [email protected] T. Sajana (B) Department of AI & DS, Koneru Lakshmaiah Education Foundation, Vaddeswaram, A.P 522302, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_34

443

444 Table 1 Severity levels of harmful algal blooms

N. Karthikeyan et al.

Severity level

Density range (cells/mL)

1

< 20,000

2

20,000 –< 100,000

3

100,000 –< 1,000,000

4

1,000,000 –< 10,000,000

5

≥ 10,000,000

To enhance the accuracy of HAB prediction, researchers have started integrating satellite imagery and metadata into their models. NASA’s satellite data can provide information on water temperature, [4] ocean currents, and chlorophyll levels, which are all indicators of HABs [5]. Additionally, metadata can include details on the water body’s location, weather conditions, and time of day, which can improve the model’s ability to detect HABs in various environments and conditions. The proposed method for detecting HABs using boosting algorithms, satellite imagery, and metadata achieves high accuracy and low false alarm rates compared to existing methods, enabling effective management of the harmful impacts of HABs on marine ecosystems and human health [6]. Severity levels for HABs are based on the density range of harmful algal cells in the water, with severity level 1 having a low cell density and severity level 5 having a high cell density. Monitoring and early prediction of HABs are crucial for public health and environmental management Table 1. There are many types of harmful algal blooms in the water ecosystem Table 2. Table 2 contains information about the features of different kinds of harmful algal blooms, or HABs, including their colour, size, and toxicity. The table includes a variety of HABs, such as dinoflagellates, diatoms, cyanobacteria, and coccolithophores [7]. The health hazards linked to each form of HAB, including skin irritability, respiratory issues, and in severe cases, even death. Each type of HAB’s geographic distribution and the environmental factors that support its growth may also be listed in the table [8].

2 Literature Survey The impact of harmful algal blooms (HABs), which have emerged as a major environmental concern worldwide, has been the subject of numerous research in recent years. In this review of the literature, summarize the findings of five pertinent publications on the application of data mining techniques for HAB prediction [9]. Zhang, Wang, and Cao present a deep learning-based approach for predicting algal blooms. Their study focuses on the relationship between phytoplankton density and environmental parameters. Construct a five-layered model to extract detailed relationships and compare it with a traditional back-propagation neural network.

Prediction of Harmful Algal Blooms Severity Using Machine Learning …

445

Table 2 Different types of harmful algal blooms S. No. Harmful Algal bloom types

Brief description

1

Cyanobacteria

Blue-green algae that can produce toxins harmful to humans and animals

2

Dinoflagellates

Can cause red tides and produce neurotoxins that can harm marine life and humans

3

Diatoms

Can produce domoic acid which can lead to amnesic shellfish poisoning in humans

4

Heterosigma akashiwo

Can produce ichthyotoxins that are harmful to fish and other marine life

5

Pseudo-nitzschia Can produce domoic acid and cause amnesic shellfish poisoning in humans

6

Alexandrium

Can cause paralytic shellfish poisoning in humans and other animals

7

Prymnesium parvum

Can produce toxins harmful to fish and other marine life

8

Karenia brevis

Can cause red tides and produce brevotoxins that can harm marine life and humans

9

Cochlodinium

Can produce potent toxins harmful to fish and other marine life

10

Microcystis

Can produce toxins harmful to humans and animals, including liver damage and respiratory issues

Results demonstrate that the deep learning model outperforms the shallow neural network in terms of generalization and accuracy. However, the complexity of algal bloom dynamics and the need for interdisciplinary collaboration pose limitations [10]. Mohammed [11] et al. propose the use of a long short-term memory (LSTM) artificial neural network model for predicting chlorophyll content in water bodies and assessing the risk of algal blooms. The model considers various factors, such as temperature and nutrient content, as inputs and predicts chlorophyll concentration as an output. By adjusting the neural network’s structure, including adding neurons and hidden layers, the model’s performance can be improved [12]. Aranay and Atrey et al. address the challenge of detecting harmful algal blooms in lakes, specifically in the state of New York. Climate data is used as input, and a spatiotemporal weather data point training sample is created to capture relevant information about harmful and non-harmful bloom classes. The most informative samples are selected based on information entropy and genetic algorithms [13]. Baek et al. propose a deep learning approach to simulate blooms of Alexandrium catenella, a harmful algal species. They utilize classification and regression convolutional neural network (CNN) models to capture the initiation and density of the blooms. The study identifies GoogLeNet and ResNet 101 as the optimal CNN structures for classification and regression, respectively, achieving an accuracy of 96.8% and a root mean square error (RMSE) of 1.20 [log (cells L-1)] [14].

446

N. Karthikeyan et al.

Hill et al. present a machine learning-based approach for detecting and predicting harmful algal blooms (HABs) using remote sensing data and propose a HAB detection system utilizing spatiotemporal data cube representations and various machine learning architectures [15] such as convolutional neural networks (CNNs), long short-term memory (LSTM) components, and random forests. The study focuses on detecting Karenia brevis algae (K. brevis) HAB events in Florida coastal waters and achieves a maximum detection accuracy of 91% with a Kappa coefficient of 0.81 [16], an order of magnitude larger than any previous study [17, 18]. Yu and Gao [19] used a machine learning-based approach to predict algal blooms by considering environmental factors. Employed a combination of techniques to obtain precise predictions of algal concentrations, as conventional statistical methods are inadequate for the complicated and nonlinear nature of algal bloom growth. The study tested the method on two real datasets and showed its effectiveness in short-term concentration prediction. A comparison study was conducted, identifying critical factors contributing to harmful algal bloom occurrence [20, 21].

3 Existing Methodology Zhang, Wang, and Cao utilize a deep learning model to predict algal blooms based on environmental parameters, outperforming traditional neural networks. Saddam Ahmed Saeed Mohammed et al. propose an LSTM-based model to predict chlorophyll content and assess the risk of algal blooms, enhancing its performance through neural network adjustments. Oguz M. Aranay and Pradeep K. Atrey address harmful algal bloom detection in New York using climate data and genetic algorithms for feature selection. Baek et al. simulate Alexandrium catenella blooms using CNN models, achieving high accuracy. Hill et al. develop a machine learning-based HAB detection system with remote sensing data, obtaining a 91% detection accuracy for Karenia brevis algae in Florida coastal waters [18, 22].

4 Proposed Methodology Proposing a methodology light gradient boost machine (LightGBM). It is an opensource, free machine learning framework for distributed gradient boosting. It is based on the decision tree technique and is used to optimize the model’s performance and conserve memory. While LightGBM breaks the tree leaf-wise, other boosting techniques build the tree level-wise. It chooses to grow the leaf with the most delta loss. The model might get more sophisticated and exhibit overfitting in small datasets if a leaf-wise tree is developed. The proposed machine learning model is also used for ranking, classification, and regression analysis. Consider the flow chart of proposed methodology as shown in Fig. 1.

Prediction of Harmful Algal Blooms Severity Using Machine Learning …

447

Fig. 1 Proposed methodology—light BGM flow chart

From Fig. 1, the dataset will be split into training and testing data for identifying harmful algal blooms. The training data will undergo label encoding and scalar standardization for machine readability. Scalar techniques will convert the data. The system will be subjected to base and meta classifiers, such as LightGBM, and the optimal approach will be chosen after evaluating each classifier’s accuracy and root mean squared error (RMSE) using the testing data. A comparative analysis of the proposed methodology will be conducted with various algorithms.

448

N. Karthikeyan et al.

4.1 Dataset Description In this research, three datasets, namely the meta dataset, satellite images, and elevation data, are used. Metadata the metadata labels used in this study are based on hand-collected “insitu” samples that were then examined for cyanobacteria density. Every measurement consists of a different date and location combination. (Latitude and longitude) Water bodies all around the United State’s continental territory provide samples. The most common method for monitoring cyanobacteria in inland water bodies is “in-situ” sampling. Although accurate, in-situ sampling is time-consuming and challenging to carry out constantly. [4] The metadata can be shown in Table 3, and the metadata is distributed year-wise for training and testing as shown in Fig. 2. Table 3 Meta dataset attributes uid

Latitude

Longitude

Date

Split

0

aabm

39.080319

−86.430867

2018–05-14

train

1

aabn

36.559700

−121.510000

2016–08-31

test

2

aacd

35.875083

−78.878434

2020–11-19

train

3

aaee

35.487000

−79.062133

2016–08-24

train

4

aaff

38.049471

−99 0.827001

2019–07-23

train

Fig. 2 Year-wise distribution of the metadata for training and testing

Prediction of Harmful Algal Blooms Severity Using Machine Learning …

449

Fig. 3 Train labels for severity levels from metadata

From Table 3, the data fields refer to an individual uid, latitude, longitude, date, and split for train and test. The database has a unique identifier for each data point in the “Unique uid” column. The “latitude” column provides the coordinates of each data point, measured in degrees from the equator. The “longitude” column also provides coordinates, measured in degrees from the prime meridian. The “date” column contains the date associated with each data point, which could be the date of an event or the date of data collection. Consider the training labels for meta dataset as shown in Fig. 3. Figure 2 provides a year-wise distribution of metadata for training and testing, while Fig. 3 illustrates the train labels for severity levels. These combined insights facilitate accurate model development and evaluation, ensuring a balanced representation of temporal dynamics and severity levels in predicting harmful algal bloom severity. This integrated information enhances strategies for effective management and mitigation of these blooms.

5 Satellite Images Sentinel-2 and Landsat satellites were used in the creation of this design. Sentinel2’s visible diapason imagery has higher resolution than Landsat’s (10 measures compared to 30 measures) [23]. However, prior to mid-2016, only Landsat imagery was available for data points. Guard-2 and Landsat satellites independently readdress the same position every five to eight days; therefore, imagery is typically accessible within a number of days of the in-situ slice marker. Images taken within a ten-day window still accurately depict the conditions in the slice [24].

450

N. Karthikeyan et al.

6 Elevation Data The digital elevation model (DEM) dataset provides several parameters for analysing topography. The made parameter identifies the highest elevation within a 1000-m radius, aiding in locating the highest point in a region. The dife parameter calculates the difference between the highest and lowest elevations within a 1000-m radius, providing insights into terrain characteristics. The elevation parameter indicates the actual elevation at a specific location, essential for determining point height on the Earth’s surface. By combining these parameters, valuable information about a region’s topography can be obtained, benefiting applications like surveying, mapping, and infrastructure planning. The utilized metadata contains unique IDs, longitude, latitude, and dates. The dataset spans 2013 and is split into training and testing sets. Satellite images were obtained to extract RGB values for bounding box analysis, along with the date and time in 2022. The dataset also incorporates elevation data, including parameters such as elevation, minimum, maximum, difference, average, and standard deviation, along with unique IDs, longitude, latitude, and date and time of 2022. By providing these specific details, the clarity and transparency of the dataset used for predicting the severity of harmful algal blooms are enhanced.

6.1 Data Extraction Data extraction is a collection step, and EDA is a critical step in the research methodology, where relevant data from APIs and images is collected. The data is collected from a reliable source and extracted in a structured format that can be used for machine learning model training. The data is checked for accuracy and completeness to ensure the model’s effectiveness in making predictions Fig. 3. To simplify parameter extraction for predicting harmful algal bloom severity, various approaches can be employed. Automated feature selection and dimensionality reduction techniques reduce dataset complexity, while regularization eliminates non-contributing parameters. Grid search and hyperparameter optimization automate model configuration, and leveraging domain knowledge provides valuable insights. Satellite image and elevation data extraction can be achieved through Microsoft’s Planetary Computer and the Copernicus DEM dataset. The Planetary Computer STAC API retrieves Sentinel-2 L2A data, rendering the satellite image for the area of interest. The Copernicus DEM GLO-30, a 30-m-resolution digital surface model, is used. Combining these methodologies establishes a robust framework for accurately predicting harmful algal blooms, aiding in effective management.

Prediction of Harmful Algal Blooms Severity Using Machine Learning …

451

6.2 Feature Engineering Feature engineering involves processing and selecting relevant features related to algal bloom growth. This phase requires a thorough understanding of the problem domain to select features that have a significant impact on the model’s performance. Feature selection techniques such as correlation analysis, principal component analysis, and mutual information are used to select the most relevant features for the model.

6.3 Image Processıng Image processing algorithms, like segmentation, object detection, and feature extraction, are used to extract features such as colour, texture, and shape from the images. These features are combined with data extraction features to enhance the model’s accuracy. Next, by cropping each image to a small area around the sample location and segmenting it into RGB variants to calculate their averages and medians, as shown in Table 4, this produces six features per image. The bounding box refers to the boundary coordinates surrounding an image and serves as a collision box, recognition point, and object detection indicator. SCL and water volume calculation data were also calculated. Figures 4 and 5 show image features extracted using the bounding box and Sentinel-2. Table 4 presents the corresponding RGB values. The bounding box aids in object identification and location determination, while Sentinel-2 provides geographic information. By combining these features, essential RGB values can be extracted for improved analysis of harmful algal blooms, enhancing, understanding, and analysis capabilities. Table 4 includes variables like “umac,” “egax,” “havx,” “loaq,” and “ttsk,” which aid in image analysis for studying colour distribution. The RGB average and median values and the use of the SCL band to calculate water volume correspond to different visual characteristics. “Prop_lake_1000” and “prop_lake_2500” estimate water area at distances of 1000 and 2500 m, respectively, aiding in identifying water bodies and assessing their extent and distribution. Comparing the new and original feature values can evaluate the impact on model performance.

6.4 Proposed Model Buildıng LightGBM, a distributed gradient boosting machine learning framework, is recommended for utilization. It supports tasks such as ranking, classification, regression, and boosting, built on decision tree methods. The first step involves splitting the data into training and validation sets. The LGBM model is then fitted to the training

index

13,445.0

2818.0

4610.0

7309.0

13,003.0



7592.0

8889.0

3555.0

8166.0

15,745.0

uid

umac

egox

havx

laoq

ttsk



zzpn

zzrv

zzsx

zzvv

zzzi

228.933333

228.933333

228.933333

228.933333

228.933333



24.071429

0.000000

29.806667

222.133333

51.269247

red_average

228.933333

228.933333

228.933333

228.933333

228.933333



41.266234

0.000000

39.816667

222.133333

79.866359

green_average

228.933333

228.933333

228.933333

228.933333

228.933333



21.489177

0.000000

22.312222

222.133333

47.580239

blue_average

255.0

255.0

255.0

255.0

255.0



23.0

0.0

28.0

255.0

34.0

red_median

Table 4 Features per image this contains RGB values from image preprocessing

255.0

255.0

255.0

255.0

255.0



40.0

0.0

38.0

255.0

72.0

green_median

255.0

255.0

255.0

255.0

255.0

……

21.0

0.0

21.0

255.0

36.0

blue_median

0.000000

0.000000

0.000000

0.000000

0.000000

–––-

0.000000

5.729504

5.729504

3057.851

3057.851

water_ vol

654.0

19,315.0

98,029.0

162,447.0

262,775.0

––––-

11,981.0

2179.0

94.0

29,046.0

0.0

density

summer

summer

summer

summer

summer

–––––

summer

spring

fall

spring

spring

season

452 N. Karthikeyan et al.

Prediction of Harmful Algal Blooms Severity Using Machine Learning …

453

Fig. 4 By utilizing bounding boxes, image features can be extracted.

Fig. 5 By using Sentinel-2, geographic bounding boxes can be obtained

set and scored on the validation set to evaluate performance. Feature engineering, including the addition of the “SCL” band for water volume calculation, can enhance model performance. Input variables encompass region, cluster, image type, date, latitude, longitude, elevation, and dife (difference between highest and lowest elevations

454

N. Karthikeyan et al.

Table 5 Validation part of the training data uid

Region

Severity

pred

14,052

vgfa

midwest

4

4

9387

ohmd

northeast

4

3

15,474

xozz

south

2

2

10,241

pnof

midwest

3

3

14,062

vjqh

West

2

3

within 1000 m radius). Other satellite imagery-related variables may be included as well.

6.5 Evaluate Model Performance The performance of the model is evaluated by testing it on a validation dataset. The validation dataset is a subset of the entire dataset, and it is used to evaluate the model’s accuracy, precision, recall, and F1-score. The evaluation metrics provide insights into the model’s performance and help improve the model’s accuracy. Table 5 depicts the regions affected by harmful algal blooms (HABs) and their corresponding severity levels, comparing actual values with predictions. It provides valuable insights into the distribution and potential impacts of HABs in different areas, aiding in informed decision-making for mitigation and prevention strategies.

7 Experimental Results The model predicts harmful algal bloom severity by considering environmental factors like temperature, pH, and nutrient content. Performance evaluation includes RMSE, accuracy, precision, recall, and F1-score. This study provides insights into key factors influencing bloom growth, aiding control strategy development. The dataset is split into training and test sets, with thorough data processing and analysis. Regression analysis using the Duan Smearing method produces “pred,” “predDens,” and “logDens” predictions, presented in Table 6. Fitting linear regions is vital for modelling data with approximately linear relationships. While algorithms like LightGBM, CatBoost, XGBoost, and CNN are not specifically designed for this purpose, a hybrid approach can address it. Incorporating linear regression explicitly models linear relationships. By combining linear regression with these algorithms, both linear and nonlinear relationships are captured. This hybrid approach maximizes the strengths of each algorithm, enabling a comprehensive analysis that accommodates linear regions while leveraging advanced capabilities.

Prediction of Harmful Algal Blooms Severity Using Machine Learning …

455

Table 6 Different types of predictions in the Duan smearing method uid

pred

predDens

logDens

0

aabn

4

1.438460e + 07

15.472339

1

aair

4

3.597007e + 06

74.086283

2

aajw

2

3.998725e + 04

9.586837

3

aalr

3

7.303734e + 06

13.070954

4

aalw

4

1.954046e + 06

13.476083

By employing these various prediction methods, a better understanding of the model’s performance can be gained, enabling the selection of the appropriate prediction approach for diverse applications. The study proposed a predictive modelling technique using the LightGBM (LGBM) algorithm. RMSE scores of LGBM were compared with XGBoost, CatBoost, and a CNN. LGBM achieved the lowest RMSE, outperforming the other models. Classification performance was evaluated using metrics like RMSE, accuracy, recall, precision, and F1-score, comparing against benchmark algorithms (XGBoost, LightGBM, CatBoost, and CNN) in Table 7. The comparative study of proposed methodology such as LightGBM and XGBoost, CatBOOST, CNN is as shown in Figs. 6, 7, and 8. The study compared the performance of four machine learning algorithms for predictive modelling: XGBoost, CatBoost, LightGBM, and a convolutional neural network (CNN). LightGBM had the lowest root mean squared error (RMSE) value of 0.8736, followed by CatBoost and XGBoost with RMSEs of 0.9486 and 0.9492, respectively. The CNN had an RMSE of 1.0144, which was higher than the other three models. LightGBM was found to be the most accurate model for the dataset and prediction task, with the lowest RMSE value at n_estimators: 320 and max_ depth: 10. Overall, all four models performed equally well. The accuracy scores of LGBM were compared to those of other machine learning algorithms, including XGBoost, CatBoost, and a convolutional neural network (CNN), to evaluate its effectiveness. The models were trained and tested on the dataset, and the results revealed that LGBM achieved the highest accuracy among the compared models. Based on Table 7, LightGBM has the highest accuracy, with 94% of predictions being correct. XGBoosting and CatBoost also have high accuracy, with 91% and Table 7 Comparing scores across different algorithms: an analysis S. No.

Algorithm

Accuracy (%)

Precision

Recall

F1-score

RMSE

1

XGBOOSTING

91

0.89

0.87

0.89

0.9492

2

LIGHT GBM

94

0.93

0.91

0.92

0.8736

3

CAT BOOST

92

0.91

0.90

0.91

0.9486

4

CNN

88

0.84

0.85

0.87

1.0144

456

N. Karthikeyan et al.

Fig. 6 Visualizing the omparison of RMSE scores across different algorithms using a bar graph

Fig. 7 Visualizing the comparison of accuracy scores across different algorithms using a bar graph

92% of predictions being correct, respectively. CNN has the lowest accuracy, with only 88% of predictions being correct. Compared LGBM’s precision, recall, and F1 scores with those of other machine learning algorithms such as XGBoost and CatBoost, as well as a convolutional neural network (CNN) as shown in Fig. 8. Based on Table 7, after evaluating the performance of various models, it was found that LGBM outperformed other models in recall, precision, and F1-score for predicting harmful algal blooms. It had the highest recall score of 0.91, precision score of 0.93, and F1-score of 0.92. CatBoost and XGBoost followed closely with slightly lower scores, while CNN had the lowest scores overall. These results indicate that LGBM is the most effective model, with CatBoost and XGBoost also performing well.

Prediction of Harmful Algal Blooms Severity Using Machine Learning …

457

Fig. 8 Visualizing the comparison of precision, recall, and F1-scores across different algorithms using a bar graph

8 Conclusion The proposed approach in this study aims to predict the severity of algal blooms by utilizing metadata and satellite imagery datasets. Three base classifiers were employed, and the LGBM model demonstrated the lowest average RMSE value of 0.873. Achieving accurate predictions for the severity of harmful algal blooms relies on employing effective strategies. Currently, the LightGBM algorithm demonstrates promising results with an accuracy of approximately 94%. However, it is crucial to acknowledge that accuracy may vary due to factors like data quality, feature selection, and algorithm tuning. To enhance accuracy, various techniques can be considered. Data augmentation, feature engineering, ensemble modelling, and hyperparameter optimization are valuable approaches to improving predictions. These methods uncover additional patterns and optimize models for greater accuracy in predicting the severity of harmful algal blooms.

9 Future Works Implementing strategies like acquiring diverse datasets, applying feature engineering techniques, exploring algorithms, optimizing hyperparameters, and addressing class imbalance improves predicting harmful algal blooms. Regularization prevents overfitting, while careful parameter tuning is essential. Cross-validation and evaluation assess model performance. Future research can enhance algal bloom prediction with deep learning and computer vision for satellite imagery interpretation. Fuzzy control handles data

458

N. Karthikeyan et al.

uncertainty, and new algorithms improve predictability. Big data technologies manage data volume, and cloud computing provides real-time access to data and reports. The proposed methodology can be extended to forecast other chronic disorders using equivalent models. These advancements enable more accurate predictions in algal bloom monitoring and chronic disorder forecasting. Future work should focus on implementing and integrating these technologies to advance the prediction and management of harmful algal blooms. This leads to effective environmental monitoring and decision-making processes.

References 1. Bretz CK, Manouki TJ, Kvitek RG (2002) Emerita analoga (Stimpson) as an indicator species for paralytic shellfish poisoning toxicity along the California coast. Toxicon 40(8):1189–1196 2. Anderson DM, Fensin E, Gobler CJ, Hoeglund AE, Hubbard KA, Kulis DM, Landsberg JH et al. (2021) Marine harmful algal blooms (HABs) in the United States: history, current status and future trends. Harmful Algae 102:101975 3. Ghatkar JG, Singh RK, Shanmugam P (2019) Classification of algal bloom species from remote sensing data using an extreme gradient boosted decision tree model. Int J Remote Sens 40(24):9412–9438 4. DrivenData runs online machine learning competitions with social impact and works directly with mission-driven organizations to drive change through data science and engineering. It provides the datasets for the problems. Tick Tick Bloom competion https://drivendata.co/blog/ tick-tick-bloom-benchmark 5. Image courtesy of NASA Earth Observatory, Joshua Stevens, using Landsat imagery from NASA/USGS. Depicts a 2017 algal bloom in Lake Erie., Tick Tick Bloom: Harmful Algal Bloom Detection Challenge 6. Martinez-Vicente V, Kurekin A, Sá C, Brotas V, Amorim A, Veloso V, Lin J, Miller PI (2020) Sensitivity of a satellite algorithm for harmful algal bloom discrimination to the use of laboratory bio-optical data for training. Front Mar Sci 7:582960 7. Karki S, Sultan M, Elkadiri R, Elbayoumi T (2018) Mapping and forecasting onsets of harmful algal blooms using MODIS data over coastal waters surrounding Charlotte County, Florida. Remote Sensing 10(10):1656 8. Fauziah SH, Rizman-Idid M, Cheah W, Loh K-H, Sharma S, NoorMaiza MR, Bordt M et al. (2021) Marine debris in Malaysia: a review on the pollution intensity and mitigating measures. Marine Pollution Bulletin 167:112258 9. Balakrishna G, Durbha SS, King RL, Younan NH (2009) Sensor web and data mining approaches for harmful algal bloom detection and monitoring in the gulf of Mexico region. In: 2009 IEEE international geoscience and remote sensing symposium, vol 3. IEEE, pp III-789 10. Zhang F, Wang Y, Cao M, Sun X, Zhenhong D, Liu R, Ye X (2016) Deep-learning-based approach for prediction of algal blooms. Sustainability 8(10):1060 11. Mohammed SAS (2020) Machine learning in algal bloom detection final thesis 12. Zheng L, Wang H, Liu C, Zhang S, Ding A, Xie E, Li J, Wang S (2021) Prediction of harmful algal blooms in large water bodies using the combined EFDC and LSTM models. J Environ Manage 295:113060 13. Aranay OM, Atrey PK (2022) Deep active genetic learning-based assessment of lakes’ water quality using climate data. IEEE Trans Sustain Comput 7(4):851–863 14. Baek S-S, Pyo JC, Kwon YS, Chun S-J, Baek SH, Ahn C-Y, Oh H-M, Kim YO, Cho KH (2021) Deep learning for simulating harmful algal blooms using ocean numerical model. Front Marine Sci 8:729954

Prediction of Harmful Algal Blooms Severity Using Machine Learning …

459

15. Wen J, Yang J, Li Y, Gao L (2022) Harmful algal bloom warning based on machine learning in maritime site monitoring. Knowl-Based Syst 245:108569 16. Ke Y, Dai Y, Xu M, Mo Y (2019) Tunnel surface settlement forecasting with ensemble learning. Sustainability 12(1):232 17. Hill PR, Kumar A, Temimi M, Bull DR (2020) HABNet: Machine learning, remote sensingbased detection of harmful algal blooms. IEEE J Selected Topics in Appl Earth Observ Remote Sens 13:3229–3239 18. Xiu L, Yu J, Jia Z, Song J (2014) Harmful algal blooms prediction with machine learning models in Tolo harbour. In: 2014 International conference on smart computing, IEEE, pp 245–250 19. Yu P, Gao R, Zhang D, Liu Z-P (2021) Predicting coastal algal blooms with environmental factors by machine learning methods. Ecol Ind 123:107334 20. Moein I, Sultan M, Kadiri RE, Ghannadi A, Abdelmohsen K (2021) A remote sensing and machine learning-based approach to forecast the onset of harmful algal bloom. Remote Sens 13(19):3863 21. Kwon DH, Hong SM, Abbas A, Pyo JC, Lee H-K, Baek S-S, Cho KH (2023) Inland harmful algal blooms (HABs) modeling using internet of things (IoT) system and deep learning. Environ Eng Res 28(1) 22. Ly QV, Nguyen XC, Lê NC, Truong T-D, Hoang T-HT, Park TJ, Maqbool T et al. (2021) Application of machine learning for eutrophication analysis and algal bloom prediction in an urban river: a 10-year study of the Han River, South Korea. Sci The Total Environ 797:149040 23. Lemos AT, Ghisolfi RDR, Mazzini PLF (2018) Annual phytoplankton blooming using satellitederived chlorophyll-a data around the Vitória-Trindade Chain, Southeastern Brazil. Deep Sea Res Part I 136:62–71 24. Feng C, Wang S, Li Z (2022) Long-term spatial variation of algal blooms extracted using the U-net model from 10 years of GOCI imagery in the East China Sea. J Environ Manage 321:115966 25. Kumar ACS, Bhandarkar SM (2017) A deep learning paradigm for detection of harmful algal blooms. In: 2017 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 743–751 26. Yi H-S, Lee B, Park S, Kwak K-C, An K-G (2019) Prediction of short-term algal bloom using the M5P model-tree and extreme learning machine. Environ Eng Res 24(3):404–411 27. Balakrishna G, Durbha SS, King RL, Younan NH (2011) Investigation of evolutionary feature subset selection in multi-temporal datasets for harmful algal bloom detection. In: 2011 6th International workshop on the analysis of multi-temporal remote sensing images (Multi-Temp), IEEE, pp 149–152 28. Yerrapothu, Bala Tripura Sundari. “Application of Machine Learning Techniques to Forecast Harmful Algal Blooms in Gulf of Mexico.“ (2021)

Comparative Analysis of Classifier Algorithms Based on Sentimental Reviews Santosh Kumar, Swastik Kashyap, and Rakhshan Khalid

1 Introduction In today’s world, the technology is advancing at a significant pace, and more and more things are becoming automated and digital. As such e-commerce websites have become the major hub for buying and selling products, making it effortless for us to buy anything without even stepping out of our house. Also, the rating and reviews for the product help the buyer to investigate the quality and the performance of the product well. So customer reviews play an important role for both buyers and sellers in e-commerce, hence the companies usually ask for feedback about their products and/or services. A huge volume of subjective textual information is generated by users (of social networks like Facebook, e-commerce platforms, blogs, and forums) that contains user’s opinions, thinking, and sentiments. Users also post their views on social media as many companies sell their product through social media. Due to the sheer amount of data, it is impossible to do manual reading and analysis of the subjective information. Even after manual summarization of reviews for a product, it has its own set of problems like whether the overall suggestion helps in current situation’s demand or not; or with advancements or new updates in market is it still worthy to buy. Thus, here comes into play the essential role of data mining and natural language processing techniques which eventually remove and analyze opinions and sentiments from the information. Sentiment Analysis (SA) then evaluates the product reviews or a blog post and notes it as a text of positive or negative opinion about the S. Kumar (B) · S. Kashyap · R. Khalid Computer Science and Engineering Department, Galgotias University, Greater Noida, Uttar Pradesh, India e-mail: [email protected] S. Kashyap e-mail: [email protected] R. Khalid e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_35

461

462

S. Kumar et al.

product, policy, or person. Also, recently, in product review sentiment classification, methods of deep learning network are also being applied. The feedback on e-business websites or social platforms comes from numerous domains. When the volume of feedback increases continuously, almost all previous algorithms needs re-evaluation, which causes wastage of time and computational load. Sentiment analysis is helpful for immediately obtaining insights based on large amount of text data. Sentiment analysis applications are just not restricted to the user review analysis (for e.g., in e-commerce) and for monitoring the emotion for social media posts but also can be used in stock market trading. Sentiment algorithms can discover specific companies who display positive sentiment in news column paper. This gives an outstanding financial chance, to buy more of the company’s stock. This gives access to traders the data to make decisions before market reacts. There are many other applications of sentiment analysis (SA) given in Fig. 1. In this paper, various classification algorithms have been applied to analyze the sentiments implicit in the text reviews and based on the classification of reviews to positive, negative, or neutral reviews. Finally, find the classification algorithms which are giving best accuracy of classification so that one can directly use the best algorithm for classifying the text-based sentiment analysis. Fig. 1 Applications of SA

Comparative Analysis of Classifier Algorithms Based on Sentimental …

463

2 Related Work In [1], authors have tried to find out the algorithms which monitor the real-time activities using mobile data. Based on this, a comparison has been done on classifiers algorithms. In [2], authors have identified the best classifier which categorizes the sentiments present in the tweets. They have suggested the main four classifiers namely SVM, Random Forest, Naïve Bayes, and Logistic Regression. Ensemble classifier has been proposed which combines base learning classifier to single classifier. Through the proposed classifier, authors have improved the performance and accuracy. In [3], authors have used Apache Spark data Processing system for new assessment evaluation of sentiment analysis of huge size dataset of online customer’s review. Apache Spark’s Library (MLlib) accustomed to then following classifiers are used: Naïve Bayes, logistic regression, and SVM. The evaluation shows that Support vector machine classifier outperforms other classifiers. In [4], authors have evaluated the effect of TF-IDF word level and, N-Gram features, for sentiment analysis on the dataset of SS Tweet. The result shows the execution of sentiment analysis increased 3–4% by applying TF-IDF word level rather than applying N-gram. The evaluation is completed by applying the following classifier method: SVM, Random Forest, Naïve Bayes, Decision Tree, Logistic Regression, K-Nearest Neighbor and observing Accuracy, Precision, F-Score, and Recall performance parameters. In [5], authors have observed the feasibility of automatic SA and star rankings for evaluating the general tone of a product or a service review. The output of evaluation of 900 reviews shows that SA is more successful in observing the underlying tone of the experimented content and can be used as an alternative to star ratings. In [6], authors have presented an adversarial learning technique for training sentiment term embeddings by applying sentimental information. The outputs show that that presented method outperforms existing sentiment embedding training models. In [7], authors have tried to discover a flexible classifier which increase performance with the increase in the volume of dataset. Evaluation of different classifiers done by using TF-IDF and CV. The result shows that SVM and LR get steady and enhanced functioning with growth in dataset volume using CV, and MNB outdid well with TF-IDF. In [8], authors have proposed a common framework for SA of unorganized information in combination with organized information sources to enhance choice making. The framework try to solve the two major drawbacks of previous frameworks, the non-presence of framework that is diverse plentiful to easily used by analyst still comprehensive plenty to be applied across numerous areas to evaluate models outcomes with insights to improve conclusion efficiency in response to sentiment data. In [9], authors have been inspired with the latest work on artificial neural networks; they present a lexicon integrated models merging LSTM/BiLSTM and CNN. Evaluation on datasets of Stanford Sentiment Treebank shows the presented technique outshines numerous standard methods. In [10], authors have proposed a continuous naïve Bayes learning framework to solve two major shortcomings of previous framework that have limited review sentimental classification in practical application, that: (1) to evaluate large size reviews

464

S. Kumar et al.

with a computational efficiency; (2) capacity to efficiently acquire from growing feedbacks and numerous fields. The experimental outcomes on the datasets of Amazon product and movie reviews exhibit that the proposed model applies the learning from the post fields to evaluate information in new fields and has improved efficiency in response to feedback that is endlessly changing and obtained from diverse fields. In [11], authors have put forward a TSSC model to categorize feedback document in two stages: (1) Coarse classification stage in which algorithm mostly acquires consumer-product collaborative data to foresee the feelings inclination of the feedback document despite not knowing feedback data; (2) Fine classification stage, in which algorithm acquires writing data of the feedback text for additional analyzing based on the feeling inclination acquired from coarse stage. Lastly, sentiment analysis is obtained after merging two outputs of fine and coarse classification. The outcome shows that the TSSC model remarkably exceeds most of the alternative algorithms (like NSC + UPA or Trigram) on datasets of Yelp and IMDB on basis of classification accuracy. The model also shows low time complexity and strong interpretability in comparison to the HUAPA model. In [12], authors have researched the influence of online feedback on business. Research shows influence of textual and numerical reviews on product sale performance. The research result helps the online vendors concentrate on important features to increase transactions. In [13], authors have proposed a technique to rank the products depending on online feedback by using sentimental analysis method and fuzzy set theory. A procedure is established to identify neutral, negative, and positive feeling inclination on different items regarding the item features in each feedback. Based on this, a choice help system can be produced to help the user buying choices more efficiently. In [14], authors have presented a method for sentiment analysis depend on feature-specific and time-specific. The model uses the review of the product, the time stamps, and a feature dictionary for the sentiment analysis. To increase the significance of the feedback, the aging factor has been used. The characteristic details are important in case where the user is not interested in the product as whole but analyzes some of its characteristics only. The different approaches proposed in the literature have not explored the comparative analysis of classifiers of various categories especially on weka platform on several performance parameters on text data to find the sentiments. Classifiers, in the proposed approach, have also explored the available and applicable filters to get the required attributes for sentiment analysis. With the help of Fine-grained Sentiment Analysis, which includes polarity of sentiments as very positive, positive, neutral, negative, very negative, emotions of the people like happiness, dissatisfaction, irritation, annoyance, sadness, etc. can be detected.

Comparative Analysis of Classifier Algorithms Based on Sentimental …

465

3 Proposed Methodology In the SA, the model is linked to a specific input during the training process (i.e., text) to the relating yield in light of the test. The input will be transferred to the feature vector with the aid of the feature extractor. The provided tags and these pairs of feature vectors are sent to the machine learning algorithm to create a model. The unidentified text inputs are transformed into feature vectors by the feature extractor during the prediction procedure. Then, feature vectors produce the anticipated labels like positive, negative, or neutral. In the second phase, ML text classifiers will change the text extraction utilizing the traditional methodology of pack of words or sack of n-grams with their recurrence. Word vectors, or embeddings of words, are the basis for a novel method for extracting features. The procedure is given in Fig. 2. The steps involved during the analysis of classifier algorithms are given below: Step-1: Install the required packages (AffectiveTweets1.0.2 for text classification) Step-2: Packages for Training Linear Model, : AffectiveTweets, LibLinear, LibSVM) Step-3: Load the sentiment data set of product reviews having positive, negative, and neutral

Fig. 2 Working of SA

466

S. Kumar et al.

Step-4: Apply the filter(s) to extract attributes from tweets useful for sentiments (Filters → unsupervised → attributes → TweetToSparseFeatureVector → TweetToLexiconFeatureVector [Use different filters to extract features from Lexicons like] [Tweet Sparse Feature Vector] [Use MultiFilter to apply filters next to each other] Step-5: Do the required settings for the filter Step-6: Remove the attributes which are irrelevant in context [Our Classifiers deal with numeric values] Step-7: Train a classifier using Cross-validation (10) or percentage split (66) Step-8: Use Nominal class Step-9: Use Percentage Split instead of Cross-validation for faster analysis Step-10: Select a classifier Step-11: Get and analyze the results.

4 Experimental Study 4.1 Data Set and Features Description The data set consists of 5000 records of product reviews in.arff file format containing two attributes reviews and class of reviews. Statistics are given in Table 1. Class is either positive, negative, or neutral as shown in Fig. 3. Table 1 Statistics of data set

Fig. 3 Data set class categories

S. No.

Label

Count

Weight

1

Positive

1748

1748.0

2

Neutral

2496

2496.0

3

Negative

756

756.0

Comparative Analysis of Classifier Algorithms Based on Sentimental …

467

Number of Attributes: 2 (Content, Class), Number of Instances: 5000.

4.2 Feature Extraction Generally, the product reviews implicate the customer sentiments. The sentiments are found from the words used in the reviews. These words may contain positive, negative, or neutral sentiments. These words describe the product quality and features which may be useful for other new customers. So there is a need to identify best classifier which can categorize these reviews. Sometimes faster words may contain negative sentiment vice versa negative word may contain positive sentiment. After applying the filter → TweetToLexiconFeatureVector and removing the string features, the number of attributes is 16147 from 5000 instances. After applying the filter → TweetToSparseFeatureVector and removing the string features, the number of attributes is 16100 from 5000 instances.

4.3 Classifier Algorithms There are several classifier algorithms available in weka in different categories. And most of the algorithms have been explored but the algorithms which are not giving better accuracy have not been considered. Some of the algorithms from different categories of classifiers which have been considered are listed in Table 2. Table 2 List of algorithms tested according to category Classifier category

Algorithms

Abbreviation

Bayes

Bayes Network Classifier

BNC

Naive Bayes Classifier

NBC

Function

Lazy Meta Trees

NaiveBayesMultinomialText

NBMNT

Logistic(Fun.)

LF

SimpleLogistic

SL

SMO

SMO

Lazy IBk

LI

Lazy LBL

LLBL

meta Bagging

MB

meta.MultiClassClassifier

MMCC

trees.RandomForest

TRF

468

4.3.1

S. Kumar et al.

Bayes Network Classifier

It is based on Bayes theorem that finds the relationship of a particular feature of a class to any other feature. It is a Probabilistic Graphical Model (PGM) that uncertainties based on probabilities [15]. It uses DAG [16] for modeling uncertainties and does the prediction of discrete features of class.

4.3.2

Naive Bayes Classifier

It is based on bayes theorem and does the classification of instances based on the probabilities of the lexicons in each class of class variables. It is one of the most realistic learning Bayesian classifier with several vector features and is used to display that attribute as independence of that class. It evaluates the MAP hypothesis or ML hypothesis. It is based on independence assumption in which training of model is easy and fast [16].

4.3.3

Naïve-Bayes-Multinomial Text

The new version of Naïve Bayes is Multinominal Naïve Bayes which is developed for text records. Multinomial built models explicitly by word counts whereas absences and presences of particular words are used to model the document by Naïve Bayes models [16]. The term Multinomial represent that each p(fi|c)p(fi|c) is a multinomial distribution, different from other distribution [15].

4.3.4

Logistic (Fun.)

A logistic function is a sigmoid curve, in Fig. 4, with equation, f (x) = L/1 + e^− k(x − x0), where x0, the x value of the Sigmond’s midpoint; L, the curve’s maximum value; k, the logistic growth rate. Application of the logistic function is in range of fields like biology, chemistry, demography, chemistry, economics, probability, and artificial neural networks.

4.3.5

Simple Logistic

It is the algorithm which works for both traditional statistics as well as machine learning. Logistic regression predicts in a binary output like True or False, and 0 or 1, instead of guessing rather continuous like size. Linearity, Independence, and No Outliers are some assumptions for Simple Logistic. Application of Logistic Regression is geographic image processing, image segmentation and categorization, and number recognition [17–19].

Comparative Analysis of Classifier Algorithms Based on Sentimental …

469

Fig. 4 Sigmoid curve

4.3.6

SMO

SMO (sequential minimal optimization) algorithm is used to train a SV classifier, based on Gaussian and polynomial kernels. The idea behind SMO is to optimize two variables at a time rather than n variables.

4.3.7

Lazy lBk

Lazy lBk [20] algorithm uses k-nearest-neighbor classifier in which an entity is organized by highest amount of votes of its adjacent with the item actually assigned to the class where it is popular between its k-NN. It is the easiest algorithms in Machine Learning [15].

4.3.8

Lazy LBL

LBL is an algorithm for factorizing a scattered indefinite linear system of equations. LBL evaluates Hx = q for x, where q is a dense vector, and H is a sparse symmetric indefinite matrix [15].

4.3.9

Meta Bagging

Bagging [15] is a machine learning ensemble meta-algorithm aimed to enhance the strength and accuracy of algorithms used in regression purpose and classification purpose. It also reduces the fluctuation of data and helps to from over-fitting. It is a bootstrap aggregation. Bootstrap samples are obtained of the training data. Average of expected values is taken for Regression.

470

4.3.10

S. Kumar et al.

Meta–MultiClassClassifier

Error rectification codes are adjusted with this algorithm for obtaining further accurateness as this algorithm is applied for ordering incidences extra to two classes [15].

4.3.11

TreesRandomForest

It is collective understanding approach for regression, classification, and more activities that result after generating a collection of decision trees at training time [15].

5 Results and Discussion The results given here have been obtained on the basis of training and testing performed on weka. The classifiers from different categories had been trained and tested. The algorithms which have been considered for classification are given in Table 3. Besides these many other classifiers have also been tested but due to poor accuracy, they have been ignored. After applying the different classifiers on the sentimental reviews, it can be seen that logistic and random forest classifier achieved the highest accuracy in classifying the instances correctly. The accuracy achieved by these classifiers is 68.71%. Kappa statistic is used for interrater for interrater reliability testing. The value ranges from − 1 to 1.0 indicates the amount of agreement desired from random chance. 1 indicates the perfect agreement between raters. The highest value of Kappa is achieved by logistic classifier which is 0.466. The root mean squared error is minimum in case of logistic classifier and simple logistic classifier and the value is approx. 0.377, while mean absolute error is minimum in case of naïve Bayes classifier. The results are also shown in graphical form in Figs. 5, 6, 7, and 8. • TP Rate: is the true positives rate (i.e., instances which have been classified correctly into a given class). The true positive (TP) or sensitivity is the chances that an actual positive will result into test positive. False positive rate indicates the accuracy of a test. True Positives True Positives + False Negatives = Recall, [best = 1, worst = 0]

True Positive Rate, TPR =

• FP Rate: It is the percent of false positives (i.e., instances which have been classified incorrectly into a given class).

Total 1700 Number of Instances

102.91%

Root relative squared error

0.4632

Root mean squared error

63.38%

0.2562

Mean absolute error

Relative absolute error

0.3987

37.59%

Incorrectly classified instances

Kappa statistic

62.41%

Correctly classified instances

1700

1700

104.29% 100%

60.90% 100%

0.4694 0.4501

0.2462 0.4042

0.3988 0

36.47% 50.88%

63.53% 49.12%

1700

83.89%

67.08%

83.65%

67.46%

0.3765

0.2727

0.4578

31.82%

68.18%

1700

0.3776

0.2711

0.466

31.29%

68.71%

68.63%

0.5263

0.2774

1700

1700

89.36% 116.93%

76.71%

0.4022

0.3101

0.3071

41.59%

58.41%

lazy Ibk

0.4639

31.41%

68.59%

Parameters Bayes Naive NaiveBayesMultinomialText Logistic SimpleLogistic SMO Network Bayes Classifier Classifier

Table 3 Statistical summary

1700

92.92%

85.06%

84.74%

68.71%

0.3814

0.2777

1700

0.4183

0.3438

0.4542

32.47%

67.53%

1700

83.98%

68.79%

0.378

0.2781

0.4646

31.35%

68.65%

1700

83.86%

72.36%

0.3775

0.2925

0.4647

31.29%

68.71%

meta meta.MultiClassClassifier trees.RandomForest Bagging

0.3265

37.06%

62.94%

Lazy LWL

Comparative Analysis of Classifier Algorithms Based on Sentimental … 471

472

S. Kumar et al.

Fig. 5 Classifiers versus correctly classified instances

Fig. 6 Classifiers versus Kappa statistic

False Positive Rate, FPR False Positives , [best = 0, worst = 1] = False Positives + True Negatives

Comparative Analysis of Classifier Algorithms Based on Sentimental …

473

Fig. 7 Classifiers versus mean absolute error

Fig. 8 Classifiers versus relative absolute error

• Precision: is division of instances that belong to a correct/true class out of whole instances belonging to that class Precision =

True Positives , [best = 1, worst = 0] True Positives + False Positives

474

S. Kumar et al.

• Recall: is division of instances that belong to a correct/true class out of actual total instances belonging to that class and this is similar to TP rate. Recall =

True Positives , [best = 1, worst = 0] True Positives + False Negatives

• F-Measure: It is collective assess of precision and recall. F - measure = 2 ×

Precision × Recall , [best = 1, worst = 0] Precision + Recall

• MCC: Matthews’s correlation coefficient is used to find the quality of binary classifications where only two classes exist. It considers true/correct and false/ incorrect positives and negatives. It gives a balanced estimate in case of classes of different sizes. TP × TN − FP × FN (TP + FP) × (TP + FN) × (TN + FP) × (TN + FN) , [best = 1, worst = −1]

MCC = √

• ROC: It gives general idea regarding performance of classifier. • PRC (Precision Recall) area: gives more detailed results than ROC even in case of binary classification having polarity in datasets. The performance of different classifiers is given in Table 4 which shows that SL, LF, and RF are giving best accuracy of classification. From Fig. 9, ROC plot shows the performance of classifiers in general. From this, it can be seen that SL is performing best among all by equally considering the Table 4 Detailed accuracy by class (weighted Avg.) Classifiers

TP rate

FP rate

Precision Recall F-measure MCC ROC PRC area area

Bayes Network Classifier

0.624 0.222 0.639

0.624

0.628

0.397 0.771 0.675

NaiveBayesMultinomialText 0.491 0.491 ?

0.491

?

?

Naive Bayes Classifier

0.635 0.244 0.641

0.635

0.634

0.397 0.765 0.663

SimpleLogistic

0.682 0.239 0.682

0.682

0.676

0.462 0.823 0.745

Logistic (Fun.)

0.687 0.236 0.687

0.687

0.681

0.471 0.822 0.743

SMO

0.686 0.237 0.687

0.686

0.679

0.47

Lazy Ibk

0.584 0.282 0.58

0.584

0.582

0.305 0.651 0.497

Lazy LWL

0.629 0.316 ?

0.629

?

?

meta Bagging

0.675 0.231 0.673

0.675

0.672

0.454 0.812 0.722

meta.MultiClassClassifier

0.686 0.236 0.686

0.686

0.679

0.47

0.82

trees.RandomForest

0.687 0.238 0.687

0.687

0.68

0.47

0.821 0.735

0.5

0.393

0.745 0.597 0.77

0.687 0.74

Comparative Analysis of Classifier Algorithms Based on Sentimental …

475

positive and negative classes while PRC plot in Fig. 10 shows behavior of classifiers in one class.

Fig. 9 ROC curve for classifiers

Fig. 10 PRC is for classifiers

476

S. Kumar et al.

Time required to build the model for the classifiers: BNC, NBC, NBMNT, LF, SL, SMO, LLBK, LLWL, MB, MMCC, and RF is given in Table 5. MB and SL classifiers take a long time to build the model in comparison to others. After seeing the confusion matrix shown in Table 6, we can conclude that meta Bagging is able to predict the positive reviews correctly. After this, Random Forest does the correct prediction of positive reviews also achieving the highest accuracy of classification. MultiClassClassifier does the minimum incorrect prediction of positive reviews but Random Forest is almost nearer to this.

6 Conclusions In this paper, different classifiers have been analyzed and tested for their accuracy and other parameters on weka platform. It has been found that logistic and random forest classifier among Bayes Network, Naive Bayes, NaiveBayesMultinomialText, SimpleLogistic, SMO, Lazy IBk, Lazy LBL, Bagging, and MultiClassClassifier are giving the highest accuracy of classification of sentimental reviews. The highest value of Kappa is achieved by logistic classifier which is 0.466. The root mean squared error is minimum in case of logistic classifier and simple logistic classifier and the value is approx. 0.377, while mean absolute error is minimum in case of naïve Bayes classifier.

7 Future Scope Further in this work, multi filter can be applied and tested for their classification accuracy. Selecting the best filter and classification can be done. Next training and testing instances can be varied for better results. Besides this, parameter settings for classifiers can be tested.

BNC

3.58

3.24

Test mode: n-fold cross-validation

10

20

Table 5 Model building time (in ms)

1.69

1.72

NBC 0.02

0.06

NBMNT 0.01

0.02

LF 15.19

20.66

SL 8.39

12.21

SMO 0.00

0.00

LLBK

0.00

0.00

LLWL

30.68

40.24

MB

4.25

4.50

MMCC

4.88

5.12

RF

Comparative Analysis of Classifier Algorithms Based on Sentimental … 477

478

S. Kumar et al.

Table 6 Class wise confusion matrix Classifier category

Algorithm

Confusion matrix

Bayes

Bayes Network Classifier

a b c ← classified as 360 176 74 | a = positive 166 540 129 | b = neutral 30 64 161 | c = negative

Naive Bayes Classifier

a b c ← classified as 322 229 59 | a = positive 131 609 95 | b = neutral 22 84 149 | c = negative

NaiveBayesMultinomialText

a b c ← classified as 0 610 0 | a = positive 0 835 0 | b = neutral 0 255 0 | c = negative

Logistic(Fun.)

a b c ← classified as 370 214 26 | a = positive 114 680 41 | b = neutral 31 106 118 | c = negative

SimpleLogistic

a b c ← classified as 369 215 26 | a = positive 116 673 46 | b = neutral 31 107 117 | c = negative

SMO

a b c ← classified as 361 222 27 | a = positive 103 686 46 | b = neutral 31 105 119 | c = negative

Lazy IBk

a b c ← classified as 355 217 38 | a = positive 201 545 89 | b = neutral 51 111 93 | c = negative

Lazy LBL

a b c ← classified as 359 251 0 | a = positive 124 711 0 | b = neutral 49 206 0 | c = negative

meta Bagging

a b c ← classified as 385 195 30 | a = positive 135 640 60 | b = neutral 37 95 123 | c = negative

Meta.MultiClassClassifier

a b c ← classified as 372 212 26 | a = positive 113 681 41 | b = neutral 34 107 114 | c = negative

Trees.RandomForest

a b c ← classified as 379 210 21 | a = positive 117 678 40 | b = neutral 34 110 111 | c = negative

Function

Lazy

Meta

Trees

Comparative Analysis of Classifier Algorithms Based on Sentimental …

479

References 1. Ayu M, Ismail S, Matin A, Mantoro T (2012) A comparison study of classifier algorithms for mobile-phone’s accelerometer based activity recognition. Procedia Eng 41:224–229. https:// doi.org/10.1016/j.proeng.2012.07.166 2. Martínez-Cámara E, Gutiérrez Y, Fernández J, Montejo-Ráez A, Guillena R (2015) Ensemble classifier for Twitter sentiment analysis 3. A large-scale sentiment data classification for online reviews under Apache Spark 4. Ahuja R, Chug A, Kohli S, Gupta S, Ahuja P (2019) The impact of features extraction on the sentiment analysis. Procedia Comput Sci 152:341–348. ISSN 1877-0509. https://doi.org/10. 1016/j.procs.2019.05.008 5. Al-Natour S, Turetken O (2020) A comparative assessment of sentiment analysis and star ratings for consumer reviews. Int J Inf Manag 54:102132 6. Peng B, Wang J, Zhang X (2020) Adversarial learning of sentiment word representations for sentiment analysis. Inform Sci 541:426–441. ISSN 0020-0255. https://doi.org/10.1016/j.ins. 2020.06.044 7. Devi MD, Saharia N (2020) Learning adaptable approach to classify sentiment with incremental datasets. Procedia Comput Sci 171:2426–2434. ISSN 1877-0509. https://doi.org/10.1016/j. procs.2020.04.262 8. Kazmaier J, van Vuuren JH (2020) A generic framework for sentiment analysis: leveraging opinion-bearing data to inform decision making. Decis Support Syst 135:113304. ISSN 01679236. https://doi.org/10.1016/j.dss.2020.113304 9. Li W, Zhu L, Shi Y, Guo K, Cambria E (2020) User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM family models. Appl Soft Comput 94:106435. ISSN 15684946. https://doi.org/10.1016/j.asoc.2020.106435 10. Xu F, Pan Z, Xia R (2020) E-commerce product review sentiment classification based on a naïve Bayes continuous learning framework. Inform Process Manage 57(5):102221. ISSN 0306–4573. https://doi.org/10.1016/j.ipm.2020.102221 11. Ji Y, Wu W, Chen S, Chen Q, Hu W, He L (2020) Two-stage sentiment classification based on user-product interactive information. Knowl-Based Syst 203:106091. ISSN 0950-7051. https:// doi.org/10.1016/j.knosys.2020.106091 12. Li X, Wu C, Mai F (2019) The effect of online reviews on product sales: a joint sentiment-topic analysis. Inform Manage 56(2):172–184. ISSN 0378-7206. https://doi.org/10.1016/j.im.2018. 04.007 13. Liu Y, Bi J-W, Fan Z-P (2017) Ranking products through online reviews: a method based on sentiment analysis technique and intuitionistic fuzzy set theory. Inform Fus 36:149–161. https://doi.org/10.1016/j.inffus.2016.11.012 14. Sharaff A, Soni A (2020)Time and feature specific sentiment analysis of product reviews. In: Sinha GR, Suri JS (eds) Cognitive informatics, computer modelling, and cognitive science. Academic Press, pp 255–272. ISBN 9780128194454. https://doi.org/10.1016/B978-0-12-819 445-4.00013-8 15. Bal R, Sharma S (2016) Review on meta classification algorithms using WEKA. Int J Comput Trends Technol 35:38–47 16. Gladence L, Karthi M, Anu V (2015) A statistical comparison of logistic regression and different Bayes classification methods for machine learning. ARPN J Eng Appl Sci 10(14):5947–5953 17. McDonald JH (1985) Size-related and geographic variation at two enzyme loci in Megalorchestia Californiana (Amphipoda: Talitridae). Heredity 54:359–366 18. Suzuki S, Tsurusaki N, Kodama Y (2006) Distribution of an endangered burrowing spider Lycosa ishikariana in the San’in Coast of Honshu, Japan (Araneae: Lycosidae). Acta Arachnologica 55:79–86 19. Tallamy DW, Darlington MB, Pesek JD, Powell BE (2003) Copulatory courtship signals male genetic quality in cucumber beetles. Proc R Soc Lond B 270:77–82

480

S. Kumar et al.

20. Aksoy S (2008) k-Nearest neighbor classifier and distance functions. Technical Report, Bilkent University, Department of Computer Engineering 21. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

Enhancing Road Safety: A System for Detecting Driver Activity Using Raspberry Pi and Computer Vision Techniques with Alcohol and Noise Sensors P. Sudarsanam, R. Anand, and Manoj Challa

1 Introduction The driver fatigue detection system aims to identify driver tiredness, which can lead to road accidents caused by sleepiness or exhaustion. The system is designed to prevent such accidents by notifying the driver when signs of fatigue are detected. According to surveys, fatigue is responsible for around 20% of accidents on road and can reach up to fifty percent on certain roads. Driver exhaustion is a significant factor in a considerable number of car accidents. Developing technology to address and prevent drowsiness is challenging but crucial in the field of accident prevention systems. As fatigue can impair a driver’s attention and alertness, it is essential to devise various methods to tackle this problem while driving. This distraction occurs when an object or event grabs the driver’s attention away from driving. Similarly, driver drowsiness is similar to driver distraction, and to ensure the driver remains attentive and alert, technology is required to detect signs of fatigue and alert the driver accordingly. This is the method support in most applications for detecting an object; notably, in this case for identifying faces in an image, Haar cascade algorithm is used. To train the classifier for face detection, a collection of positive images containing faces and negative images without faces is required. The features of the images can be extracted and analyzed for various applications. The Haar cascade P. Sudarsanam (B) BMS Institute of Technology and Management, Bengaluru, India e-mail: [email protected] R. Anand CMR Institute of Technology Bengaluru, Bengaluru, India e-mail: [email protected] M. Challa Gopalan College of Engineering and Management, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_36

481

482

P. Sudarsanam et al.

algorithm calculates the value of each feature by comparing pixel values, and based on these calculations, it identifies areas of interest. These areas are then highlighted as black portions, indicating that the algorithm is performing the detection operation. In addition to the Haar cascade algorithm, a shape predictor helps to predict object based on its structure. It converts the object, such as the face, into coordinate points (x, y) and extracts the desired features based on the provided inputs. This allows for further analysis and processing of the detected objects. In the hope of to determine the aspect ratio of the eyes and mouth, the algorithm transforms images into facial landmarks. Only faces are processed; all other items are ignored. Using algorithms, image processing retrieves crucial information while preventing noise and distortion. Alcohol and noise levels can be sensed in real time using dedicated sensors integrated into a system that monitors driver behavior and improves road safety. Overall, the driver drowsiness detection system described in the reference paper is an innovative solution to a serious problem that affects road safety. By utilizing computer vision techniques, image processing, and machine learning, the system can detect signs of driver fatigue and alert the driver before an accident occurs. Additionally, the inclusion of alcohol and noise sensors adds an extra layer of safety to the system, making it a comprehensive solution for monitoring driver behavior. With the rising number of road accidents caused by driver fatigue, this technology has the potential to save lives and improve road safety for all.

2 Literature Survey The exhaustion detection system implemented in this work uses a combination of electroencephalogram (EEG) and eye condition to accurately detect a driver’s level of exhaustion. This is an IoT-based project that utilizes the Neurosky Mindflex headset to study brainwave data. To use this device, the user must wear it on their forehead and connect it using wires. The project involves various tools and resources to detect the onset of drowsiness and can send SMS alerts to a nearby control room via signals. This innovative technology is extensively utilized in IoT devices to continuously monitor brainwave data, specifically focusing on points of meditation and attention. When the driver’s eyes remain closed for approximately three seconds, a signal is transmitted to the Arduino device using radio frequency communication. By analyzing the eye closure ratio obtained from the EEG data, the system can detect drowsiness by comparing it against a predetermined threshold for the eye aspect ratio in the closed state. The system suggesting that driver must ensure that they are wearing the device on their forehead before starting to drive, as it continuously detects the brain and provides an alert when drowsiness is detected [1]. This paper goes beyond the simple detection of drowsiness and aims to predict the level of impairment that a driver may experience. The project’s main objective is to gather sufficient information from drowsiness detection to assess and forecast the future level of impairment. Real-world scenarios were examined to gather this information. Various models, such as ANN, CNN, and RNN, were implemented to

Enhancing Road Safety: A System for Detecting Driver Activity Using …

483

detect the level of drowsiness at different times. Additionally, tracking the driver’s movements provided insights into the specific situations that trigger drowsiness. Behavioral information emerged as the best approach for detecting and predicting drowsiness. Therefore, the project not only focuses on driver exhaustion but also predicts the likelihood of a driver becoming drowsy in different situations [2]. In order to develop this project, a random forest algorithm was utilized, resulting in a 78.7% accuracy rate for alerting the drowsy detection system. Additionally, an ensemble algorithm was employed for hybrid sensing, achieving better accuracy of close to 82.4% in detecting alert and drowsy states. Furthermore, a classification model accuracy increased up to 95.4% and achieved for the moderately exhausted driver state. A dataset consisting of 10-s sections of data was created using hybrid evaluation and recorded during the drowsiness detection system experiment. This paper primarily focuses on conducting a feasibility study to maintain a high level of accuracy in driver drowsiness detection. The detection system is entirely based on hybrid sensing, without the need for contact sensors [3]. In this study, the system utilized machine learning model supports for detecting a drowsiness, and an additional CO2 sensor chip was implemented. Moreover, speech recognition technology (STT) was integrated, allowing the driver to request music or make phone calls using voice commands. Multiple features have been incorporated into this project to enhance its functionality [4]. The project leverages machine learning to develop a drowsy detection system by mining real human behavior data. It successfully classifies 30 facial actions, including blinking, yawning, and other facial movements, using a facial action coding system trained on a separate database. Head movement is automatically collected through eye tracking motion, enhancing the system’s accuracy. But, the approach is important to note that the prediction may not be completely accurate when it comes to detection [5]. The purpose of this report is to streamline project implementation by minimizing the number of tools required. The focus of the project is on detecting eye blinks and using these data to alert the user. The methodology employed in this project aims to assist users experiencing fatigue-related issues [6]. The road safety ensures that machine learning approaches have been developed to anticipate the driver’s current situation and emotions, enabling the system to gather relevant data for providing protection. ANN models have also been utilized to facilitate automatic learning, allowing the system to improve itself without explicit programming. However, achieving 100% efficiency has proven to be a challenge. With the various consequences, this paper has been transformed into a research endeavor aimed at developing a better solution for the aforementioned issues [7]. This study focuses on a feature that is associated with the mouth and eyes, such as frequent eye blinking and yawning. The objective is to classify the level of fatigue. The proposed approach involves dividing the process into four parts: A video camera captures the feed, which is then processed by a computer vision system to detect faces. The detected face images are subsequently passed to a support vector machine (SVM) for classification as fatigue or non-fatigue. The output of the classifier is represented as + 1 for fatigue and − 1 for non-fatigue, serving as both input and output for the

484

P. Sudarsanam et al.

system. The time-dependent output is then used to categorize the fatigue level into high and low exhaustion level [8]. The primary focus of this paper revolves around developing a yawning detection system. The mouth region is specifically extracted from the face by employing (sFCM) clustering. The system is designed to detect instances of yawning across multiple frames. If the system identifies that the driver is experiencing fatigue based on the frequency of yawning, it will trigger an alarm to warn the driver [9]. The various technologies are used to develop a project relies on the utilization of the drowsiness detection block match algorithm. Various techniques are employed to effectively eliminate noise interference. The main challenge of this project revolves around detecting the frequency of eye blinks, whereby the user is alerted if the eye blinks exceed a threshold of three times. Key technologies such as drowsiness detection block match algorithm and Gabor ordinal measures are crucial in achieving the objectives of this project [10].

3 Methodology The new approach driver activity detection system aims to enhance driver safety through various functionalities, including drowsiness detection, continuous monitoring of driver behavior and concentration, along with identifying the position of the driver’s eyes as well as mouth for the drowsiness detection system. Additionally, the system incorporates head movement detection. An embedded sound detector sensor is utilized to detect surrounding sounds. The system has developed by using a Python programming, machine learning, and different image processing models are employed. The project involves capturing images of drivers, converting them to grayscale using image processing technology, extracting relevant facial features, and subsequently processing the images to determine the presence of required objects within the frame. If the aforementioned conditions are met, the system proceeds with the calculation of the Euclidean distance to determine the distance between the left and right eyes, as well as to assess whether the eyes are open or closed. Utilizing Python array concepts, it becomes feasible to extract the coordinates of the eyes and mouth from the facial structure. Additionally, a head movement feature is incorporated to detect driver exhaustion. If the system detects improper head posture or frequent tilting of the head, it will issue a warning alarm to alert the driver. The schematic diagram of integrating Raspberry Pi is shown in Fig. 1 with sensors designed using Proteus simulation software for effective prototyping and testing of electronic systems. The list of components are Raspberry Pi, LCD display, Pi camera, GSM module, PIR sensor, microphone, speaker, seat belt detection sensor, alcohol sensor, vibration sensor, and buzzer.

Enhancing Road Safety: A System for Detecting Driver Activity Using …

485

Fig. 1 Raspberry Pi with sensors

3.1 Driver Exhaustion Detection Architecture Figure 2 illustrates the architecture of the exhaustion detection system, which encompasses three key features: eye, mouth, and head movement. Once these features detect drowsiness, the system displays a message or activates an alarm. Initially, the system captures images of the driver, followed by face detection and extraction of facial features such as the eye and mouth. If the criteria for exhaustion detection are met, the system flags it as exhaustion detected. Concurrently, it also monitors head movement, and if the driver tilts their head, indicating drowsiness, an alarm is triggered. The architectural representation of the system is designed to provide a visual understanding of the execution flow. It simplifies the project’s stages and facilitates a system-oriented perspective for easier comprehension and task execution. Driver exhaustion detection is an innovative technology designed to identify and mitigate driver fatigue during travel. This technology employs a range of sensors, cameras, and different devices to continuously monitor the driver’s behavior and recognize indicators of exhaustion. When signs of fatigue are detected, the system can promptly alert the driver or even assume control of the vehicle to enhance the safety of all road users. Given that driver fatigue is a significant contributor to accidents, this technology plays an important role in preventing collisions and preserving lives. By effectively detecting and mitigating exhaustion, driver exhaustion detection strives to minimize accident rates and make a substantial impact on road safety. The exhaustion detection system follows a structured processing flow. Initially, it takes a user’s face image as input via a web camera. The image is then converted into coordinates, and only the required features (eye, mouth, and head) are extracted from the facial structure. To determine drowsiness, the system analyzes the eye and mouth using a shape predictor with 68 coordinate points. Specific points related to the eye and mouth are extracted for further processing and calculation. Pre-defined machine learning models are employed to convert the image to grayscale, enabling

486

P. Sudarsanam et al.

Fig. 2 Architecture of driver exhaustion detection system

the detection of eye and mouth aspect ratios. The system is developed to detect and identify eyes and lips activity and positional measurements through a Euclidean distance algorithms to achieve necessary requirements. Thus, the system calculates required results, to generate alerts when any inactive state is detected. In the next step, the sequence of images is captured by a camera of the driver, which serves as the input data. The system then proceeds to detect specific regions of interest on the driver’s face, namely the eye, mouth, and head movement. Once it detects signs of drowsiness in the driver, it generates an alarm as the output. The middle tier of the architecture focuses on processing and managing the user data. The camera is strategically positioned in front of the driver, fixed securely at the corner to ensure continuous monitoring of the driver’s behavior. If the system detects a loss of concentration or signs of exhaustion, it promptly issues warnings to the driver in various situations. Subsequently, the system processes the captured images and computes the equable fatigue. Once the required features have been extracted through a machine learning algorithms, the exhaustion level assessment begins. In this stage, the system employs a predetermined threshold, allowing the driver a certain amount of time to regain

Enhancing Road Safety: A System for Detecting Driver Activity Using …

487

comfort and focus. If the driver appears inattentive or distracted for an extended period, the system refrains from causing any further distractions. This approach enables the system to accurately gauge the driver’s level of exhaustion and respond accordingly.

3.2 Video Capture The web camera or car dashboard camera captures images within specified frames. The images have been captured and then processed to determine the either presence or absence of the desired object. The video contains frames at a certain frequency, which are utilized for face detection. These images are classified as either negative or positive. A positive image indicates that the expected object is present within the frame, while a negative image signifies the absence of the expected object. To capture video from a Raspberry Pi camera, you can use the Pi camera Python library. Here are the general steps to capture video using the Raspberry Pi camera. The list of requirements are as follows: Import the necessary libraries, set the camera resolution and frame rate, create a file to store the video, wait for a specified amount of time, stop recording and close the camera. This code will capture 10 s of video at a resolution of 640 × 480 and a frame rate of 24 frames per second and save it to a file called “my_video.h264.” You can change the resolution, frame rate, and filename as desired. You can also add additional code to process the video data or stream it over a network if needed.

3.3 “Haar Cascade” Algorithm Haar cascade is one of the machine learning model supports either training or detection of images sequence or videos frames. It is one of the techniques used for detecting faces in images or videos. The algorithm works by analyzing an image in different scales and sizes and identifying features such as edges, lines, and corners. The predefined features are then combined to form a Haar-like feature, which is essentially a simple classifier. The Haar cascade algorithm trains a classifier by feeding it with positive and negative samples. Positive samples are images that contain the object of interest, while negative samples are images that do not contain the object of interest. The algorithm then constructs a cascade of classifiers, where each classifier in the cascade is more complex than the previous one [11]. Training Phase During the training phase, the Haar cascade algorithm analyzes positive and negative samples to create a cascade of classifiers. The first image shows a positive sample of a face, while the second image shows a negative sample without a face. During the detection phase, the algorithm slides the classifier over the image in a scanning

488

P. Sudarsanam et al.

window. The classifier evaluates each region in the image and determines whether it contains the object of interest. The algorithm then uses a sliding window approach to scan the entire image and returns the regions where the object is detected. Detection Phase The detection phase involves the utilization of the Haar cascade algorithm, which employs a sliding window technique to scan an image and identify the object of interest. This algorithm evaluates each region of the image using a classifier. If no object is detected, it moves on to the next region. The third image demonstrates the sliding window approach implemented on an image, while the fourth image displays the regions where faces are detected. In the context of this project, our focus is primarily on face detection. To achieve this, we need a combination of positive images that contain faces and negative images that lack faces to train the classifier. Once the classifier is trained, we can extract features from the images. The Haar cascade algorithm calculates the value of each feature by analyzing pixels on a pixel-by-pixel basis, as illustrated in Fig. 2. It highlights the regions where operations are performed by representing them as black portions. Furthermore, the algorithm can calculate features for various window sizes, as there are multiple features available for each individual, with as many as 160,000 features for 24 × 24 windows. Figure 3 is utilized to recognize the eye, mouth, and nose for detection purposes. Each subfigure within Fig. 3 rotates around the captured face image and extracts the necessary features using the Haar cascade algorithm. Each subfigure is designed with distinct structures to process the specific feature required for drowsiness detection. These rectangular boxes rotate around the facial landmarks, enabling the identification of the desired facial object and its corresponding features. Upon successfully detecting these features, the system proceeds with the calculation specific to that particular feature [11–13]. The face detection algorithm implemented using OpenCV can be summarized as follows: • • • • • • • •

Import the required libraries. Capture input images using a camera and convert it into grayscale. Load the input image using OpenCV. Use Haar cascade and LBP classifiers to detect faces in the image. Compare the accuracy and time performance of both classifiers. Display the output image with detected faces. For the Haar cascade classifier, the algorithm can be further detailed as: Load the input image using cv2.imread(img_path) function with the image path as an input parameter. • Convert the input image to grayscale mode and display it. • Load the Haar cascade classifier.

Enhancing Road Safety: A System for Detecting Driver Activity Using …

489

Fig. 3 Features of Haar cascade

3.4 Shape Predictor Figure 4 illustrates the facial coordinates, where each corner of the face is converted into coordinate points for feature division and calculation of drowsiness level. Out of the 68 coordinates, six are dedicated to the eyes and eight to the mouth, which are the specific features extracted for drowsiness detection. This process is facilitated by the OpenCV library, which provides image processing capabilities. The x- and y-axes of the coordinates are utilized to extract the desired features. With the aid of Python technology, fetching these facial landmarks becomes a straightforward task [14]. The facial landmarks of the system are to predict the structure of object and convert it into (x, y) coordinates, allowing for the extraction of the desired object based on user inputs. In this project, the system takes an actual image and converts it into facial landmarks if the specified object is present. The identified facial landmarks are then converted into coordinates, enabling the system to locate the eyes and mouth. Only the coordinates relevant to the eyes and mouth are considered, and the system calculates

490

P. Sudarsanam et al.

Fig. 4 Facial landmarks

the eye and mouth aspect ratios using these coordinates. The shape predictor serves as input training data, as it is pre-trained and provides pre-defined data. To utilize the shape predictor, the find_min_global function from the Dlib library is employed. This function performs a search on a standard grid. The iBUG 300-W dataset is utilized with this method to find optimal solutions for hyperparameters. The iBUG 300-W dataset is an excellent resource for training facial landmark predictors, as it allows for a specific focus on individual facial structures such as the eyebrows, eyes, nose, mouth, and jawline. Due to the large size of shape predictor datasets, it is more practical to concentrate solely on the eyes and mouth rather than all face landmarks. This narrower focus makes it easier to train the shape predictor specifically for the identification of the eyes and mouth. To configure the Dlib environment, several packages need to be installed, including Dlib, OpenCV, Imutils, and Scikit-learn. Following that, the find_min_global function is used to fine-tune hyperparameters through a standard grid search.

3.5 Image Processing Image processing is a powerful technique used to manipulate and analyze images in order to extract important information. It is analogous to signal processing, where both the input and output features are interconnected with the image. Exhaustion

Enhancing Road Safety: A System for Detecting Driver Activity Using …

491

monitoring technology has seen significant advancements in recent years, providing drivers with increased safety and advantages. By leveraging image processing, the detection of eye and mouth behaviors can be accurately determined from digital images, while minimizing negative disturbances. The image processing workflow generally involves three main steps: importing and loading the images using appropriate tools, processing and modifying the image, and finally displaying the processed results. In the context of detection, resizing the image becomes essential to adjust the pixel dimensions, and image interpolation techniques are used to correct any distortions based on the desired capture frame.

3.6 Calculate EAR In Fig. 5, the coordinates represent the eye aspect ratio (EAR), and with these, the coordinates are used to calculate whether the eye is open or closed. The formula mentioned above is applied to measure the eye aspect ratio, which plays a crucial role in identifying the level of exhaustion. The eye aspect ratio is calculated using six (x, y) coordinates. The numerator of the formula calculates the vertical distance of the eye, while the denominator calculates the horizontal distance. The coordinates consist of one set of straight coordinates and two sets of straight up coordinates. During the EAR calculation, the eye shape is predicted in a clockwise manner, starting from the right to the left. Once the eye is detected, the calculations are performed. The EAR is typically regular when the eye is open, resulting in a nonzero value. However, when the eye is closed, the EAR will be close to zero. It is crucial to accurately differentiate between open and closed eyes, especially during blinking. Blinking introduces small variations in the eye shape and movement, making it challenging to maintain a uniform scaling of the image. Therefore, careful consideration and adjustment are necessary to account for individual differences in eye blinking patterns.

Fig. 5 EAR (aspect ratio eye)

492

P. Sudarsanam et al.

3.7 Calculate MAR In Fig. 5, the coordinates represent the mouth aspect ratio (MAR), which is used to determine whether the driver is in sleepy state according to the yawning. The MAR calculation involves eight (x, y) coordinates that define the mouth region. It functions as the vice versa of the EAR in the sense that it assesses the mouth instead of the eye. By utilizing the Haar cascade algorithm, the mouth region can be extracted from the image, enabling the calculation of the MAR and assisting to identify the drowsiness of driver.

3.8 Fatigue Detection with Face Detection and Features Extraction Figure 6 illustrates the process of detecting both eye and face images when images are processed through the video capture unit. The images captured are in RGB color format, allowing for brighter image capture even in low-light conditions. The algorithm employed ensures that image processing takes place effectively in lowlight environments, aiming to enhance the image quality and eliminate unnecessary disturbances in the frames. To achieve this, the image undergoes contrast enhancement, which involves the removal of noise and amplification of the desired image features. This process is divided into two parts. Firstly, to eliminate disturbances, an adaptive-based super pixel noising technique is applied. Secondly, to enhance contrast, adaptive luminance adjustment is performed on a pixel level. It is crucial to eliminate image disturbances before applying contrast enhancement techniques. Figure 7 shows the identification head and face shape using Python, and you can use computer vision techniques and libraries such as OpenCV, Dlib, and numpy. Here are the list of the steps involved: • Load the sequence of image using OpenCV and covert to grayscale. • Detect the face region using Haar cascade or Dlib’s face detection algorithm.

Fig. 6 MAR (aspect ratio of mouth)

Enhancing Road Safety: A System for Detecting Driver Activity Using …

493

Fig. 7 Face and eye detection

• Identify face image and extract the facial landmarks mouth and eyes using Dlib’s pre-trained facial landmark detector. • Use the facial landmarks to identify the shape of the face and head. By applying this method, the system aims to enhance accuracy by eliminating noise and enhance the quality of the images while avoiding any adverse effects on image texture blurring caused by excessive enhancement. Hence, the face is recognized in the images and converted to grayscale for easy visualization of face images. This conversion is performed because drowsy detection does not necessitate the use of color information in the images. To ensure compliance with Indian standards, the image detection process in frames requires a pre-trained machined algorithm with fast recognition capabilities. The Haar cascade algorithm is used to detect face images using Haar cascade features with respect to facial landmarks. Specifically for driver drowsiness detection, only the eye and mouth sections need to be extracted from the image. To achieve this, the face image is transformed into a rectangular shape measuring 100 × 100 pixels. From this, we obtain a region of 80 × 30 pixels for the EAR and a region of 40 × 40 pixels for the mouth aspect ratio, represented by coordinates (x, y). The respective rectangle windows for the eye and mouth regions are defined as (10, 20) and (30, 60). Further processing is performed for fatigue detection. Figures 8 and 9 illustrate the detection of yawning and closed eyes in Python, utilizing computer vision techniques and libraries such as OpenCV, Dlib, and numpy. Patterns are identified to match the relevant data within the high-dimensional and potentially correlated dataset. The compressed dataset is then divided into a training set and a test set to determine whether the driver is exhausted or not. Ultimately, the classifier outputs a value of 1 if the driver is detected as fatigued, or − 1 if not, triggering an alert to the driver.

494

P. Sudarsanam et al.

Fig. 8 Head and face shape identification

Fig. 9 Yawning and closed eye identification

3.9 Sound Detection The sound sensor and sound sensor module easily integrated with Raspberry Pi, and it can use a sound sensor module that can detect sound levels and output signals that can be read by the Raspberry Pi. Here are the general steps to get started: Integrate the sound sensor and sensor module to the Raspberry Pi using the GPIO pins. Install the necessary software packages for reading the sensor data. One popular package is the “gpiozero” Python library. Write a Python program to acquire data from the sensor, and it can be analyzed by different sound levels. This can be done by setting a threshold sound level and comparing the sensor readings to that threshold. If the sensor readings exceed the threshold, it can trigger an action of an alarm Run the Python program on the Raspberry Pi and test it by making loud noises near the sensor.

Enhancing Road Safety: A System for Detecting Driver Activity Using …

495

Fig. 10 Seat belt detection

3.9.1

Seat Belt Detection

Figure 10 shows detecting whether a person is wearing a seat belt using a Raspberry Pi camera that can be done using computer vision techniques. Here are the general steps you can follow: Collect data: First, you will need to collect a dataset of images or video clips of people wearing and not wearing seat belts. This will be used to train your machine learning model. Train a model: It can use a pre-trained model like YOLOv3 or train our custom model using a deep learning framework like TensorFlow or PyTorch. Seat belt detection model uses transfer learning to adapt a pre-trained model to your specific use case. Set up the Raspberry Pi: Install the necessary libraries and software on your Raspberry Pi, including OpenCV, TensorFlow, and/or PyTorch. Capture video: Use the Raspberry Pi camera to capture live video footage or images of people in a vehicle. Run the model: Process the video or images using your trained model to detect whether a person is wearing a seat belt or not. Alert the driver: If the model detects that a person is not wearing a seat belt, you can trigger an alert to notify the driver. It is important to note that this system may not be 100% accurate and should be used as a supplement to traditional seat belt detection systems. In addition, seat belt detection model ensures that you are following all necessary safety precautions and regulations when testing this system [15]. Figure 11 shows the detection of different activities during the phase alarm and alert generation.

3.9.2

Alcohol Sensor

A Raspberry Pi can interface with external sensors, including alcohol breathalyzers and touch sensors, to detect alcohol in a driver’s breath or sweat. To do this, you would connect the sensor to the Raspberry Pi, set up the GPIO pins, capture the sensor data, and process it. The code may need calibration and normalization steps for accurate readings. Integrating a driver alcohol consumption sensor with a Raspberry

496

P. Sudarsanam et al.

Fig. 11 Activity detection flowchart

Pi can create a low-cost, portable system for detecting drunk driving, useful for law enforcement agencies or as a safety feature in vehicles [16]. Here is a comparison table of the driver activity detection system presented in this paper with some other related techniques as in Table 1.

Enhancing Road Safety: A System for Detecting Driver Activity Using …

497

Table 1 Comparison of proposed approach with other techniques System

Advantages

Limitations

Proposed system Computer vision techniques and Raspberry Pi

Methodology

Cost-effective, easily installed in any vehicle, real-time processing, alerts driver and passengers if any dangerous behavior is detected, sends notifications to authorities if necessary, useful for monitoring professional drivers

The system requires a high-quality camera, alcohol and noise sensing accuracy may be affected by external factors such as vehicle vibrations

H. Lee et al., 2018

Convolutional neural networks and Arduino

Can detect driver drowsiness and distraction caused by using a mobile phone while driving, low-cost system

Requires a large amount of training data, camera angle and position may affect accuracy, limited to detecting only drowsiness and mobile phone use, not useful for monitoring professional drivers

H. A. Abdullah et al., 2020

Image processing Cost-effective, easily and Arduino installed in any vehicle, real-time processing, detects drowsiness, distractions, and alcohol consumption

Limited to detecting drowsiness, distractions, and alcohol consumption, facial expression detection accuracy may be affected by lighting conditions and camera position, requires a high-quality camera

4 Conclusion The driver activity detection plays a crucial role in safeguarding drivers from road accidents by monitoring their exhaustion levels through the use of cameras and alcohol sensors. The primary objective of this project is to make this technology accessible in all vehicles, including cars and trucks, with the aim of preventing accidents and promoting overall road safety. By implementing such systems, we can significantly reduce the risk of accidents caused by driver fatigue or impairment, ensuring the well-being of drivers and other road users. It is a system as a driver assistive to improve the safety of drivers by reducing road accidents. Raspberry Pi integrated with Pi camera and alcohol sensor through the machine learning and image processing, was able to develop next-generation road assistive system for future. The proposed framework, which utilizes head, face, eye, mouth, seat belt, and alcohol sensors with the feature sets along with a vast amount of data, can be highly effective in detecting driver fatigue and surpassing obstacles from recently developed techniques. Additionally, the system requires only a camera to monitor the driver’s face, reducing hardware costs. This makes the framework highly productive and cost-effective in detecting signs of exhaustion in drivers.

498

P. Sudarsanam et al.

References 1. Kondapaneni A, Hemanth C, Sangeetha RG et al (2021) A smart drowsiness detection system for accident prevention. Natl Acad Sci Lett 44:317–320. https://doi.org/10.1007/s40009-02001000-3 2. Jacobé de Naurois C, Bourdin C, Stratulat A, Diaz E, Vercher J-L (2019) Detection and prediction of driver drowsiness using artificial neural network models, accident analysis & prevention. In: 10th international conference on managing fatigue: managing fatigue to improve safety, wellness, and effectiveness, pp 95–104. ISSN 0001-4575. https://doi.org/10.1016/j.aap.2017. 11.038 3. Gwak J, Hirao A, Shino M (2020) An investigation of early detection of driver drowsiness using ensemble machine learning based on hybrid sensing. Appl Sci 10(8):2890. https://doi. org/10.3390/app10082890 4. Jang S-W, Ahn B (2020) Implementation of detection system for drowsy driving prevention using image recognition and IoT. Sustainability 12:7. https://doi.org/10.3390/su12073037 5. Vural E, Cetin M, Ercil A, Littlewort G, Movellan J (2007) Machine learning systems for detecting driver drowsiness. https://doi.org/10.1007/978-0-387-79582-9_8 6. Azim T et al (2009) Automatic fatigue detection of drivers through yawning analysis. In: Signal processing, image processing and pattern recognition: international conference, Proceedings, SIP 2009, held as part of the future generation information technology conference, FGIT 2009, Jeju Island, Korea, 10–12 Dec 2009. Springer, Berlin 7. Navya Kiran VB, Raksha R, Rahman A, Varsha KN, Nagamani NP (2020) Driver drowsiness detection. Int J Eng Res Technol (IJERT) NCAIT–2020 8(15). ISSN: 2278-0181 8. Gupta R, Aman K, Shiva N, Singh Y (2017) An improved fatigue detection system based on behavioral characteristics of driver 9. Azim T, Jaffar A, Ramzan M, Mirza A (2010) Automatic fatigue detection of drivers through yawning analysis. Commun Comput Inform Sci 61:125–132. https://doi.org/10.1007/978-3642-10546-3_16 10. Mary Sefia A, Anitha Gnana Selvi J (2016) Driver state analysis and drowsiness detection using image processing. Int J Sci Eng Appl Sci (IJSEAS) 252–256. ISSN: 2395-3470 11. Shetty AB, Bhoomika D, Jeevan Rebeiro R (2021) Facial recognition using Haar cascade and LBP classifiers. Glob Trans Proc 2(2):330–335. ISSN 2666-285X. https://doi.org/10.1016/j. gltp.2021.08.044 12. Aza VI, Areni IS (2019) Face recognition using local binary pattern histogram for visually impaired people. In: 2019 international seminar on application for technology of information and communication (iSemantic), Semarang, Indonesia, 2019, pp. 241–245. https://doi.org/10. 1109/ISEMANTIC.2019.8884216 13. Bratu DV, Moraru SA, Georgeta Gu¸seil˘a L (2019) A performance comparison between deep learning network and Haar cascade on an IoT device. In: 2019 international conference on sensing and instrumentation in IoT Era (ISSI), Lisbon, Portugal, pp 1–6. https://doi.org/10. 1109/ISSI47111.2019.9043714 14. Chakraborty P, Roy D, Zahid Z, Rahman S (2019) Eye gaze controlled virtual keyboard. Int J Rec Technol Eng 8:3264–3269. https://doi.org/10.35940/ijrte.D8049.118419 15. Simeon S, Kasiselvanathan M, Vimal SP (2019) Raspberry-Pi based secure systems in car. IOP Conf Ser Mater Sci Eng 590:012064. https://doi.org/10.1088/1757-899X/590/1/012064 16. McShane J, Douglas M, Meehan K (2021) Using a raspberry Pi to prevent an intoxicated driver from operating a motor vehicle. In: 2021 IEEE 11th annual computing and communication workshop and conference (CCWC), NV, USA, pp 1023–1028. https://doi.org/10.1109/CCW C51732.2021.9376036 17. Wong J, Lau P (2019) Real-time driver alert system using raspberry Pi. ECTI Trans Electr Eng Electron Commun 17:193–203. https://doi.org/10.37936/ecti-eec.2019172.215488

A Decision Support System for Prediction of Air Quality Using Recurrent Neural Network R. Naga Sai Harshini, V. S. V. Jetendra, K. Sravanthi, and T. Sajana

1 Introduction Pollution of the air, ecosystems, and climate change are serious problems that influence people’s health. Human activities including transport, manufacturing, and energy generation are amongst the main causes of air pollution. The hazardous gases carbon monoxide (CO), nitrogen dioxide (NO2 ), ozone (O3 ), particulate matter (PM10 and PM2.5), or sulphur dioxide are a few of those that are regarded as important pollutants (SO2 ). All the problems that have been rooted in polluted air can only be resolved by knowing the number of pollutants in the air around us. Air quality data are required for mitigating air pollution. Therefore, air quality prediction is an important task in environmental monitoring and public health management [1]. Because of its capacity to understand sequential and temporal connections in data, recurrent neural networks (RNNs) are extensively utilised in forecasting air quality. RNNs are a powerful choice for air quality prediction tasks due to factors such as time series data, handling variable-length sequences, feature extraction, and non-linear modelling, which allow them to capture the temporal dynamics and dependencies present in air quality data and provide accurate predictions for short-term or long-term forecasts. R. Naga Sai Harshini · V. S. V. Jetendra · K. Sravanthi Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP 522302, India e-mail: [email protected] V. S. V. Jetendra e-mail: [email protected] K. Sravanthi e-mail: [email protected] T. Sajana (B) Department of AI & DS, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP 522302, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_37

499

500

R. Naga Sai Harshini et al.

LSTM and GRU are two RNN designs that are particularly good at capturing dependence over time in the information and avoiding the vanishing gradient problem, making them excellent for predicting air quality [2]. The accuracy of pollution prediction can be impacted by input representation and training techniques in addition to choosing the right RNN architecture. The three most popular input representations are CNNs, feature engineering, and raw data [3–8].

2 Literature Survey The conventional feedforward backpropagation technique known as multilayer perceptron (MLP) and the constant artificial neural network [9, 10] system identified as long short-term memory networks (LSTM) were used to predict PM10 hourly using previous information on the chemical and three meteorological variables gathered from five monitoring stations [11]. The models were validated using two strategies: the hold-out and blocked nested cross-validation (BNCV) technique. The modelling findings provide highly accurate forecasting for times when there is moderate PM10 concentration. They thought it would be best to use several deep learning model alternatives that also include incremental learning to introducing self-identifying methods for model identification. Sarkar et al. discussed that in order to estimate the AQI value for particulate matter at a specific location in Delhi, a number of error-prone procedures [12–15], such as r-squared mean absolute error and root-mean-square error methods, are listed in this paper. The proposed method forecasts the AQI of the immediate vicinity by combining several deep learning algorithms, such as long short-term memory or gated recurrent unit. Several stand-alone machine learning and deep learning models [16], such as LSTM, linear regression, GRU, k-nearest neighbour, and support vector machine, have been trained on the same data set to compare their performances with the proposed hybrid model. With a maximum average efficiency (MAE) value of 36.11 and a score of 95%, it is found that the suggested hybrid technique outperforms them all with R2 value of 0.84 [5]. Chen Ding et al. discussed that the data set, which primarily contributes with three phases, determines the adaptive GRU structures that [17] help to solve the problem. First, a method is suggested for the GRU structure to be represented as a fixed-length binary string. The square root of the total sum of individual losses is then used to clarify the fitness coefficient for iterative calculation. The data-adaptive GRU network structure is computed using the genetic method [7], which can increase the precision of pollution predictions. Results of the experiment using three real data sets in Xi’an demonstrate that the suggested technique outperforms the current LSTM-, SVM-, and RNN-based algorithms in RMSE and SAMPE. Their primary flaw was all of the elements affecting the air quality were not taken into account because of insufficient data [4]. The authors of this research suggested the IAQP technique, which integrates indoor air quality predictions based on real-time data, for air quality management

A Decision Support System for Prediction of Air Quality Using …

501

systems [18]. LoRa and IoT sensors are used to forecast the results to assess indoor air quality. The multilevel RNN model beats the LSTM in the prediction process because it produced outstanding results and was feasible [12]. Although the sensors that were considered for the collection of data in this research are expensive and not feasible to use them at all times [6]. Aggarwal et al. discussed that using the air quality data set of 15 sites in India, the suggested model is used. To demonstrate the superiority of the suggested technique, several tests are carried out [19]. Initially, a comparison between deep learning models and conventional sequential models is made. Second, the findings are assessed against various samples of the benchmark data set that already exists. Findings indicate that when compared across a range of performance indicators, the suggested strategy performs better than the current forecasting methods: additionally, preprocessing, segmentation, and feature. In this study, engineering is utilised to comprehend the spatio-temporal correlations of the time series, as well as seasonal and trend features. This explains how LSTM models’ spatio-temporal instabilities are addressed. The goal of this study was to forecast daily and hourly levels of PM2.5 for the upcoming 30 days and 72 h as well as to ascertain how the weather impacted PM2.5 concentrations in Pakistan. For this predicting, LSTM and LSTM encoder–decoder machine learning and deep learning models were used [2]. This analysis also correctly forecasted the suggested day and hourly PM2.5 concentration. The LSTM encoder– decoder displayed the best efficiency with a mean absolute percentage error (MAPE) of 28.2, 15.07, and 42.1% daily, and 11.75, 9.5, and 7.4% hourly for numerous cities in Pakistan. For a fuller model comparison, this study need to have employed statistical tests such as the T-test, ANOVA, and F-test. The authors of this work have reviewed the literature and found various machine learning techniques to predict the AQI [9]. Through the literature review, the linear regression, LASSO regression, ridge regression, and SVR method were determined. They successfully trained the linear regression, LASSO regression, ridge regression, and SVR algorithm after preprocessing the data. Each model is constructed using the same set of data. In this study, the authors evaluated the models’ performance using MAE, RMSE, and r-squared error. With reduced MAE, root-mean-squared error, and greater r-squared, the models ridge regression and LASSO regression have performed better [20]. Here, in Table 1, the tabular format of the survey conducted on the prediction of air quality is shown. Table 1 gives detailed information about various research papers that have been referred. The table contains the authors’ names of various research papers, the data set they have considered, the methodology they have used, and the outcome of every methodology used. It also includes the limitations of their research.

502

R. Naga Sai Harshini et al.

Table 1 Survey on prediction of air quality by various researchers S.

Author

Input

Method

Output

Limitations

1

Sugandha Arora

AQI data set Recurrent neural Comparison of five cities network (RNN) between predictions for 2015–2020 of LSTM and the proposed approach on different fractional orders. The performance of networks is measured using RMSE and MAPE

The authors declare that they have no conflicts of interest

2

Khwaja Hassan, Mehmet Turan, Asadullah Shaikh, Jawad Rasheed

Air quality LSTM and data from LSTM sensors at US encoder–decoder embassies across Pakistan and meteorological data from the World Weather website

The LSTM encoder–decoder had the best performance and successfully forecasted PM2.5 concentration with a mean absolute percentage error (MAPE)

This study should have used statistical tests such as the T-test, ANOVA, and F-test for a more thorough model comparison. Instead of only using MAPE, mean absolute error and root-mean-squared error would have provided a more in-depth analysis of the research

3

Abdellatif Bekkar

Air quality Deep learning data comes (DL) from the Beijing municipal Environmental Monitoring Center

Predicted the PM2.5 of air pollutants in the urban area of Beijing

It can be difficult to debug and be sensitive to initial conditions

4

Avan Chowdary Gogineni, Vamsi Sri Naga Manikanta Murukonda

The data set, air quality data, and AQI of various cities in India

Conducted literature review and identified some machine learning algorithms to predict the AQI

We can further ensemble two or more machine learning algorithms and process large data to get more accurate results

No.

Deep learning

(continued)

A Decision Support System for Prediction of Air Quality Using …

503

Table 1 (continued) S.

Author

Input

Method

Output

Limitations

5

Nairita Sarkar, Rajan Gupta, Pankaj Kumar Keserwani, Mahesh Chandra Govil

Data set comprised of air quality data and meteorological data. 70–80% for training and 20–30% for testing

Using different hybrid models which are combination of both ML and DL models

Proposed that LSTM–GRU models’ performance is high in terms of R2, RSME, MAE parameters

The authors declare that they have no conflicts of interest

6

Ghufram isam drewil, Jabbar al-bhadili

Time series data for hours for a group of stations in India from 2017 to 2020

Designing It will predict air hybrid genetic quality for the next algorithm LSTM day by improving the prediction error in the LSTM algorithm

It can be difficult to debug and be sensitive to initial conditions

7

Chen ding, zhouyi zheng, sirui zheng

Real-time data from 2018 to 2020 recorded at the central square station of Xincheng district

Using adaptive GRU using genetic algorithm

We get the optimal prediction results which contains comparison of different pollutants

Due to insufficient data, all the factors influencing the air quality were not considered

8

Jingyang wang, Xiaolei li, lukai jin, jazheng li, qiuhong sum, haiyowangg

Air quality data from 00:00 on 1 January 2017, to 23:00 on 30 June 2021, in Shijiazhuang city, Hebei Province, China

Using a combined model of CNN and improved LSTM

Showed the Does not perform decreased MAE well in extreme values and the values prediction increased R2 values

9

Mauro Castelli

Large data set Machine with more learning (SVM) parameters and measurements, which can support, in particular, NO2 and PM2.5

Presented a study of support vector regression (SVR) to forecast pollutants particulates levels and to correctly identify the AQI

We intend to improve and investigate the usage of SVR to forecast air quality

10

Sheen Maclean

Air quality ANN data comes from the Beijing Municipal Environmental Monitoring Center

Air pollution variables using ANNs have increased dramatically

Each step of the overall model development process of ANNs should be viewed as interconnected entities

No.

(continued)

504

R. Naga Sai Harshini et al.

Table 1 (continued) S.

Author

Input

Method

Output

Limitations

11

Chardin Hoyos Cordova, Rodrigo Salas, Paulo Canas, Romino Tores in 2021

Ten air quality monitoring stations located in the constitutional province of Callao and the north, south, east, and centre of Lima

MLP, LSTM, hold-out and the blocked nested cross-validation

LSTM recurrent ANN with BNCV adapt more precisely to critical pollution episodes have better predictability performance for this type of environmental data

We expect to apply other variants of deep learning models that include incremental learning as well as to introduce self-identification techniques for the model identification

12

Sagat V, Sreenidhi Rajagopal, Ranjani R, Rajasekhar Mohan

Two data sets from two different cities; Bengaluru being the metropolitan city and Amaravati being a small city

A long short-term memory (LSTM) recurrent neural network (RNN)

The forecasting model has an RMS error between 30 and 40 ppm in Bengaluru, whereas Amaravati has RMS error between 0 and 5 ppm. It is also observed that data in Bengaluru being a huge city have high temporal variance compared to Amaravati which is a small town

The model does not directly take factors such as wind, temperature, humidity, and weather conditions that affect pollutant concentration into account. Taking these factors into account directly would improve the performance of the model

No.

3 Problem Analysis and Proposed Strategy LSTM recurrent neural networks that are good for time series are called models prediction tasks. In air quality research, LSTM models are often used to predict concentrations of pollutants such as PM2.5 or NO2 based on historical data. Univariate analysis involves using only the historical concentrations of a single pollutant to make predictions. In this approach, the LSTM model takes as input a time series of past pollutant concentrations and learns to predict the future concentration based on this history. This approach can be effective when the pollutant of interest is the main driver of air quality changes in the area. Bivariate analysis involves using the historical concentrations of multiple pollutants as input to the LSTM model. This approach takes into account the interactions between different pollutants and can provide more accurate predictions when multiple pollutants are affecting air quality. The air quality data set utilised in the study comprises hourly information on the weather and pollution levels at the US Embassy in Beijing, China, over five years. The data contain the date and time, the amount of pollution known as PM2.5, and

A Decision Support System for Prediction of Air Quality Using …

505

meteorological information such as dew point, temperature, pressure, wind direction, wind speed, and the total number of hours of snow and rain. Univariate and bivariate analysis approaches can detect correlations, trends, and patterns resulting in interpretable and actionable data. Understanding the interdependence of distinct contaminants or pollutants and meteorological conditions allows us to prioritise certain variables for future inquiry or modelling. These approaches can detect unexpected spikes in pollutant concentrations by comparing current measurements to historical trends or predetermined thresholds, alerting researchers or authorities to suspected pollution sources or hazardous circumstances. Bivariate and univariate analyses can also be combined with more advanced models or machine learning techniques. The RMSE calculates the discrepancy between the expected and actual pollutant concentration values. The variance between the actual pollutant concentration levels and the expected values is measured by MAPE. It is computed by dividing the actual values by the mean of the per cent variance between the predicted and actual values, then multiplying the result by 100 to get the percentage. A lower MAPE shows that the simulator is more accurate at forecasting air quality. Here, in Fig. 1, the flowchart of the proposed methodology is being demonstrated. From Fig. 1, consider the analysis of proposed methodology as discussed in the following sections.

3.1 Data Preprocessing Preprocessing data are an important stage in air quality prediction since it ensures that the data are clean, accurate, and fit for use in machine learning models. Generally, the accuracy of the results depends on the quality of the data utilised for air quality forecast. Data preparation approaches can assist guarantee that the data are fit for use in machine learning models, resulting in more accurate predictions as shown in Fig. 1.

3.1.1

Data Preprocessing by Replacing NA Values

In data preparation, replacing NA (not available) values refer to the process of filling in missing values in a data set with a substitute value. NA or missing values in a data set can arise for a variety of reasons, including insufficient data collection, data corruption, or equipment malfunction. Because most machine learning algorithms cannot accept missing data, replacing NA values is a necessary step in data preparation.

506

R. Naga Sai Harshini et al.

Fig. 1 Proposed methodology—flow chart

3.1.2

Data Preprocessing by Parsing Date–Time into Pandas

The pandas.to datetime() method may be used to convert a date–time string into a pandas DataFrame index. This method converts strings or other data types into pandas datetime objects that can subsequently be used as the index of a DataFrame. First, an example DataFrame having a date–time column named ‘date’ and a value column called ‘value’ is established. Then pandas.to datetime() to convert the ‘date’

A Decision Support System for Prediction of Air Quality Using …

507

column to a pandas datetime object is used, which is then utilised as the DataFrame’s index with df.set index. The date–time values are used as the index in the resultant DataFrame, and the ‘value’ column is kept.

3.1.3

Data Preprocessing by Specifying Clear Names for Each Column

In data preparation, specifying unambiguous names for each column refers to the practise of assigning relevant and descriptive labels to each column in a data set. Concise and succinct column names help users comprehend and analyse the data more easily, and they can also assist to prevent errors or misunderstanding during analysis. When defining column names, it is critical to select names that appropriately reflect the column’s contents.

3.2 Data Visualisation In machine learning, data visualisation refers to the use of graphical representations to highlight patterns, correlations, and trends in information. Data visualisation techniques are used to assist people in understanding and analysing big and complicated data sets, which is very beneficial in machine learning. Matplotlib, Seaborn, and Plotly are some prominent Python data visualisation packages for machine learning. These libraries offer a diverse set of visualisation approaches for creating useful and interesting representations of machine learning data.

3.2.1

Data Visualisation Using Boxplot

A boxplot, often called a box and whisker plot, is an illustration of the quartile distribution of a data set. It displays a data set’s five-number summary: the minimum, first quartile (Q1), median, third quartile (Q3), and highest value. Boxplots are useful for examining data distribution and finding outliers. It will display a box with lines extending from it, indicating the first (Q1) and third (Q3) quartiles, respectively. In Fig. 2, the visualisation of the distribution of the air quality data set that is being considered is displayed. In Fig. 2, the median is shown as a horizontal line. The box will include whiskers that extend to the lowest and highest values of the data, and any outliers will be displayed as individual points outside of the whiskers.

3.2.2

Data Visualisation Using Correlation Matrix

Representation of the data set using correlation matrix is shown in Fig. 3. The coefficients on the relationship between items in a data set are called a correlation matrix.

508

R. Naga Sai Harshini et al.

Fig. 2 Distribution of air quality data set

Each table cell displays the correlation over two variables, with values ranging from − 1 to + 1. A perfect correlation is represented by a value of − 1, a perfect correlation by + 1, and no correlation by a value of 0. A correlation matrix can be shown graphically as a heat map, in which each cell is coloured in accordance with its value. The representation of the correlation between the variables is shown in Fig. 3.

Fig. 3 Correlation matrix representation of air quality data set

A Decision Support System for Prediction of Air Quality Using …

509

In Fig. 3, the heat map visualises the correlation between variables, making it simpler to see patterns and correlations in the data. Higher correlation coefficient cells will be tinted darker, whilst lower correlation coefficient cells will be coloured lighter.

3.3 LSTM Data Presentation Each LSTM cell in an LSTM network has three gates: a gate for entering data, a forget gate, and an exit gate. These gates are connected in a chain. The output gate regulates the amount of the cell’s internal state, the forget gate regulates how lot of the previous cell state is discarded, and the input gate regulates how much of data input is permitted to flow through. The differentiating characteristic of LSTM is the ability to consciously recall forget information during extended periods of time.

3.3.1

Normalising Data

Normalising data is a key stage in the production of LSTM data. Normalisation is converting to a common scale, either between 0 and 1 or − 1 and 1, to make training and improving the model easier. Normalisation is vital because it guarantees that the input values are within a specified range and are neither too large nor too tiny, which can create problems like disappearing or bursting gradients. These difficulties can make learning and generalisation from data challenging for the LSTM.

3.3.2

Transforming Data Set into Supervised Learning Problem

Another critical stage in LSTM data preparation is transforming data into a supervised learning issue. This entails turning a series of raw data into a collection of input– output pairs from which the LSTM model may learn. After transforming the data into a supervised learning problem, it may be utilised to train an LSTM model. By moving the window over the input data and using the learnt parameters to predict the output at each time step, the model may then make predictions on fresh sequences of data.

3.4 Fitting the Model 3.4.1

Splitting Data into Train and Test

This method aims to create systems that can be successfully generalised to new, unrecognised data. If the model works well on test data, it may work well on new

510

R. Naga Sai Harshini et al.

data. Generally, the data are randomly split into training and test sets, with a typical proportion of 80% for training and 20% for testing. However, the split ratio may change depending on the size and complexity of the information and the needs of the application.

3.4.2

Defining Three-Layer LSTM Architecture and Adding Dropout at 20% After Each Layer

Dropout is a regularisation method used in neural networks to assist avoid overfitting. It operates by randomly removing (setting to zero) part of the network’s neurons during training, driving the surviving neurons to acquire more robust representations of the input. Including a dropout layer after each LSTM layer in a neural network model can assist reduce overfitting and enhance the model’s generalisation performance. The dropout rate of 0.2 indicates that during training, 20% of the neurons in the LSTM layer will be randomly dropped out.

3.5 Evaluating the Model 3.5.1

Making Predictions

To predict MAE, the trained model performance is assessed by generating predictions on a set of test data and comparing the expected and actual results. The test input data is represented by X test, whilst the equivalent real output data are represented by Y test. For the test data, the predicted technique is utilised to get the anticipated output y pred. After producing predictions, the model’s performance can be assessed using an appropriate evaluation metric, such as mean squared error (MSE) for regression issues. The sklearn.metrics module’s mean squared error function may be used to compute the MSE between the actual and predicted outputs.

3.5.2

Invert Scaling

Scaling is a popular preprocessing approach in which the input characteristics is scaled to a certain range, such as 0 and 1 or − 1 and 1. Generally, it is needed to test the model’s performance on new, unknown data after training it. First, a MinMaxScaler object is created and it is fitted to the training data X train. The scaler is then used to convert the test data X test to the scaled format X test scaled. The trained model is then used to generate predictions on the scaled test data y pred scaled. To return the projected output to its original scale, scaler’s inverse can be used to transform method to conduct an inverse scaling operation.

A Decision Support System for Prediction of Air Quality Using …

3.5.3

511

Calculating RMSE and MAPE

To generate the expected output y pred, first use the trained model to make predictions on the test data X test. RMSE is computed using the mean squared error function from the sklearn.metrics package. The sqrt function is then used to calculate the square root of the mean squared error, yielding the RMSE. Finally, compute the MAPE by considering the mean absolute percentage error between the actual and anticipated outputs, Y test and y pred. The np.abs function is used to calculate the absolute difference between the actual and anticipated outputs, and the error is scaled to a percentage by dividing by Y test. Both RMSE and MAPE are prominent regression problem assessment measures. The root-mean-squared difference between projected and actual production is represented by RMSE, whereas the average percentage difference between the predicted and actual outputs is represented by MAPE.

4 Experimental Results Experimental results are conducted on air quality data set as shown in the following Figs. 4, 5, 6, 7, 8 and 9. In Fig. 4, the visualisation of the number of pollutants in the data using boxplot is shown. The correlation matrix for different features and the output are shown in Fig. 5. In Fig. 6, the output that represents a boxplot of single pollution attribute is shown. In Fig. 7, the depiction of the actual and predicted values with the help of graph plot along with its block of code is shown.

Fig. 4 Boxplot of various pollutants present in the data set in multivariate analysis

512

R. Naga Sai Harshini et al.

Fig. 5 The correlation matrix of various pollutants depicted from the data set in multivariate analysis

Fig. 6 The boxplot of a single attribute from the data set in univariate analysis since univariate analysis predicts by taking only one attribute from various attributes present

Fig. 7 The graph that is plot between the actual and predicted values in multivariate analysis after calculating the RMSE and MAPE

A Decision Support System for Prediction of Air Quality Using …

513

Fig. 8 Training the model in univariate analysis to assess the model’s performance

Fig. 9 Training the model in multivariate analysis where several attributes are taken into account

In Fig. 8, the model’s performance is shown. In Fig. 9, it is easy to find the pollutant level and the time series of the pollutant using multivariate analysis. This helps us to find the amount of particular pollutant level at a particular time.

514

R. Naga Sai Harshini et al.

5 Conclusion The use of recurrent neural networks for air quality prediction has shown promising results in recent research. RNNs are particularly well suited for time series data, such as air quality measurements, because they can capture temporal dependencies and patterns in the data. RNNs have also been used to model the effects of various factors on air quality, such as weather conditions and emissions from different sources. Iterative neural networks are beginning to be able to predict time series information. In this article, various RNN algorithms to predict air quality were used. The accuracy of RNN air quality predictions depends on several factors. Here, air quality data sets collected from different cities to train an LSTM network to predict future pollutant levels are used. Better predictions were obtained by using the LSTM model. The analysis can be further improved by using other types of RNN architectures such as HRNN, HNN, and RMLP. One can also improve the quality of input data and use some advanced preprocessing techniques to get uncertainty-free data.

References 1. Arora S, Singh N, Singh S, Singh D, Shrivastava RR (2022) Air quality prediction using the fractional gradient-based recurrent neural network, received 25 Jul 2022, Accepted 20 Sept 2022, Published 9 Dec 2022 2. Abu-Mahfouz AM, Shaikh A, Turan M, Rasheed J (2022) Forecasting of air quality using an optimized recurrent neural network, Received 28 August 2022, Revised 11 October 2022, Accepted 12 October 2022, Published 18 October 2022 3. Bekkar A, Hssina B, Douzi S, Douzi K (2021) Air-pollution prediction in smart city, deep learning approach, Published: 22 Dec 2021 4. Murukonda VSNM, Gogineni AC (2022) Prediction of air quality index using supervised machine learning. Available from: 8 July 2022, Created: 7 July 2022 5. Sarkar N, Gupta R, Keserwani PK, Govil MC (2022) Air quality index prediction using an effective hybrid deep learning model. Received 30 June 2022, Revised 27 Sept 2022, Accepted 6 Oct 2022, Available online 11 Oct 2022, Version of Record 17 Oct 2022 6. Drewil GI, Al-Bahadili RJ (2022) Air pollution prediction using LSTM deep learning and metaheuristics algorithms. Received 1 Sept 2022, Revised 29 Sept 2022, Accepted 24 Oct 2022, Available online 29 Oct 2022, Version of Record 9 Nov 2022 7. Xie X, Wen D, Zhang L, Zhang Y (2022) Accurate air-quality prediction using geneticoptimized gated-recurrent-unit architecture. Received: 20 Mar 2022, Revised 15 Apr 2022, Accepted: 17 Apr 2022, Published: 26 Apr 2022 8. Wang J, Li X, Jin L, Li J (2022) An air quality index prediction model based on CNN-LSTM. Published: 19 May 2022 9. Clemente FM, Silva S, Vanneschi L (2020) A machine learning approach to predict air quality in California, Received 25 Jan 2020, Accepted 23 June 2020, Published 4 Aug 2020 10. Cabaneros SM, Calautit JK, Hughes BR (2019) A review of artificial neural network models for ambient air pollution prediction. Received 25 June 2018, Revised 25 June 2019, Accepted 26 June 2019 11. Cordova CH, Lopez Portocarrero MN, Salas R, Torres R (2021) Air quality assessment and pollution forecasting using artificial neural networks in Metropolitan Lima-Peru Published: 20 Dec 2021

A Decision Support System for Prediction of Air Quality Using …

515

12. Madan T, Sagar S, Virmani D (2020) Air quality prediction using machine learning. Published December 2020 13. Halsana S (2020) Air quality prediction model using supervised machine learning algorithms. Published July 2020 14. Sakhrieh A, Hamdan MA, Bani Ata MF (2021) Air quality assessment and forecasting using neural network model. Artificial Neural Networks, Published 06 June 2021 15. Nehete R, Patil DD (2021) Air quality prediction using machine learning. Published 6 June 2021 16. Zhao Z, Wu J, Cai F, Zhang S, Wang Y-G (2023) A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID-19 Pandemic. Published 18 Jan 2023 17. Ding C, Zheng Z, Zheng S, Wang X (2022) Accurate air-quality prediction using geneticoptimized gated-recurrent-unit architecture. Received: 20 Mar 2022, Revised 15 Apr 2022, Accepted 17 Apr 2022, Published 26 Apr 2022 18. Nurcahyanto H, Prihatno AT, Morshed Alam Md, Habibur Rahman Md, Jahan I, Shahjala Md. (2022) Multilevel RNN-Based PM10 air quality prediction for industrial internet of things applications in cleanroom environment. Received 09 Jul 2021, Accepted 01 Feb 2022 19. Aggarwal A, Toshniwal D (2021) A hybrid deep learning framework for urban air quality forecasting. Received 11 Sept 2019, Revised 14 Oct 2021, Accepted 7 Nov 2021, Available online 19 Nov 2021, Version of Record 25 Nov 2021 20. Liang Y-C, Maimury Y, Chen AH-L, Juarez JRC (2020) Machine learning based prediction of air quality. Received 29 Nov 2020, Revised 16 Dec 2020, Accepted 18 Dec 2020, Published: 21 Dec 2020

Trust Aware Distributed Protocol for Malicious Node Detection in IoT-WSN S. Bhaskar, H. S. Shreehari, and B. N. Shobha

1 Introduction An enormous number of communication nodes may connect to the internet by using Internet of Things (IoT) technology. These nodes include actuators and/or sensors to enable minimum or no human interaction in the processing and recovery of data obtained from other systems. Many IoT applications have been deployed to enhance system performance and quality of life in the industrial, transportation, healthcare, and other industries [1–3]. IoT’s growth has significantly impacted a number of fields. To accomplish the goals set out by smart services, IoT technology requires a number of procedures. Smart activities allow gadgets to interact with the actual environment so that the consumers may always receive the best service. The number and variety of attacks have also increased as a result of significant technological advancements. In an effort to reduce the reliability of IoT devices and services, attackers frequently leverage the heterogeneity of IoT to provide false data and capture user behavior. Despite the fact that reliability, security, and privacy are essential for the success of IoT deployment, the diversity and dynamic nature of IoT applications as well as the lack of resources have generated huge issues [5]. As a result, the model evaluates reliability, recognizes and distinguishes among the devices, which do not perform the same. This offers a controlled IoT infrastructure process and avoids unpredictability and service interruptions [4–6]. Also, it helps to reduce the potential risks involved. The widespread adoption of IoT applications is unlikely if it does not have a strong security framework to restrict and reduce its effect [6, 7]. Thus, the authentication and encryption processes are utilized to enable IoT security. IoT security vulnerabilities can be reduced by utilizing strong authentication and encryption technologies. The authentication and encryption processes are highly required to securely send data between nodes, and as a result, it creates a first line S. Bhaskar (B) · H. S. Shreehari · B. N. Shobha Department of ECE, SJC Institute of Technology, Chickballapur, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_38

517

518

S. Bhaskar et al.

of defense against attacks [8]. These systems are able to stop and identify external attacks but they do not identify internal attacks or hostile nodes. By acquiring access to the network’s shared key and initiating several attacks, inside attackers can get beyond these security measures [7]. By improving the neighbor weight-based approach for assessing trust, in [9] researchers overcame the difficulty of identifying malicious nodes in WSN. The proposed algorithm has determined the acceptable minimum level of node trust and this routinely updated the node trust level. Using the D-S (Dempster-Shafer) evidence theory, a trust mechanism is developed to manage both indirect and direct trust of third-party nodes. It guaranteed the robustness of the network and the integrity of each data packet. Unlike [10, 11] proposed a trust model that considers both direct and indirect trust levels as well as the internal threats that wireless sensor networks encounter. This model is designed based on the computation of trust degree. By incorporating regular changes to the degree of confidence, this technique reduced the network’s energy usage and established the trust threshold. In order to assure network security and resilience, it may also distinguish malicious nodes. This evaluates the message success rate, node latency, and accuracy as trust metrics in order to decrease the uncertainty of the traditional trust mechanism’s results. A multiattribute trust model is presented by [12], wherein each node’s ultimate trust value was derived via fuzzy processing. A novel strategy for maintaining trust based on the D-S evidence theory was proposed by [13]. By first examining the spatio-temporal correlation of data collected from nearby sensor nodes, the D-S theory is employed and the model of trust was also developed. At the end, the overall level of trust was evaluated to spot malicious nodes. In order to address the issues presented by the single detection function and limitations of malicious node identification systems, a unique malicious node recognition model is developed to resist dangerous behavior of nodes in existing WSNs. This model demonstrated the Beta Distribution of reputation and the indirect credibility of third-party nodes. To ensure that hostile nodes were correctly recognized, it additionally included the trust levels corresponding to different attack strategies. Above mechanism for malicious node detection seems to be complex as well as high cost, thus this research work develops the TADP mechanism for performing efficient detection. Further contribution is given as follows: • TADP (Trust Aware distributed protocol) aims to develop trust among the nodes for data transmission. Secure data aggregation is carried out for data transmission. • Two Distinctive Trust Aware protocol is developed for classification and misclassification of detected nodes in IoT-WSN This research is highly organized; first section discusses the background of IoTWSN and its security related issues along with existing mechanism. Second section discusses the proposed mechanism TADP and evaluated it in third section.

Trust Aware Distributed Protocol for Malicious Node Detection …

519

2 Proposed Methodology Wireless sensor networks have various applications for IoT where devices combined with the sensors are used for transmission of data avoiding human interference. Although, an issue is raised on the security of data transmission and requires constant attention and resolution of the problems that arise. The sensors that are used with the devices can easily be tampered along with data packets that are malicious. It is required that these packets are identified. This study develops a TADP WSN network that takes care of malicious data packets leading to a secured and verified data aggregation.

2.1 TADP System Model A number of users N = {n 1 , n 2 , . . . , n r } are deployed for the proposed model with each user having their own device. All the users R have the data such as D = {d1 , d2 , . . . , dr }. In this WSN network, every device is considered as a malicious device or an honest device with respect to dl . The model computation is calculated using the following equation: M=

R 1  dl R l=1

(1)

Considering that, the Data Information Center (DIC) and the two various types of devices that are used include the trusted device and malicious device. This also involves two different threats that arise to this network namely, quality of data aggregation and data privacy. For this, the computation is formulated as: M = 0

p 

zl0 xl0

(2)

l=1

2.2 Trust Aware Distributed and Secure Data Aggregation The data that is aggregated is denoted as y j . During security optimization of the data, there is an extra unwanted noise that is denoted as ∅ j . The final data that the WSN network is comprised are given by the Eq. (3): y ..j = y j + ∅ j

(3)

520

S. Bhaskar et al.

On considering Eqs. (2) and (3), we formulate an equation for computation of the model taking into consideration the unwanted noise that is produced, ..

M = 0

r 

zl0 xl0

(4)

l=1

  This extra noise is approximated to O 0 + I 2 . O  denotes the genuine data out of the entire data in the WSN network. The possibility of further addition of noise in the network is performed using a function for random generation. This random generation function is denoted as θ (.) : T with respect to the original data. This expression is further given as follows   y ..j = θ y j = y j + ∅ j

(5)

The identification of the original data becomes challenging when both original and extra data are combined. This also results in obstacles between accuracy and privacy issues of data. In order to resolve this issue a parameter was used for Integrated Mechanism which is given as χ , expressed as follows: χ = M − M ..

(6)

Considering the above equation, the average parametric value yields more efficient accuracy of data aggregation. The number of device users is denoted as nl , extra parameter is ϒ j while its weights are given as l . We calculate the original data of the network as given below with and without additional extra data. ..

M=

r 

l n l ; original data without extra data

l=1

(7)

..



M =M+

r 

l ∅ j ; with extra data

l=1

Further simplified, ..

r   0 2  2    z j Il X ∅j = 2

(8)

j=1

The value of ∅ j = 2Il2 . Hence the simplification leads to Eq. (8). Furthermore, we consider probability distribution and substitute it as probability < ϑ = r giving the equation 

1 − r = 2χ

−2

o   2  0 2 Il zl l=1

(9)

Trust Aware Distributed Protocol for Malicious Node Detection …

521

As security is a major concern for the WSN network, it is essential to set a  −1 security optimization parameter where I j2 is equivalent to μ2 ϒ j . Therefore, this parameter is given as ⎛  ⎞ o   0 2  −1 χ =⎝ zl 2ϒ j ⎠μ(1 − r )−1/2

(10)

j=1

The initial parameter that is used here for further simplification which helps to increase the accuracy of the model. An optimization problem is proposed from the above equations that are stated. The devices that upload the data on the basis of this problem solution are given below, further simplified using the average parameter value:     2

0 2 3 o   ϒ j  = zl S j k=1

2   1 zl0−1 S j 3

−12 E

(11)

after simplification using nominal parametric value −12     2

0 2 3 1      S z 2 j o 0−1 l C ℘ jˆ = Sj 3 k=1 z l This optimization solution helps to resolve the privacy concerns of the users in the WSN network. Data perturbation is a technique used for enhancing the security of data where extra unwanted data also termed as noise is added to the original data in order to preserve user data confidentiality. A user adds the data that is perturbed on the basis of the extent that the user wants the data to be preserved. In the absence of disturbed data, the device cannot store the initial user data without the perturbed data, regardless of whether the WSN network can be trusted due to security concerns. The model that is optimized considers a function for its design as K : R M × R → R, where the multiplier is denoted as ς . The equation is derived considering M with respect to nominal ϒ j which is expressed as: 



o N   0 2  −1     2 M ϒj, ς = zl ϒ j + ς SL ϒ 2j − E J =1

(12)

h=1 

whereE =

o 

SL ϒ 2j

J =1

While considering the nominal we proposed the equation,

 2 1/3 zlo ϒj = 2ς S j ϒ j

(13)

522

S. Bhaskar et al.

On further simplifying the above equation and the Eq. (11) E (2ς )−1/3 = o  2  1/3 o Sj l=1 z l

(14)

2.3 Trust Protocol Optimizes the Misclassification of Node Identification It is very essential for the process of data aggregation to be valid. In order to achieve this, the quality of data aggregation in the WSN network should be qualifiable. It is highly possible for the malicious devices to be present with the trusted devices, therefore, corrupting the entire network. In order to prevent this, it is necessary to detect the malicious WSN devices and discard them for achieving a secure and trustworthy WSN network. The detection of the malicious devices follows a series of calculation by including the inconvenience of even malicious devices being considered as trustable devices. It is also needed to consider the energy constraint while detecting the malicious devices. A parametric value for the aggregation of data at its highest efficiency is considered and is denoted as I0 . While data aggregation is inefficient, the parametric value is denoted as I1 . It is also possible for the devices to be miscategorized implying that the devices that are malicious are identified as trustable devices. Dg = D(I1 |I0 ); honest nodes identified as honest Dl = D(I0 |I1 ); malicious nodes identified as honest

(15)

We also propose the test statistic and the test for the categorized data and data that is miscategorized, given in the equation below. M = y lj − yˆ lj 2 ; M≶ II01 (ϕ)

(16)

In this section, we also discuss about the parameter related to energy along with its constraints for the devices as well as the data packets that are being transferred and is denoted by E1 for the honest devices and the data packets that are being transferred for malicious devices are denoted as E0 . In which, E1 > E0 > 0. While we consider the presence of malicious devices in the WSN network, we also need to consider the possibility of these malicious devices being attacked and the various risks that the network contains due to these devices being present. The probability of attack is denoted by P and the risk factor of the WSN network is given as R. Therefore, the risk of attack that the WSN network suffers is expressed using the equation stated below:

Trust Aware Distributed Protocol for Malicious Node Detection …







S(ϕ, r) = (E1 1 − Rg (ϕ) − DRg (ϕ))⎝1 −

523 Oo 

⎞ rj⎠

j=1 Oo    + E0 (1 − R0 (ϕ)) − DRg (ϕ) − DRg (ϕ) rj

(17)

j=1

2.4 Trust Aware Protocol to Detect the Malicious Node and Remove WSN network is highly vulnerable to attacks as it is also a wireless network. As the vulnerability of the network is high, meaning the threat of attacks is more, it is highly essential to resolve this security problem by detecting the malicious devices. This process remains incomplete as destroying/eliminating the detected malicious devices is given utmost importance to maintain the security of the network. This process is performed by using the sensors that are constantly updated within lth time and the parameter that represents is given as t lj . The original parametric value is given as zl0 . The packets being deviated along with its function are given as f Lw = lLw { fl }, which is further simplified and denoted as f . . There are a number of constraints that are needed to be followed for destroying the data packets in the WSN network. The constraints are. First Protocol: If function fl is less than or equal to f Lw then the senor is updated while the value of the function f m is reduced. Therefore, this increases the reliability of the node. Second Protocol: If the function f m is more than f Lw , in this case, there is a reduction in the reliability while there is an increase in f m − f Lw . This leads to considering that if a certain device in the WSN network is malicious then it nullifies the parametric value and the data packet is destroyed. This sensor updating function is given by the following equation:

3 Performance Evaluation WSN network adapts to the enormous amounts of data in the network with redundancy using aggregation of data as a solution. Although, data aggregation security is challenging as the network is exposed to a number of security threats and attacks. A data aggregation mechanism that is secure, efficient, and reliable is developed. This model is inclusive of 100 nodes with sensors that are used for simulation along with three particular types malicious nodes, namely, 10, 20, and 30 devices that contribute

524

S. Bhaskar et al.

Fig. 1 Node identification

to the security, proof, and reliability of the model proposed. This section of performance evaluation consists of the utilization of energy for efficient data aggregation. The count of non-function devices parametric value is assumed to be the same for evaluation. A comparative study is also proposed including factors such as security, reliability, and efficiency are compared with the previously existing models. Analysis of model security as well as efficiency for various malicious devices are considered, whereas the efficiency of the model is on the basis on energy consumed and the devices that have failed in the WSN network. The malicious device being discarded and destroyed is also compared with the previously existing methodologies.

3.1 Identification In this section, the malicious nodes are identified wherein a comparison is made between the existing system and the proposed system by evaluating the correct identification of nodes with 15, 20, and 25 nodes. Figure 1 shows the comparison of the information stated above; in the context of 15 comprised nodes, the existing system detects 8 sensor nodes and the TADP model identifies 13 nodes. Consequently, in the context of 20 sensor nodes, the existing system identifies 8 nodes whereas the proposed model identifies 18 nodes. For 25 sensor nodes, the existing system identifies 8 nodes whereas the TADP model identifies 22 nodes.

3.2 Misidentification Figure 2 depicts the misidentification of nodes for 15, 20, and 25 sensor nodes. In 15 nodes context, the existing model misidentifies 7 nodes whereas the TADP model misidentifies 2 nodes. In 20 nodes, the existing model misidentifies 12 nodes whereas the proposed model misidentifies 2 nodes. In 25 nodes context, the existing model misidentifies 17 nodes whereas the proposed model misidentifies 3 nodes.

Trust Aware Distributed Protocol for Malicious Node Detection …

525

Fig. 2 Node misidentification

Fig. 3 Throughput comparison

3.3 Throughput Throughput is defined as the amount of work done in a specific amount of time, it showcases the models’ efficiency. In the case of 15 comprised nodes, the throughput of the existing model is 0.123 and for the proposed model it is 1.000. In the case of 20 compromised nodes, the throughput of the existing model is 0.3 and for the proposed model it is 0.91. In the case of 25 compromised nodes, the throughput of the existing model is 0.33 and for the proposed model it is 0.93 (Fig. 3).

3.4 Comparative Analysis This section showcases the comparative analysis and shows the percentage improvisation for the TADP model from the existing model. The improvisation is carried out on node identification for 15 nodes is 4.721% improvisation; for 20 nodes, the improvisation is 7.69%; for 25 nodes, the improvisation is 9.33%. For misidentification, the improvisation for 15 nodes is 11.1%; for 20 nodes is 14.2% improvisation; for 25 nodes it is 14.0% improvisation. For throughput, the improvisation for 15 nodes

526 Table 1 Comparison table

S. Bhaskar et al.

Comparison

15 (%)

20 (%)

Identification

4.72

7.69

Misidentification

11.1

14.2

Throughput

15.6

10.8

25 (%) 9.33 14.0 9.52

is 15.6%; for 20 nodes the improvisation is 10.8%; and for 25 nodes it is 7.09% improvisation (Table 1).

4 Conclusion A large number of sensor nodes have been deployed as a result of the Internet of Things’ rapid expansion, and the data is also gathered from various sensors and is integrated for achieving an effective data transmission. This research study introduces the TADP model, which verifies sensor nodes and data before aggregation as an effort to protect aggregation from any type of attack. Also, based on its faulty classification, the Transmission model is assessed by introducing 15, 20, and 25 nodes. The TADP model has achieved a better performance upon the calculation of throughput and compared it with the existing aggregation approach. The use of data integrity solutions like blockchain will play a crucial role in the future research.

References 1. Luong NC, Hoang DT, Wang P, Niyato D, Kim DI, Han Z (2016) Data collection and wireless communication in Internet of Things (IoT) using economic analysis and pricing models: a survey. IEEE Commun. Surveys Tuts 18(4):2546–2590. https://doi.org/10.1109/COMST.2016. 2582841 2. Oriwoh E, Conrad M (2015) Things’ in the internet of things: towards a definition. Int J Int Things 4(1):1–5 3. Alhandi SA, Kamaludin H, Alduais NAM (2023) Trust evaluation model in IoT environment: a comprehensive survey. IEEE Access 11:11165–11182. https://doi.org/10.1109/ACCESS.2023. 3240990 4. Landaluce H, Arjona L, Perallos A, Falcone F, Angulo I, Muralter F (2020) A review of IoT sensing applications and challenges using RFID and wireless sensor networks. Sensors 20(9):2495. https://doi.org/10.3390/s20092495 5. Mon SFA, Winster SG, Ramesh R (2022) Trust model for IoT using cluster analysis: a centralized approach. Wireless Pers Commun 127(1):715–736. https://doi.org/10.1007/s11277-02108401-7 6. Xia Y et al (2023) A trust-based reliable confident information coverage model of wireless sensor networks for intelligent transportation. IEEE Transact Veh Technol 7. Pandey D, Vandana K (2023) Impact of security attacks on congestion in wireless sensor networks. In: Intelligent cyber physical systems and internet of things: ICoICI, Springer International Publishing, Cham, pp 721–732

Trust Aware Distributed Protocol for Malicious Node Detection …

527

8. Sahu M, Nilambar S, Susanta KD (2023) A survey on detection of malicious nodes in wireless sensor networks. In: 2022 6th International conference on trends in electronics and informatics (ICOEI), IEEE 9. Zawaideh F, Salamah M, Al-Bahadili H (2017) A fair trust-based malicious node detection and isolation scheme for WSNs. In: Procceding 2nd IT-DREPS, Amman, Jordan, pp 10. Zeng LG, Yuan LY, Wang H (2018) Detecting WSN node misbehavior based on the trust mechanism. J Nat Sci 41(1):39–43 11. Su YX, Gao XF, Lu Y (2018) ‘Credibility based WSN trust model.’ Electron Opt Control 25(3):32–36. https://doi.org/10.3969/j.issn.1671-637X.2018.03.008 12. Prabha VR, Latha P (2017) Fuzzy trust protocol for malicious node detection in wireless sensor networks. Wireless Pers Commun 94(4):2549–2559. https://doi.org/10.1007/s11277016-3666-1 13. Zhang W, Zhu S, Tang J, Xiong N (2018) A novel trust management scheme based on Dempster-Shafer evidence theory for malicious nodes detection in wireless sensor networks. J Supercomput 74(4):1779–1801. https://doi.org/10.1007/s11227-017-2150-3

A Review on YOLOv8 and Its Advancements Mupparaju Sohan, Thotakura Sai Ram, and Ch. Venkata Rami Reddy

1 Introduction Object detection is a fundamental task in computer vision that involves identifying and classifying objects present in images or videos [1]. It has numerous practical applications across various domains such as robotics, autonomous vehicles, surveillance, and augmented reality [2]. It is now widely used in industries such as transportation, mining, and construction, which significantly improved safety measures [3]. One such application uses computer vision algorithms to detect whether the employees are in potentially dangerous situations by identifying whether employees are wearing safety equipment, such as helmets, which can assure safety in dangerous areas [3]. It can also be utilized in autonomous vehicles to detect pedestrians and other cars on the road [4]. Object detection in robotics can be used to locate and recognize items for performing manipulation or interaction [5]. Over time, various techniques are developed to tackle the object detection challenge, ranging from traditional machine-learning approaches to new deep-learning models. Initially, object detection was approached as a pipeline consisting of three main steps: proposal generation, feature extraction, and region classification. Traditional machine-learning techniques [6–8], including support vector machines (SVM), were popular due to their success on small training datasets. However, these methods had limited effectiveness, and the detection accuracy improvements were marginal. The emergence of deep learning has enabled a significant change in object detection with deep convolutional neural networks (CNNs) playing a crucial role in transforming the field of computer vision and enabling the development of more precise and resilient object detection methods [9]. Deep neural networks can generate hierarchical features, capture different scale information in different layers, and produce robust and discriminative features for classification. M. Sohan (B) · T. Sai Ram · Ch. V. Rami Reddy School of Computer Science and Engineering, VIT-AP University, Amaravati 522237, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_39

529

530

M. Sohan et al.

This research study will discuss about the most recent YOLO model YOLOv8, its development and implications in object detection along with the speed and accuracy that have emerged throughout the framework’s development.

2 Existing Object Detection Models Currently, deep learning-based object detection frameworks can be broadly classified into two families: two-stage detectors and one-stage detectors. Two-stage detectors, such as Region-based CNN (RCNN) and its variants, first generate object proposals and then classify each proposal. One-stage detectors, such as You Only Look Once (YOLO) and its variants, directly predict the presence and location of objects in a single step. RCNN (Region-based Convolutional Neural Network). Proposed by Ross Girshick et al. in 2014, is a two-stage object detection model [10]. It first generates region proposals using a selective search algorithm and then extracts features from these regions using a CNN. Finally, the extracted features are fed into an SVM for object classification. The main limitation of RCNN is its slow training and inference speed due to its two-stage approach and the selective search algorithm. SPPNet (Spatial Pyramid Pooling Network). Proposed by Kaiming He et al. in 2014 [11], is an improvement over RCNN that addresses the slow speed issue. It introduces Spatial Pyramid Pooling (SPP) to enable the network to take inputs of arbitrary sizes and output fixed-length feature vectors, thus eliminating the need for cropping or warping the input image. However, it still relies on a selective search algorithm for region proposals, which limits its performance. Fast RCNN. Proposed by Ross Girshick in 2015 [12], is an improvement over RCNN and SPPNet that eliminates the separate feature extraction step by introducing an RoI pooling layer that shares the feature map across all RoIs (region of interests). This significantly reduces the computation time and improves accuracy compared to RCNN and SPPNet. Faster RCNN. proposed by Shaoqing Ren et al. in 2015 [13], is a further improvement over Fast RCNN that replaces the selective search algorithm with a Region Proposal Network (RPN) to generate region proposals in a single forward pass. This leads to a significant reduction in computation time and improves the accuracy of object detection. FPN (Feature Pyramid Networks). Proposed by Tsung-Yi Lin et al. in 2017 [14], is an extension of Faster RCNN that addresses the issue of detecting objects at different scales. It introduces a top-down pathway and lateral connections that combine features at different levels of a CNN to form a feature pyramid. This enables the network to detect objects at different scales and improves the accuracy of object detection. YOLOv1. Proposed by Joseph Redmon et al. in 2015 [15], is a one-stage object detection model that uses a single convolutional neural network to predict object classes and bounding boxes directly from full images. It divides the image into a grid

A Review on YOLOv8 and Its Advancements

531

of cells and predicts multiple bounding boxes and object classes for each cell. The main limitation of YOLOv1 is its poor performance on small objects. SSD (Single Shot MultiBox Detector). Proposed by Wei Liu et al. in 2016 [16], is also a one-stage object detection model that uses a similar approach to YOLOv1. However, it introduces additional convolutional layers to predict object categories and offsets for default boxes of different scales and aspect ratios. This enables it to better handle objects at different scales and aspect ratios. YOLOv2. Proposed by Joseph Redmon and Ali Farhadi in 2017 [17], is an improvement over YOLOv1 that addresses its limitations. It introduces anchor boxes and batch normalization to improve the accuracy of object detection. Anchor boxes are used to better handle objects at different scales and aspect ratios, while batch normalization improves the stability of the network during training. YOLOv3. Proposed by Joseph Redmon and Ali Farhadi in 2018 [18], is a further improvement over YOLOv2 that introduces a number of changes to improve accuracy and speed. It uses a feature pyramid network and predicts object categories and bounding boxes at three different scales to better handle objects of different sizes. It also introduces new techniques such as multi-scale training and dynamic anchor assignment to improve the accuracy of object detection. YOLOv4. proposed by Alexey Bochkovskiy et al. in 2020 [19], is a significant improvement over YOLOv3 that introduces a number of new techniques to improve both accuracy and speed. It uses a CSPDarknet backbone and introduces new techniques such as spatial attention, Mish activation function, and GIoU loss to improve accuracy. It also introduces new training techniques such as the selfattention mechanism and Mosaic data augmentation to improve the robustness of the network. YOLOv5. proposed by Glenn Jocher et al. in 2020 [20], is another significant improvement over YOLOv3 that introduces a new architecture and new techniques to improve accuracy and speed. It uses a novel backbone architecture and introduces new techniques such as focal loss, label smoothing, and auto-augmentation to improve accuracy. It also introduces new training techniques such as CutMix data augmentation and learning rate schedulers to improve the convergence rate of the network. YOLOv6. Developed by the Meituan researchers in 2022 [21], is an object detection model designed primarily for industrial applications. Its hardware-efficient design and improved performance surpass that of YOLOv5 in terms of both detection accuracy and inference speed. It uses a EfficientRep backbone and techniques like an Anchor-free paradigm, SimOTA tag assignment, and SIoU box regression loss to improve speed and accuracy. YOLOv7. Proposed by Chien-Yao Wang and Alexey Bochkovskiy in 2022 [22], is an improvement over Scaled-YOLOv4 and YOLOR which are based on YOLOv4. It introduces architectural reforms such as E-ELAN and techniques such as Model Scaling Techniques, Re-parameterization Planning, and Coarse-to-fine auxiliary head supervision to improve its efficiency.

532

M. Sohan et al.

Fig. 1 Timeline of YOLO advancements

3 Overview of YOLOv8 YOLOv8 is the latest version [23, 24] of the YOLO (You Only Look Once) models. The YOLO models are popular for their accuracy and compact size. It is a state-of-theart model that could be trained on any powerful or low-end hardware. Alternatively, they can also be trained and deployed on the cloud. The first YOLO model was introduced in a C repository called Darknet in 2015 by Joseph Redmond [15] when he was working on it as PHD at the University of Washington. It has since been developed by the community for subsequent versions. Timeline of YOLO Advancements is shown in Fig. 1. YOLOv8 was developed by Ultralytics, a team known for its innovative YOLOv5 model [20]. It was introduced on 10 January 2023. YOLOv8 is used to detect objects in images, classify images, and distinguish objects from each other. Ultralytics has made numerous enhancements to YOLOv8, making it better and more user-friendly than YOLOv5. It is an advanced model that improves upon the success of YOLOv5 by incorporating modifications that enhance its power and user-friendliness in various computer vision tasks. These enhancements include a modified backbone network, an anchor-free detection head, and a new loss function. Furthermore, it provides builtin support for image classification tasks. YOLOv8 is distinctive in that it delivers unmatched speed and accuracy performance while maintaining a streamlined design that makes it suitable for different applications and easy to adapt to various hardware platforms.

4 Architecture of YOLOv8 As of the recent research literature, there is no published research study yet on YOLOv8, so detailed insights into the research techniques and the studies conducted during its development are unavailable. However, an analysis of the YOLOv8 repository [24] and its documentation [23] over its predecessor YOLOv5 [20] reveals several key features and architectural improvements.

A Review on YOLOv8 and Its Advancements

533

5 Architecture Components The YOLOv8 architecture is composed of two major parts, namely the backbone and head, both of which use a fully convolutional neural network. Backbone. YOLOv8 features a new backbone network which is a modified version of the CSPDarknet53 architecture [26] which consists of 53 convolutional layers and employs a technique called cross-stage partial connections to enhance the transmission of information across the various levels of the network. This Backbone of YOLOv8 consists of multiple convolutional layers organized in a sequential manner that extract relevant features from the input image. The new C2f module integrates high-level features with contextual information to enhance detection accuracy. The SPPF [11] (spatial pyramid pooling faster) module, and the other following convolution layers, process features at various scales. [25] Head. The head then takes the feature maps produced by the Backbone and further processes them to provide the model’s final output in the form of bounding boxes and object classes. In YOLOv8, the head is created to be detachable, which implies that it manages objectness scores, classification, and regression duties in an independent manner. This approach allows each branch to focus on its own task while improving the model’s overall accuracy. The U layers (Upsample layers) in Fig. 2. increase the resolution of the feature maps. [25] The head uses a sequence of convolutional layers to analyze the feature maps, followed by a linear layer for predicting the bounding boxes and class probabilities. The head’s design is optimized for speed and accuracy, with special consideration given to the number of channels and kernel sizes of each layer to maximize performance.

Fig. 2 YOLOv8 architecture visualization, arrows represent data flow between layers

534

M. Sohan et al.

Finally, the detection module uses a set of convolution and linear layers to map the high-dimensional features to the output bounding boxes and object classes. The entire structure is designed to be quick and effective, yet it maintains high precision in object detection.

6 Architectural Advancements Anchor-Free Detection. Similar to YOLOv6 and YOLOv7, YOLOv8 is a model that does not rely on anchors. This means that it predicts the centre of an object directly rather than the offset from a known anchor box. Anchor boxes were a wellknown challenging aspect of early YOLO models (YOLOv5 and earlier) since these could represent the target benchmark’s box distribution but not the distribution of the custom dataset. The use of anchor-free detection minimizes the number of box predictions, which speeds up Non-Maximum Suppression (NMS), a complex postprocessing phase that sifts through candidate detections following inference [27]. New Convolution Layer. The convolutional (conv) layers in YOLO architecture are responsible for detecting features in input images using learnable filters. These layers detect features at different scales and resolutions, allowing the network to detect objects of varying sizes and shapes. The output of these layers is then passed through other layers to generate bounding boxes and class predictions for each object detected in the image. Unlike YOLOv5, YOLOv8 uses a different convolution layer called C2f. This new layer replaces the C3 layer of YOLOv5. The C2f layer in YOLOv8 concatenates the outputs of all the Bottleneck layers, while in the C3 layer of YOLOv5, only the output of the last Bottleneck layer is utilized. C3 Module of YOLOv5, the number of bottleneck layers is n. ConvBNSiLU is a block composed of a Conv, a BatchNorm, and a SiLU layer as shown in Fig. 3. C2f Module of YOLOv8, Conv is a block composed of a Conv2d, a BatchNorm, and a SiLU layer as shown in Fig. 4. The bottleneck in this system is similar to that of YOLOv5 but with a change in the first convolution layer. The kernel size has been increased from 1 × 1 to 3 × 3, which is similar to the ResNet block described in 2015. In the neck of the system, the features are concatenated without requiring them to have the same channel dimensions, which reduces the number of parameters and the total size of the tensors.

A Review on YOLOv8 and Its Advancements Fig. 3 C3 Module of YOLOv5, the number of bottleneck layers is n. ConvBNSiLU is a block composed of a Conv, a BatchNorm, and a SiLU layer

Fig. 4 C2f Module of YOLOv8, Conv is a block composed of a Conv2d, a BatchNorm, and a SiLU layer

535

536

M. Sohan et al.

7 Training and Inference 7.1 Downloadable Python Package via Pip YOLOv8 can now be installed through a PIP package, making it easy for users to install and manage YOLOv5 for training and inference. This simplifies the installation process and allows for easy updates and compatibility with other Python libraries. Users can simply use the pip package manager to install YOLOv8 and start using it for their computer vision tasks, further increasing the accessibility and usability of the model. YOLOv8 can also be installed from the source on GitHub.

8 Command Line Interface (CLI) One of the most useful features of YOLOv8 is its ultralytics package distributed with a CLI. It supports easy single-line commands without requiring a Python environment. CLI does not require any customization or Python code. The CLI provides options for specifying dataset paths, model architecture, training parameters, and output directories. This allows users to easily customize the training process according to their specific requirements. Compared to previous versions of YOLO, the CLI in YOLOv8 offers additional options for fine-tuning, model evaluation, and distributed training, providing enhanced flexibility and control over the training process.

9 YOLOv8 Python SDK The YOLOv8 model also comes with a Pythonic Model and Trainer interface, making it easier to integrate the YOLO model into custom Python scripts with just a few lines of code. This enables users to leverage the power of YOLOv8 for object detection, image classification, and instance segmentation tasks with minimal effort. This streamlined integration process is a significant advancement, as it eliminates the need for complex configurations or lengthy setup procedures, making YOLOv8 more accessible and convenient for developers to use in their Python-based projects.

10 YOLOv8 Tasks and Modes The YOLOv8 framework can be used to perform computer vision tasks such as detection, segmentation, classification, and pose estimation. It comes with pretrained models for each task. The pretrained models for detection, Segmentation, and

A Review on YOLOv8 and Its Advancements

537

Pose are pretrained on the COCO dataset [25, 26], while classification models are pretrained on the ImageNet dataset. YOLOv8 introduces scaled versions such as YOLOv8n (nano), YOLOv8s (small), YOLOv8m (medium), YOLOv8l (large), and YOLOv8x (extra big). These several versions provide variable model sizes and capabilities, catering to various requirements and use scenarios. For Segmentation, Classification and Pose estimation; these various scaled versions use suffixes such as -seg, -cls, and -pose respectively. These tasks don’t require additional commands and scripts for making masks, contours, or for classifying the images. With a well-labelled and sufficient dataset, the accuracy can be high. Also using a GPU over a CPU is recommended for the training process to further enhance the performance by decreasing computation time. YOLOv8 offers multiple modes that can be used either through a command line interface (CLI) or through Python scripting, allowing users to perform different tasks based on their specific needs and requirements. These modes are. Train. This mode is used to train a custom model on a dataset with specified hyperparameters. During the training process, YOLOv8 employs adaptive techniques to optimize the learning rate and balance the loss function. This leads to enhanced model performance. Val. This mode is used to evaluate a trained model on a validation set to measure its accuracy and generalization performance. This mode can help in tuning the hyperparameters of the model for improved performance. Predict. This mode is used to make predictions using a trained model on new images or videos. The model is loaded from a checkpoint file, and users can input images or videos for inference. The model predicts object classes and locations in the input file. Export. This mode is used to convert a trained model to a format suitable for deployment in other software applications or hardware devices. This mode is useful for deploying the model in production environments. Commonly used YOLOv8 export formats are PyTorch, TorchScript, TensorRT, CoreML, and PaddlePaddle. Track. This mode is used to perform real-time object tracking in live video streams. The model is loaded from a checkpoint file and can be used for applications like surveillance systems or self-driving cars. Benchmark. This mode is used to profile the performance of different export formats in terms of speed and accuracy. It provides information on the size of the exported format, mAP50-95 metrics for object detection, segmentation, and pose, or accuracy_top5 metrics for classification, as well as inference time per image. This enables users to select the most suitable export format for their particular use case, considering their requirements for speed and accuracy.

538

M. Sohan et al.

11 User Experience (UX) Enhancements YOLOv8 improves on prior versions by introducing new modules such as spatial attention, feature fusion, and context aggregation. Each YOLO task has its own Trainers, Validators, and Predictors, which can be customized to support custom user tasks or research and development ideas. Callbacks are utilized as a point of entry at strategic stages during the train, val, export, and predict modes in YOLOv8. These callbacks accept an object of either a trainer, validator, or predictor depending on the type of operation. The model’s performance, speed, and accuracy are highly influenced by YOLO settings, hyperparameters, and augmentation, which can also influence model behaviour at various stages of model development, such as training, validation, and prediction. These can be configured by using either CLI or Python scripts.

12 Performance Evaluation 12.1 Object Detection Metrics mAPval. mAPval stands for Mean Average Precision on the validation set. It is a popular metric used to evaluate the accuracy of an object detection model. Average Precision (AP) is calculated for each class in the validation set, and then the mean is taken across all classes to obtain the mAPval score. A higher mAPval score indicates better accuracy of the object detection model in detecting objects of different classes in the validation set. Speed CPU ONNX. This relates to the object identification model’s speed while running on a CPU (Central Processing Unit) using the ONNX (Open Neural Network Exchange) runtime. ONNX is a prominent deep-learning model representation format, and model speed can be quantified in terms of inference time or frames per second (FPS). Higher values for Speed CPU ONNX indicate faster inference times on a CPU, which can be important for real-time or near-real-time applications. Speed A100 TensorRT. This refers to the speed of the object detection model when running on an A100 GPU (Graphics Processing Unit) using TensorRT, which is an optimization library developed by NVIDIA for deep-learning inference. Similar to Speed CPU ONNX, the speed can be measured in terms of inference time or frames per second (FPS). Higher values for Speed A100 TensorRT indicate faster inference times on a powerful GPU, which can be beneficial for applications that require high throughput or real-time processing. Latency A100 TensorRT FP16 (ms/img). This refers to the latency or inference time of the object detection model when running on an NVIDIA A100 GPU with TensorRT optimization, using the FP16 (half-precision floating point) data type. It indicates how much time the model takes to process a single image, typically

A Review on YOLOv8 and Its Advancements

539

Table 1 Performance of YOLOv8 pretrained models for detection Model

Size (pixels)

mAPval 50–95

Speed CPU ONNX (ms)

Speed A100 TensorRT (ms)

params (M)

FLOPs (B)

YOLOv8n

640

37.3

80.4

0.99

3.2

8.7

YOLOv8s

640

44.9

128.4

1.20

11.2

28.6

YOLOv8m

640

50.2

234.7

1.83

25.9

78.9

YOLOv8l

640

52.9

375.2

2.39

43.7

165.2

YOLOv8x

640

53.9

479.1

3.53

68.2

257.8

Sourced from the Ultralytics GitHub Repository (https://github.com/ultralytics/ultralytics)

measured in milliseconds per image (ms/img). Lower values indicate faster inference times, which are desirable for real-time or low-latency applications. Params (M). Params (M) refers to the number of model parameters in millions. It represents the size of the model, and generally larger models tend to have more capacity for learning complex patterns but may also require more computational resources for training and inference. FLOPs (B). FLOPs (B) stands for Floating-point operations per second in billions. It is a measure of the computational complexity of the model, indicating the number of floating-point operations the model performs per second during inference. Lower FLOPs (B) values indicate less computational complexity and can be desirable for resource-constrained environments, while higher values indicate more computational complexity and may require more powerful hardware for efficient inference.

13 Benchmark Datasets and Computational Efficiency Performance of YOLOv8 on COCO. The COCO val2017 dataset [28, 29] is a commonly used benchmark dataset for evaluating object detection models. It consists of a large collection of more than 5000 diverse images with 80 object categories, and it provides annotations for object instances, object categories, and other relevant information. The dataset is an Industry-Standard benchmark for object detection performance and for comparing the accuracy and speed of different object detection models. Performance of YOLOv8 pretrained models sourced from ultralytics Github repository is shown in Table 1. The training and results were sourced from Ultralytics GitHub repository[24].1 All scaled versions of YOLOv8 along with previous versions of YOLO i.e. YOLOv5, YOLOv7 were trained on COCO. Here mAPval values are for single-model singlescale on the COCO val2017 dataset and Speed is averaged over COCO val images using an Amazon EC2 P4d instance. Performance comparison of all scaled versions of YOLOv8, YOLOv7, YOLOv6, and YOLOv5 is shown in Fig. 5. 1

Ulatralytics GitHub repository https://github.com/ultralytics/ultralytics.

540

M. Sohan et al.

Fig. 5 Performance comparison of all scaled versions of YOLOv8, YOLOv7, YOLOv6, and YOLOv5 (Source: GitHub repository “ultralytics/ ultralytics” by Ultralytics, https://github.com/ultral ytics/ultralytics)

Performance of YOLOv8 on RF100. The Roboflow 100 (RF100) dataset [30, 31] is a diverse, multi-domain benchmark comprising 100 datasets created by using over 90,000 public datasets and 60 million public images from the Roboflow Universe, a web application for computer vision practitioners. The dataset aims to provide a more comprehensive evaluation of object detection models by offering a wide range of real-life domains, including satellite, microscopic, and gaming images. With RF100, researchers can test their models’ generalizability on semantically diverse data. YOLOv8 is evaluated on the RF100 benchmark alongside YOLOv5 and YOLOv7. [email protected] is a specific version of the mAP metric that measures the average precision of a model at a detection confidence threshold of 0.5. In other words, it measures how well the model is able to detect objects when it is at least 50% confident that an object is present in the image. The process and results were sourced from the Roboflow blog[32].2 Small versions of each model are trained for a total of 100 epochs. To minimize the effect of random initialization and ensure reproducibility, each experiment is run using a single seed. 2

Roboflow blog https://blog.roboflow.com/whats-new-in-yolov8/.

A Review on YOLOv8 and Its Advancements

541

Fig. 6 Box plots show each model’s [email protected]. (Source: “What’s New in YOLOv8?” by Jacob Solawetz and Francesco, published on the Roboflow blog on [March 27, 2023], https://blog.rob oflow.com/whats-new-in-yolov8/)

According to the box plot in Fig. 6, YOLOv8 performed better than yolov7 and yolov5 on the Roboflow 100 benchmark in terms of mAP and had fewer outliers, indicating a more consistent performance. The analytical results, which compare YOLOv5 with YOLOv8 performance in domain-specific tasks, show that YOLOv8 outperforms YOLOv5. Specifically, YOLOv8 obtained a mean average precision (mAP) score of 80.2% on Roboflow 100, while YOLOv5’s score was 73.5%. These findings suggest that YOLOv8 is significantly more effective than YOLOv5 for domain-specific tasks. The bar plot in Fig. 7 shows the performance of YOLOv8, YOLOv7, and YOLOv8 evaluated on each category of the RF100 dataset. Small versions of each model are trained for a total of 100 epochs as done with the whole RF100 dataset above. From the plots in Fig. 7, it is observed that the mAP of YOLOv8 is similar to or significantly improved than its previous versions. By comparison of the evaluation, it is observed that the YOLOv8 model outperforms YOLOv5 and YOLOv7 in each category of the dataset. From the above performance, YOLOv8 is observed to be not only faster and more accurate than previous versions, but it also requires fewer parameters to attain its performance.

542

M. Sohan et al.

Fig. 7 Bar plot shows the average [email protected] for each RF100 category. (Source: “What’s New in YOLOv8?” by Jacob Solawetz and Francesco, published on the Roboflow blog on [March 27, 2023], https://blog.roboflow.com/whats-new-in-yolov8/)

14 Applications and Use Cases Autonomous Vehicles. YOLOv8’s real-time object detection capabilities make it a powerful tool for enhancing the safety of autonomous vehicles. By accurately detecting and tracking other vehicles, pedestrians, and traffic signals, YOLOv8 can help self-driving cars navigate complex traffic scenarios with greater efficiency and safety. Medical Imaging. YOLOv8 can be used in medical imaging to detect and classify numerous anomalies and diseases such as cancer, tumours, and fractures. It can also be utilized for surgical planning and guidance, as well as tracking medical tools in real time during surgery. YOLOv8’s improved accuracy and speed can assist medical practitioners in making faster and more accurate diagnoses, therefore improving patient outcomes. Manufacturing. In the field of manufacturing, YOLOv8 can identify product flaws by detecting deviations in shape, size, or colour. It can also verify that the appropriate parts are being used in assembly line processes and monitor inventory levels to prevent shortages. Security. YOLOv8 can be used for surveillance to identify individuals and objects in restricted areas, enabling the detection of intruders and unauthorized access. It can also monitor crowd movement and traffic flow in congested public spaces

A Review on YOLOv8 and Its Advancements

543

such as airports and train stations. Additionally, it can detect potentially threatening behaviour to aid in identifying security risks. Sports Analysis. The sports analysis field can utilize YOLOv8 to track the movements of players, detect the location of the ball, and classify the actions of players. This data can help assess player performance, formulate game plans, and identify areas that require improvement. Autonomous Drones. Autonomous drones can leverage YOLOv8 for detecting and tracking objects, including moving targets, to enable applications like surveillance, search and rescue, and structural or infrastructure inspection. Agriculture. Regarding agriculture, YOLOv8 can track crop growth, detect crop diseases, and recognize pests. It can also facilitate precision agriculture by identifying areas of a field that require varying degrees of water or fertilizer. By providing faster and more precise data, YOLOv8 can support farmers in making more informed decisions, increasing crop yields, and decreasing waste. Robotics. In the domain of robotics, YOLOv8 can assist robots in recognizing and interacting with objects in their surroundings. It can facilitate object tracking, manipulation, real-time navigation, and obstacle avoidance. By providing greater speed and precision, YOLOv8 can enable robots to undertake more intricate tasks, including warehouse automation, manufacturing, and search and rescue missions. Environmental Monitoring. YOLOv8 can identify and categorize animals, track deforestation, and monitor pollution levels as such it can be used for environmental monitoring. This data can facilitate the development of conservation plans, highlight environmental concerns, and assess the effects of human activity on the environment. Traffic Management. YOLOv8 can assist in traffic management to recognize and trail vehicles, monitor traffic buildup, and manage traffic lights. These applications can contribute to decreasing the occurrence of traffic accidents, improving traffic flow, and reducing the amount of time commuters spend travelling. YOLOv8 can be instrumental in achieving these objectives. From the above applications, it can be observed that YOLOv8 is well-suited to a wide range of computer vision applications because of its improved speed and precision. It can be utilised in both traditional applications like medical imaging and surveillance as well as emergent ones like gaming and augmented reality.

15 Conclusion In this study, YOLOv8, its architecture and advancements along with an analysis of its performance has been discussed on various datasets by comparing it with previous models of YOLO. The introduction of YOLO v8 is a noteworthy achievement in the research progress of object detection models. YOLO’s latest edition leverages the advantages of its prior versions and incorporates various novel elements that boost its efficacy and effectiveness. YOLO v8 incorporates cutting-edge techniques that have been shown to improve object detection accuracy and speed while reducing computation and memory requirements, such as the addition of attention modules

544

M. Sohan et al.

and self-attention mechanisms and the use of spatial pyramid pooling and deformable convolutions. Overall, YOLO v8 exhibits great potential as an object detection model that can enhance real-time detection capabilities. This latest version of YOLO is a notable advancement in the field of computer vision and is likely to stimulate additional exploration and progress in this domain.

References 1. Deng J, Xuan X, Wang W et al (2020) A review of research on object detection based on deep learning. J Phys: Conf Ser 1684:012028. https://doi.org/10.1088/1742-6596/1684/1/012028 2. Bianchini M, Simic M, Ghosh A, Shaw RN (2022) In machine learning for robotics applications. Springer Verlag, Singapore, S.l. 3. Agrawal T, Kirkpatrick C, Imran K, Figus M (2020) Automatically detecting personal protective equipment on persons in images using amazon recognition. Amazon, 2020. Retrieved from https://aws.amazon.com/blogs/machine-learning/automatically-detecting-personal-pro tective-equipment-on-persons-in-images-using-amazon-rekognition/. (accessed April 27, 2023) 4. Rasouli A, Tsotsos JK (2019) Autonomous vehicles that interact with pedestrians: a survey of theory and practice. IEEE Trans Intell Transp Syst 21:900–918. https://doi.org/10.1109/TITS. 2019.2901817 5. Martinez-Martin E, del Pobil AP (2017) Object detection and recognition for assistive robots: experimentation and implementation. IEEE Robot Autom Mag 24:123–138. https://doi.org/ 10.1109/MRA.2016.2615329 6. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, n.d. https://doi.org/10.1109/cvpr.2001.990517 7. Dalal N, Triggs B Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). https:// doi.org/10.1109/cvpr.2005.177 8. Felzenszwalb PF, Ross BG, David M (2010) Cascade object detection with deformable part models. In: IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/cvpr.2010.5539906 9. Nash R (2015) An introduction to convolutional neural networks. arXiv preprint arXiv:1511. 08458 10. Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Malik. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. https://doi.org/10.1109/cvpr.2014.81. 11. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916. https://doi.org/10. 1109/TPAMI.2015.2389824 12. Girshick R (2015) Fast R-CNN. In: IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/iccv.2015.169 13. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149. https://doi.org/ 10.1109/TPAMI.2016.2577031 14. Lin TY, Piotr D, Ross G, Kaiming H, Bharath H, Serge B (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2017.106 15. Redmon J, Divvala S, Girshick R, Farhadi A (2016) you only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2016.91

A Review on YOLOv8 and Its Advancements

545

16. Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. Comput Vision— ECCV 216:21–37. https://doi.org/10.1007/978-3-319-46448-0_2 17. Redmon J, Ali F (2017) Yolo9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2017.690 18. Redmon J, Ali F (2018) “YOLOv3: an incremental improvement. arXiv preprint arXiv:1804. 02767 19. Bochkovskiy A, Chien-Yao W, Hong-Yuan ML (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 20. Jocher G (2020) YOLOv5 by Ultralytics (Version 7.0). Comput Softw. https://doi.org/10.5281/ zenodo.3908559 21. Li C, Lulu L, Hongliang J, Kaiheng W, Yifei G, Liang L, Zaidan K, Qingyuan L, Meng C, Weiqiang N, Yiduo L (2022) YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 22. Wang CY, Alexey B, Hong-Yuan ML (2022) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 23. Ultralytics (2023) YOLOv8 Docs. Retrieved from https://docs.ultralytics.com/. accessed April 27, 2023 24. Jocher G, Chaurasia A, Qiu J (2023) YOLO by Ultralytics (Version 8.0.0). Computer software. GitHub. Retrieved from https://github.com/ultralytics/ultralytics. 25. Range K, Jocher G (2023) Brief summary of YOLOv8 model structure. GitHub Issue. Retrieved from https://github.com/ultralytics/ultralytics/issues/189. accessed April 27, 2023 26. Bochkovskiy A, Wang C, Liao HM (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. 27. Liu W, Hasan I, Liao S (2023) Center and scale prediction: anchor-free approach for pedestrian and face detection. Pattern Recogn 135:109071. https://doi.org/10.1016/j.patcog.2022.109071 28. Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. Computer Vision – ECCV, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48 29. Common Objects in Context (2023) COCO. Retrieved from https://cocodataset.org/. accessed April 27, 2023 30. Ciaglia F, Zuppichini FS, Guerrie P, McQuade M, Solawetz J (2022) Roboflow 100: a rich, multi-domain object detection benchmark. arXiv preprint arXiv:2211.13523 31. Roboflow 100: A new object detection benchmark (2023) RF100. Retrieved from https://www. rf100.org/ accessed April 27, 2023. 32. Solawetz JF (2023) What is YOLOv8? the ultimate guide. Blog post. Retrieved from https:// blog.roboflow.com/whats-new-in-yolov8/ accessed April 27, 2023

Assistance for Visually Impaired People in Identifying Multiple Scenes Using Deep Learning T. P. Divina, Rohan Paul Richard, and Kumudha Raimond

1 Introduction On the global scale, visually impaired people suffer from various difficulties. They need human assistance even to complete their day-to-day activities such as walking from one place to another place. In the busy world, providing human assistance to each and every task performed by visually impaired people is nearly an impossible task. Several new technologies are developed in recent times to provide automated assistance to visual impaired population to lead a normal life. Most useful application developed so far is the smart stick, which senses the obstructions in their path and provide safe navigation. Despite the hype, even now some individuals struggle to move on by correctly identifying the pebbles, stones, or other particles present in their path. One of the primary challenges faced by visually impaired population is obstacle identification. This requirement necessitates the need for adapting new technologies to help the visually impaired people in identifying the things in front of them and to ease their movements. This research work intends to assist the visually impaired people by letting them know what is the obstacle in front of them. The proposed model will be particularly useful for them as they generally walk or grab items in front of them. Also, it will facilitate to cross roads. T. P. Divina (B) · R. P. Richard · K. Raimond Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] R. P. Richard e-mail: [email protected] K. Raimond e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_40

547

548

T. P. Divina et al. Input Images / Videos

Object Detector

YOLO v3

Voice Assistance

Output

Fig. 1 Processed methodology

In this work, the proposed methodology (Fig. 1) uses YOLOv3 model to detect objects that will specifically be converted to voice with the help of text to speech assistance, which will provide great assistance for navigation.

2 Related Works The following are the inferences gained from the research literature to build the proposed model. To recognize various objects, Ref. [1] recommended employing FCN, SSD, CNN, K-means clustering, DBSCAN, and silhouette analysis and obtained 100% accuracy. The authors in [2] have used few CNN training models (VGG-16, AlexNet, and Inception V3) and achieved the scene detection with an accuracy of 79.8%, 87.3%, and 89.3%, respectively. Sarfaraz et al. have first started with the image processing. In image processing, the average RGB value has been obtained for the overall pixels present in the image. It was subtracted from each pixel value, which actually serves to normalize the data. This process has been done to check which training model of CNN works better. And at last, this study has concluded that Inception V3 architecture leverages a good accuracy of 89.3%. To recognize objects using DL, Ref. [3] makes use of techniques including histogram of oriented gradients (HOG), deformable part-based models (DPM), deep neural networks (DNNs), and R-CNN. Zhong-Qiu Zhao, et al. also used the dataset from the PASCAL Visual Object Classes (VOC). With varying degrees of R-CNN modification, the authors have also succeeded in developing object detection frameworks that address a variety of subproblems, including occlusion, clutter, and low resolution. The authors in [4] have proposed object recognition using algorithms such as CNN-Softmax and CNN-SVM. Louise el. at suggested employing transfer learning to recognize things in forward-looking sonar imagery using these methods. Also, the author has stated that using transfer learning as opposed to specially created features while creating object recognition systems for FLS can be advantageous. The proposed approach yielded an accuracy of 90% and 97% accuracy using CNN-Softmax and CNN-SVM, respectively.

Assistance for Visually Impaired People in Identifying Multiple Scenes …

549

The work by the researchers in [5] has been proposed to protect readers from risky scenarios like murder and robbery. In this project, CCTV videos will show us whether the thief is carrying a knife, a gun, or other potentially harmful weapons. Sakib, et al. took 1000 images with and without a knife for this project. The authors discovered a few strategies that will increase his productivity like BoW, CNN, HOG + SVM, and Alexnet + SVM. Sakib, el at. discovered that Alexnet with SVM performs best among all other approaches. The researchers in [6] have mainly focused on detecting the indoor objects, which means detecting objects inside the house. Yanxiang et al. made use of RGB-D data to understand the scene. In order to handle lost depth data, researchers have introduced a trustworthy in painting method for raw data. Afterwards, the authors have examined critical features and structural data from depth images, summarized their internal data related to interior scenes, and used them in segmentation and to support inference. The researchers in [7] use few approaches, namely deep learning, principle component analysis and the proposed approach, PCP-DL hybrid framework. These methods first segmented the backdrop, after which segmented the objects and identified any on-screen items. They obtained an average F-measure values of around 88, 80, and 70% in PCP-DL hybrid framework, deep learning, principle component analysis, respectively, for color scale frame type. The results show better performance in the proposed approach. In [8], object detection in congested areas by Hong-hui Xu et al. implemented it using fine tuning the YOLOv4 algorithm, called as YOLO-CS. The approach exhibits excellent detection capability in congested settings. With the help of motion model utility framework attention model, semantic analysis approaches, expectation maximization singular value decomposition (SVD), and other techniques, the work in [9] was able to effectively encapsulate information for both video structure and highlights in a temporal graph. The proposed information for both video structure and highlight is then successfully captured in a temporal graph by modeling the evolution of a movie through a temporal graph. The proposed method simulates the development of a video using a temporal graph. The authors in [10] suggested a method that provides semantically significant scenes and is resilient to local mismatches. One image from scene key-frame could potentially represent the entire scenario, as Zeeshan et al. also suggested. They provided thorough experimental evaluation on several Hollywood movies and a TV sitcom, which supported the suggested strategy. CNN models and StyleBankNet have been suggested by [11] as an end-toend solution for preparing the training dataset for underwater object detection and validating with actual underwater sonar images. For object detection without class [12] suggests learning objects from sonar images. EdgeBoxes, selective search, and cross-correlation template matching were utilized as baselines by the author for this study. Matias Valdenegro-Toro proposed the use of a fully convolutional neural network (FCNN) for this purpose. For those who have visual impairment, Alhichri el at. [13] have suggested a multiobject scene description utilizing an ordered weighted averaging (OWA) approach by

550

T. P. Divina et al.

fusing 2 pre-trained CNN models, such as VGG16 and SqueezeNet. They obtained good results compared to the other methods in the literature. Summary of the survey is shown in Table 1.

3 Methodology Deep learning model is been used in this project: You Only Look Once Version 3 (YOLOv3) YOLOv3 (Fig. 2) is a state-of-art object detection algorithm that actually uses deep neural network to specifically detect objects in real time. The YOLOv3 algorithm essentially is designed to literally detect and specifically classify objects in realtime video streams or images, which particularly is quite significant. It works by dividing an image into a grid of cells and then predicting bounding boxes, object probabilities, and class probabilities for each cell. This process is carried out by a deep neural network that really has been trained on a large dataset of annotated images. There are several reasons why YOLOv3 is particularly used for real-time object detection. Firstly, it is very fast and can process images and videos in real time even on low-end hardware, which shows that this really makes it suitable for applications such as surveillance, autonomous vehicles, and robotics. Secondly, YOLOv3 basically is very accurate and specifically has state-of-art performance on object detection benchmarks. It achieves high precision and recall rates while maintaining a low false-positive rate. Finally, YOLOv3 literally is very flexible and can essentially be adapted to different types of objects and environments, which is significant. It can detect and essentially classify objects in wide range of scenarios, including indoor and outdoor environments, day and night conditions, and different weather conditions.

4 Process Input: The YOLOv3 algorithm takes an input or video frame in the RGB format, which is quite significant. Preprocessing: The input image specifically is pre-processed to bring it to a generally common size, usually 416 × 416 pixels, to actually ensure consistency across all images. CNN backbone: The pre-processed image is basically passed through a CNN backbone, which consists of sort of multiple convolutional layers that extract features from the image at different scales. YOLOv3 used a variant of Darknet, a deep neural network architecture, as its backbone.

Assistance for Visually Impaired People in Identifying Multiple Scenes …

551

Table 1 Summary of literature survey Reference number

Algorithms used Dataset

Merits

Demerits

[1]

FCNN KITTI dataset Spherical KNU signature descriptor (SSD) CNN K-means clustering DBSCAN Silhouette Analysis

Applicable for detecting multiple objects in both the underwater and terrestrial environments

This can be done only with underwater and in clouds but not in any other areas

[2]

CNN VGG 16 Inception V3 Alexnet

They found that Inception V3 and VGG-16 give a good accuracy

They just checked the accuracy of all three approaches that they have taken. They did not check with any other real-time dataset

[5]

BoW CCTV CNN recordings HOG + SVM Alexnet + SVM

Their model identified the knife and gun, which was their project objective. Overall, the utilization of pre-trained AlexNet with SVM gives the best performance

It just recognizes the knife and gun, but it cannot recognize anything other than that

[8]

YOLO-CS

A joint prediction They have checked it scheme based on only in crowded area YOLOv4 is proposed to not in all the areas address the dilemma in crowded scenes detection, that is, we attempt to detect a set of objects in one cell with multiple anchor groups

[9]

Expectation maximization SVD Motion model utility framework attention model Semantic analysis

PLACES2 dataset SUN397 SUN397

COCO PASCAL VOC

We have presented a novel approach for video summarization. On the one hand, the structure of videos is exploited in order to maintain the content coverage of summaries

Information for both video structure and highlight is then effectively encapsulated in a temporal graph. By modeling the evolution of a video through a temporal graph, the proposed (continued)

552

T. P. Divina et al.

Table 1 (continued) Reference number

Algorithms used Dataset

Merits

Demerits

[10]

Shot similarity graph (SSG)

Few Hollywood movies are considered as the dataset

Represents the scene content using one key-frame

They have done only with some movies but not with real-time videos

[13]

CNN VGG16 SqueezeNet GoogLeNet MobileNet ResNet OWA

KSU1 KSU2 UTrento1 UTrento2

Presents a novel Datasets are not computer vision method enough in size for the detection of the presence of multiple objects in a scene

[14]

CNN RNN Singleshot multibox detection

Pascal VOC

Outlined the ongoing initiatives taken by researchers to test self-driving cars and emphasized the role of DL in real-time object detection

Collect data in hazardous weather conditions such as rain, hail, snow, and study self-driving cars’ navigation without human intervention

Fig. 2 Architecture of YOLOv3

Feature extraction: The CNN backbone extracts feature such as edges, corners, and textures from the image, which are used for object detection. Anchor boxes: YOLOv3 generally uses anchor boxes, which are particularly predefined bounding boxes of various shapes and sizes, to detect objects of different

Assistance for Visually Impaired People in Identifying Multiple Scenes …

553

Fig. 3 Overview of object recognition

scales and aspect ratios. These anchor boxes are specifically learned during the training process. Object detection: The features extracted from the CNN backbone are used to essentially generate predictions for object classes and bounding box coordinates. YOLOv3 predicts object classes using Softmax activation and predicts bounding box coordinates in terms of the x, y, width, and height of the bounding box, demonstrating the features extracted from the CNN backbone and used to generate predictions for object classes and bounding box coordinates, which definitely is quite significant. Overview of object recognition is shown in Fig. 3. Non-maximum suppression (NMS): After obtaining the predicted bounding box coordinates, YOLOv3 really uses non-maximum suppression (NMS) to essentially remove redundant detections. NMS compares the confidence scores of overlapping bounding boxes and generally keeps only the box with the highest confidence score of each object. Output: The final output of YOLOv3 is a list of bounding boxes along with their associated class labels and confidence scores, representing the detected objects in the input images or video frame, which is essentially significant. Post-processing: Once the object is detected, post-processing steps can be performed, bounding boxes will be drawn around the objects, and labeling them with their class names helps to visually display the results.

5 Result and Discussion The developed model has been evaluated using the real-time images as shown below to check the robustness of the model. The model identifies the objects in the given image as shown in Figs. 4, 5, 6 and 7.

554

Fig. 4 Snapshot of the output using YOLOv3 (cell phone and person)

Fig. 5 Snapshot of the output using YOLOv3 (person and chair)

Fig. 6 Snapshot of the output using YOLOv3 (person and clock)

T. P. Divina et al.

Assistance for Visually Impaired People in Identifying Multiple Scenes …

555

Fig. 7 Snapshot of the output using YOLOv3 (person and cup)

The results show that the YOLOv3 model correctly detects the object, despite removing a few of the objects in the images. Furthermore, voice assistance helps in the conversion of detected items to speech, allowing visually impaired users to interpret the objects in front of them.

6 Conclusion When using YOLOv3, the combination of speed, accuracy, and ease of use really makes YOLOv3 a suitable solution for enabling automated assistance for visually impaired people in identifying the objects present in multiple scenes using mobile phones. In this work, YOLOv3 is used to detect objects and further the text-to-speech method is used to convert the detected object from text mode to speech mode, allowing visually impaired people to easily identify the objects in front of them. Overall, the real-time object detection using YOLOv3 is a promising approach with more practical applications, including surveillance, autonomous vehicles, and augmented reality. The proposed model is advantageous when compared to other existing model in terms of object detection accuracy. The future scope of this research work is to develop an application using this mobile deployment to detect and identify the distance of the objects present in front of them. Acknowledgements We would like to extend our heartfelt appreciation to all the individuals who have graciously allowed us to include their pictures in this research paper. Their valuable contributions have enriched the content and visual appeal of our work. Special thanks go to Jeremiah Mishael, Rohan Richard, who generously provided us with their photographs and granted permission for their inclusion in this publication. Their willingness to share their visual materials has enhanced the overall quality of our research.

556

T. P. Divina et al.

Lastly, we express our gratitude to our friends and family for their unwavering support throughout the research process. Their encouragement and understanding have been instrumental in our journey. Thank you all for your contributions, without which this paper would not have been possible.

References 1. Nguyen HT, Lee E-H, Bae CH, Lee S (2020) Multiple object detection based on clustering and deep learning” methods. Sensors 20:4424. https://doi.org/10.3390/s20164424 2. Masood S, Ahsan U, Munawwar F, Raza Rizvi D, Ahmed M (2020) Scene recognition from image using convolutional neural network. Proc Comput Sci 67:1005–1012. ISSN 1877-0509https://doi.org/10.1016/j.procs.2020.03.400 3. Zhao Z-Q, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. In: IEEE transactions on neural networks and learning systems 30(11) 4. Fuchs LR, Gällström A, Folkesson J (2018) Object recognition in forward looking sonar images using transfer learning. In: 2018 IEEE/OES autonomous underwater vehicle workshop (AUV), Porto, Portugal, pp 1–6.https://doi.org/10.1109/AUV.2018.8729686 5. Kibria SB, Hasan MS (2017) An analysis of feature extraction and classification algorithms for dangerous object detection. In: 2017 2nd international conference on electrical & electronic engineering (ICEEE), Rajshahi, Bangladesh, pp 1–4 6. Chen Y, Pan D, Pan Y, Liu S, Gu A, Wang M (2015) Indoor scene understanding via monocular RGB-D images. Inf Sci 320:361–371. ISSN 0020-0255 7. Abdulghafoor NH, Abdullah HN (2022) Novel real-time multiple objects detection and tracking framework for different challenges. Alex Eng J 61:9637–9647 8. Xu H, Wang X, Wang D, Duan B, Rui T (2023) Object detection in crowded scenes via joint prediction. Defence Tech 21:103–115. ISSN 2214-9147. https://doi.org/10.1016/j.dt. 2021.10.007 9. Ngo C-W, Ma Y-F, Zhang H-J (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305. https://doi.org/10.1109/ TCSVT.2004.841694 10. Rasheed Z, Shah M (2005) Detection and representation of scenes in videos. IEEE Trans Multimedia 7(6):1097–1105. https://doi.org/10.1109/TMM.2005.858392 11. Lee S, Park B, Kim A (2018) Deep learning from shallow dives: sonar image generation and training for underwater object. ArXiv./abs/1810.07990 12. Valdenegro-Toro M (2019) Learning objectness from sonar images for class-independent object detection. In: 2019 European conference on mobile robots (ECMR), Prague, Czech Republic, pp 1–6. https://doi.org/10.1109/ECMR.2019.8870959 13. Alhichri H, Bazi Y, Alajlan N (2020) Assisting the visually impaired in multi-object scene description using OWA-based fusion of CNN models. Arab J Sci Eng 45:10511–10527. https:// doi.org/10.1007/s13369-020-04799-7 14. Gupta A, Anpalagan A, Guan L, Khwaja AS (2021) Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues. Array

Identification of the Best Combination of Oversampling Technique and Machine Learning Algorithm for Credit Card Fraud Detection S. Srinivasan, A. L. Vallikannu, L. Manoharan, K. Deepthi, and B. Aravind Yadav

1 Introduction Credit card fraud poses a major threat to both financial institutions and cardholders, as it can lead to significant financial losses. As fraudsters become increasingly sophisticated with technological advancements, it has become more challenging to detect fake transactions through manual methods. Hence, there is a growing need to develop automated systems to effectively identify and prevent fraudulent activities. Machine learning algorithms have demonstrated its potential in detecting credit card fraud, particularly in scenarios where fake transactions are much less frequent than legitimate transactions. However, due to the imbalanced and infrequent occurrence of fraudulent activities, traditional machine learning algorithms may struggle to achieve high accuracy in identifying fake transactions. To overcome this issue, oversampling techniques like K-Means SMOTE, SMOTE, SMOTE-NC, ADASYN can be utilized to address the dataset imbalance. These techniques generate synthetic data points to augment the minority class and increase the model’s capacity to identify unauthorized transactions. Additionally, undersampling techniques like NearMiss can also be applied to the dataset. This research study intends to compare the effectiveness of various machine learning algorithms such as Naive Bayes, ANN, decision tree, random forest, and logistic regression when applied with different oversampling techniques on a dataset for the purpose of credit card fraud detection. Each algorithm’s performance will be measured using a variety of metrics, including precision, accuracy, F1 score, and recall to decide which method will work the best in the dataset file provided for credit S. Srinivasan (B) · A. L. Vallikannu · L. Manoharan · K. Deepthi · B. Aravind Yadav Department of ECE, Hindustan Institute of Technology and Science, Chennai, Tamil Nadu, India e-mail: [email protected] A. L. Vallikannu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_41

557

558

S. Srinivasan et al.

card fraud detection. Relevant studies and sources such as [1–3] will be utilized in this research study. The dataset file used in this research work includes credit card transactions conducted by European cardholders in September 2013. The data covers transactions over a two-day period, including 492 fake transactions out of a total of 284,807 transactions. Due to the fact that just 0.172% of transactions were fraudulent, the dataset has become highly unbalanced. Only numerical input variables that have undergone PCA transformation for privacy considerations are included in the dataset. V1, V2, …, and V28 are used to identify the changed features. Whilst the feature “Amount” displays the transaction’s value, the feature “Time” displays how many seconds have passed since the dataset’s first transaction. If a transaction is fraudulent, the response variable “Class” will take on the value “1”, otherwise it will take the value “0”. The original dataset is often split into two distinct files in order to train and test the machine learning algorithms in this work. The machine learning algorithms are trained using the first file, sometimes referred to as the training dataset. This dataset includes the “Class” variable in addition to all of the features obtained from the original dataset. The second file, usually known as the testing dataset, is used to evaluate the effectiveness of the trained machine learning algorithms. With the exception of the “Class” variable, this dataset is compared to the training dataset. Here, the predictions are generated by utilizing the testing dataset, and the algorithm’s precision is evaluated by contrasting the obtained values with the actual values in the “Class” variable. Various metrics like precision, accuracy, F1 score, and recall are frequently employed to assess how well-trained models perform on the testing datasets. The missing values are replaced with the mean or median values of the available values for the same features in such a way that the missing values are only randomly missing and this will not introduce any bias in the results. Machine learning is used to identify credit card frauds. Imbalanced datasets are frequently encountered as there are fewer fake transactions than genuine transactions. Oversampling and undersampling are used to balance the dataset to overcome this issue. Oversampling involves increased representation of the minority class (false transactions). Synthetic Minority Oversampling Technique (SMOTE) interpolates between existing minority class instances to generate new ones. Based on the density distribution, synthetic samples are produced by Adaptive Synthetic (ADASYN). SMOTE-NC and K-Means SMOTE are the two variants of SMOTE that can handle datasets with both numerical and categorical variables. On the other hand, undersampling includes lowering the proportion of instances in the dominant class (legal transactions). The undersampling method known as NearMiss selects a portion of the majority class that is similar to the minority class [4, 5]. Machine learning algorithms that are biased and perform poorly at detecting fake transactions can result from unbalanced datasets in detecting credit card frauds. Techniques for over- and undersampling data can be applied to solve this problem and enhance model performance. The minority class (fake transactions) can be raised through oversampling by creating more instances of the minority class using techniques like SMOTE (Synthetic Minority Oversampling Technique) or ADASYN (Adaptive Synthetic). K-Means and SMOTE-NC SMOTE variants are capable of

Identification of the Best Combination of Oversampling Technique …

559

handling datasets with both categorical and numerical variables. This oversampling techniques also help to get rid of unnecessary data. Whilst NearMiss selects a subset of the majority class that is most similar to the minority class, undersampling reduces the number of instances in the majority class (legal transactions) to balance the dataset. The ability of machine learning algorithms to learn from both classes in a balanced dataset would improve the identification of fake transactions and assessment metrics like precision, accuracy, F1 score, and recall.

2 Literature Survey A lot of research has been done on applying machine learning algorithms to detect credit card fraud. This literature review will highlight some of the most important works in this field. A novel approach to the issue of imbalanced classification, in which the number of examples in several classes differs noticeably, is proposed in [6]. To enhance classification performance on the minority class, the suggested method, dubbed Calibrated Probability with Undersampling (CPU), combines the undersampling of the majority class with calibrated probability estimates. On a number of real-world datasets, the authors show how effective their approach is and contrast it with a number of other cutting-edge approaches. The findings demonstrate that, in terms of a variety of evaluation measures, including precision, accuracy, F1 score, and recall, the suggested CPU technique performs better than alternative methods. Credit card fraud detection was the subject of a thorough research from a practitioner’s perspective in [7]. In this article, the authors discuss their experiences deploying a fraud detection system in a large-scale European bank. To determine which feature selection techniques and machine learning algorithms are most useful for fraud detection, the existing techniques are compared. The comparative study discusses how crucial it is to assess the system’s effectiveness using the appropriate performance metrics, such as F1 score. A few challenges, such as the problem of imbalanced datasets and model interpretability, were encountered when deploying a fraud detection system in the real world. Finally, workable solutions are offered to these problems, such as, the use of under- and oversampling methods to deal with imbalanced datasets and the use of interpretable models to learn more about the fraud detection procedure. In [8], the study suggests using SMOTE and ADASYN oversampling techniques with a number of machine learning algorithms, including KNN, decision trees, random forest, and logistic regression to identify credit card fraud. On a Kaggle dataset, the authors assessed the effectiveness of the proposed approach and discovered that the oversampling has increased the classifiers’ F1 score and accuracy. In [9], the authors tested the effectiveness of this strategy using the decision tree, logistic regression, XGBoost, and random forest algorithms and discovered that the hybrid technique has outperformed them all in terms of F1 Score.

560

S. Srinivasan et al.

In [10], a novel technique has been proposed for identifying credit card fraud by combining the AdaBoost algorithm with majority voting. In order to collect pertinent features that may differentiate fraudulent and legitimate transactions, the researchers first preprocessed credit card transaction data by performing feature selection and extraction process. The next step was to utilize the AdaBoost algorithm to train a number of weak classifiers, which were then combined into a strong classifier to categorize the transactions as fraudulent or not. The findings demonstrated that the proposed approach has outperformed other algorithms in terms of detection rate. In [11], a revolutionary learning strategy and realistic modelling are combined as a new technique for detecting credit card fraud. To identify the underlying trends in fraudulent behaviour, the authors consider a number of domain-specific variables, including the country of the merchant, the kind of transaction, and the cardholder’s age and gender. They put up the Multi-Context Attributed Network (MCAN) as a revolutionary learning approach. The results indicate that the proposed approach works better than earlier methods in terms of a number of evaluation measures, including recall, accuracy, precision, and F1-score. The study [12] describes SCARFF (Scalable framework for streaming credit card fraud detection), a framework for detecting credit card fraud in a streaming environment using Spark. In situations when typical batch processing techniques may not be appropriate, the authors discuss about the challenges in processing high-volume and high-speed credit card transactions in real time. They assess SCARFF’s performance on several real-time datasets and contrast it with various cutting-edge techniques. The findings demonstrate that the proposed framework performs better than alternative approaches in terms of a variety of performance metrics, including recall, precision, accuracy, and F1 score. In [13], a combination of active learning and streaming processing approaches to enhance the performance of fraud detection algorithms in order to deal with huge and complex datasets was discussed. On a real-time dataset of credit card transactions, an analysis will be carried out on how various active learning strategies, such as uncertainty sampling, perform. The authors suggest a unique visualization tool called FraudVis to enable interactive exploration of the outcomes and the underlying data, to visualize the performance of various methodologies. The application of active learning and streaming algorithms for credit card fraud detection in a practical context is presented in detail. In [14], this article explains an innovative method for detecting credit card fraud that combines supervised and unsupervised learning approaches. The authors propose a semi-supervised learning framework that makes use of both labelled and unlabelled data to address the problem of identifying fraud in situations where labelled data is difficult to obtain or expensive. The authors utilize clustering to find probable fraud trends and autoencoders to learn high-level representations of the input data. In order to improve the accuracy of the clustering results and the fraud detection system, additionally, a classifier trained on labelled data is used. The effectiveness of the proposed strategy is evaluated by comparing it to a number of cutting-edge techniques, such as unsupervised clustering, supervised classification, and hybrid strategies. The findings reveal that the proposed strategy outperforms the other existing approaches in

Identification of the Best Combination of Oversampling Technique …

561

terms of various evaluation measures, including precision, accuracy, F1 score, and recall. In [15], the study is more focused on analysing the adaptive machine learning methods for detecting credit card fraud. The difficulty of creating fraud detection systems that are more precise, effective, and adaptable to shifting fraud patterns and developing threats is addressed in this research study. It makes a number of contributions, including the design and development of unique feature engineering methods, the development of resilient and adaptive classifiers, and the assessment of several performance indicators for fraud detection. The findings demonstrate that, particularly in situations where fraud patterns are dynamic and evolving, the accuracy and effectiveness of fraud detection systems can be greatly increased by applying the proposed adaptive machine learning technique.

3 Existing Model In the existing model, the dataset file is provided to machine learning algorithms like Naive Bayes, Random Forest, Logistic Regression, and Decision Tree. These algorithms use the entire dataset to train and test the proposed algorithm without using any sampling techniques. When the proportion of bogus transactions compared to authorized transactions is much smaller, this situation is referred to as an “unbalanced” or “imbalanced” dataset scenario. The model may experience a class imbalance problem in this case, where it becomes biased towards the majority class (legitimate transactions) and performs poorly in identifying the minority class (fake transactions). Metrics including precision, accuracy, F1 score, and recall can be used to assess the algorithm’s performance. The dataset file containing credit card transactions done by European cardholders in September 2013 was taken into account in this model. The dataset contains 284,807 transactions in total, of which 492 are fraudulent. The algorithm is put into practice using Google Colab. First, the dataset file downloaded from Kaggle is submitted. The dataset contains 284,807 transactions, of which 284,315 are legitimate transactions and 492 are fraudulent transactions. The dataset file is divided into two groups: the majority group (which includes authentic transactions) and the minority group (which includes fake transactions). The dataset file under consideration is unbalanced since there are more legitimate transactions than fraudulent ones. The dataset file is pre-processed and divided into training and test datasets at a ratio of 80% to 20%. Each machine learning algorithm, including Logistic Regression, Decision Tree, Naive Bayes, and Random Forest, receives training data. Then, a model with a few predictions is built for each algorithm. The proposed algorithm is then provided with a test dataset, after which it makes few predictions. Then the predictions made should be compared and verified. Additionally, the machine learning technique that produces the best outcome for calculating the fraud rate in the provided dataset file should be identified. The evaluation

562

S. Srinivasan et al.

metrics rate, such as precision, accuracy, F1 score, and recall can be used to determine the best model. Since the dataset file under consideration is unbalanced, accuracy will not serve as the only indicator for how well the algorithm is working. Instead, F1 score rate, which is harmonic mean of recall and precision, is primarily considered to gain a better understanding on how well the algorithm performs in identifying fake transactions.

4 Proposed Model In the proposed approach, machine learning algorithms are used to implement the credit card fraud detection model by including Logistic Regression, Naive Bayes, Decision Tree, Random Forest, and ANN (Artificial Neural Network) algorithms, which involves training the algorithm on a dataset containing a combination of legitimate and fraudulent transactions. The algorithm gains the ability to detect the data patterns to differentiate between two different types of transactions. It is necessary to use oversampling techniques like SMOTE (Synthetic Minority Oversampling Technique), ADASYN (Adaptive Synthetic Sampling), SMOTE-NC (SMOTE for Nominal and Continuous features), and K-Means SMOTE (K-Means Synthetic Minority Oversampling Technique) to balance the dataset before testing. Following the dataset’s oversampling, the Machine Learning (ML) algorithms are trained on the newly balanced dataset and assessed using a variety of metrics, including precision, accuracy, F1 score, and recall. The effectiveness of the ML algorithms in identifying credit card fraud is evaluated using these metrics. Whilst accuracy measures the proportion of correctly classified transactions to all projected correctly classed transactions, precision compares the proportion of correctly recognized fraudulent transactions to all projected fraudulent transactions. Recall is the percentage of genuine fake transactions out of all fraudulent transactions that were successfully identified as being fraudulent. The harmonic mean of recall and precision, known as the F1 Score, is a weighted average of these two measurements. By interpolating between the minority class samples, the popular oversampling technique SMOTE creates synthetic minority class samples. A SMOTE add-on called ADASYN customizes the creation of synthetic samples based on the density distribution of the minority class. SMOTE-NC, which manages nominal and continuing features in the dataset, is an enhancement over SMOTE. K-Means SMOTE is a hybrid technique that creates artificial samples by combining SMOTE and K-Means clustering. • Logistic regression is a binary classification approach that uses a linear function of the features to represent the probability of the target class. • Naive Bayes is a probabilistic method that predicts the target class based on the Bayes theorem and assumes independence between the features. • To categorize the data, decision trees are non-parametric algorithms that divide the feature space into hierarchical tree-like structures.

Identification of the Best Combination of Oversampling Technique …

563

• In order to increase accuracy and decrease overfitting, Random Forest is an ensemble technique that combines different decision trees. • To learn complicated representations of the data, Artificial Neural Networks are deep learning algorithms that mimic the neural network architecture of the human brain. To perform the feature selection process, we used few methods like Feature Importance Ranking, L1 Regularization etc., based on the algorithm we used. In summary, to create a credit card fraud detection algorithm using ML algorithms, one needs to follow these steps: 1. Pre-process the dataset to remove any irrelevant data, deal with missing values and outliers. 2. Balance the dataset using oversampling techniques such as SMOTE, ADASYN, SMOTE-NC, or K-Means SMOTE. 3. Traın different ML algorithms on the balanced dataset, such as LR, NB, DT, RF, and ANN. 4. Use several metrics, such as precision, accuracy, F1 score, and recall to assess the models’ performance. 5. Decide which model performs the greatest, then employ it to foretell fraud on fresh credit card transactions.

5 Working Methodology Credit card fraud detection is a critical task in the finance sector. The process of credit card fraud detection extensively utilizes the machine learning methods like Naive Bayes, Artificial Neural Networks (ANN), Random Forests, Decision Trees, and Logistic Regression. The dataset file considered here includes 284,807 transactions, of which 492 are fraud transactions and the other 284,315 are real transactions. It is obvious that the dataset file is severely unbalanced. When working with an unbalanced dataset, it is possible to use sampling techniques like oversampling and undersampling to the dataset file in order to assess whether machine learning algorithms perform better. In our paper, we constructed the model by using undersampling methods like NearMiss and SMOTE as well as oversampling methods like K-Means SMOTE, ADASYN, SMOTE, and SMOTE-NC on the dataset file to assess the effectiveness of the machine learning algorithms. The performance of the machine learning algorithms can fluctuate and be improved depending on the sampling technique that is used on the dataset. The following steps can be used to summarize the methodology for detecting credit card fraud using machine learning algorithms and sampling techniques: 1. Data preprocessing is the first phase, which entails eliminating any superfluous features, addressing missing values, and normalizing the data. 2. Feature Selection: In machine learning, feature selection is a crucial phase that aids in reducing the dimensionality of the data and enhancing model performance.

564

S. Srinivasan et al.

3. Data Splitting: Training and testing datasets are created from the dataset. The testing dataset is used to evaluate the performance of the machine learning algorithms, whereas the training dataset is used to train the models. 4. Oversampling: To balance the dataset by raising the number of fake transactions, oversampling techniques like K-Means SMOTE, ADASYN, SMOTE, and SMOTE-NC are used. 5. Undersampling: To overcome the class imbalance issue in credit card fraud detection datasets, an undersampling technique like NearMiss can be applied. The majority class sample subset that is chosen using the undersampling technique NearMiss is chosen because it is closest to the minority class sample subset. 6. The NearMiss approach is used to reduce the number of real transactions in the majority class samples in order to balance the dataset. 7. Model Training: The balanced dataset is used to train machine learning models including LR, NB, DT, RF, and ANN. 8. Model Evaluation: Various performance indicators, including precision, accuracy, F1 score, and recall are used to assess the performance of the models. 9. Model Selection: The best-performing model is chosen for deployment based on the evaluation results. All the unrelated data that are not consistent with normal spending pattern are removed by amount-based filtering technique. The working methodology for detecting credit card fraud using machine learning algorithms like NB, DT, RF, and ANN involves data preprocessing, feature selection, data splitting, sampling (oversampling & undersampling), model training, model evaluation, and model selection. Evaluation Metrics: • Accuracy: This measures the proportion of correctly classified transactions categorized the total number of transactions. High accuracy suggests that a significant portion of transactions are being correctly identified by the model. Accuracy = (TN + TP)/(TN + TP + FN + FP) where TN TP FP FN

True Positive True Negative False Positive False Negative.

• Precision: This measures the proportion of true positive fraud cases out of all predicted positive cases (predicted as fraudulent). A high precision means that the majority of the bogus transactions are being successfully identified by the model. Precision = TP/(TP + FP)

Identification of the Best Combination of Oversampling Technique …

565

• Recall: This measures the proportion of true positive fraud cases out of all actual positive fraud cases. A high recall shows that the algorithm is successfully detecting the majority of bogus transactions whilst minimizing false negatives (fake transactions that are mistakenly labelled as legitimate). Recall = TP/(FN + TP) • F1 score: This is the harmonic mean of precision and recall, providing a balance between the two. It shows how well the model performs overall in detecting fraudulent transactions whilst reducing false positives and false negatives. F1 Score = 2 ∗ ((Precision. Recall)/ (precision + recall)) = (TP)/[(TP) + 1/2(FP + FN)] In order to reduce the number of false positives in credit card fraud detection, which can have serious financial repercussions for both individuals and businesses, it is crucial to have high precision. High recall is additionally necessary to prevent the oversight of fraudulent activities. F1 score is a helpful indicator to assess the performance of the model. The workflow model for detecting credit card fraud using a machine learning algorithm is shown in the Fig. 1 below, when sampling techniques like K-Means SMOTE, ADASYN, SMOTE, and SMOTE-NC (oversampling) and NearMiss (undersampling) are used on the dataset file. The workflow model graphic clearly outlines the process through which the model is used and how fraud can be found.

6 Results and Discussions This study has considered five machine learning algorithms such as Naive Bayes, Random Forest, ANN, Decision Tree, and Logistic Regression. Evaluation Metrics such as Precision, Accuracy, F1 score, and Recall are measured at the initial stage i.e., not applying any sampling technique to the dataset file. The same are measured after applying oversampling techniques such as SMOTE, ADASYN, SMOTE-NC, K-Means SMOTE and undersampling technique such as NearMiss. After applying sampling techniques to the dataset file, the variations can be observed in accuracy and F1 score rate. If we see the F1 Score rate of Random Forest algorithm, there is an increase from 0.792 to 0.816 when ADASYN technique is applied. As we have considered imbalanced dataset, the accuracy scores of machine learning models are neglected because it is not suitable in case of imbalanced dataset. F1 score rate is mainly focused here. The results obtained from the analysis are tabulated below (Tables 1, 2, 3, 4, 5 and 6): Evaluation results without applying any sampling technique to the dataset. After applying SMOTE to the given dataset.

566

S. Srinivasan et al.

Fig. 1 Workflow diagram for credit card fraud detection Table 1 Evaluation results without sampling technique Model name

Precision

Accuracy

F1 score

Recall

Naive Bayes

0.131

Random forest

0.910

0.993

0.216

0.609

0.999

0.792

ANN

0.531

0.701

0.998

0.450

0.390

Decision tree

0.727

0.999

0.731

0.735

Logistic regression

0.637

0.998

0.610

0.586

Identification of the Best Combination of Oversampling Technique …

567

Table 2 Evaluation results by applying SMOTE to the given dataset Model name

Precision

Accuracy

F1 score

Recall

Naive Bayes

0.129

0.992

0.218

0.689

Random forest

0.831

0.999

0.811

0.793

ANN

0.218

0.995

0.336

0.735

Decision tree

0.430

0.998

0.546

0.747

Logistic regression

0.069

0.982

0.127

0.827

Table 3 Evaluation results by applying SMOTE-NC to the given dataset Model name

Precision

Accuracy

F1 score

Recall

Naive Bayes

0.067

Random sorest

0.910

0.978

0.124

0.851

0.999

0.792

0.701

ANN

0.833

0.998

0.107

0.057

Decision tree

0.738

0.999

0.759

0.782

Logistic regression

0.876

0.999

0.735

0.633

Table 4 Evaluation results by applying K-Means SMOTE to the given dataset Model name

Precision

Accuracy

F1 Score

Recall

Naive Bayes

0.131

0.993

0.216

0.609

Random forest

0.910

0.999

0.792

0.701

ANN

0.0

0.998

0.0

0.0

Decision tree

0.727

0.999

0.731

0.735

Logistic regression

0.637

0.998

0.610

0.586

Table 5 Evaluation results by applying ADASYN to the given dataset Model name

Precision

Accuracy

F1 score

Recall

Naive Bayes

0.129

Random forest

0.841

0.992

0.218

0.712

0.999

0.816

ANN

0.041

0.793

0.969

0.078

0.850

Decision tree

0.503

0.998

0.605

0.758

Logistic regression

0.066

0.981

0.122

0.827

Table 6 Evaluation results by applying NearMiss technique (undersampling technique) to the given dataset Model name

Precision

Accuracy

F1 score

Recall

Naïve Bayes

0.007

0.834

0.014

0.781

Random torest

0.023

0.941

0.044

0.896

ANN

0.001

0.002

0.003

1.0

Decision tree

0.003

0.650

0.007

0.908

Logistic regression

0.010

0.865

0.020

0.908

568

S. Srinivasan et al.

Applying SMOTE-NC to the given dataset. Applying K-Means SMOTE to the given dataset. Applying ADASYN to the given dataset. Applying NearMiss technique (undersampling technique) to the given dataset. Confusion Matrices of Random Forest Algorithm. Figures 2, 3, 4, 5 and 6 represent confusion matrix for Random Forest Algorithm when no sampling technique is applied to the dataset, and also when oversampling techniques such as SMOTE, SMOTE-NC, K-Means SMOTE, and ADASYN are applied. As shown in Fig. 7, undersampling technique such as NearMiss is applied to the dataset file. Random Forest Algorithm is mainly focused, because it gives best outcome for detecting credit card fraud.

Fig. 2 Matrix of random forest algorithm without applying any sampling technique

Fig. 3 Matrix of random forest algorithm after applying SMOTE

Identification of the Best Combination of Oversampling Technique … Fig. 4 Matrix of random forest algorithm after applying ADASYN

Fig. 5 Matrix of random forest algorithm after applying SMOTE-NC

Fig. 6 Matrix of random forest algorithm after applying K-Means SMOTE

569

570

S. Srinivasan et al.

Fig. 7 Matrix of random forest algorithm after applying NearMiss

7 Conclusion The dataset file we selected has a significant imbalance. As a result, we used a combination of oversampling and undersampling strategies to balance the dataset. Checking whether there will be any improvement in the performance of the machine learning models is the goal of implementing these strategies. The accuracy ratings of machine learning models are disregarded since they are inappropriate in the context of an unbalanced dataset, which we have taken into consideration. Here, the main emphasis is on F1 Score rate. The machine learning algorithm gives better F1 score rate and is considered as the best machine learning algorithm in determining the fraud rate in the dataset file taken. From the analysis, it is observed that there is an increase in F1 score rate after applying ADASYN (Adaptive Synthetic Sampling). Comparing to other algorithms, random forest gives best algorithm in determining the fraud count in the dataset file. So from the analysis, this study concludes ADASYN as the best oversampling technique, which helps in selecting the best machine learning algorithm in determining the fraud transaction count in the dataset file.

References 1. Lebichot B, Le Borgne Y-A, He L, Oble F, Bontempi G (2019) Deep-learning domain adaptation techniques for credit cards fraud detection, INNSBDDL 2019: recent advances in big data and deep learning, pp 78–88 2. Mohammed E, Far B (2018) Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study. IEEE Ann History of Comput IEEE, 1 July 2018. https://doi.org/10.1109/IRI.2018.00025 3. Roy A et al (2018) Deep learning detecting fraud in credit card transactions. In: 2018 Systems and information engineering design symposium (SIEDS). https://doi.org/10.1109/sieds.2018. 8374722

Identification of the Best Combination of Oversampling Technique …

571

4. Awoyemi JO et al (2017) Credit card fraud detection using machine learning techniques: a comparative analysis. In: 2017 international conference on computing networking and informatics (ICCNI). https://doi.org/10.1109/iccni.2017.8123782 5. Melo-Acosta GE et al (2017) Fraud detection in big data using supervised and semi-supervised learning techniques. In: 2017 IEEE Colombian conference on communications and computing (COLCOM). https://doi.org/10.1109/colcomcon.2017.8088206 6. Pozzolo AD, Caelen O, Johnson RA, Bontempi G (2015) Calibrating probability with undersampling for unbalanced classification. In: Symposium on computational intelligence and data mining (CIDM), IEEE 7. Dal Pozzolo A, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G (2014) Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl 41(10):4915–4928. Peragamon. 8. Al-Rajab M, AI-Khassawneh (2021) Credit card fraud detection uisng machine learning techniques with oversampling 9. Li et al (2022) A hybrid oversampling technique combining SMOTE and ADASYN for credit card fraud detection 10. Randhawa K et al (2018) Credit card fraud detection using adaboost and majority voting. IEEE Access 6:14277–14284. https://doi.org/10.1109/access.2018.2806420 11. Dal Pozzolo A, Boracchi G; Caelen O, Alippi C, Bontempi G (2018) Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Trans Neural Netw Learn Syst 29(8):37843797 12. Carcillo F, Dal Pozzolo A, Le Borgne Y-A, Caelen O, Mazzer Y, Bontempi G (2018) Scarff: a scalable framework for streaming credit card fraud detection with Spark. Inf Fusion 41:182–194 13. Carcillo F, Le Borgne, Y-A, Caelen O, Bontempi G (2018) Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. Int J Data Sci Anal 5(4):285–300 14. Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2019) Combining unsupervised and supervised learning in credit card fraud detection information sciences 15. Dal Pozzolo A Adaptive machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)

Image-To-Image Translation Using Pix2Pix GAN and Cycle GAN Ranjan K. Senapati, Renukunta Satvika, Aishwarya Anmandla, Gopidi Ashesh Reddy, and Ch Anil Kumar

1 Introduction The mode of translation of images from one domain to another has evolved in a great way in the past few decades and this technique of transferring a image across different domains from one to another, such as converting a photo of a cat to a painting of a cat is also considered as image-to-image (I2I) translation. Figure 1 explains the process of I2I translation more clearly. As we can see all the images on the left side are of one domain, which is translated to the other domain on the right side. I2I translation, or image-based machine translation, is a primary research area in computer vision and machine learning domain. This task is frequently used for applications like style transfer and image synthesis, where an input image is transformed into an output image that has a different representation, style, or content. This has numerous applications, including creating art, improving computer vision models, and enabling more accurate machine translation. I2I translation is achieved through techniques such as deep learning, which uses neural networks to understand how two domains map to one another. GAN algorithm is a deep learning algorithm composed of two models, a generator and a discriminator, that work together in a game-like fashion. GANs play a very important role in the task of image-to-image translation. In detail, the generator in the GAN model takes the input as the source image and tries to generate the image similar to the targeted image whereas the discriminator takes the input of both generated image from the generator and the target image from the dataset and reveals the probability of likelihood of the generated image. In this way, both generator and discriminator learn together and improve themselves to give better results. For this task of I2I translation, we employ R. K. Senapati (B) · R. Satvika · A. Anmandla · G. Ashesh Reddy · C. Anil Kumar Department of ECE, VNRVJIET, Hyderabad, Telangana, India e-mail: [email protected] C. Anil Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 I. J. Jacob et al. (eds.), Data Intelligence and Cognitive Informatics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-7962-2_42

573

574

R. K. Senapati et al.

Fig. 1 Few examples of image-to-image translation

“Pix2Pix GAN,” which is one of the special types of GANs. Further, to overcome the drawbacks of the Pix2Pix GAN, we have employed CycleGAN. Here we work on the process of converting images of horses to zebra using CycleGANs. This way, we use both types of GANs: Pix2Pix GAN and CycleGAN models for I2I translation tasks.

2 Related Work The authors Phillip Isola et al. [1] came up with the model of conditional generative adversarial network. We have employed the generator which follows the U-Net architecture and patch GAN discriminator in our project. The authors Vaishali et al. [2] have proposed a feasible model to convert satellite images to Google map images based on GAN tricks. The proposed model produces images with less accuracy, which can be improved by increasing training time. Regmi and Borji [3] presented a conditional GAN model for cross-view image synthesis. The authors Park et al. [4] introduced a unique methodology of dual learning and conditional translations in their paper. It works deeply on conditional GAN whereas, it is a time-taking process and consumes a lot of our time. The authors Zhu et al. came up with a DualGAN mechanism in [5] which improves GAN output. It works and learns itself. The authors Lee et al. [6] work on the implementation of SPAGAN, where we compute the discriminator and transfer the knowledge to the generator. The model quoted is known for the excellence of model’s

Image-To-Image Translation Using Pix2Pix GAN and Cycle GAN

575

superior performance in terms of both quantitative and qualitative. The authors Tang et al. [7] employed Consistent Embedded Generative Adversarial Networks (CEGAN) to generate realistic and diversified images. Jun-Yan Zhu et al. [8] have presented a method called cycle GAN, which translates one domain of the image to another domain of the unpaired type data set. We adapted our technique of cycle GAN inspired from [8–10]. In the proposed research work, the MAPS dataset [11] is used to perform imageto-image translation with pix2pix GAN. This dataset consists of New York satellite photos and the Google maps pages having a resolution of 1200 × 600 pixels. And, also trained pix2pix GAN model based on the dataset [12]. This dataset contains facades from different cities around the world and diverse architectural styles. Each image has a dimension of 1280 × 960 pixels. We used horses2zebra dataset for the CycleGAN model [11]. It is further divided into testA, testB, trainA, and trainB, while the category A designates horses and category B designates zebras. The square photos measure 256 × 256 pixels. GANs of the type Pix2Pix are trained to map input images to output images in a supervised way.

3 Methodology This study employs a pix2pix GAN for the implementation of image translation and later we correct the limitation encountered while using pix2pix GAN by adapting the Cycle GAN.

3.1 Pix2Pix GAN I2I translation model called Pix2Pix is a Conditional GANs (cGAN) model. A picture is required in order to create an image by using the Pix2Pix GAN. Conditional GANs train a conditional generative model in the exact same way as GANs. Convolution-Batch Normalization-ReLU layers are a standard building component for deep convolutional neural networks and they are used by both the generator and discriminator models. Figure 2 represents a block diagram representation of Pix2Pix GAN. It consists of U-net generator for generating target image and patch GAN discriminator to classify each patch as real or fake. U-Net Generator. In pix2pix GAN, the generator is designed using U-Net model architecture. The generator model takes one image as input and does not require a point from the latent space as in the case of a regular standard GAN model. As discussed earlier, the dataset contains two domains. They are mentioned as follows: • Source domain • Target domain

576

R. K. Senapati et al.

Fig. 2 Block diagram representing the working of Pix2pix GAN

Input Image

Generated Image

Fig. 3 Block diagram representation of U-Net architecture

Image from source domain is considered to be given as input to the generator whereas output obtained must be similar to the image in the target domain. The U-Net model architecture is indeed very close to the encoder-decoder generator framework as shown in the Fig. 3, wherein it involves downsampling to a bottleneck as well as upsampling it to an output image. However, in U-Net, links or skip-connections are also created among layers that share the same size in the encoder and the decoder, letting the bottleneck to be avoided. PatchGAN Discriminator. The discriminator model compares two images; one is the output image generated by a generator and another one is from the target domain, which estimates the chance that the generated output image is either a genuine copy of the source image or an artificially produced copy. Output image generated by the generator and image from target domain is considered to be given as input to the discriminator whereas output predicts the likelihood that the produce image is an accurate translation of original image. The Pix2Pix model employs a Patch GAN as shown in the Fig. 4 rather than a typical GAN model’s deep convolutional neural network to perform image classification. Instead of classifying the full input image as real or fake, this deep convolutional neural network intends to identify and differentiate the patches of the input image. Here, the structure is penalized at the level of patches. This discriminator attempts

Image-To-Image Translation Using Pix2Pix GAN and Cycle GAN

577

Fig. 4 Patch GAN discriminator

to determine if each of the image’s N × N patches is authentic or fake. To get the final output of discriminator, we convolutionally apply this to the entire picture and average all outputs. It was observed that a patch size of 70 × 70 worked well for a variety of image-to-image translation applications.

3.2 Cycle GAN Generally, a generator and discriminator neural network make up the Generative Adversarial Networks (GANs). Here, CycleGAN consists of two GANs, giving it a total of two generators and two discriminators. The fundamental goal of CycleGAN is to learn the mapping between two distinct picture domains without requiring the paired samples from the two domains during training phase. This indicates that without requiring a matched image in the target domain, the model may learn to translate images from one domain to another. CycleGAN performs this by employing two GANs that have undergone cycle training. While the second GAN picks up the skill of translating images from domain B to domain A, the first GAN learns to do the reverse. CycleGAN incorporates a cycleconsistency loss in addition to the adversarial loss present in conventional GANs, ensuring that the image produced by the translation from domain A to domain B and back to domain A is identical to the original image in domain A. Figure 5 shows the block diagram of a cycle GAN. Generator Architecture. The translated image is produced as output by the Cycle GAN Generator model from input image. The architecture is shown in Fig. 6a. Each conv. block contains 2D conv layer, an Instance normalization, and Leaky ReLU layer. The first layer has 64 filters doesn’t change the image size. The next layers

578

R. K. Senapati et al.

Fig. 5 Block diagram representing the working of cycle GAN

reduce the input image size by half and use filters by double (i.e., 128) and so on for the next layers. The ResNet blocks add desired features to the input. Each ResNet block contains 2D conv layer with stride = 1, an Inst. Normalization layer and Leaky ReLU. The second layer is a 2D conv and Inst. Normalization layer. The decoder (output of ResNet blocks), enlarges the input back to the original size and generates the final output image. Each block of decoder contains transpose conv layer (a combination of 2D upsampling and 2D conv layer with stride = 1). Discriminator Architecture. Here, we employ the same PatchGAN discriminator which was discussed above in Pix2Pix GAN (shown in Fig. 6b). The difference between a PatchGAN and regular GAN discriminator is that the regular GAN maps from a 256 × 256 image to a single scalar output, which denotes “real” or “fake,” whereas the PatchGAN maps from 256 × 256 to a N x N (in this case, 70 × 70)

Fig. 6 Proposed Structure of the a Generator and b Discriminator

Image-To-Image Translation Using Pix2Pix GAN and Cycle GAN

579

array of outputs X, where each X ij denotes whether the patch ij in the image is real or fake. The architecture of discriminator employed in this project is: C64 − C128 − C256 − C512 In the above architecture, Ck is a 4 × 4 convolution-Instance Norm-Leaky ReLU layer with k filters and stride 2. On the top layer, we don’t use Instance Norm (C64). Convolution is used to provide 1 × 1 output after the final layer.

4 Implementation This section discusses about the implementations of pix2pix GAN and Cycle GAN respectively.

4.1 Steps Involved in Implementing Pix2pix GAN Here is a list of the steps to be followed while implementing Pix2pix GAN: • • • • • •

Segregation of dataset. Define the discriminator model. Define the generator model. Define the combined generator and discriminator model. Train and save the model. Load model and run.

Segregation of Dataset. In this Pix2Pix GAN, we especially work on two different datasets named “MAPS” and “FACADES.” These datasets are further divided into two different folders. They are: • train • test. The satellite picture on the left and the Google maps picture on the right are both present in each image in the MAPS dataset, which is 1200 × 600 pixels of width and height. Further loads, scale adjustments, and division into satellite and Google maps components are made for these photos. After rescaling the images are converted as 256 × 256 pixels of height and width. Since, we try the same pix2pix GAN with the other datasets too. The same process of loading, rescaling and splitting happens with the other dataset such as FACADES. Define the Discriminator Model. The discriminator design relies on the model’s effective receptive field, which specifies the correlation between a model output and the quantity of pixels in the input picture. Here, we put the PatchGAN discriminator

580

R. K. Senapati et al.

model of 70 × 70 into practice. The model generates a patch output of predictions using two concatenated input pictures. Binary cross-entropy is utilized to optimize the model, and a weighting is used such that changes to the model have half (0.5) of the typical effect. This method has the advantage that a single model may be used to process input photos of various sizes, such as those that are larger or lower than 256 256 pixels. Define the Generator Model. The generator employs a U-Net design and is an encoder-decoder model. This model creates a target picture from a source picture. It accomplishes this just by downsampling or encoding the input picture to a bottleneck layer, subsequently upsampling or decoding the bottleneck representation to the dimensions of the output picture. According to the U-Net design, skip-connections are introduced to create a U-shaped pattern between the encoding levels and the appropriate decoding layers. The generator’s encoder and decoder are made up of standardized blocks of convolutional, batch normalization, dropout, and activation layers. Because of the standardization, we can define helper functions to produce each layer block and use them again to construct the encoder and decoder sections of the model. The output layer employs the Tanh activation function, which causes the resulting image’s pixel values to fall inside the [−1, 1] range. Define the Combined Generator and Discriminator Model. In contrast to the generator model, the discriminator model receives direct training on both produced and actual pictures. Instead, the discriminator model is used to train the generator model. The adversarial loss and the L1 loss are weighted together to update the generator. The composite model is improved with two targets: the performed actual translation of the picture, which itself is compared against by the output of the generator model; and the cross-entropy loss, which forces significant weight updates in the generator toward producing more photo realistic images (L1 loss). Train and Save the Model. It will be necessary to train the discriminator using batches of actual and false photos. Now, we construct a batch of randomly selected picture pairings from the training dataset and assign them the discriminator label class = 1 to denote their reality. To create an equal batch of target photos for the discriminator, we create false samples using the generator model and a set of genuine source images. To alert the discriminator that they are fake, these are returned with the label class = 0. GAN models often don’t merge; instead, equilibrium between the generator and discriminator models is identified. As such, we cannot simply discern when training should cease. As a result, during training, we may preserve the model and utilize it to produce sample image-to-image translations at predetermined intervals, such every 10 training epochs. At the conclusion of training, we may analyze the produced photos and select a final model based on the image quality. The model is furthermore stored in H5 format, which makes it simpler to import afterward. Load Model and Run. Then, we load the model and utilize it to generate ad hoc translations of the training dataset’s source pictures. We may first load the training dataset. To anticipate a Google Maps picture, we may provide the model with the source satellite image as input. In case of FACADES dataset, we can provide the

Image-To-Image Translation Using Pix2Pix GAN and Cycle GAN

581

source architectural labels as input to the model and use it to predict the realistic building images. The source picture, the produced image, and the expected target image can all be plotted.

4.2 Steps Involved in Implementing Cycle GAN The steps involved in implementing the Cycle GAN are: • • • • • •

Load the dataset. Define the discriminators. Define the generators. Define the loss functions. Train the model. Test the model.

Load the Dataset. Collecting images for both domains A and B is necessary before loading the dataset for a CycleGAN. The model should be adequately trained with a significant number of photos, usually thousands. After gathering the photos, you must preprocess them before feeding them to the model. The images may be resized to a certain size, the pixel values may be normalized, and the photographs may be converted to a suitable format as part of the preprocessing stages (e.g., JPEG, PNG). Define the Generators. Images are translated from one domain to another in a CycleGAN by the generator networks. There are two generator networks specifically: one that translates photos from domain A to domain B and the other that translates images from domain B to domain A. Convolutional neural network architectures, such as U-Net or ResNet, are frequently used to create these generating networks. Define the Discriminators. There are two discriminators in CycleGAN: one for separating real from fake photos in domain A and another for separating real from false images in domain B. Convolutional neural networks should be used by both discriminators to determine whether the input image is authentic or not. Define the Loss Functions. Cycle-consistency loss and adversarial loss are the two categories of loss functions used by CycleGAN. • Adversarial Loss. The adversarial loss measures how successfully the generator can deceive the discriminator into believing that the generated images are authentic. For the CycleGAN algorithm, there are two antagonistic losses: one for the generator that maps from domain A to domain B and another for the generator that maps from domain B to domain A. The binary cross-entropy loss between the target labels and the discriminator’s output is the adversarial loss for the generator (either 1 for real images or 0 for fake images).

582

R. K. Senapati et al.

• Cycle-consistency loss. A measurement of how closely the reconstructed images can resemble the original images is the cycle-consistency loss. It aids the generators in creating aesthetically comparable images across the two domains. Cycleconsistency losses are defined for both the generator that maps from domain A to domain B and the generator that maps from domain B to domain A in CycleGAN. The L1 loss between the original image and the reconstructed image is what is referred to as the cycle-consistency loss. Train the Model. While collecting the images, the generators, discriminators, and loss functions must be specified before training a CycleGAN model, which is done via optimizer. • Train the discriminators. Prior to updating the discriminator parameters, we compute the adversarial loss for both real and fake images and back propagate the gradients. The gradients are kept from returning to the generators by using the detach() method. • Train the generators. The adversarial loss and cycle-consistency loss for the produced images are computed and the gradients are back propagated to update the parameters of the generators. The relative weight given to the cycle-consistency loss is managed via the lambda cycle hyperparameter. Test the Model. We must create some fictitious images from the input domain and the output domain, and then visualize them to evaluate the performance of the CycleGAN model. By passing some test photos from the input domain via the generator GA→B , we first create some fictitious images from the input domain to the output domain. Following that, we create some fictitious images by feeding some test photos from the output domain through the generator GB→A . Lastly, to evaluate how well the model is working, we can display the generated images using a program like matplotlib or OpenCV. Even though the generated images were created from the source domain, a well-performing model should make them appear to be from the target domain.

5 Results The simulation is performed on an Intel i7 processor, 3 GHz, 16 GB RAM with integrated Nvidia GeForce 8 GB GPU and with External NVIDIA GeForce 8 GB RTX 2080. 40 epochs are used for training the Pix2Pix GAN model for Satellite to Google map and vice versa images using maps dataset with batch size of 1 which consists of 1097 training images. Pix2pix GAN was trained on MAPS dataset [11] for 40 epochs. The training was performed on 1097 images, which are JPEG format and they are allocated with digital filename. Both the satellite image on the left and the Google maps image on the right are present in each 1200 × 600 pixel image. The batch size of 1 is used

Image-To-Image Translation Using Pix2Pix GAN and Cycle GAN

583

means one epoch will be trained per batch with 1097 training steps. Trained model is tested for Google map and satellite image as input it translates to satellite and Google map. Figure 7 shows all the results. Figure 8 shows the translation of architectural label to building photos using Pix2Pix GAN of facades dataset [12]. Using the horses2zebra dataset [13], the Cycle GAN model was trained for 10 epochs. For each epoch, 1187 horse images are trained with batch size of 1. The test results are displayed in Fig. 10. By increasing no. of epochs (>70), the accuracy will increase. Losses. Discriminator loss is considered as the sum of real loss and generated loss. Generator GAN loss is a sigmoid cross-entropy loss of an array of ones and the resulting pictures. Generator L1 loss (Eq. 1) is defined as the difference between the pixel values of the targeted image and the output generated by the generator. All the loss plots are shown in Fig. 9.

Fig. 7 a Translation of satellite image to Google map image for 40 epochs of training b Translation of Google map image to satellite image for 10 epochs of training c Translation of Satellite image to Google map image for arbitrary input and d Translation of Google map image to Satellite image for arbitrary input

Fig. 8 Translation of architecture labels to building photos using Pix2pix GAN

584

R. K. Senapati et al.

Fig. 9 Loss plots for facades dataset a Discriminator loss b Generator GAN loss c Generator L1 loss d Generator total loss

L1Loss Function =

n 

|ytrue − ypredicted |

(1)

i=1

Generator total loss is the additive result of the generator GAN loss and the L1 Loss (Eq. 2). Totalgenerationloss = gen_gan_loss + (λ ∗ L1_loss)

(2)

6 Conclusion and Future Scope In this research work, the obtained results depict conditional Generative Adversarial Networks (cGAN) as the best approach for performing pix2pix image translation tasks for the paired set of datasets whereas the CycleGAN approach stands out as best solution for image translation tasks with unpaired set of datasets. Both methodologies produce reliable results. The major drawback of Pix2Pix GAN model is the requirement of paired training datasets. The development of these databases is timeconsuming and costly. So, this study has used Cycle GAN by including unpaired

Image-To-Image Translation Using Pix2Pix GAN and Cycle GAN

585

Fig. 10 a Horse to Zebra b Zebra to Horse by Cycle GAN

image datasets to change one domain representation of the input image to another domain. Here, this study has implemented Cycle GAN on the horses2zebra dataset. The outputs of Pix2Pix GAN are produced by a model, which is trained for 40 epochs and that of Cycle GAN are produced by a model, which is trained for 10 epochs due to system limitations. As we increase the number of epochs, the performance of the model gets better and prominent results can be produced. We have many other future applications, which can be developed from the basic model that we have built.

References 1. Isola F, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pat- tern recognition, IEEE, Honolulu, pp 1125–1134 2. Ingale JV, Singh R, Patwal P (2021) Image to image translation: generating maps from Satellite images, arXiv:2105.09253 3. Regmi K, Borji A (2018) Cross-view image synthesis using conditional GAN. In: Proceedings of the IEEE conference on CVPR, IEEE, Utah, pp 3501–3510 4. Park T, Liu M-Y, Wang T-C, Zhu J-Y (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceeding of the IEEE Conference CVPR 37–2346, CA USA 5. Zhu P, Abdal R, Qin Y, Wonka P (2020) Sean: image synthesis with semantic region- adaptive normalization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE, USA, pp 5104–5113

586

R. K. Senapati et al.

6. Lee C-H, Liu Z, Wu L, Luo P (2020) Maskgan: towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE conference on computer vision and Pat-tern Recognition, IEEE,USA, pp 5549–5558 7. Tang, H., Xu, D., Yan, Y., Torr, P.-H., and Sebe, N.: Local class-specific and global image level generative- adversarial networks for semantic guided scene generation. In: Proc. of the IEEE Conference on CVPR, pp.7870–7879, IEEE,USA (2020) 8. Zhu J-Y, Isola P, Efros AA (2017) Unpaired image to image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE, Italy, pp. 2223–2232 9. Li M, Huang H, Ma L, Liu W (2018) Unsupervised image to image translation with stacked cycle consistent adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE, pp 184–199 10. Tang H, Liu H, Xu D, Torr P-H-S, Sebe N (2021) Attention GAN: Unpaired image to image translation using attention-guided GAN. In: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE, pp 1–16 11. Kaggle homepage, https://www.kaggle.com/datasets/alincijov/pix2pix-maps 12. https://www.kaggle.com/datasets/balraj98/facades-dataset 13. https://www.kaggle.com/datasets/balraj98/horse2zebra-dataset