Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences: PCCDS 2022 9811987416, 9789811987410

This book gathers selected high-quality research papers presented at International Conference on Paradigms of Communicat

492 92 22MB

English Pages 763 [764] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences: PCCDS 2022
 9811987416, 9789811987410

Table of contents :
Preface
Contents
About the Editors
1 Optimized Watermarking Scheme for Copyright Protection of Medical Images
1 Introduction
2 Proposed Scheme
2.1 Embedding Process
2.2 Extraction Process
2.3 Generation of Optimal Embedding Factor
3 Experimental Results
4 Conclusion
References
2 MobileNet + SSD: Lightweight Network for Real-Time Detection of Basketball Player
1 Introduction
2 Literature Survey
3 Methodology
3.1 SSD Parameters
3.2 Implementation of Proposed Methodology
4 Dataset and Experimental Results
5 Conclusion and for Future Research Scope
References
3 Modified Hungarian Algorithm-Based User Pairing with Optimal Power Allocation in NOMA Systems
1 Introduction
2 Network Model
3 MHA-Based User Pairing
4 Proposed Power Allocation Methods
4.1 Sub-band Power Allocation
4.2 Power Assignment for Sub-band Users
5 Simulation Results
6 Conclusion
References
4 Design and Implementation of Advanced Re-Configurable Quantum-Dot Cellular Automata-Based (Q-DCA) n-Bit Barrel-Shifter Using Multilayer 8:1 MUX with Reversibility
1 Introduction
2 Theoretical Surroundings
2.1 QCA-Based Fundamental Gates
2.2 Clock Scheme that Used for QCA Technology
2.3 Reversible Logical Expression
2.4 Q-DCA-Based Multilayer Structure
2.5 Occupied Area, Delay, Dissipated-Power, and Tunneling Resistivity Calculation
3 Proposed Multilayer 8:1 MUX
4 Proposed 8-Bit Barrel Shifter
5 Conclusion
References
5 Recognition of Facial Expressions Using Convolutional Neural Networks
1 Introduction
2 Materials and Methods
2.1 Materials
2.2 Methods
2.3 Artificial Data Generation
2.4 Key Performance Indicators
3 Results
3.1 Le-Net 5
3.2 Basic CNN
3.3 AlexNet
3.4 ResNet-50
4 Discussion
5 Conclusions and Future Work
References
6 Identification of Customer Preferences by Using the Multichannel Personalization for Product Recommendations
1 Introduction
2 Realistic Choice-Based Conjoint Analysis
2.1 Area of Research
2.2 Experimental Strategy of Conjoint Analysis
2.3 Various Samples Plus Data Assembly
3 Research Results and Discussion
3.1 Evaluation of Quality of Fit and Predictive Analysis
3.2 Choice Associated Combined and Assumption-Based Testing
3.3 Product Recommendation Rejecting Reasons
4 Conclusion
References
7 A Post-disaster Relocation Model for Infectious Population Considering Minimizing Cost and Time Under a Pentagonal Fuzzy Environment
1 Introduction
2 Basic Concepts and Defuzzification Method for PFN
2.1 Basic Concepts
2.2 Removal Area Method to Convert PFN to Crisp Number
3 Problem Statement and Model Formation
3.1 Modelling
4 Solving Multi-objective Optimization Problems
4.1 Defuzzification of MOST Model
4.2 Compromise Programming Approach to Solve MOST Model
4.3 LINGO Optimization Software
5 Numerical Experiments and Discussions
5.1 Input Data for the Real Life Model
5.2 Result Analysis
6 Conclusion and Future Prospects
References
8 The Hidden Enemy: A Botnet Taxonomy
1 Introduction
2 Botnet Life Cycle
3 Classification
3.1 Botnet Architecture
3.2 Botnet Detection Techniques
4 Conclusion and Future Scope
References
9 Intelligent Call Prioritization Using Speech Emotion Recognition
1 Introduction
2 Literature Survey
3 Methodology
3.1 Audio Preprocessor Module
3.2 Emotion Detection Module
3.3 Call Prioritizer Module
4 Results and Observations
5 Conclusion
References
10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection
1 Introduction
2 Literature Review and Background
2.1 Adaptive Boosting Algorithm
2.2 Metaphor Based Metaheuristics
3 Social Network Search Algorithm
4 Experimental Setup
5 Conclusion
References
11 Prediction of Pneumonia Using Deep Convolutional Neural Network (CNN)
1 Introduction
2 Literature Survey
3 Methodology
3.1 Deep Learning
3.2 Convolutional Neural Networks (CNNs)
3.3 Pre-trained Convolutional Neural Networks
3.4 ResNet
3.5 InceptionV3
3.6 VGG19
4 Proposed Work
4.1 Preprocessing and Augmentation
4.2 Performance Metrics
5 Results and Discussion
5.1 Dataset
5.2 Simulation Results
6 Conclusion and Future Work
References
12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert System
1 Introduction
2 The Neuro-fuzzy Expert System of Egg Development States Diagnostics Creation
2.1 Linguistic Variables Formation
2.2 Formation of Base of Fuzzy Rules
2.3 Fuzzification
2.4 Aggregation of Substates
2.5 Activation of the Conclusions
2.6 Conclusions Aggregation
2.7 Defuzzification
3 Creation of Mathematical Models of a Neuro-fuzzy Expert System of Egg Development States Diagnostics
4 Choice of Criteria for Efficiency Evaluation of Mathematical Models of a Neuro-fuzzy Expert System of Egg Development States Diagnostics
5 Numerical Research
6 Conclusions
References
13 Analysis of Delay in 16 × 16 Signed Binary Multiplier
1 Introduction
2 Urdhva Tiryagbhyam Sutra
3 Proposed Design
4 Results and Discussion
5 Conclusion
References
14 Review of Machine Learning for Antenna Selection and CSI Feedback in Multi-antenna Systems
1 Introduction
2 TAS for Single-User MIMO and Untrusted Relay Networks
2.1 System Model
2.2 Dataset Generation and CSI Classifier Building:
2.3 Performance Analysis of ML-Based CSI Classifiers
3 FDD-Based Massive MIMO Systems
3.1 System Model
3.2 CNN for Compression
3.3 Analysis of the ML-Based CSI Feedback Techniques
4 Conclusion
References
15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet, SEResNeXt, ViT, DeIT and MobileNetV3 Models
1 Introduction
2 Related Work
2.1 EfficientNet
2.2 Squeeze-and-Excitation Networks
2.3 ResNeXt
2.4 ViT
2.5 DeIT
2.6 MobileNetV3
2.7 Stacked Ensemble Learning Generalization
3 Proposed Method
3.1 Architecture
4 Experimental Results and Analysis
5 Conclusion
References
16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex
1 Introduction
2 Method
2.1 The RGC Morphology
2.2 Connectome Specificity
3 Simulation Results and Discussion
4 Conclusion
References
17 Dynamic Thresholding with Short-Time Signal Features in Continuous Bangla Speech Segmentation
1 Introduction
2 Short-Time Speech Signal Features
3 Dynamic Thresholding Approach
4 Blocking Method
5 Conclusion
References
18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images
1 Introduction
2 Background
3 Methodology
3.1 Hazy Image Classification
3.2 Fast Hazy Image Decomposition
3.3 Fast Dehazing
3.4 Fast Multilayer Laplacian Enhancement
4 Experimental Results
5 Conclusion
References
19 Review on Recent Advances in Hearing Aids: A Signal Processing Perspective
1 Introduction
1.1 Hearing Aid Devices
1.2 Analog Hearing Aid Devices
1.3 Digital Hearing Aid Devices
2 Smart Hearing Aid Using Deep Learning
3 Smart Phone-Based Hearing Aids
4 Occlusion Effect in Hearing Aids
5 Feedback Cancellation in Hearing Aids
6 Conclusion
References
20 Hierarchical Earthquake Prediction Framework
1 Introduction
1.1 Types of Earthquakes
1.2 Impacts of Seismic Tremor on Earth
1.3 Challenges for the Earthquake Prediction
1.4 Application
1.5 Organization of Paper
2 Literature Review
2.1 Seismic Parameters
3 Proposed Framework
3.1 Dataset
3.2 Data Preprocessing for Location Prediction
3.3 Location Prediction
3.4 Preprocessing for Magnitude Prediction
3.5 Magnitude Prediction
3.6 Magnitude Range Classification
3.7 Magnitude Estimation
4 Experimental Analysis
4.1 Dataset
4.2 Data Preprocessing for Location Prediction
4.3 Location Prediction
4.4 Data Preprocessing for Magnitude Prediction
4.5 Magnitude Range Classification
4.6 Magnitude Estimation
5 Conclusions
References
21 Classification Accuracy Analysis of Machine Learning Algorithms for Gearbox Fault Diagnosis
1 Introduction
2 Experimental Vibration Data Collection
2.1 Experimental Gearbox Test Setup
2.2 Methodology and Procedure
3 Classification Accuracy Selection Process
4 Machine Learning Algorithm
4.1 Naive Bayes Machine Learning Classifier
4.2 KNN Machine Learning Classifier
4.3 Decision Tree Machine Learning Classifier
4.4 Random Forest Machine Learning Classifier
4.5 Support Vector Machine (SVM)
5 Fault Detection Results and Discussions
6 Conclusions
References
22 Stock Price Forecasting Using Hybrid Prophet—LSTM Model Optimized by BPNN
1 Introduction
2 Prophet Model
3 LSTM Model
4 MAE, RMSE, MAPE
4.1 Mean Absolute Error (MAE)
4.2 Root Mean Square Error (RMSE)
4.3 Mean Absolute Percentage Error (MAPE)
5 Proposed Method
6 Results and Discussion
7 Conclusion
References
23 Identification of Genetically Closely Related Peanut Varieties Using Deep Learning: The Case of Flower-Related Varieties 11
1 Introduction
2 Related Work
2.1 Situation of Peanut Seed Production and Certification in Senegal
2.2 Existing Methods of Identification of Peanut Seeds in Senegal
3 Methodology
3.1 Existing Methods of Identification of Peanut Seeds in Senegal
3.2 Image Acquisition
4 Model Construction
5 Model Training
6 Model Evaluation and Testing
7 Conclusion
References
24 Efficient Color Image Segmentation of Low Light and Night Time Image Enhancement Using Novel 2DTU-Net and FM2CM Segmentation Algorithm
1 Introduction
2 Literature Survey
3 Proposed Methodology
3.1 Input Source
3.2 Image Resizing
3.3 Image Enhancement
3.4 Segmentation
4 Results and Discussion
4.1 Performance Analysis of Proposed 2DTU-Net
4.2 Performance Analysis of Proposed FM2CM
5 Conclusion
References
25 An Architecture to Develop an Automated Expert Finding System for Academic Events
1 Introduction
2 Related Work
3 Findings About Expert Finding Systems
3.1 Classification Based on Used Domains
3.2 Classification Based on Used Techniques
4 Proposed Architecture
5 Result and Discussion
6 Conclusion
References
26 A Seismcity Declustering Model Based on Weighted Kernel FCM Along with DPC Algorithm
1 Introduction
2 Seismic Cataog Used in the Analysis
3 Proposed Model
4 Result and Discussion
4.1 Epicenter Plot
4.2 Cumulative and Lambda Plot
4.3 Temporal Seismicity Analysis
4.4 Comparative Analysis with State-of-the-Art Declustering Techniques
5 Conclusion
References
27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna with Defected Ground for Medical Devices
1 Introduction
2 Literature Review
3 Antenna Design and Analysis
4 Fabrication Results and Analysis
5 Conclusion
References
28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense Circular Polarization Antenna for 5G/Wi-MAX and C-Band Satellite Applications
1 Introduction
2 Antenna Design and Analysis
2.1 Prototype Analysis
2.2 CP Mechanism and Analysis
3 Results and Discussion
4 Conclusion
References
29 An Analytical Appraisal on Recent Trends and Challenges in Secret Sharing Schemes
1 Introduction
2 Secret Sharing Schemes
3 Recent Research Advances in Secret Sharing Schemes
4 Comparative Analysis of Various Secret Sharing Schemes
5 Conclusion
References
30 A Comparative Study on Sign Language Translation Using Artificial Intelligence Techniques
1 Introduction
1.1 Artificial Intelligence (AI)
1.2 Machine Learning (ML)
1.3 Deep Learning (DL)
2 Literature Review
3 Comparison of Artificial Intelligence Techniques
4 Conclusion
5 Future Work
References
31 WSN-IoT Integration with Artificial Intelligence: Research Opportunities and Challenges
1 Introduction
2 Related Works
2.1 Algorithms Related to WSN as Well as IoT
3 WSN-IoTs and AI Collaboration
4 Conclusion and Future Scope
References
32 Time Window Based Recommender System for Movies
1 Introduction
2 Related Work
3 Our Contribution
4 Clustering Scheme
5 Recommendation Scheme
6 Experimental Settings
6.1 Data Description
6.2 Evaluation Metric
7 Results and Discussions
8 Conclusion and Future Work
References
33 Approximate Multiplier for Power Efficient Multimedia Applications
1 Introduction
2 Related Work
3 4:2 Compressors
4 16*16 Bit Approximate Dadda Multiplier Architecture
4.1 Comparison of High-Speed 4:2 Compressor with Dual-Stage Compressor
5 Implementation Results and Analysis
6 Conclusion
References
34 A Study on the Implications of NLARP to Optimize Double Q-Learning for Energy Enhancement in Cognitive Radio Networks with IoT Scenario
1 Introduction
2 Methods
3 Background
3.1 Cognitive Radio Network
3.2 The History and Technology of Internet of Things (IoT’s)
4 Implications of NLARP
4.1 Network Layer for IoT Devices
4.2 The GSMA’s Visualization of IoT Services—The Linked Life
4.3 Matrix of Energy and Security Visualization by Using Payoff in Coalition Game
4.4 Spectrum Utilization Using Q-Learning
5 Conclusion
References
35 Automatic Generation Control Simulation Study for Restructured Reheat Thermal Power System
1 Introduction
2 Restructured Reheat Thermal Power System Model
3 Application of PSOA for Tuning PID Controller
4 Simulation Results and Discussion
4.1 Poolco-Based Transaction (PBT)
4.2 Bilateral-Based Transaction (BBT)
4.3 Contract Violation-Based Transaction (CVBT)
5 Conclusions
References
36 Processing and Analysis of Electrocardiogram Signal Using Machine Learning Techniques
1 Introduction
1.1 Conditions that Are Diagnosed with the ECG
1.2 Limitations of the ECG
2 ECG Signal Processing
3 Significance of Denoising
4 Related Works
5 Various Techniques Used for Analysis and Prediction
5.1 Wavelet Transform Technique
5.2 Machine Learning and Predictive Analysis
6 Results and Discussion
6.1 Step-Wise Results for a Normal ECG Signal
6.2 Results for Abnormal Signals
7 Conclusion
References
37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic Controller for Solar PV System Under Dynamic Irradiation Conditions
1 Introduction
2 Design of Solar PV Array
3 Analysis of Single Switched Power Converter
3.1 Sliding Controller for Single Switch Converter
3.2 Design and Analysis of Adaptive MPPT Controller
4 Design and Analysis of Two-Leg Inverter
5 Discussion of Simulation Results
6 Conclusion
References
38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy for Fog/Edge Computing
1 Introduction
2 System Model and Problem Formulation
2.1 System Model
2.2 Problem Formulation
3 SmartFog: The Proposed Framework
3.1 Task–Resource Allocation Based on Proposed Best_Fit Algorithm
3.2 Inclusion of Communication Cost
3.3 Applying the Strategy of Pre-Emption and Migration
3.4 Profit Awareness
4 Simulation Framework: SmartFog
5 Conclusion
References
39 A Comparative Approach: Machine Learning and Adversarial Learning for Intrusion Detection
1 Introduction
2 Related Work
3 Background
3.1 Types of Attacks
3.2 Adversarial Machine Learning Fundamentals
3.3 Adversarial Attack Strategies
3.4 Defense Strategies
4 Proposed Model
4.1 Dataset and Its Preprocessing
4.2 Machine Learning Models
4.3 Experimental Setup
4.4 Performance Metrics
5 Results and Discussion
6 Conclusion
References
40 Blockchain-Based Agri-Food Supply Chain Management
1 Introduction
2 Domain Explanation
2.1 Blockchain
2.2 Keccak-256
2.3 Ethash
2.4 Smart Contract
3 Related Works
4 Proposed System
4.1 Seller Phase
4.2 Buyer Phase
4.3 Payment Phase
4.4 Delivery Phase
4.5 Resell Phase
4.6 Refund Phase
5 Conclusion and Contribution to Research
6 Future Work
References
41 Data Balancing for a More Accurate Model of Bacterial Vaginosis Diagnosis
1 Introduction
2 Materials and Methods
2.1 Dataset
2.2 Machine Learning Methods and Metrics
2.3 Performance Measures
3 Experimental Design
3.1 Scenario One: “Performance on Imbalanced Dataset”
3.2 Feature Ranking Based on Four Criteria on Imbalanced Dataset
3.3 Scenario Two: “Performance on Balanced Dataset”
3.4 Feature Ranking Based on Four Criteria on Balanced Dataset
3.5 Scenario Three: “Performance on Four Sub-Datasets”
4 Results
4.1 Feature Rankings Based on Imbalanced BV Dataset
4.2 Feature Rankings Based on Balanced BV Dataset
4.3 Bacterial Vaginosis Predictors
4.4 Bacterial Vaginosis Predictive Models in Three Scenarios
5 Conclusion
References
42 Approximate Adder Circuits: A Comparative Analysis and Evaluation
1 Introduction
2 Approximate Adders
2.1 Preliminaries
2.2 Classification
3 Comparative Analysis
4 Conclusion
References
43 Effect of Traffic Stream Speed on Stream Equivalency Values in Mixed Traffic Conditions on Urban Roads
1 Introduction
2 Literature Review
3 Methodology
3.1 Selection of Study Area
3.2 Collection of Data
3.3 Analysis of Data
3.4 Development of Neural Network Model
3.5 Prediction of Missing Values
4 Results
5 Conclusion
References
44 Intelligent System for Cattle Monitoring: A Smart Housing for Dairy Animal Using IoT
1 Introduction
2 A Smart Village Design Cycle
3 Literature Review
4 Proposed Model
4.1 Cattle Shed Atmosphere Monitoring and Controlling
4.2 Cattle Health Monitoring
4.3 Cattle Food and Drinking Water Monitoring and Management
4.4 Cattle Waste Management
4.5 Milk Process Equipment
5 Conclusion
References
45 Energy-Efficient Approximate Arithmetic Circuit Design for Error Resilient Applications
1 Introduction
2 Literature Review
2.1 Review of Approximate Adders
3 Scope for Approximate Adder Design
4 Proposed Design of Energy-Efficient Adder
5 Evaluation Metrics
5.1 Error Metrics
5.2 Circuit Metrics
6 Applications
7 Case Study
8 Conclusion
References
46 Continuous Real Time Sensing and Estimation of In-Situ Soil Macronutrients
1 Introduction
2 Literature Survey
2.1 NPK
2.2 pH
2.3 Temperature
2.4 EC
3 Soil Testing Methods
3.1 Colorimetry Testing Method
3.2 Spectroscopy Method
3.3 Sensor Based Method
4 Comparison of Soil Testing Methods
5 Comparison of Classification Algorithms
6 Comparative Analysis of Classification Algorithms
6.1 J48
6.2 JRip
6.3 RF
6.4 ANN
7 Conclusion
8 Future Enhancement
References
47 Design and Development of Automated Groundstation System for Beliefsat-1
1 Introduction
2 Literature Review
3 Methodology
3.1 Antenna
3.2 Antenna Directional Rotator
3.3 Transceiver System
3.4 Data Management System
4 Results and Discussions
4.1 Antenna
5 Conclusion
References
48 Towards Developing a Deep Learning-Based Liver Segmentation Method
1 Introduction
2 Materials and Methods
2.1 Dataset
2.2 Methodology
3 Image Segmentation
4 Results and Discussion
5 Conclusion
References
49 Review on Vision-Based Control Using Artificial Intelligence in Autonomous Ground Vehicle
1 Introduction
2 Literature Review
3 Conclusion
References
50 Ensemble Learning Based Feature Selection for Detection of Spam in the Twitter Network
1 Introduction
2 Literature Survey
2.1 Features for Twitter Spam Detection
2.2 Feature Selection Methods for Spam Detection
2.3 Machine Learning Algorithms for Spam Detection
3 Dataset Description
4 Proposed System
4.1 Feature Selection Methods Used in the Proposed Work
4.2 Classifier Training
5 Experimental Evaluation and Results
6 Conclusion
References
51 Small-Scale Islanded Microgrid for Remotely Located Load Centers with PV-Wind-Battery-Diesel Generator
1 Introduction
2 Modeling of the Test Power System
3 Control Strategies
3.1 Control Strategies for Battery Charging/Discharging Controller
3.2 Control Strategies for PV System
3.3 Control Scheme for Doubly Fed Induction Generator (DFIG)
3.4 Energy Management Control Strategies for Diesel Generator to Turn on/off
4 Result & Discussion
4.1 Case1 Varying Irradiance Condition:
4.2 Case 2 Constant Irradiance and Variable Load Condition
5 Conclusion
References
52 A Review on Early Diagnosis of Lung Cancer from CT Images Using Deep Learning
1 Introduction
2 Related Works
3 Datasets
4 Deep Learning in Lung Cancer Detection
5 Conclusion
References
53 A Context-Based Approach to Teaching Dynamic Programming
1 Introduction
2 Algorithm Visualization in Programming Education
3 The Two Selected Tasks
4 Presentation of Illustrative Tools
4.1 Coin Problem
4.2 Mirror Word Problem
5 Conclusions
References
54 On the Applicability of Possible Theory-Based Approaches for Ranking Fuzzy Numbers
1 Introduction
2 Existing Ranking Approaches
3 Invalidity of the Existing Ranking Approaches
4 Validity of the Existing Ranking Approaches
4.1 Validity of the Existing Approach for the Ranking of Generalized L-R Fuzzy Numbers
4.2 Validity of the Existing Approach for the Ranking of Generalized Trapezoidal Fuzzy Numbers
4.3 Validity of the Existing Approach for the Ranking of Generalized Triangular Fuzzy Numbers
5 Conclusion
References
55 Change Detection of Mangroves at Subpixel Level of Synthesized Hyperspectral Data Using Multifractal Analysis Method
1 Introduction
2 Study Area
3 Data Used
4 Methodology
5 Result and Discussion
6 Accuracy Assessment
7 Conclusion and Future Work
References
56 Analysis of the Behavior of Metamaterial Unit Cell with Respect to Change in Its Structural Parameters
1 Introduction
2 Design of MM Unit Cell
2.1 Configuration of the MM Unit Cell to Extract S-Parameters
2.2 Simulation Results for S-Parameter Extraction
3 Effect of Change in Split Ring Dimensions on Resonant Frequency
3.1 Simulation Results for Variable Length and Width of SRR
3.2 Simulation Results for Variable Gap Size of SRR
3.3 Simulation Results for Variable Width of Thin Wire
4 Result and Discussion
5 Conclusion
References
57 Mid-Term Load Forecasting by LSTM Model of Deep Learning with Hyper-Parameter Tuning
1 Introduction
2 LSTM Model
3 Load Forecasting Model
3.1 Hyper-Parameter Tuning
3.2 Hyper-Parameter Tuning Summary
4 Case Study
4.1 Error Calculation Without Hyper-Parameter Tuning
4.2 Error Calculation with Hyper-Parameter Tuning
5 Conclusion
References
58 A Comprehensive Survey: Benefits, Recent Works, Challenges of Optimal UAV Placement for Maximum Target Coverage
1 Introduction
2 Literature Review on Optimal UAV Placement for Maximum Target Coverage
2.1 Chronological Review
2.2 Literature Review
3 Algorithmic Analysis and Performance Measures Concentrated in Optimal UAV Placement
3.1 Algorithmic Analysis
3.2 Performance Metrics
3.3 Implemented Platforms
4 Research Gaps and Challenges
5 Conclusion
References
59 Comparative Study Between Different Algorithms of Data Compression and Decompression Techniques
1 Introduction
2 State of the Art
3 Presentation of Algorithms
3.1 Huffman Coding
3.2 Code LZ77
3.3 Code BID
4 Comparison
5 Discussion
6 Conclusion
References
60 A Unique Multident Wideband Antenna for TV White Space Communication
1 Introduction
2 TVWS Antennas at Glance
2.1 Designing Standard Rectangular Patch by Use of Defected Ground Approach
3 Proposed Antenna
3.1 Return Loss Plot
3.2 VSWR
3.3 Gain Plot
3.4 Gain Versus Frequency Graph
4 Fabricated Multident Antenna
4.1 Result After Fabrication of the Antenna Design
5 Conclusion
References
61 Development of a Deep Neural Network Model for Predicting Reference Crop Evapotranspiration from Climate Variables
1 Introduction
2 Methods and Materials
2.1 Study Zone and Climate Dataset Availability
2.2 Deep Neural Network (DNN)
2.3 Performance Assessment of the Models
3 Results and Discussions
3.1 Performance of the Model Under Single Input Combination
3.2 Performance of the Model Under Five Input Combinations
4 Conclusion
References
62 A Novel Efficient AI-Based EEG Workload Assessment System Using ANN-DL Algorithm
1 Introduction
1.1 Electroencephalogram
1.2 EEG Signals
2 Proposed Methodology
2.1 Data Collection
2.2 Data Filtering
2.3 Line Noise Filtering
2.4 Artifact Subspace Reconstruction (ASR)
2.5 Feature Extraction
3 Classification
3.1 Data Labeling
3.2 Training Model
3.3 Workload Detection/Recognition
3.4 System Performance
4 Result and Discussion
5 Conclusion
References
Author Index

Citation preview

Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar

Rajendra Prasad Yadav Satyasai Jagannath Nanda Prashant Singh Rana Meng-Hiot Lim   Editors

Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences PCCDS 2022

Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK

This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.

Rajendra Prasad Yadav · Satyasai Jagannath Nanda · Prashant Singh Rana · Meng-Hiot Lim Editors

Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences PCCDS 2022

Editors Rajendra Prasad Yadav Department of Electronics and Communication Engineering Malaviya National Institute of Technology Jaipur, Rajasthan, India Prashant Singh Rana Department of Computer Science and Engineering Thapar Institute of Engineering and Technology Patiala, Punjab, India

Satyasai Jagannath Nanda Department of Electronics and Communication Engineering Malaviya National Institute of Technology Jaipur, Rajasthan, India Meng-Hiot Lim School of Electrical and Electronic Engineering Nanyang Technological University Singapore, Singapore

ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-19-8741-0 ISBN 978-981-19-8742-7 (eBook) https://doi.org/10.1007/978-981-19-8742-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This book contains outstanding research papers as the proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences (PCCDS 2022). PCCDS 2022 has been organized by Malaviya National Institute of Technology Jaipur, India, and technically sponsored by Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. We have tried our best to enrich the quality of the PCCDS 2022 through the stringent and careful peer-review process. This book presents novel contributions to Communication, Computing and Data Sciences and serves as reference material for advanced research. PCCDS 2022 received many technical contributed articles from distinguished participants from home and abroad. PCCDS 2022 received 349 research submissions from 20 different countries, viz. Bangladesh, China, Germany, Greece, Iceland, India, Indonesia, Malaysia, Mexico, Morocco, Philippines, Poland, Qatar, Romania, Russia, Senegal, Serbia, Spain, Ukraine, and the USA. After a very stringent peerreviewing process, only 62 high-quality papers were finally accepted for presentation and the final proceedings. Jaipur, India Jaipur, India Patiala, India Singapore

Rajendra Prasad Yadav Satyasai Jagannath Nanda Prashant Singh Rana Meng-Hiot Lim

v

Contents

1

2

3

4

5

6

7

8

Optimized Watermarking Scheme for Copyright Protection of Medical Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rohit Thanki and Purva Joshi

1

MobileNet + SSD: Lightweight Network for Real-Time Detection of Basketball Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Banoth Thulasya Naik and Mohammad Farukh Hashmi

11

Modified Hungarian Algorithm-Based User Pairing with Optimal Power Allocation in NOMA Systems . . . . . . . . . . . . . . . Sunkaraboina Sreenu and Kalpana Naidu

21

Design and Implementation of Advanced Re-Configurable Quantum-Dot Cellular Automata-Based (Q-DCA) n-Bit Barrel-Shifter Using Multilayer 8:1 MUX with Reversibility . . . . . . Swarup Sarkar and Rupsa Roy Recognition of Facial Expressions Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio Sarasa-Cabezuelo Identification of Customer Preferences by Using the Multichannel Personalization for Product Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Ramakantha Reddy and R. Lokesh Kumar A Post-disaster Relocation Model for Infectious Population Considering Minimizing Cost and Time Under a Pentagonal Fuzzy Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mayank Singh Bhakuni, Pooja Bhakuni, and Amrit Das The Hidden Enemy: A Botnet Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . Sneha Padhiar, Aayushyamaan Shah, and Ritesh Patel

35

53

69

79 93

vii

viii

9

Contents

Intelligent Call Prioritization Using Speech Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Sanjana Addagarla, Ravi Agrawal, Deep Dodhiwala, Nikahat Mulla, and Kaisar Katchi

10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Marko Djuric, Luka Jovanovic, Miodrag Zivkovic, Nebojsa Bacanin, Milos Antonijevic, and Marko Sarac 11 Prediction of Pneumonia Using Deep Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Jashasmita Pal and Subhalaxmi Das 12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Eugene Fedorov, Tetyana Utkina, Tetiana Neskorodieva, and Anastasiia Neskorodieva 13 Analysis of Delay in 16 × 16 Signed Binary Multiplier . . . . . . . . . . . . 155 Niharika Behera, Manoranjan Pradhan, and Pranaba K. Mishro 14 Review of Machine Learning for Antenna Selection and CSI Feedback in Multi-antenna Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Garrouani Yassine, Alami Hassani Aicha, Mrabti Fatiha, and Dhassi Younes 15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet, SEResNeXt, ViT, DeIT and MobileNetV3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Hrishikesh Kumar, Sanjay Velu, Are Lokesh, Kuruguntla Suman, and Srilatha Chebrolu 16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Satyabrat Malla Bujar Baruah, Adil Zafar Laskar, and Soumik Roy 17 Dynamic Thresholding with Short-Time Signal Features in Continuous Bangla Speech Segmentation . . . . . . . . . . . . . . . . . . . . . 205 Md Mijanur Rahman and Mahnuma Rahman Rinty 18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Balla Pavan Kumar, Arvind Kumar, and Rajoo Pandey 19 Review on Recent Advances in Hearing Aids: A Signal Processing Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 R. Vanitha Devi and Vasundhara

Contents

ix

20 Hierarchical Earthquake Prediction Framework . . . . . . . . . . . . . . . . . 241 Dipti Rana, Charmi Shah, Yamini Kabra, Ummulkiram Daginawala, and Pranjal Tibrewal 21 Classification Accuracy Analysis of Machine Learning Algorithms for Gearbox Fault Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . 255 Sunil Choudhary, Naresh K. Raghuwanshi, and Vikas Sharma 22 Stock Price Forecasting Using Hybrid Prophet—LSTM Model Optimized by BPNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Deepti Patnaik, N. V. Jagannadha Rao, and Brajabandhu Padhiari 23 Identification of Genetically Closely Related Peanut Varieties Using Deep Learning: The Case of Flower-Related Varieties 11 . . . . 273 Atoumane Sene, Amadou Dahirou Gueye, and Issa Faye 24 Efficient Color Image Segmentation of Low Light and Night Time Image Enhancement Using Novel 2DTU-Net and FM2 CM Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Chandana Kumari and Abhijit Mustafi 25 An Architecture to Develop an Automated Expert Finding System for Academic Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Harshada V. Talnikar and Snehalata B. Shirude 26 A Seismcity Declustering Model Based on Weighted Kernel FCM Along with DPC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Ashish Sharma and Satyasai Jagannath Nanda 27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna with Defected Ground for Medical Devices . . . . . . . . . . . . . . 325 Archana Tiwari and A. A. Khurshid 28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense Circular Polarization Antenna for 5G/Wi-MAX and C-Band Satellite Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Krishna Chennakesava Rao Madaka and Pachiyannan Muthusamy 29 An Analytical Appraisal on Recent Trends and Challenges in Secret Sharing Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Neetha Francis and Thomas Monoth 30 A Comparative Study on Sign Language Translation Using Artificial Intelligence Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Damini Ponnappa and Bhat Geetalaxmi Jairam 31 WSN-IoT Integration with Artificial Intelligence: Research Opportunities and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Khyati Shrivastav and Ramesh B. Battula

x

Contents

32 Time Window Based Recommender System for Movies . . . . . . . . . . . 381 Madhurima Banerjee, Joydeep Das, and Subhashis Majumder 33 Approximate Multiplier for Power Efficient Multimedia Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 K. B. Sowmya and Rajat Raj 34 A Study on the Implications of NLARP to Optimize Double Q-Learning for Energy Enhancement in Cognitive Radio Networks with IoT Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Jyoti Sharma, Surendra Kumar Patel, and V. K. Patle 35 Automatic Generation Control Simulation Study for Restructured Reheat Thermal Power System . . . . . . . . . . . . . . . . . 419 Ram Naresh Mishra 36 Processing and Analysis of Electrocardiogram Signal Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Gursirat Singh Saini and Kiranbir Kaur 37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic Controller for Solar PV System Under Dynamic Irradiation Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 CH Hussaian Basha, G. Devadasu, Nikita Patil, Abhishek Kumbhar, M. Narule, and B. Srinivasa Varma 38 SmartFog: A Profit-Aware Real-Time Resource Allocation Strategy for Fog/Edge Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Ipsita Dalui, Arnab Sarkar, and Amlan Chakrabarti 39 A Comparative Approach: Machine Learning and Adversarial Learning for Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Madhura Mulimani, Rashmi Rachh, and Sanjana Kavatagi 40 Blockchain-Based Agri-Food Supply Chain Management . . . . . . . . . 489 N. Anithadevi, M. Ajay, V. Akalya, N. Dharun Krishna, and S. Vishnu Adityaa 41 Data Balancing for a More Accurate Model of Bacterial Vaginosis Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Jesús Francisco Perez-Gomez, Juana Canul-Reich, Rafael Rivera-Lopez, Betania Hernández Ocaña, and Cristina López-Ramírez 42 Approximate Adder Circuits: A Comparative Analysis and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Pooja Choudhary, Lava Bhargava, and Virendra Singh

Contents

xi

43 Effect of Traffic Stream Speed on Stream Equivalency Values in Mixed Traffic Conditions on Urban Roads . . . . . . . . . . . . . . . . . . . . 535 K. C. Varmora, P. J. Gundaliya, and T. L. Popat 44 Intelligent System for Cattle Monitoring: A Smart Housing for Dairy Animal Using IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Sanjay Mate, Vikas Somani, and Prashant Dahiwale 45 Energy-Efficient Approximate Arithmetic Circuit Design for Error Resilient Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 V. Joshi and P. Mane 46 Continuous Real Time Sensing and Estimation of In-Situ Soil Macronutrients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 G. N. Shwetha and Bhat GeetaLaxmi Jairam 47 Design and Development of Automated Groundstation System for Beliefsat-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591 Rinkesh Sante, Jatin Bhosale, Shrutika Bhosle, Pavan Jangam, Umesh Shinde, Kavita Bathe, Devanand Bathe, and Tilottama Dhake 48 Towards Developing a Deep Learning-Based Liver Segmentation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Snigdha Mohanty, Subhashree Mishra, Sudhansu Shekhar Singh, and Sarada Prasad Dakua 49 Review on Vision-Based Control Using Artificial Intelligence in Autonomous Ground Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Abhishek Thakur and Sudhanshu Kumar Mishra 50 Ensemble Learning Based Feature Selection for Detection of Spam in the Twitter Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 K. Kiruthika Devi, G. A. Sathish Kumar, and B. T. Shobana 51 Small-Scale Islanded Microgrid for Remotely Located Load Centers with PV-Wind-Battery-Diesel Generator . . . . . . . . . . . . . . . . . 637 Deepak Gauttam, Amit Arora, Mahendra Bhadu, and Shikha 52 A Review on Early Diagnosis of Lung Cancer from CT Images Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Maya M. Warrier and Lizy Abraham 53 A Context-Based Approach to Teaching Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 András Kakucs, Zoltán Kátai, and Katalin Harangus 54 On the Applicability of Possible Theory-Based Approaches for Ranking Fuzzy Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Monika Gupta and R. K. Bathla

xii

Contents

55 Change Detection of Mangroves at Subpixel Level of Synthesized Hyperspectral Data Using Multifractal Analysis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 Dipanwita Ghosh, Somdatta Chakravortty, and Tanumi Kumar 56 Analysis of the Behavior of Metamaterial Unit Cell with Respect to Change in Its Structural Parameters . . . . . . . . . . . . . 703 Shipra Tiwari, Pramod Sharma, and Shoyab Ali 57 Mid-Term Load Forecasting by LSTM Model of Deep Learning with Hyper-Parameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . 713 Ashish Prajesh, Prerna Jain, and Satish Sharma 58 A Comprehensive Survey: Benefits, Recent Works, Challenges of Optimal UAV Placement for Maximum Target Coverage . . . . . . . 723 Spandana Bandari and L. Nirmala Devi 59 Comparative Study Between Different Algorithms of Data Compression and Decompression Techniques . . . . . . . . . . . . . . . . . . . . 737 Babacar Isaac Diop, Amadou Dahirou Gueye, and Alassane Diop 60 A Unique Multident Wideband Antenna for TV White Space Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745 Ankit Meghwal, Garima Saini, and Balwinder Singh Dhaliwal 61 Development of a Deep Neural Network Model for Predicting Reference Crop Evapotranspiration from Climate Variables . . . . . . 757 T. R. Jayashree, N. V. Subba Reddy, and U. Dinesh Acharya 62 A Novel Efficient AI-Based EEG Workload Assessment System Using ANN-DL Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771 R. Ramasamy, M. Anto Bennet, M. Vasim Babu, T. Jayachandran, V. Rajmohan, and S. Janarthanan Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783

About the Editors

Prof. Rajendra Prasad Yadav is currently working as a Professor-HAG in the Department of Electronics and Communication Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India. He has more than four decades of teaching and research experience. He was instrumental in starting new B.Tech, M.Tech courses and formulating Ph.D Ordinances for starting research work in Rajasthan Technical University (RTU) Kota and other affiliated Engg colleges as Vice Chancellor of the University. He has served as HOD of Electronics and Comm. Engg., President Sports and Library, Hostel warden, Dean Student Affairs at MNIT Jaipur. At present he is also the Chief Vigilance Officer of MNIT Jaipur since 2015. Prof. Yadav received the Ph.D degree from MREC Jaipur and M.Tech degree from IIT Delhi. Under his supervision 15 Ph.D students have received Ph.D degree, 7 students are working for their Ph.D degree. Forty M.Tech students have carried out their dissertation work under his guidance. He has published more than 200 peer review research papers which has received 1150 citations. His research interest are Error Control Codes and MIMO-OFDM, RF and Antenna Systems, Mobile and Wireless Communication Systems, Optical Switching and Materials, Mobile Adhoc and Sensor Network, Device Characterization and MEMS, Cognitive Radio Networks. Dr. Satyasai Jagannath Nanda is an assistant professor in the Department of Electronics and Communication Engineering, Malaviya National Institute of Technology Jaipur since June 2013. Prior to joining MNIT Jaipur he has received the PhD degree from School of Electrical Sciences, IIT Bhubaneswar and M. Tech. degree from Dept. of Electronics and Communication Engg., NIT Rourkela. He was the recipient of Canadian Research Fellowship- GSEP, from Dept. of Foreign Affairs and Intern. Trade (DFAIT), Govt. of Canada for the year 2009-10. He was awarded Best PhD thesis award at SocPros 2015 by IIT Roorkee. He received the best research paper awards at SocPros-2020 at IIT Indore, IC3-2018 at SMIT Sikkim, SocPros-2017 at IIT Bhubaneswar, IEEE UPCON-2016 at IIT BHU and Springer OWT-2017 at MNIT. He is the recipient of prestigious IEI Young Engineers Award by Institution of Engineers, Govt. of India in the field of Electronics and Telecommunication Engineering for the year 2018-19. Dr. Nanda is a Senior Member of IEEE and IEEE xiii

xiv

About the Editors

Computational Intelligence Society. He has received travel and research grants from SERB, UGC, CCSTDS (INSA), INAE. Till date he has published 40 SCI/SCOUPUS Journal articles and 50 international conference proceedings which received almost twelve hundred citations. He is the in-charge of Digital Signal and Image Processing (DSIP) Lab. at MNIT Jaipur. Under his supervision at MNIT Jaipur six researchers have awarded PhD and five researchers are continuing their research work. Along with it he has supervised 22 M. Tech thesis. Dr. Nanda is co-coordinator of Electronics and ICT Academy at MNIT Jaipur which is a set-up of Ministry of Electronics and IT, Govt. of India of Grant 10 Crore. Dr Prashant Singh Rana is presently working as Associate Professor in the Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Patiala, Punjab. He received his both PhD and MTech from ABV-IIITM, Gwalior. His areas of research are Machine Learning, Deep Learning, Bioinformatics and Optimization. He has published more than 70 research papers in different journals and conferences. He completed five projects sponsored by DST, ICMR, NVIDIA and one projects are going on. He published 10 patents. He guided seven PhD students and 18 Masters students. Dr Meng-Hiot Lim is a faculty in the School of Electrical and Electronic Engineering. He is holding a concurrent appointment as a Deputy Director for the M.Sc in Financial Engineering and the Centre for Financial Engineering, anchored at the Nanyang Business School. He is a versatile researcher with diverse interests, with research focus in the areas of computational intelligence, evolvable hardware, finance, algorithms for UAVs and memetic computing. He is currently the Editor-inChief of the Journal of Memetic Computing published by Springer. He is also the Series Editor of the book series by Springer titled “Studies in Evolutionary Learning and Optimization”.

Chapter 1

Optimized Watermarking Scheme for Copyright Protection of Medical Images Rohit Thanki

and Purva Joshi

1 Introduction Image sharing is easy with today’s open-access media. However, attackers or imposters can manipulate the images on open-access media. This leads to copyright issues. Because of this, if an image is shared on an open-access medium, it must be protected by copyright. Watermarking can be used to solve this problem [1– 11]. Using an embedding factor, watermarked content is generated from the cover image. There are many types of watermarking based on the content of their covers, the processing domains, the attack, and the extraction method [1, 6]. Watermarks can be applied as text watermarks, image watermarks, or signal watermarks, depending on the content of the cover. Watermarking can be classified into four types based on the processing domain: spatial domain, transform domain, a hybrid domain, and sparse domain. According to their resistance to attacks, watermarks can be classified as robust or fragile. Additionally, blind, non-blind, and semi-blind watermarking can be classified as types of watermarking based on how they are extracted. According to the literature [1–11], watermark embedding is performed using an embedding factor. The watermark information is inserted or modified into the cover medium’s content using the embedding factor value. In the literature, researchers have created watermarked information using their embedding factors. Unfortunately, there is no standard for embedding factors. Therefore, an optimized process is required for embedding factor standardization. The optimization process will identify the optimal embedding factor to produce the best results for the watermarking algorithm.

R. Thanki (B) Senior Member, IEEE Gujarat Section, Rajkot, India e-mail: [email protected] P. Joshi University of Pisa, Pisa, Italy © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_1

1

2

R. Thanki and P. Joshi

The watermarking algorithm uses various optimization algorithms to find an optimal embedding factor. The embedding factor is determined by watermarking evolution parameters (called a fitness function, f ). Watermark transparency is measured by peak signal-to-noise ratios (PSNR) or weighted PSNRs. Structural similarity index measures (SSIMs), bit error rates (BERs), and bit correction rates (BCRs) are used to measure robustness. Watermark size and cover data size are used to calculate payload capacity. Transparency, robustness, and payload capacity are measured by PSNR, NC, and PC, respectively. As a result, the fitness function may look like this: f m = PSNRm + w1 · NCm + w2 · PCm

(1)

where m is a no. of iteration and w1 and w2 are weighted factors. The embedding factors are used to generate an image with a watermark. To obtain a watermarked image, embedding factors were used according to the researchers’ specifications. It is time consuming to select embedding factors manually. The best embedding factor is thus determined by algorithms regarding watermarking techniques. Embedding factors are affected by cover data, watermarks, and embedding processes. An array of optimization algorithms can currently be used to accomplish this, including genetic algorithms (GAs) [12], particle swarm optimizations (PSOs) [13], genetic programming (GP) [14, 15], differential evaluations (DAs) [16], simulated annealing (SAs) [17], tabu searches [18], colony optimizations [19], and harmony searches [20]. Due to its ease of understanding and implementation, the PSO algorithm is widely used for optimizing embedding factors in watermarking [21]. The literature contains a few watermarking schemes using PSO to protect medical images from copyright infringements [22–25]. Findik et al. [22] proposed a PSO watermarking scheme for cover color images. A pseudo-random noise generator was used to determine where the watermark image should be embedded. The watermark bit was inserted into a block of the cover image using PSO after finding the best pixel locations in this block. The payload capacity of this scheme is smaller than that of a blind scheme. Watermarking schemes based on PSO and discrete wavelet transform (DWT) have been proposed by Fakhari et al. [23] and Wang et al. [24]. An image with a grayscale medical cover and a greyscale standard was proposed by the Fakhari scheme, while the Wang scheme proposed the standard grayscale image. It was a non-blind scheme in both cases. There is, however, a limited payload capacity with these two schemes. Wang’s scheme is robust against standard watermarking attacks, whereas Fakhari’s is robust against geometric attacks. A DWT watermarking scheme was presented by Chakraborty et al. [25]. A detailed wavelet sub-band of the host image is augmented with PN sequences by watermark bits. The embedding factors for watermarks are found using PSO. The authors do not discuss the scheme’s robustness against attacks in their paper. A new blind watermarking scheme is developed and proposed for medical images in this paper to overcome some limitations of the existing techniques [22, 23]. Watermarking schemes using RDWTs offer better transparency and payload capacity than

1 Optimized Watermarking Scheme for Copyright Protection of Medical …

3

those using wavelet transforms alone. This scheme has the following main features: (1) uses RDWT and PSO properties to overcome some of the limitations of standard watermarking procedures, such as selecting the embedding factor and limited payload capacity; (2) watermark images can be blindly extracted, which cannot be done using the existing scheme [23, 25]. The proposed scheme is robust based on the results. Moreover, the proposed scheme provides greater transparency and payload capacity than the existing schemes such as Fakhari’s scheme [23] and Chakraborty’s scheme [25]. The proposed scheme selects embedding factors according to an optimization mechanism. Watermarking schemes are improved by adding a mechanism for optimization to the traditional trade-offs. It proceeds as follows: Sect. 2 describes a proposed blind and optimized watermarking scheme. A comparison of the proposed scheme and the experimental results is presented in Sect. 3. The conclusion of the paper is presented in Sect. 4.

2 Proposed Scheme The PSO algorithm implements a robust and blind watermarking scheme [25]. This scheme uses digital wavelet transform (DWT). However, the payload capacity of this scheme is very low. Additionally, watermarked images are less transparent with this scheme. Moreover, this scheme applies only to certain types of images or signals. Using block RDWT and PSO, we propose an approach to embed the monochrome watermark directly into the LH, HL, and HH wavelet sub-bands of the cover medical image. HH, HL, and LH, which contain detailed wavelet sub-bands, provide less visual information for embedding watermarks. As a result, the proposed scheme embeds the watermark in the detailed wavelet sub-bands. Sub-bands are divided into non-overlapping blocks. Watermark bits 0 and 1 are embedded using two uncorrected noise sequences. As a result of each noise sequence, the coefficients of its corresponding sub-band are modified using the optimal embedding factor. The PSO algorithm determined an optimal embedding factor. Embedding and extraction are two processes in the proposed scheme.

2.1 Embedding Process The steps for the embedding process are given below. Step 1. A single-level RDWT is used to separate the cover medical image into wavelet sub-bands such as LL (approximation sub-band) and LH, HL, and HH (detail sub-bands). Step 2. Monochrome watermark images are converted into binary sequences. Step 3. Create non-overlapping blocks from detailed wavelet sub-bands LH, HL, and HH.

4

R. Thanki and P. Joshi

Step 4. The block size is equal to the size of two uncorrelated noise sequences generated by a noise generator. S0 and S1 are the noise sequences for watermark bits 0 and 1. Step 5. The watermark sequence modifies the coefficients of detailed wavelet subbands based on an optimal embedding factor (k) which is obtained from PSO. For each block of cover medical image, this procedure was conducted for all coefficients. Step 6. A modified wavelet sub-band is the output as the result of step 5. After that, apply single-level inverse RDWT to these modified sub-bands and unmodified approximation sub-bands in order to generate the watermarked medical image.

2.2 Extraction Process The steps for the extraction process are given below. Step 1. A watermarked medical image is decomposed into different wavelet subbands using a single-level RDWT, including approximation sub-bands and detailed sub-bands like WLH, WHL, and WHH. WLH, WHL, and WHH should be converted into non-overlapping blocks. Step 2. Take uncorrelated noise sequences that are generated during embedding. Step 3. Using the correlation between the noise sequences (S 0 , S 1 ) and the detailed wavelet coefficients (WLH, WHL, and WHH), recover the watermark bit from the detailed wavelet coefficients of the watermarked medical image. The correlation result of coefficients with noise sequence S 1 and noise sequence S 0 indicates C 1 and C 0 . Step 4. Whenever C 0 > C 1 , bit 0 is selected as the watermark bit value. Otherwise, bit 1 is chosen. Step 5. The recovered watermark image is created by reshaping the extracted sequence into the matrix.

2.3 Generation of Optimal Embedding Factor Any watermarking scheme depends on the embedding factor to meet its basic requirements. Watermarked images with large embedding factors have lower transparency, but recovered watermarked images with high embedding factors have better quality. Many existing schemes in the literature keep the embedding factor constant, but these do not work well with multimedia data. As a result, some adaptive schemes are needed to calculate the appropriate embedding factors for various multimedia data types. This paper combines a block RDWT-based watermarking scheme with a popular optimization algorithm, PSO, to provide an optimal embedding factor value. According to the proposed scheme, PSNR and NC can be used to calculate fitness

1 Optimized Watermarking Scheme for Copyright Protection of Medical … Table 1 Obtained optimal embedding factor for proposed scheme

Embedding factor range

k1

k2

5 k3

0.0–1.0

0.5483

0.6346

0.8362

0.0–2.0

1.5718

1.8092

1.6667

0.0–3.0

2.0803

2.7933

2.6394

0.0–4.0

3.8015

3.8167

3.8451

0.0–8.0

7.6857

6.2683

5.6797

0.0–10.0

8.0542

9.8810

7.6335

0.0–50.0

42.8845

49.0747

44.0159

0.0–150.0

143.4951

148.4743

142.2608

0.0–250.0

233.1310

225.7728

218.1750

functions for each particle’s population. Optimal solutions (gbest) are selected by selecting the maximum fitness values. Equation (2) gives the fitness function:    fitness = PSNR(C, WA) + 100 ∗ NC w, w

(2)

The above equation represents the peak signal-to-noise ratio defined by PSNR and the normalized correlation by NC. A cover medical image is indicated by variable C, while variable WA indicates a watermarked medical image, and a watermark image is indicated by variables w, w. According to the experimental results, the proposed fitness function works well. In this case, the PSO parameters are selected to help compare schemes, their values are provided as constants C1 and C2, the number of particles is 5, the number of iterations is 8, and the initial weight α is 0.9. Some trial and experimental medical images found the best range between 0.0 and 250.0. The PSO algorithm provided the optimal embedding factor given in Table 1. The embedding factor k is represented in Table 1 for the generation of watermarked medical images.

3 Experimental Results An MRI image with grayscale (512 × 512 pixels) [26] is used as a cover medical image and the hospital logo (64 × 64) as a watermark image in testing the proposed scheme (Fig. 1). In this example, the watermark image is inserted directly into the cover medical image. The resultant images using the proposed scheme are shown in Fig. 2. Table 2 summarizes the PSNR and NC values based on the proposed scheme. Using this measurement, the PSNR measures the imperceptibility (transparency) of the embedded watermark in the cover image. As part of the study, NC also measured the robustness of extracting watermarks from watermarked medical images. A T EMB (s) and T EXT (s) indicate how long it takes to embed the watermark into a cover

6

R. Thanki and P. Joshi

Fig. 1 a Cover (MRI) image, b watermark image Embedding Factor Range

0.0 – 1.0

0.0 – 2.0

0.0 – 3.0

0.0 – 4.0

0.0 – 8.0

0.0 – 10.0

0.0 – 50.0

0.0 – 15.0

0.0 – 250.0

Watermarked Image

Recovered Watermark Image Embedding Factor Range

Watermarked Image

Recovered Watermark Image Embedding Factor Range

Watermarked Image

Recovered Watermark Image

Fig. 2 Generated watermarked medical images and recovered watermark image

1 Optimized Watermarking Scheme for Copyright Protection of Medical …

7

medical image and how long it takes to extract the watermark from that image. In total, this algorithm generates watermarked medical images in around 3 s. Table 3 summarizes the NC values of recovered watermark images for different watermarking attacks. This scheme can also provide robustness for medical images, as shown by the results. This indicates that telemedicine applications can use this scheme to secure medical images. Based on Table 4, the proposed scheme is compared with the Fakhari and Chakraborty schemes that provide similar copyright protection for medical images. For embedding the watermark image, the Fakhari scheme [23] and the Chakraborty scheme [25] use DWT, whereas the proposed scheme uses RDWT. Fakhari’s scheme [23] has a payload capacity of 10 bits, while Chakraborty’s scheme Table 2 Quality measures values of proposed scheme Range of embedding factor

PSNR (dB)

NC

T EMB (s)

T EXT (s)

0.0–1.0

63.02

0.6805

1.9943

1.5326

0.0–2.0

55.05

0.7872

1.4002

1.5577

0.0–3.0

51.58

0.8493

1.4411

1.5326

0.0–4.0

47.95

0.9035

1.4463

1.5061

0.0–8.0

43.14

0.9475

1.3609

1.6488

0.0–10.0

40.88

0.9667

1.3595

1.5823

0.0–50.0

26.44

0.9862

1.5518

1.7159

0.0–150.0

16.37

0.9975

1.5061

1.6962

0.0–250.0

12.50

0.9996

1.5345

1.6310

Table 3 NC values of proposed scheme under different watermarking attacks Attacks

NC

Attacks

NC

JPEG (Q = 80)

0.9996

Motion blurring

0.8067 0.9996

JPEG (Q = 25)

0.9993

Gaussian blurring

Median filtering (3 · 3)

0.9979

Sharping

1.0000

Gaussian noise (variance = 0.005)

0.9996

Histogram equalization

1.0000

Salt and pepper noise (variance = 0.005)

0.9996

Rotation (20°)

0.5649

Speckle noise (variance = 0.005)

0.9996

Cropping (20%)

0.9996

Intensity adjustment

1.0000

Scaling (512–256–512)

0.9755

Table 4 Performance comparison of proposed scheme with the existing schemes [23, 25] Features

Fakhari scheme [23]

Chakraborty scheme [25]

Proposed scheme

Used transform

DWT

DWT

RDWT

PSNRMax (dB)

51.55

25.7253

63.02

Payload capacity (bits)

10

1024

4096

8

R. Thanki and P. Joshi

Table 5 Performance comparison of proposed scheme with recently published schemes (2022, 2021) [23, 25] Features

Rezaee scheme [27]

Sharma scheme [28]

Golda scheme [29]

Proposed scheme

Used optimization algorithm

Whale

Firefly

Social group

Particle swarm optimization

PSNRMax (dB)

39.87

57.58

23.78

63.02

NCMax

0.9807

Not reported

Not reported

0.9996

[25] has a payload capacity of 1024 bits, which is less than the proposed scheme’s payload capacity. Compared to the Fakhari scheme [23] and the Chakraborty scheme [25], the proposed scheme performs better in transparency and payload capacity. Furthermore, with higher PSNR values, the transparency of watermark medical images is improved. As a result, Table 4 shows that the proposed scheme has a higher PSNR than the existing schemes, which implies that it has a higher degree of transparency. Table 5 summarizes the performance comparison of the proposed scheme with that of recently published schemes (2022, 2021) [27–29]. The comparison is based on PSNR and NC values and an optimization algorithm. Compared with recently published schemes [27–29], this proposed scheme outperformed them.

4 Conclusion For copyright protection of medical images, we present a watermarking scheme based on RDWT and PSO. In this case, the RDWT is used to increase payload capacity, while the PSO is used to generate optimal embedding factors. The proposed scheme for embedding a secret logo into medical images for copyright protection was secure and accurate. However, as a limitation of this proposed scheme, it can only watermark binary watermark images. Additionally, the proposed scheme performed better than the existing schemes.

References 1. Langelaar GC, Setyawan I, Lagendijk RL (2000) Watermarking digital image and video data. A state-of-the-art overview. IEEE Signal Process Mag 17(5):20–46 2. Thanki R, Borra S, Dwivedi V, Borisagar K (2017) An efficient medical image watermarking scheme based on FDCuT–DCT. Eng Sci Technol Int J 20(4):1366–1379 3. Lakshmi HR, Surekha B, Raju SV (2017) Real-time implementation of reversible watermarking. In: Intelligent techniques in signal processing for multimedia security. Springer, Cham, pp 113–132

1 Optimized Watermarking Scheme for Copyright Protection of Medical …

9

4. Thanki R, Borra S (2018) A color image steganography in hybrid FRT–DWT domain. J Inf Secur Appl 40:92–102 5. Thanki R, Dwivedi V, Borisagar K, Borra S (2017) A watermarking algorithm for multiple watermarks protection using SVD and compressive sensing. Informatica 41(4):479–493 6. Borra S, Lakshmi H, Dey N, Ashour A, Shi F (2017) Digital image watermarking tools: stateof-the-art. In: Information technology and intelligent transportation systems: proceedings of the 2nd international conference on information technology and intelligent transportation systems, vol 296, Xi’an, China, p 450 7. Surekha B, Swamy GN (2013) Sensitive digital image watermarking for copyright protection. IJ Netw Secur 15(2):113–121 8. Surekha B, Swamy G, Reddy KRL (2012, July) A novel copyright protection scheme based on visual secret sharing. In: 2012 third international conference on computing communication & networking technologies (ICCCNT). IEEE, pp 1–5 9. Dey N, Roy AB, Das A, Chaudhuri SS (2012, October) Stationary wavelet transformation based self-recovery of blind-watermark from electrocardiogram signal in wireless telecardiology. In: International conference on security in computer networks and distributed systems. Springer, Berlin, Heidelberg, pp 347–357 10. Dey N, Dey G, Chakraborty S, Chaudhuri SS (2014) Feature analysis of blind watermarked electromyogram signal in wireless telemonitoring. In: Concepts and trends in healthcare information systems. Springer, Cham, pp 205–229 11. Dey N, Ashour AS, Chakraborty S, Banerjee S, Gospodinova E, Gospodinov M, Hassanien AE (2017) Watermarking in biomedical signal processing. In: Intelligent techniques in signal processing for multimedia security. Springer, Cham, pp 345–369 12. Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press 13. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the 1995 IEEE international conference on neural networks, Perth, Australia, pp 1942–1948 14. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press 15. Koza JR (1994) Genetic programming as a means for programming computers by natural selection. Stat Comput 4(2):87–112 16. Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359 17. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 18. Glover F (1977) Heuristics for integer programming using surrogate constraints. Dec Sci 8(1):156–166 19. Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B 26(1):29–41 20. Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: harmony search. Simulation 76(2):60–68 21. Li X, Wang J (2007) A steganographic method based upon JPEG and particle swarm optimization algorithm. Inf Sci 177(15):3099–3109 22. Findik O, Babao˘glu ˙I, Ülker E (2010) A color image watermarking scheme based on hybrid classification method: particle swarm optimization and k-nearest neighbor algorithm. Opt Commun 283(24):4916–4922 23. Fakhari P, Vahedi E, Lucas C (2011) Protecting patient privacy from unauthorized release of medical images using a bio-inspired wavelet-based watermarking approach. Digital Signal Process 21(3):433–446 24. Wang YR, Lin WH, Yang L (2011) An intelligent watermarking method based on particle swarm optimization. Expert Syst Appl 38(7):8024–8029 25. Chakraborty S, Samanta S, Biswas D, Dey N, Chaudhuri SS (2013, December) Particle swarm optimization-based parameter optimization technique in medical information hiding. In: 2013 IEEE international conference on computational intelligence and computing research (ICCIC), IEEE, pp 1–6

10

R. Thanki and P. Joshi

26. MedPixTM Medical Image Database available at http://rad.usuhs.mil/medpix/medpix.html, https://medpix.nlm.nih.gov/home. Last access year: 2021 27. Rezaee K, SaberiAnari M, Khosravi MR (2022) A wavelet-based robust medical image watermarking technique using whale optimization algorithm for data exchange through internet of medical things. In: Intelligent healthcare. Springer, Singapore, pp 373–394 28. Sharma S, Choudhary S, Sharma VK, Goyal A, Balihar MM (2022) Image watermarking in frequency domain using Hu’s invariant moments and firefly algorithm. Int J Image Graph Signal Process 2:1–15 29. Golda D, Prabha B, Murali K, Prasuna K, Vatsav SS, Adepu S (2021) Robust image watermarking using the social group optimization algorithm. Mater Today Proc

Chapter 2

MobileNet + SSD: Lightweight Network for Real-Time Detection of Basketball Player Banoth Thulasya Naik

and Mohammad Farukh Hashmi

1 Introduction In the field of computer vision, sports video analysis is one of the important topics. Many kinds of research, in particular, emphasize field sports like basketball, soccer, and field hockey, which are extremely popular outdoor sports all over the world. Analysis of field sports videos can be used for a variety of purposes, including event detection and player/team activity analysis. Low-level structural processes, such as player detection, classification, and tracking, are required for high-level applications. Player detection is a challenging issue though it is generally the first step in sports video analysis. The reason is basketball being an extremely dynamic sport, with players continuously changing their positions and postures. This paper offers a robust and efficient system for real-time player detection in basketball. This system enables detection of the position of the player in sports videos. As a result, only, one camera is displayed at any given moment. The player’s size does not always remain consistent when the camera moves from one side of the court to other and zooms in at various points. Furthermore, some body parts may appear to be foreshortened in comparison to others as a result of players changing their angle in relation to the camera. Moreover, it is extremely common for a player to be partially covered by other players. This paper focuses on basketball player detection using a dynamic camera infrastructure. In field sports, player detection algorithms must deal with a wide range of challenges, including changing lighting and weather conditions, as well as positional B. T. Naik (B) · M. F. Hashmi Department of Electronics and Communication Engineering, National Institute of Technology, Warangal, India e-mail: [email protected] M. F. Hashmi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_2

11

12

B. T. Naik and M. F. Hashmi

changes of players in pictures, such as size and rotation, depending on the camera viewpoint. Depending on the distance from the camera and the direction in which they travel, players may appear at varying sizes, resolutions, and orientations. As there is such a broad range of player uniform colors and textures, the team uniform and lighting have a significant impact on player appearance. An approach is proposed to address these problems for real-time player detection in basketball sports. The rest of the paper is organized as follows. Section 2 provides a review of the existing literature related to player detection. The proposed methodology which is used to achieve the desired results is discussed in Sect. 3. The experimental results and performance metrics related to the work are presented in Sect. 4. Finally, the conclusion and scope of future work is discussed in Sect. 5.

2 Literature Survey One of the most basic tasks in computer vision is extracting information from images. Researchers have developed systems that use structure-from-motion [1] to obtain geometric information for detection and classification to find semantic information. For a wide range of applications, detecting players from images and videos is significant [2]. Intelligent broadcast systems, for example, employ player positions to influence broadcast camera viewpoints. For instance, broadcast systems employ player positions to influence broadcast camera viewpoints [3, 4]. In [5] proposed a mechanism to classify group activities by detecting players. Player detection and team categorization also offer metadata for player tracking [6], player pose estimate, and team strategy analysis [7]. Player detection has attracted a lot of attention as a subset of object detection in sports. [8–10] presented background subtraction-based approaches to achieve real-time response in a basketball game. Nonetheless, in order to detect foreground objects reliably, all techniques assume the camera is stationary. Several learning-based approaches, such as Faster R-CNN [11] and YOLO [12], can be modified to identify players with high detection accuracy; however, due to poor pixel resolution, they may miss distant players.

3 Methodology Detection algorithm comprises dual mechanism, i.e., a backbone and head. As essence, the backbone is often a pre-trained network for classifying the images. Here, the network called MobileNet [13] is used as backbone which is trained using over a million images while SSD [14] is used as head as shown in Fig. 1. Therefore, SSD contains various fixed layers, and results were defined in terms of classes of predicted and ground truth bounding boxes at the final 1-dimensional fully connected layer.

2 MobileNet + SSD: Lightweight Network for Real-Time Detection …

13

Fig. 1 Architecture of MobileNetv1 + SSD detector

3.1 SSD Parameters 3.1.1

Grid Cell and Anchor Box

Player detection in image entails determining the class and position of an element in the immediate vicinity. For example, an image of 4 × 4 grid is shown in Fig. 2. Here, the grating concentrates on producing appropriate spacing and form. The anchor boxes are accessible fields that are provided to complete that portion of the picture which has distinct parts in the grating separately. Individual grid cells were assigned to SSD with various anchors or prefixes. The anchor boxes have complete control over the shape and size of each grid cell. Figure 2 Fig. 2 Example of 4 × 4 grid and different size anchor boxes

14

B. T. Naik and M. F. Hashmi

shows two players, one with a height of anchor box and the other with a width of anchor box, indicating that the anchor boxes are of various sizes. The class and location of an item is finalized by anchor boxes with a lot of intersections through it. This information is utilized to train the network as well as to forecast the detected object’s position once the network has been completed.

3.1.2

Zoom Level and Aspect Ratio

The size of the anchor boxes does not have to be the same as the grid cells. It is used to determine the degree to which the anchor box wants the posterior to move upward or downward using grid cells. As illustrated in Fig. 2 based on varying grades, certain items in form are broader while others are longer. The SSD architecture allows anchor boxes to have a higher aspect ratio. The various aspect ratios of anchor box links are described using the range of proportions.

3.1.3

Receptive Field

The permissible field input area is divided into a viewable zone using a specific convolutional neural network. Zeiler et al. [15] presented them as a back-line combination at the relative location using a distinguishing attribute and an actuation. The characteristics of distinct layers indicate the varying sizes of regions in the image due to the hampered process. This is done by pacing the bottom layer (5 × 5) first, followed by a console, resulting in a middle layer (3 × 3) with a single green pixel representing a 3 × 3 section of the input layer (bottom layer), as seen in Fig. 3. The convolution is then applied to the middle layer (green), with the upper red layer (2 × 2) containing the individual attribute equaling the input image’s 7 × 7 region. The feature map is a green and red 2D array that points to a collection of features produced using a comparable feature extractor at different places of the input map in the form of an indication window. A comparable field exists for similar map features, which attempts to find similar patterns in different locations. As a result, a convolutional network at the local level is constructed.

3.2 Implementation of Proposed Methodology TensorFlow, Keras, and OpenCV libraries were used to build deep learning models for player detection, which is akin to real-time object detection. First, the system was trained with known data, i.e., labeled basketball data so that a player who comes in any unseen frames or video can be detected. This paper deals with a pre-trained lightweight player detection model which is trained on third-party objects, and most of the objects are included in the training class except player. So, some of the layers of the proposed method were modified to train the model on labeled basketball

2 MobileNet + SSD: Lightweight Network for Real-Time Detection …

15

Fig. 3 Feature map visualization and the receptive field

data. Finally, combining the pre-trained network and the SSD approach, the system was ready to detect basketball players. Here, the pre-trained model (backbone) is MobieNetv1 which was combined with SSD to complete the architecture of player detection. Now, the architecture was trained by detecting labels from the training dataset based on the bounding boxes. Frames must be considered fixed resolution (512 × 512) for training the player detection model. MobileNetv1 was utilized as the backbone of SSD architecture to improve the detection accuracy and frame rate. To detect multiple players, this technique must take a single shot. So, SSD is the best architecture designed based on neural networks to detect objects in a frame or video. Other techniques such as R-CNN, Fast R-CNN, and Faster R-CNN require two shots. This increases the computation time as parameters increase, which reduces speed of detection. The SSD method divides the bounding output space into a sequence of default boxes with different proportions and sizes and recognizes the restricted output space as the default box. The network quickly analyzes a default box for the existence of particular object classes and combines the boxes to detect an exact object. This network also encompasses a variety of models of varying sizes of natural adhesives and resolutions. In case if no object is present in the frame, then it is considered as background and ignored.

4 Dataset and Experimental Results The proposed model carried out training on the basketball dataset [16] which was filmed during a basketball match (Resolution 1020 × 1080), and it contains a variable number of players, i.e., some frames contain 10 or 11 players while other frames contain 12 or 13 players of total 50,127 frames of which 40,300 frames are for training and 9827 frames are for testing. While training the proposed model, the resolution of frames was modified to 512 × 512, and various data augmentation techniques

16 Table 1 Configurations of experimental setup

B. T. Naik and M. F. Hashmi Model training/testing setup Names

Experimental configuration

OS

Windows 10 Pro

CPU/GHz

Intel Xeon 64 bit CPU @3.60

RAM/GB

64

GPU

NVIDIA Quadro P4000, 8 GB, 1792 Cuda cores

GPU acceleration library CUDA10.0, CUDNN7.4 TensorFlow

2.x

Keras

2.2.x

such as blur, flip, crop, affine, and contrast were applied to enhance the robustness of the model. The detection model was trained and tested using a workstation with the configurations listed in Table 1. Model training was stopped at 100 epochs, as it attained the minimum training and testing loss accuracy of 0.12 and 0.45; at the same time, it reached training and testing accuracy of 98.3% and 96.8%, respectively, as shown in Figs. 4 and 5. The model was set to save the checkpoints (i.e., weight file of detection model on various parameters) for every 10 epochs. The size of the final weight file achieved is 12.4 MB, is a lightweight player detection network which can be embedded on low-edge devices for real-time implementation and may achieve better accuracy in detecting the player in real time. Though the basketball match was captured using a dynamic camera, players were detected accurately, with the performance and robustness of the proposed method measured using four metrics whereas results were compared and tabulated as shown in Table 2. Figure 6 depicts that almost all the players were detected even though the movement of the camera changed while capturing the match, as shown between frame-8 to frame-199.

Fig. 4 Training precision and training loss with respect to number of epochs

2 MobileNet + SSD: Lightweight Network for Real-Time Detection …

17

Fig. 5 Testing precision and testing loss with respect to number of epochs Table 2 Comparative analysis of proposed detection algorithm with state of art methods on basketball dataset Architecture

Precision (%)

Recall (%)

F1-score (%)

FPS

Multiplayer detection [9]

88.65

92.19

90.39



Proposed method

92.1

73.8

81.3

57.2

The bold values represent the performance of the proposed methodology is superior in precision and FPS

Fig. 6 Detecting basketball players in a frame which is captured with a dynamic camera (the view of the camera changes and it can be observed from frame-8 to frame-199 in the figure)

18

B. T. Naik and M. F. Hashmi

5 Conclusion and for Future Research Scope In this paper, a lightweight network for real-time basketball player detection is proposed. The proposed mechanism effectively improves player detection speed of 57.2 fps. The experimental results on the proposed MobileNetv1 + SSD methodology achieved 92.1% precision and 81.3% f 1-score. The weight file obtained after training the model is 12.4 MB, which is a lightweight player detection network that is simple to deploy on low-edge embedded devices. In future work, in addition to addressing the above limitations, the proposed methodology is deployed in embedded devices such as PYNQ board and Jetson Nano, while other methods are considered to optimize the proposed method to enhance scalability.

References 1. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge 2. Thomas G, Gade R, Moeslund TB, Carr P, Hilton A (2017) Computer vision for sports: current applications and research topics. Comput Vis Image Underst 159:3–18 3. Chen J, Le HM, Carr P, Yue Y, Little JJ (2016) Learning online smooth predictors for realtime camera planning using recurrent decision trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4688–4696 4. Chen J, Little JJ (2017) Where should cameras look at soccer games: improving smoothness using the overlapped hidden Markov model. Comput Vis Image Underst 159:59–73 5. Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1971–1980 6. Lu W-L, Ting J-A, Little JJ, Murphy KP (2013) Learning to track and identify players from broadcast sports videos. IEEE Trans Pattern Anal Mach Intell 35(7):1704–1716 7. Lucey P, Oliver D, Carr P, Roth J, Matthews I (2013) Assessing team strategy using spatiotemporal data. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1366–1374 8. Carr P, Sheikh Y, Matthews I (2012) Monocular object detection using 3d geometric primitives. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 864–878 9. Liu J, Tong X, Li W, Wang T, Zhang Y, Wang H (2009) Automatic player detection, labeling and tracking in broadcast soccer video. Pattern Recogn Lett 30(2):103–113 10. Parisot P, De Vleeschouwer C (2017) Scene-specific classifier for effective and efficient team sport players detection from a single calibrated camera. Comput Vis Image Understand 159:74– 88 11. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149 12. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 13. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 14. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37

2 MobileNet + SSD: Lightweight Network for Real-Time Detection …

19

15. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Cham, pp 818–833 16. Citraro L, Márquez-Neila P, Savare S, Jayaram V, Dubout C, Renaut F, Hasfura A, Shitrit HB, Fua P (2020) Real-time camera pose estimation for sports fields. Mach Vis Appl 31(3):1–13

Chapter 3

Modified Hungarian Algorithm-Based User Pairing with Optimal Power Allocation in NOMA Systems Sunkaraboina Sreenu and Kalpana Naidu

1 Introduction With the rapid expansion of smart devices and multimedia applications, NonOrthogonal Multiple Access (NOMA) has evolved as cutting-edge technology for 5G networks since it boosts user connectivity, total capacity, and cell-edge user data rate [1–3]. Contrary to prevailing Orthogonal Multiple Access (OMA) [4, 5], NOMA simultaneously serves the massive number of users in the single resource block through “power domain multiplexing” [6]. Moreover, the NOMA system employs “Superposition Coding (SC)” at the transmitter to superimpose several user symbols. Furthermore, “Successive Interference Cancellation (SIC)” is conducted to isolate the respective user symbols at the receiver [7, 8]. In addition, NOMA assigns vast power to the bad channel state users (far users) and small power to the users having good channel conditions (near users) to preserve fairness among users [9, 10]. However, superimposing more users in the same resource unit leads to severe error propagation and higher latency [11, 12]. Therefore, efficient resource allocation in the NOMA systems plays a substantial role in enriching sum capacity, energy efficiency, and user fairness. Recently, many studies focused on the “User Pairing (UP) and Power Allocation (PA)” problems to enrich the overall system’s potency. Thus, in [13], the authors proposed the exhaustive search UP scheme, which in turn picks the better throughput user group among all user combinations. Nevertheless, in [13], the computational intricacy rises exponentially with an upsurge in users. Hence, this strategy is impractical for the massive number of users. Conversely, the Random Pairing (RP) algorithm S. Sreenu · K. Naidu (B) Department of Electronics and Communication Engineering, NIT Warangal, Warangal, India e-mail: [email protected] S. Sreenu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_3

21

22

S. Sreenu and K. Naidu

has been presented in [14], in which users are paired randomly, and it offers low complexity. But, RP performance is ineffective since users’ channel information is not considered for pairing. In Consequence, Near-Far Pairing (NFP) has been investigated in [15]; wherein NFP forms the user pairs according to the channel states of users. Although channel conditions are considered in [15], the group of users in the cell center constitutes a very small channel gain gap, leading to substantial interference formation among those users. Further, the author in [16] studied the Uniform Channel Gain Difference (UCGD) user grouping to circumvent the issues of low channel gain gap user pairs in near-far user pairing. Additionally, the virtual user pairing technique has been propounded in [17]; in which, user groups are paired optimally when the number of weak signal strength users is more than the strong users. However, to further upgrade the system performance, various PA algorithms have been investigated in the literature in addition to UP algorithms [18]. Additionally, the authors in [19] presented the low complexity user grouping and power allotment technique by employing the Fixed Power Allocation (FPA) approach for multiplexed users of each pair. Anyhow, FPA provides poor performance. Analogously, Fractional Transmission Power Allocation (FTPA) was proposed in [6], which is employed to allocate the fractional power between the sub-band users. Further, in [20], the Difference-of-Convex (DC) functions technique was investigated to improve spectral and energy efficiencies. However, this approach allocates the power between inter and intra-sub bands. Even though the PA scheme in [20] outperforms the FTPA, it provides the sub-optimal solution. Furthermore, an optimum power distribution approach was explored in [21] to optimize the system’s weighted sum rate while maintaining the Quality of Service (QoS) standards. In an analogous manner, a new resource allocation strategy has been studied in [22], in which Karush-Kuhn-Tucker (KKT) conditions have been used to solve the optimal power factors among sub-band users by satisfying BS transmit power and QoS of each user constraints. Moreover, authors in [22] exploited the Hungarian algorithm for user grouping. However, this approach has high complexity. In the existing literature, most of the works have not investigated sub-band power allocation in the NOMA system. The optimal PA among orthogonal sub-bands attains further improvement in the system performance [23], motivated by this fact. This article proposes an optimal resource allocation (i.e., UP and PA) algorithm, aiming to further enhance the system’s sum capacity and minimize the algorithm’s complexity. In this line, we develop the sum-rate maximization problem with the constraints of (a) total power available at the BS and (b) minimum required data throughput of users. However, finding the optimal global solution for the combined UP and PA problem is complex due to the problem’s non-convexity [24]. As a result, we divide this problem into two parts and solve them in the following order: • All the users are paired (two users on every sub-band) optimally based on the Modified Hungarian Algorithm (MHA). Moreover, MHA has lower complexity than the Hungarian algorithm [25, 26].

3 Modified Hungarian Algorithm-Based User Pairing with Optimal …

23

• We proposed Water Filling based Power Allocation (WFPA) to distribute the powers across sub-bands according to the channel conditions of each sub-band. • Then, this allocated sub-band power is optimally shared between sub-band users based on KKT optimal conditions. Here onwards, the suggested hybrid Resource Allocation (RA) method for NOMA systems is labeled as “MHA-WFPA-KKT”. Ultimately, the proposed hybrid RA method shows significant performance and lower complexity compared to the existing RA techniques. The remaining article is organized accordingly. Section 2 elucidates the downlink NOMA network model. Then, Sect. 3 describes the proposed MHA-based user pairing algorithm. Subsequently, Sect. 4 demonstrates a method for power distribution across sub-bands and a strategy for power allotment among users on each sub-band. Section 5 validates the effectiveness of the MHA-WFPA-KKT scheme through simulation results. Ultimately, the article is wrapped up in Sect. 6.

2 Network Model This work considers the downlink transmission model for the NOMA system. Here, the BS is established at the origin of the circular cell province. Besides, the diameter of the cell region is D meters, and M users are uniformly and randomly scattered in the cell. Further, the system’s bandwidth is BT , which is identically portioned into N Sub-Bands (SB), where bandwidth of each SB is Bn = BN . Moreover, the power N Pn = PT ; where, PT is total power available assigned to n th SB is Pn , such that n=1 at BS. Figure 1 depicts the fundamental concept of the multi-user downlink NOMA transmission model. The BS multiplex the Wn users symbols on each SB by using SC. Consequently, the superposed signal broadasted by the BS is given by, T

sn =



p1,n s1,n +



p2,n s2,n + ..... +

⇒ sn =

Wn  √



pWn ,n sWn ,n

pw,n sw,n

(1)

(2)

w=1

where, sn represents BS transmitted signal and pw,n is allocated power for wth user  Wn pw,n = Pn . Thus, the signal obtained at the wth user is written on the S Bn and w=1 by, (3) rw,n = gw,n sn + vw,n , ⇒ rw,n =



pw,n gw,n sw,n +

Wn  √ j=1, j=w

p j,n gw,n s j,n + vw,n .

(4)

24

S. Sreenu and K. Naidu

Fig. 1 Multi-user downlink system model of NOMA

In Eq. (4); gw,n = √hw,n−η denotes channel coefficient from BS to user-w on nth SB, dw where, h w,n is Rayleigh fading channel gain. Besides, distance from BS to user ‘w’ is denoted by dw and η represents path loss slope. In addition, vw,n is the Gaussian noise and its power σ 2 = No Bn , where No is the noise spectral density of vw,n . Let |g | 2 G w,n = σw,n2 is the “Channel gain to Noise Ratio (CNR)” of wth user on sub band ‘n’. With no loss of generality, according to CNRs, users’ are ordered as follows: G 1,n  G 2,n  G 3,n  .............  G Wn ,n .

(5)

The users in the NOMA network utilize the SIC technique to remove interference from far users on the SB [27]. Furthermore, the ascending order of CNRs is the optimal detecting sequence for SIC. According to this sequence, the user can perfectly interpret the signals of other users whose decoding order precedes that user. As a result, interference from poor channel state users can be eliminated by a user with substantial channel gain. For instance, consider two users in S Bn , and if G 1,n ≤ G 2,n . Then, BS allot the transmission power as p1,n ≥ p2,n . In this case, user-2 performs the SIC to remove interference caused by user-1 (i.e., far user) and then extracts its information. However, user-1 decodes its signal by presuming the user-2 signal as inter-user noise. Therefore, the “Signal to Interference plus Noise Ratio (SINR)” of wth user is represented as follows: Γw,n =

G w,n pw,n Wn  j=w+1

G w,n p j,n + 1

(6)

3 Modified Hungarian Algorithm-Based User Pairing with Optimal …

25

Then, capacity of wth user is obtained as,   Rw,n = Bn log2 1 + Γw,n

(7)

Therefore, overall throughput of the NOMA system can be expresses as, Rtotal =

Wn N  

Rw,n .

(8)

n=1 w=1

3 MHA-Based User Pairing The Hungarian Algorithm (HA) is one of the best combinatorial schemes for solving assignment problems. Since it provides globally optimal solutions. Moreover, HA is perfectly suitable for pairing the users on the SB in the NOMA systems to improve the sum rate [13, 22, 28]. Nevertheless, the computational intricacy of this algorithm is high. So, we proposed the modified Hungarian method for the problem of user pairing, which provides the same performance as HA but with lower complexity [29]. In order to pair the users, divide randomly deployed users into two user groups as follows: (a) strong users’ group gs = (U1 , U2 , ...., Ui ), (b) weak users’ group gw = (U1 , U2 , ...., U j ). Further, gs and gw represent rows and columns in the cost function, respectively. Therefore, the cost function is mathematically constructed as R = Ci j ; i, j = {1, 2, 3, ..., M/2}. Where, Ci j is the sum of ith strong user and jth weak user achievable data rates. For instance, 10 users are deployed in the cell and assume gs = (U1 , U2 , U5 , U8 , U10 ) and gw = (U3 , U4 , U6 , U7 , U9 ) then the corresponding cost matrix is given in Table 1. Then, the MHA pairs the strong user with the weak on a specific sub-band at the maximum sum rate. The steps of the MHA-based user pairing are detailed lucidly in Algorithm 1.

Table 1 MHA-based user pairing cost matrix description for sum rate optimization Strong users Weak users U3 U4 U6 U7 U9 U1 U2 U5 U8 U10

C1,3 C2,3 C5,3 C8,3 C10,3

C1,4 C2,4 C5,4 C8,4 C10,4

C1,6 C2,6 C5,4 C8,4 C10,4

C1,7 C2,7 C5,7 C8,7 C10,7

C1,9 C2,9 C5,9 C8,9 C10,9

26

S. Sreenu and K. Naidu

Algorithm 1. MHA-based user grouping scheme 1: Construct the cost function Ci j . 2: Obtain the maximum element from the whole cost function. Then do the subtrac-

tion between the largest element and each element of the cost matrix. 3: Identify the least possible element from each row and perform the subtraction

among the minor element and every element in that specific row. 4: Similarly, identify the minimum number from each column and subtract it from

every element in that column. 5: Draw the least number of lines on rows and columns required to cover all the

zeros in the resulting cost function obtained from step 1 to step 4. 6: If the minimal number of lines (K ) and order of the cost function (k) are differed

by one (i.e., k − K = 1), then do the partial pairing in the following way: (i): Mark all the zeros of single zero rows and columns with a circle and cross the rest of the zeros in the corresponding columns and rows. (ii): If there are no unmarked zeros, proceed to step 7. But, if there exists more than one unmarked zero, randomly mark one of them with a circle and cross the remaining zeros. Continue this procedure until there are no unmarked zeros in the cost function. 7: Subsequently, we found the ‘marked zeros’ assignment of the (k − 1) rows and (k − 1) columns of the cost matrix. Hence, one row and one column element remain to which no selection has been made. 8: Eventually, the smallest of all uncovered elements is the element at the crosssection of row and column to which no earlier selection has been made. The best solution will be a combination of partial assignment and this extra assignment (all assigned elements treated as matched user pairs).

In HA, we must modify the cost matrix until we get K = k, which takes more arithmetic operations. Whereas in MHA, we perform the assignment (pairing) when k − K = 1, this procedure reduces the algorithm’s complexity and provides optimal pairs.

4 Proposed Power Allocation Methods Once the user pairs on the sub-bands have been determined, this part investigates the power allocation issue, which blends the WFPA with the KKT optimal conditions in order to enrich the aggregate data rate even more.

3 Modified Hungarian Algorithm-Based User Pairing with Optimal …

27

4.1 Sub-band Power Allocation The proposed MHA-based user pairing method grouped the only two users in each sub-band. As a result, every SB user can decode its signal perfectly. Besides, consider that the sum of the paired users’ channel gains on nth sub-band equals to ψn . As previously mentioned, existing RA methods split the total power (PT ) equally across all sub-bands. Even though Equal Power Allocation (EPA) is simple, it produces a sub-optimal solution. The powers allotted to SB will impact the its possible comprehensive data rate. Hence, in order to find optimal power for every SB, we exploited the water-filling algorithm [31]. Therefore, the optimization problem for sub-band power allocation is formulated as, max Pn

N 

Bn log2 (1 + Pn ψn )

(9)

n=1

subject to :

N 

Pn  PT

(10)

n=1

Here, Eq. (9) is a convex optimization problem. So, we solve the above optimization problem by exploiting Lagrange multiplier method. Finally, the closed-form solution for the nth sub-band power is provided by, Pn ∗ =

and



1 1 − δ ψn

N   1 n=1

1 − δ ψn

+

+

(11)

=PT

(12)

where, δ is Lagrange multiplier.

4.2 Power Assignment for Sub-band Users In this part, our goal is to optimize the sum-rate under the total BS transmission power and also each user’s minimum rate constraints. Accordingly, the sum-rate maximization issue is developed as follows: max pw,n

Wn N   n=1 w=1

Rw,n

(13)

28

S. Sreenu and K. Naidu

subject to :

Wn N  

pw,n ≤ PT

(14)

n=1 w=1

Rw,n  win w,n , w = 1, 2, ...., Wn

(15)

where win w,n is the minimal required data rate of wth user of S Bn . Equation (14) indicates the total power constraint and Equation (15) guarantees the each user’s minimal required rate. According to the optimization problem in Equation (13), the total data rate can be find by summing all the sub-bands aggregate throughput. Therefore, to maximize the systems’ sum rate, the data throughput of each sub-band must be optimized, so that optimisation problem can be modified as, max Rn = pw

Wn 

Rw

(16)

pw = Pn

(17)

w=1

subject to :

Wn  w=1

Rw  win w

(18)

We can achieve the optimal powers for the superimposed users on each SB in a closed-form, that satisfies the KKT [30] conditions. The Lagrange function of the formulated problem (16) is obtained as, L ( pw , λ, μw ) = Bn

Wn 

log2 (1 + pw γw ) − λ

w=1



Wn 

μw

W n 

pw − Pn

w=1

 win w

− Bn

w=1

Wn 



log2 (1 + pw γw )

(19)

w=1

where λ, μw denotes the Lagrange multipliers and γw =

Wn 

G w,n

,

pi G w,n +1

i=w+1

= Bn

Wn 

w=1

(1 + μw ) log2 (1 + pw γw ) − λ

W n  w=1

pw − Pn −

Wn 

μw win w .

(20)

w=1

Karush-Kuhn-Tucker conditions are attained in the following way:   Bn 1 + μ∗w γw ∂L  − λ∗ = 0, ∀w ∈ Wn  = ∂ pw∗ ln 2 1 + pw∗ γw

(21)

3 Modified Hungarian Algorithm-Based User Pairing with Optimal …

λ



W n 

29

pw∗

− Pn

= 0, ∀w ∈ Wn

(22)

  ∗ log2 1 + pw γm = 0, ∀w ∈ Wn

(23)

w=1

 μ∗w

win w

− Bn

Wn  w=1

W n 

pm∗ − Pn

= 0, ∀w ∈ Wn

(24)

  log2 1 + pw∗ γw = 0, ∀w ∈ Wn

(25)

w=1

 − Bn win w

Wn  w=1

λ∗  0 and μ∗w  0, ∀w ∈ Wn

(26)

If λ∗ , μ∗w are greater than zero, the optimal solution is obtained [22, 32]. Therefore, Wn 

pw∗ = Pn , ∀w ∈ Wn

(27)

w=1

win w

= Bn

Wn 

  log2 1 + pw∗ γw , ∀w ∈ Wn

(28)

w=1

The closed-form result of the optimal power allocations within the Wn -user SB can be given as,

win w 1 2 Bn − 1 , ∀w ∈ {2, 3, ...., Wn } (29) pw∗ = γw and p1 = Pn −

Wn 

pw∗ .

(30)

w=2

5 Simulation Results This section evaluates the effectiveness proposed MHA-WFPA-KKT resource allocation approach for the downlink NOMA system through MATLAB simulations. We presume that BS is positioned at the central point of the cell with perfect CSI. In addition, the BS transmission power is set at 30 dBm, and the system bandwidth is 5 MHz. Also, assume that the noise spectral density (No ) is equal over all SBs. Besides, Table 2 presents the detailed simulation settings for the proposed system.

30 Table 2 Simulation specifications Parameter’s name No. of users (M) No. of sub bands (N) Cell diameter (D) System bandwidth (BT ) Noise spectral density (No ) Channel model Path loss (υ) FTPA decay factor (α) Min. rate for strong user (min ) s Min. rate for weak user (min ) w

S. Sreenu and K. Naidu

Value 10 5 1000 m 5 MHz −174 dBm/Hz Rayleigh fading channel 3.7 0.4 1 Mbps 0.5 Mbps

Fig. 2 Proposed MHA based user pairing algorithm performance against various stat-of-art pairing schemes and OMA

Figure 2 portrays the performance of the MHA-based UP scheme compared with existing user pairing strategies. Here, equal power is distributed to the orthogonal sub-bands (i.e., Pn = PNT ), and FTPA has been employed for intra-sub-band power allocation. Further, Fig. 2 corroborated that the MHA-based UP scheme outperforms the UCGD pairing, NFP, RP, and OMA system in terms of the system’s sum throughput. Moreover, the overall data rate improves as transmission power increases from 0 to 1 Watt. For instance, if PT is fixed at 0.5 Watt, the proposed pairing performance is 4.5%, 8.5%, 20%, and 56.85% more than the UCGD pairing, NFP, RP, and OMA, respectively. Further, MHA provides equal performance as HA with lower complexity. Figure 3 exhibits the relationship between total cell throughput and transmit power among three diverse power assignment algorithms. We can notice from Fig. 3 that as PT grows, the overall cell throughput also increases. Furthermore, when compared to

3 Modified Hungarian Algorithm-Based User Pairing with Optimal …

31

Fig. 3 Comparison of power allocation techniques

Fig. 4 Sum rate comparison among proposed resource allocation method, existing resource allocation methods and OMA system

the EPA-FTPA algorithm, the suggested technique performed better at low transmit power and was closer to the EPA-FTPA at high transmit power. The EPA-FTPA algorithm had poor performance because the channel conditions were not considered in sub-band PA, and FTPA is a sub-optimal PA scheme. Although the EPA-KKT scheme obtained higher throughput than the EPA-FTPA approach, it has to optimize the inter-sub band powers. In this paper, the WFPA-KKT algorithm furnished optimal solutions for inter and intra-sub band power allocations and provided better sum-rate performance. Figure 4 displays impact of PT on sum-throughput with 10 users assumed in cell. From Fig. 4, it can be inferred that as PT changes from 0 to 1 W, the sum rate of the system also increases for all resource allocation methods. In addition, the proposed

32

S. Sreenu and K. Naidu

hybrid MHA-WFPA-KKT system throughput outperformed all existing HNG-EPAKKT, UCGD-EPA-FTPA, NFP-EPA-FTPA, RP-EPA-FTPA, and OMA systems.

6 Conclusion In this paper, the sum-rate maximization problem that guarantees the BS transmission power is resolved by fulfilling each user’s minimum required throughput constraints. Thus, to untangle this optimization problem, the modified Hungarian algorithm is proposed for user grouping and optimal power allocation for every sub-band users. Moreover, the proposed WFPA provides better performance against equal power allocation, as WFPA yields optimal power distribution among sub-bands. Furthermore, compared to the Hungarian method, the proposed MHA for user pairing is lesser complex. Ultimately, the simulations exhibited that the sum data rate of the proposed MHA-WFPA-KKT is superior to the prevailing resource allocation methods for NOMA and OMA systems.

References 1. Dai L, Wang B, Ding Z, Wang Z, Chen S, Hanzo L (2018) A survey of non-orthogonal multiple access for 5G. IEEE Commun Surv Tutorials 20(3):2294–2323 2. Islam SMR, Zeng M, Dobre OA, Kwak K (April 2018) Resource allocation for downlink NOMA systems: key techniques and open issues. IEEE Wirel Commun 25(2):40–47 3. Liu Y, Qin Z, Elkashlan M, Ding Z, Nallanathan A, Hanzo L (Dec 2017) Nonorthogonal multiple access for 5G and beyond. Proc IEEE 105(12):2347–2381 4. Naidu K, Ali Khan MZ, Hanzo L (July 2016) An efficient direct solution of cave-filling problems. IEEE Trans Commun 64(7):3064–3077 5. Kalpana, Sunkaraboina S (24 Dec 2021) Remote health monitoring system using heterogeneous networks. Healthc Technol Lett 9(1–2):16–24 6. Saito Y, Benjebbour A, Kishiyama Y, Nakamura T (2013) System-level performance evaluation of downlink non-orthogonal multiple access (NOMA). In: 2013 IEEE 24th annual international symposium on personal, indoor, and mobile radio communications (PIMRC), pp 611–615 7. Saito Y, Kishiyama Y, Benjebbour A, Nakamura T, Li A, Higuchi K (2013) Non-orthogonal multiple access (NOMA) for cellular future radio access. In: 2013 IEEE 77th vehicular technology conference (VTC Spring), pp 1–5 8. Sunkaraboina S, Naidu K (Dec 2021) Novel user association scheme deployed for the downlink NOMA systems. In: International conference on communication and intelligent systems (ICCIS ). Delhi, India, pp 1–5 9. Aldababsa M, Toka M, Gökçeli S, Kurt GK, Kucur O (2018) A tutorial on nonorthogonal multiple access for 5G and beyond. Wirel Commun Mob Comput 2018:9713450, 24 10. J. Choi (2017) NOMA: principles and recent results. In: 2017 international symposium on wireless communication systems (ISWCS), pp 349–354 11. Al-Obiedollah H, Cumanan K, Salameh HB, Chen G, Ding Z, Dobre OA (Nov 2021) Downlink multi-carrier NOMA with opportunistic bandwidth allocations. IEEE Wirel Commun Lett 10(11):2426–2429 12. Ding Z, Fan P, Poor HV (August 2016) Impact of user pairing on 5G non-orthogonal multipleaccess downlink transmissions. IEEE Trans Veh Technol 65(8):6010–6023

3 Modified Hungarian Algorithm-Based User Pairing with Optimal …

33

13. Marcano AS, Christiansen H (2018) L: Impact of NOMA on network capacity dimensioning for 5G HetNets. IEEE Access 6:13587–13603 14. Aghdam MRG, Abdolee R, Azhiri FA, Tazehkand BM (2018) Random user pairing in massiveMIMO-NOMA transmission systems based on mmWave. In: 2018 IEEE 88th vehicular technology conference (VTC-Fall), pp 1–6 15. Dogra T, Bharti MR (2022) User pairing and power allocation strategies for downlink NOMAbased VLC systems: an overview. AEU-Int J Electron Commun 154184 16. Gao Y, Yu F, Zhang H, Shi Y, Xia Y (2022) Optimal downlink power allocation schemes for OFDM-NOMA-based internet of things. Int J Distrib Sens Netw 18(1) 17. Shahab MB, Shin SY (2017) On the performance of a virtual user pairing scheme to efficiently utilize the spectrum of unpaired users in NOMA. Phys Commun 25:492–501 18. Di B, Song L, Li Y (2016) Sub-channel assignment, power allocation, and user scheduling for non-orthogonal multiple access networks. IEEE Trans Wirel Commun 15(11):7686–7698 19. He J, Tang Z (2017) Low-complexity user pairing and power allocation algorithm for 5G cellular network non-orthogonal multiple access. Electron Lett 53(9):626–627 20. Parida P, Das SS (2014) Power allocation in OFDM based NOMA systems: a DC programming approach. In: 2014 IEEE globecom workshops (GC Wkshps), pp 1026–1031 21. He J, Tang Z (2017) Low-complexity user pairing and power allocation algorithm for 5G cellular network non-orthogonal multiple access. Electron Lett 53:626–627 22. Ali ZJ, Noordin NK, Sali A, Hashim F, Balfaqih M (2020) Novel resource allocation techniques for downlink non-orthogonal multiple access systems. Appl Sci 10(17):5892 23. Goswami D, Das SS (Dec 2020) Iterative sub-band and power allocation in downlink multiband NOMA. IEEE Syst J 14(4):5199–5209 24. Saraereh OA, Alsaraira A, Khan I, Uthansakul J (2019) An efficient resource allocation algorithm for OFDM-based NOMA in 5G systems. Electronics 8(12):1399 25. Dutta J, Pal SC (2015) A note on Hungarian method for solving assignment problem. J Inf Optim Sci 36(5):451–459 26. Akpan NP, Abraham UP (2016) A critique of the Hungarian method of solving assignment problem to the alternate method of assignment problem by Mansi. Int J Sci: Basic Appl Res 29(1):43–56 27. Dai L, Wang B, Yuan Y, Han S, Chih-Lin I, Wang Z (2015) Non-orthogonal multiple access for 5G: solutions, challenges, opportunities, and future research trends. IEEE Commun Mag 53(9):74–81 28. Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logistics Q 2(1–2):83–97 29. Dutta J, Pal PC (2015) A note on Hungarian method for solving assignment problem. J Inf Optim Sci 36(5):451–459 30. Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge university press 31. Khan MZA, Naidu K (May 2015) Weighted water-filling algorithm with reduced computational complexity. In: IEEE ICCIT 2015, 20–21, Abu Dhabi, UAE 32. Zuo H, Tao X (2017) Power allocation optimization for uplink non-orthogonal multiple access systems. In: 2017 9th international conference on wireless communications and signal processing (WCSP), pp. 1–5

Chapter 4

Design and Implementation of Advanced Re-Configurable Quantum-Dot Cellular Automata-Based (Q-DCA) n-Bit Barrel-Shifter Using Multilayer 8:1 MUX with Reversibility Swarup Sarkar and Rupsa Roy

1 Introduction The proposed shift, named as barrel-shifter, is formed widely using CMO technology till now because it follows Moore’s Law [1], but the fundamental CMOSbased circuitries face various types of small-scaling issues as the phase of devicedensity increment takes place. The device-complexity, leakage current flow, powerdissipation increment and delay are increased due to the device-size decrement in CMOS technology. Thus, utilization of more advanced technologies to design digital components is becoming a problem in this recent nano-technical digital era. In this research work, a novel low power high speed beyond transistor-level technology is proposed to design only combinational circuit-based shift registers. This novel advanced technology is named as Q-DCA technology, introduced by lent et al. in 1993 [2, 3]. In this proposed technique, quantum cells or Q-cells with 4-dots [4] are applied to form the proposed digital circuitries, in which 2-dots are engaged by Spintronic electrons (positioned crosswise all time to maintain electrostatic-revolting force between 2 consecutive similar charged carriers), and the electrons move one dot to another through a tunnel. These spintronic electron-based Q-cells are located one after another in this proposed technical field to form a Q-wire, and these quantum wires help to flow the information from one Q-cell to another Q-cell with few power, very low leak-current, and THz frequency range [5]. In this nano-technical field, multilayer 3D design with reversible gate also can be used to get the efficient device size, delay and dissipated power amount without any device-complexity increment, which is proved in this work done. S. Sarkar (B) · R. Roy Department of Electronics and Communication Engineering, SMIT, SMU, Rangpo, Sikkim, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_4

35

36

S. Sarkar and R. Roy

Combinational circuit-based barrel-shifter is proposed in this paper to design using Q-DCA technology because it is more acceptable as an arithmetic logic unit (ALU) component than other sequential circuit-based shift registers, which helps to shift data, and “ALU” is a major element of any processor in this recent era. Multi-bit barrel-shifter using IGFINFET is the recent progress of 2020 [6] (produced by Nehru kandasamy et al.), where the advancement of the proposed technology compare to “Transmission-gate” as well as “Pass-transistor-logic” is proved to optimize the value of power-delay product, and in this year, also, a transistor logic-based advanced 8bit barrel-shifter is presented by Ch. Padmavani et al. in paper [7], where the value of average power is discussed for different technologies from 65 to 250 nm using Tanner 15.0 EDA tool with SVL or self-voltage-controllable logic. In this work, the flow of barrel-shifter-design-advancement is maintained by applying low power high speed beyond transistor logic technologies named as Q-DCA with multilayer 3D advancement, 16 nm cell design and reversibility checking. 8:1 multiplexer or MUX is the basic component of the proposed shifter structure of this paper. If the recent quantum cell-based 8:1 MUX designs are discussed, the recent parametric changes of this used component are also very important to present. In 2019, 2:1 to 8:1 MUX model using single-layer quantum cell-based technology will be presented by Li. Xingjian et al. in paper [8], where cost calculation is shown, and only, 260 cells are used to get 0.4 µm2 area-occupied 8:1 MUX. It presents an outcome with 9 ps propagation delay. The previous occupied area and cell count is optimized more without changing the propagation delay in 2022 [9], which is written by Seyed-Sajad Ahmadpour et al., and in this paper, also, the 2:1–8:1 MUX design using single-layer Q-DCA is presented, where the ultimate optimized design is used to configure a “RAM-cell,” but in this proposed work, barrel-shifter is the ultimate design where 8:1 MUX is used as the basic component. A novel structure of multilayer 8:1 multiplexer with 75% reversibility is used in this work done to design an 8-bit multilayer barrel-shifter. Layer separation gap reduction effect is also checked in this proposed work with temperature tolerance checking to reduce the device volume, which is a trade off in multilayer circuitry. The major contributions of this projected work done are as follows: • A novel optimized multilayer 3D 8:1 MUX is designed in Q-DCA platform and checks the reversibility. • Multilayer barrel-shifter using the proposed design of 8:1 MUX is formed in Q-DCA platform and presents the parametric advancement of Q-DCA-based proposed design compared to recently widely used transistor-based design. • High-thermal effect and layer-separation-gap-decrement effect of proposed multilayer shifter designs are discussed. The entire contribution is methodically presented in 5 different sections: Sect. 2 reveals the theoretical surroundings related to the suggested technical and logical field; Sects. 3 and 4 present the configuration and results of the projected 8:1 multiplexer and 8-bit barrel-shifter, respectively, with proper parametric investigations, and Sect. 5 presents the concluded part with future possibilities of this presented circuitries.

4 Design and Implementation of Advanced Re-Configurable …

37

2 Theoretical Surroundings 2.1 QCA-Based Fundamental Gates The “3-input majority gate” or “3-input M-G,” “5-input majority gate” or “5-input M-G,” and “inverter gate are the most essential and highly used gates in our proposed low power 4-dotted Q-DCA-design technology, based on the previously discussed binary ‘0’ and binary ‘1’ selection in Q-DCA technical platform [10, 11]. The polarity of input part and the polarity of output part in a “3-input M-G” are the similar. If “A, B, and C” are the 3 inputs of this structure, then the output is presented in Eq. 1. M-G with 3-input also can be used to configure “AND-Gate” and “OR-Gate” by alternating the polarity from -1 to + 1 of one of the three inputs in “3-input M-G” and vice versa in the same way, which is also given in Eqs. 2 and 3, correspondingly (Fig. 1a presents a clear image of “3-input M-G”) [12, 13]. Another important multiinput M-G is the “5-input M-G.” Equation 4 represents a “5-input M-G” with input “A, B, C, D, and E.” If 3-inputs among the five are merged and change the clock zone (discussed below in QCA-based clock-scheme part) from clock 0 to clock 1 near the output section, it gives a “3-input XOR” output, and without changing the clock zone, it presents the output of normal “3-input M-G” with 4% output-strength increment, but the cell complexity is increased from 5 to 11. The “5-input M-G” is presented in this section in Fig. 1b, which is representative of the “3-input XOR” operation. As we know, “inverter gate” is rapidly required to design any digital-based circuitry, which gives a “NOT-Gate” outcome. In this theory about QCA-based widely used conventional logic gates, this clear reflection of basic single later “inverter gate” is also included, which is given in Fig. 1c. M − G(“A, B, C  ) = AB + BC + AC

(1)

M − G(“A, B, 0 ) = A.B

(2)

M − G(“A, B, 1 ) = A + B

(3)

M − G(“A, B, C, D, E  ) = ABC + AB D + AB E + AD E + AC E + AC D + BC E + BC D + B D E + C D E

(4)

38

S. Sarkar and R. Roy

Fig. 1 Q-DCA-based single-layer structure of a “3-input M-G,” b “5-input M-G,” and c “InverterGate”

Fig. 2 “Clock-phases” used in Q-DCA technology

2.2 Clock Scheme that Used for QCA Technology In Q-DCA, a different timing scheme helps control the flow of information from one part to another in the circuit, maintains performance gain by restoring lost signal energies in the environment, and identifies design delays. It is a concatenated structure where 4 clock zones with 4 clock phases are presented. The 4 different clock zones are as follows: “Clock Zone 1,” “Clock Zone 2,” “Clock Zone 3,” and “Clock Zone 4.” Likewise, the 4 different clock phases with a 90° phase difference are as follows: “switch, hold, release and relax.” All are shown in Fig. 2 [14]. When the clock is high, the potential barricade between the 2-dots drops and the total circuit polarization is 0, and when the clock is low, the potential barricade between the 2-dots is high, and electrons are placed in the dots by tunneling according to the cell polarization, which depends on the specified neighboring cells.

2.3 Reversible Logical Expression In the conventional logic gates discussed above, “copy deletion” cannot be possible. Thus, the energy is dissipated per bit, which can be maintained through design adiabaticity, and this adiabatic logic can be followed by using reversible gates, where “information erase with copy” can be maintained by the “Bennet Clock Scheme” [15– 17]. So, the energy loss per bit can be preserved by adding this reversible gate. In a conventional gate, the inputs are represented only by the outputs, but in this type of gate (return gate), the inputs are also dependent on the output. This means that

4 Design and Implementation of Advanced Re-Configurable …

39

Fig. 3 The “block diagram” of a “reversible gate”

the output distribution is also able to represent the input distribution and vice versa in this proposed feedback gate. To create this arrangement in a reversible gate, it is necessary to maintain the same number of inputs and outputs. Thus, we can say that a proper exploration of the advantages of Q-DCA-based circuits can be possible using a reversible gate, and due to its energy-controlled nature, it becomes more efficient in a multilayer platform. Therefore, this proposed design is formed in a hybrid way by adding a reversible gate with the widely used “3-input M-G.” Figure 3 presents a proper block diagram of the basic reversible gate.

2.4 Q-DCA-Based Multilayer Structure In circuit design, the criterion of crossing two wires is a very common and important thing, which becomes more complex at the time of circuit incrementing operations. Delay, area, output power, and power dissipation also depend on this criterion. So, choosing a crossover design is the trickiest part of building a circuit. In our proposed technology, Q-DCA, coplanar, multilayer, and crossing by changing the clock zones of two different crossing wires are presented [18]. In our work, a multilayer crossover is used, where different cells are specified in different layers, which acts as an inverter in two different consecutive layers. However, this type of design can increase the output strength compared to the coplanar form and also compared to one later “inverter gate” with a 25% reduction in delay. In this multilayer structure, the vertically separated quantum cells are tuned to match their kink energy in the horizontal plane in contrast to the transistor-based structure [19]. Figure 4 presents a bridged multilayer structure based on Q-DCA.

40

S. Sarkar and R. Roy

Fig. 4 Multilayer Q-DCA-based formation

2.5 Occupied Area, Delay, Dissipated-Power, and Tunneling Resistivity Calculation The Q-DCA-based structure performance model was primarily established by Timler and borrowed in 2002, and this model is based on the “Hamiltonian matrix” where the cell–cell “Hartree–Fock” is used to determine the methodology. “Hamilton matrix” [20]. The total amount of energy flow through the Q-dot cell is another value of the energy flow between the cells and the dissipated energy. Here, the total flow of energy through the quantum wires (P1) depends on the “power of the clock (‘Pclk’),” the “power of the input indicator (‘Pin’),” the “power of the output indicator (‘Pout’),” and “dissipated power (‘Pdiss’)” in the Q-cell base conductors. The relationship between these given powers is illustrated in this section in Eq. 5. This section also represents the gain equation (of each Q-cell) in Eq. 6. Power dissipation calculation is discussed also through Eq. 7. In this equation, ‘τ ’ = “the time of energy relaxation,” ‘γnew ’ and ‘γold ’ = “the clock energy during switching activity takes place,” ‘Po ’ and ‘Pn ’ = “the polarization of output and input,” and ‘Pold ’ and ‘Pnew ’ = “the polarization before and after Q-cell switching in that order” [21, 22]. P1 = Pin + Pout + Pclk + Pdiss

(5)

Gain = Pout /Pin

(6)

   pnew 1 2γnew Po Pn γold − γnew + Ek τ Ek Pold Pnew 2 −(Pn − Po )]

Pdiss =

(7)

In our proposed Q-DCA-based nanoscale spin technology, the area-based circuit power dissipation is easily calculated considering a power dissipation of 100 W per cm2 area [20]. But, in this work, the power dissipation in the quantum wire is calculated using a few simple equations that depend on the switching time of the quantum cells, the complexity of the cell, and the distance between two quantum cells. The basic power dissipation equation used in this proposed work is also given in this section in Eq. 8.

4 Design and Implementation of Advanced Re-Configurable …

Pdiss = (E diss )/(Switching Time)

41

(8)

In the above equation, the energy dissipation (‘Ediss’) depends on the distance between two cells (‘r’), the quantum cell length (‘l’), the number of cells (‘C’), the translation angle based on the specification of two consecutive cells and cell fracture energy. The equation in this section is given in Eqs. 9 and 10 represents the “Kink Energy” display. The “switching time” depends on the cell complexity and tunneling speed (‘Tr’ = 1/tunneling time), and this switching time is also represented here by Eq. 11 [23]. Energy Dissipation = {r (C − 1)/l} × (Kink Energy)

(9)

Kink Energy = {23.04e − 29/r } J

(10)

Switching Time = (C − 1)/ Tr

(11)

Not only the estimated power, but also the delay and area estimation are expressed in this portion. For the systematic process replication of Q-DCA-structures’ requirearea (‘A’), in Eq. 12, it is shown in this part, where ‘n’ = “number of used cells in the vertical portion,” ‘m’ = “used cells of the horizontal portion,” ‘l’ = “length of each cell,” ‘w’ = “width of each cell,” ‘q’ = “the distance between two consecutive cells vertically,” and ‘r’ = “the distance between two consecutive cells horizontally.” Not only this occupied area but also the utilized area is also determined in this work, which is called area utilization factor (“AUF”) [24], and the used equation is presented in Eq. 13, where ‘C’ = the number of total used cells. The propagated latency of Q-DCA-formation is dependent of the required clocking zones, which are used in the “Critical-path” of this formed configuration. In this paper, the calculation of tunneling resistivity (‘ρT ’) is also presented using Eq. 14, where the ‘d’ = effectivetunneling-distance, ‘τ ’ = tunneling-rate, ‘ε’ = permittivity, ‘e’ = electronic-charge, and ‘c’ = speed of light. A = {(l × n) + q} × {(w × m) + r

(12)

AUF = [{(L × n) + q} × {(W × m) + r }]/(C × L × W )

(13)

(d 3 × τ )  ρT =  ε × e × c2

(14)

42

S. Sarkar and R. Roy

3 Proposed Multilayer 8:1 MUX As we know, multiplexer or MUX is the basic component of barrel-shifter, and to design an 8-bit barrel-shifter, a novel 8:1 MUX design using multilayer Q-DCA technology is proposed in this paper. This proposed structure based on 2:1 MUX [25] design and this design is 75% reversible if any one input of this 2:1 MUX acts as a direct output because; 6 numbers of bit combination in output are matched with the input bit combination among 8 numbers of output bit combination in this proposed MUX design, and reversibility helps to store the information before removing. The Q-DCA-based multilayer novel structure of proposed 8:1 MUX is presented in this portion in two different ways: Fig. 5 presents all the layers separately, and Fig. 6 presents the combined-layer structure without showing any unwanted signals, where 4 clock zones are required with 4 different colors. Here, i0–i7 is the 8 inputs with S0–S2 three select lines, and OUT is the ultimate outcome. The simulated outputs of used 2:1 MUX and ultimate 8:1 MUX are also presented in this part in Figs. 7 and 8, respectively, where the outcomes of 2:1 MUX show 75% reversibility of the proposed design (when output q1 is the direct outcome of select line S). Proposed multilayer Q-DCA technology is able to present a more advanced occupied-area-efficient, less-complex; highly frequent and less dissipated powerbased 8:1 MUX compare to single-layer design which is shown in this portion through a comparison table with recent-related works (Table 1). The utilized area of the presented design is more than paper [9] and less than paper [8]. But, the proposed is more effective than paper [8]’s design due to 75% occupied area reduction, 54% cell-complexity reduction, 64% areal power dissipation reduction (used cell number based), 84% cost reduction, and 82% speed improvement achievement for only 30% utilized areal factor reduction in this proposed design compare to the design of paper

Fig. 5 Layers of proposed Q-DCA-based 8:1 MUX

4 Design and Implementation of Advanced Re-Configurable … Fig. 6 Combined-layers structure of proposed multilayer Q-DCA-based 8:1 MUX

Fig. 7 Simulated outcomes of used 2:1 MUX with reversibility

43

44

S. Sarkar and R. Roy

Fig. 8 Simulated outcome of proposed multilayer Q-DCA-based 8:1 MUX

[8]. Not only the previously discussed parameters but also this multilayer design is able to perform for less layer separation gap, which can reduce the volume of the design compared to normal value (11.5 nm) of multilayer Q-DCA-based designs. But, high-temperature effect is a trade off in multilayer circuitry, which affects the improvement of output strength because of the rise of electro-dispersion possibilities. In the presented 3D 3-layer 8:1 MUX design, temperature tolerance is 2 K more than the room temperature in the presence of layer separation gap reduction down to 7 nm with the same output strength.

4 Design and Implementation of Advanced Re-Configurable …

45

Table 1 Comparison table: Q-DCA-based 8:1 MUX design Designs of 8:1 MUX, year

Occupied area (µm2 )

AUF

Speed [1/propagation Delay (ps)] (THz)

Cell complexity

Areal power dissipation (nW)

Cost (area * delay [26])

Single-layer design of paper [8]

0.40

4.76

0.11

260

80

3.6

Single-layer design of paper [9]

0.12

2.78

0.11

135

44

1.08

Proposed 3-layered design

0.10

3.33

0.2

118

30

0.5

4 Proposed 8-Bit Barrel Shifter The presented advanced configuration of “8:1 MUX” is used for the design of an 8-bit Q-DCA-based barrel-shifter. As we know, multiplexer is the main component of barrel-shifter; in this work, 8:1 MUX is directly used to design 8-bit proposed shifters same as 4-bit barrel-shifter, where 4:1 MUX is the basic component. The block diagram of this 4:1 MUX-based 4-bit barrel-shifter is given in paper [27]. The used layers of the proposed shifter are presented in Fig. 9a–g, and the combined-layer structure is presented in Fig. 10 separately. The simulated outcomes also presented in this portion in Fig. 11, where the inputs are i0–i7 with 3 select lines S0 to S2 and outputs are O0–O7. The power dissipation based on cell area and based on switching effect and delay (propagation delay and switching delay) of the proposed shifter design is calculated in this work and compared the values with most optimized IGFINFET-based 8-bit barrel-shifter. The comparison table is presented through Table 2, and additionally, another table (Table 3) is presented in this portion, in which other parameters of the proposed multilayer 8-bit shifter are listed.

5 Conclusion A novel structure of multilayer 8:1 multiplexer-based “8-bit Barrel-Shifter” is designed utilizing Q-DCA technology, where 75% reversible 2:1 MUX is used to design the proposed 8:1 MUX, without using any extra area. The proposed multilayer 8:1 MUX can reduce 16.6% occupied area, 12.5% cell complexity, 31.8% power dissipation and 33.7% cost with 19.7% utilized area-factor improvement, and 81.8% speed improvement compare to recent single layer most optimized Q-DCAbased design. This optimization is utilized in this work to prove the advancement of

46

S. Sarkar and R. Roy

(a) Layer: 1

(b) Layer: 2

(c) Layer: 3 Fig. 9 Layers of proposed 8-bit barrel-shifter

4 Design and Implementation of Advanced Re-Configurable …

(d) Layer: 4

(e) Layer: 5

(f) layer: 6

Fig. 9 (continued)

47

48

S. Sarkar and R. Roy

(g) layer: 7

Fig. 9 (continued)

Fig. 10 Combined-layers structure of proposed 8-bit barrel-shifter

quantum cell-based technology compared to transistor-level IGFINFET technology by forming lower-size cell-based barrel-shifter, and all the calculated parametric values are presented in this paper with the required comparisons based on the occupied area, power dissipation, propagation delay, switching delay, AUF, temperature tolerance, layer separation gap, used cell-numbers, and the total tunneling resistivity. In this proposed shifter design, down to 8.5-nm layer separation gap can be applied to get a proper stable result. But, 10.5-nm layer separation gap presents a highest temperature tolerance with less output strength compare to previous one, or it can be said that the proposed multilayer barrel-shifter design can tolerate 81% more hightemperature effect compared to normal temperature for 19.5% increment of layer separation gap and 1% increment of maximum output strength.

4 Design and Implementation of Advanced Re-Configurable …

Fig. 11 Simulated outcomes of proposed 8-bit barrel-shifter

49

50

S. Sarkar and R. Roy

Table2 Comparison table: 8-bit barrel-shifter Technology

Power dissipation (W )

Delay (s)

20-nm IGFINFET [6]

36.45 * 10–6

9.013 * 10–9

16-nm cell-based multilayer Q-DCA

8 * 10–6 (due to switching effect) 6.7 * 10–6 (based on cell-area)

12 * 10–12 (propagation delay) 1.05 * 10–15 (switching delay)

Table 3 Parametric outcomes of proposed 8-bit barrel shifter Occupied area

1.18 µm2

Used cells

2644

Maximum temperature tolerance for most suitable layer separation gap

11 K more than room temperature for 10.5-nm layer separation gap

AUF

1.76

Total tunneling resistivity

92.5 * 105  m

References 1. Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38:114– 117 2. Lent CS, Tougaw PD, Porod W, Bernstein GH (1993) Quantum cellular automata. Nanotechnology 4:49–57 3. Tougaw PD, Lent CS (1994) Logical devices implemented using quantum cellular automata. J Appl Phys 75:1818–1825 4. Babaie S et al (2019) Design of an efficient multilayer arithmetic logic unit in quantum-dot cellular automata (QCA). IEEE Trans Circ Syst 66:963–967 5. MirzajaniOskouei S, Ghaffari A (2019) Designing a new reversible ALU by QCA for reducing occupation area. J Supercomput 75:5118–5144 6. Kandasamy N, Telagam N, Kumar P, Gopal V (2020) Analysis of IG FINFET based N-bit barrel shifter. Int J Integr Eng 12(8):141–148 7. Padmavani C, Pavani T, Sandhya Kumari T, Anitha Bhavani C (2020) Design of 8-bit low power barrel shifter using self controllable voltage level technique. Int J Adv Sci Technol 29(8):3787–3795 8. Xingjun L, Zhiwei S, Hongping C, Reza M, Haghighi J (2019) A new design of QCA-based nanoscale multiplexer and its usage in communications. Int J Commun Syst 33(4):1–12 9. Ahmadpour SS, Mosleh M, Heikalabad SR (2022) Efficient designs of quantum-dot cellular automata multiplexer and RAM with physical proof along with power analysis. J Supercomput 78:1672–1695 10. Singh R, Pandey MK (2018) Analysis and implementation of reversible dual edge triggered D flip flop using quantum dot cellular automata. Int J Innov Comput Inf Control 14:147–159 11. Safoev N, Jeon J-C (2017) Area efficient QCA barrel shifter. Adv Sci Technol Lett 144:51–57 12. Walus K, Dysart TJ et al (2004) QCA designer: a rapid design and simulation tool for quantumdot cellular automata. IEEE Trans Nanotechnol 3:26–31 13. Roy SS (2016) Simplification of master power expression and effective power detection of QCA device. In: IEEE students’ technology symposium, pp 272–277 14. Askari M, Taghizadeh M (2011) Logic circuit design in nano-scale using quantum-dot cellular automata. Eur J Sci Res 48:516–526 15. Narimani R, Safaei B, Ejlali A (2020) A comprehensive analysis on the resilience of adiabatic logic families against transient faults. Integr VLSI J 72:183–193

4 Design and Implementation of Advanced Re-Configurable …

51

16. Pidaparthi SS, Lent CS (2018) Exponentially adiabatic switching in quantum-dot cellular automata. J Low Power Electron Appl 8:1–15 17. D’Souza N, Atulasimha J, Bandyopadhyay S (2012) An energy-efficient Bennett clocking scheme for 4-state multiferroic logic. IEEE Trans Nano Technol 11:418–425 18. Abedi D, Jaberipur G, Sangsefidi M (2015) Coplanar full adder in quantum-dot cellular automata via clock-zone based crossover. In: IEEE transactions on nanotechnology, 18th CSI international symposium on computer architecture and digital systems (CADS), vol 14, no 3, pp 497–504 19. Waje MG, Dakhole P (2013) Design implementation of the 4-bit arithmetic logic unit using quantum-dot cellular automata. IEEE, IACC, pp 1022–1029 20. Timler J, Lent CS (2020) Power gain and dissipation in quantum-dot cellular automata. J Appl Phys 91:823–831 21. Barughi YZ et al (2017) A three-layer full adder/subtractor structure in quantum-dot cellular automata. Int J Theor Phys 56:2848–2858 22. Ganesh EN (2015) Power analysis of quantum cellular automata circuit. Proc Mater Sci 10:381– 394 23. Roy SS (2017) Generalized quantum tunneling effect and ultimate equations for switching time and cell to cell power dissipation approximation in QCA devices. Phys Tomorrow, 1–12 24. Zahmatkesh M, Tabrizchi S, Mohammadyan S, Navi K, Bagherzadeh N (2019) Robust coplanar full adder based on novel inverter in quantum cellular automata. Int J Theor Phys 58:639–655 25. Majeed AH, Alkaldy E, Zainal MS, Navi K, Nor D (2019) Optimal design of RAM cell using novel 2:1 multiplexer in QCA technology. Circ World 46(2):147–158 26. Maharaj J, Muthurathinam S (2020) Effective RCA design using quantum-dot cellular automata. Microprocess Microsyst 73:1–8 27. Elamaran V, Upadhyay HN (2015) Low power digital barrel shifter datapath circuits using microwind layout editor with high reliability. Asian J Sci Res 8:478–489

Chapter 5

Recognition of Facial Expressions Using Convolutional Neural Networks Antonio Sarasa-Cabezuelo

1 Introduction Image recognition is a classic problem in the field of artificial intelligence [1], which has been treated with different techniques. The problem consists [2] of detecting the similarity or equivalence between images in an automated way as a result of processing the information contained in the images. In most solutions, the similarity is obtained by defining a distance [3] that allows measuring the closeness between the identifying characteristics that the images have. Thus, in recent years, this problem has been treated with machine learning algorithms [4], obtaining different levels of success, but solutions based on neural networks [5] stand out for their efficiency. This fact is consistent with the nonlinear nature of the problem and with the strength of networks to solve problems of this nature. However, despite the efficiency obtained, the classical neural network does not adapt well [6] to this problem due to the spatial factor associated with the problem of image recognition, given that the images are generally represented by one (in the case of being in gray scale) or three matrices (in the case of using the RGB representation) and the need to compute many parameters. The rise of big data [7] has favored the evolution of neural network models with the aim of improving their predictive and classifying capacity. And for this, their complexity has been increased in terms of intermediate processing layers and different connection architectures. Thus, a more particular model of artificial neural networks is the so-called convolutional neural networks (CNNs) [8], used, for the most part, in the recognition and classification of images. A convolutional neural network is generally made up of three types of layers: the convolutional layers that are responsible for performing feature extraction, the subsampling (or pooling) layers A. Sarasa-Cabezuelo (B) Universidad Complutense de Madrid. Dpto. Sistemas Informáticos y Computación, Calle Profesor José García Santesmases, 9, 28040 Madrid, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_5

53

54

A. Sarasa-Cabezuelo

that are used to reduce the size of the image, and the fully connected layers which are essentially a traditional multilayer neural network that is responsible for classifying the data processed by the previous layers. A particular problem related to image recognition is the recognition of facial expressions associated with human emotions [9]. Facial expressions are part of verbal communication and are based on small variations of the surface of a person’s face such as the curvature of the mouth that gives rise to different expressions that aim to convey an emotion. One difficulty that arises [10] in solving this problem is the fact that certain facial expressions can be easily confused. For example, frowning can express anger, but also disgust, and having an open mouth can indicate joy, surprise, or fury. It is for this reason that there are different proposals [11] for solving this problem depending on whether it is considered a discrete problem, each facial expression corresponds to a single emotion, or continuous (an easy expression could contain elements of several types of emotions). The recognition of facial expressions has been a problem treated using different approaches with convoluted neural networks. In the first works [12], trained CNN combinations were used with the aim of minimizing squared hinge loss. This approximation was improved [13] by increasing the training and test data, and using an ensemble voting mechanism via uniform averaging so that the ensemble predictions are integrated via weighted averaging with learned weights. Another proposal [14] is based on hierarchically integrating the predictions of the set with assigned network weights according to the performance of the validation set. In another work, with the aim of obtaining a trained model that generalizes its classificatory capacity, a single CNN based on the Inception architecture was used where data from multiple posed and naturalistic datasets were processed in an effort. Proposals for improvements have also been implemented based on [15] increasing the amount of training data by using multiple datasets with heterogeneous labels and applying a patch-based feature extraction and registration technique as well as feature integration through an early fusion. Related to this idea, the use of [16] registered and unregistered versions of a given face image during training and testing has been tested. Other approaches [17] are based on the detection of facial landmarks in images; however, they present the problem of challenging facial expressions or partial occlusions [18] that make detection difficult. Several solutions have been proposed for this problem, such as the detection of reference points in several versions of the same images [19], the illumination correction [20], or the normalization of the images by means of a histogram equalization and linear plane fitting [21]. Another line of work is based on the variation of the depth of the networks (number of layers with weights), where it has been possible to prove [22] that great efficiency can be obtained with a depth of 5. The influence of the dataset used in training. Some works use the union of seven [23] and three [24] datasets, while in other works preprocessing functions are applied to obtain additional information such as the calculation of a vector of HoG features from facial patches [25] or the encoding of face pose information [26]. Likewise, other works augment the data using horizontal doubling techniques [27], random trimming [28], or related transformations. Other variants are based on which dataset is augmented, on the training set [29] or on the test dataset [30].

5 Recognition of Facial Expressions Using Convolutional Neural Networks

55

The objective of this work is to compare the performance of different convolutional neural networks to identify the facial emotion represented in an image. These convolutional networks behave well since they preprocess the images in such a way that each element of the image depends directly on those that surround it. To carry out the study, several architectures are considered, and versions are generated where some elements are improved, such as the increase in complexity, the introduction of regularization methods, or the change of the activation function. Likewise, the size and type of the dataset have been experimented with, so that initially, the FER2013 [31] facial expression image dataset is used, to which artificially generated images are added through various mechanisms, with the aim of evaluating a performance improvement. The structure of the paper is as follows. Section 2 briefly describes the dataset used as well as the different convoluted neural networks used in the experiment. Next, Sect. 3 shows the results obtained from the experiment. In Sect. 4, a discussion is made about the results shown. And finally, Sect. 5 presents the conclusions and a set of lines of future work.

2 Materials and Methods 2.1 Materials For the training and validation of the networks, the FER2013 database [32] has been used, which has 35,887 grayscale images divided into 28,709 training images and 7,178 for testing. Each of them is labeled according to the emotion they represent: anger, disgust, fear, happiness, sadness, surprise, and neutral. The distribution of the images according to their labels can be seen in Table 1. The images are represented by a square matrix of dimension 48 of integers between 0 and 255 where 0 represents black and 255 white. Table 1 Number of images per label

Label

Training

Test

Total

Anger

3995

958

4953

Disgust

436

111

547

Fear

4097

1024

5121

Happiness

7215

1774

8989

Sadness

4830

1247

6077

Surprise

3171

831

4002

Neutral

4965

1233

6198

Total

28,709

7178

35,887

56

A. Sarasa-Cabezuelo

2.2 Methods A convolutional neural network is made up of three types of layers: the convolutional layers themselves, which are responsible for performing feature extraction, the subsampling (or pooling) layers used to reduce the size of the image, and the completely subsampling layers connected (multilayer neural networks) that are responsible for classifying the data processed by the previous layers. Convolutional layers are based on [33] a matrix operation called convolution and defined as: Y [i, j] =

∞ 

∞ 

X [i + k1; j + k2]W [k1; k2]

k1=−∞ k2=−∞

where X is the data matrix and W is the filter matrix that allows features to be extracted. Although the X and W matrices are finite, the sums are infinite, since what is done is to fill with 0 those indices in which they are not defined (padding). In this sense, the matrix operation represents a movement [34] of the filter matrix over the input matrix after padding with zeros (adding as many rows and columns of zeros as necessary to the outside of the matrix) as a sliding window and calculating the sum of element-to-element products. The number of indices the filter array moves over the input array is called the offset. Thus, the size of the result matrix depends on the offset that is applied to the filter in each iteration of the convolution and the number of zeros with which the input matrix is filled. Depending on the desired size of the result matrix, three types of padding [35] are distinguished. The first type, padding the same, involves adding as many rows and columns as necessary to the input array so that, taking into account the size of the filter and the offset used, the result array has the same size as the input array. The second type, padding valid, consists of not adding padding to ensure that the result matrix has a smaller size than the input one. And finally, the third fill, full fill, seeks that all the elements of the matrix are subjected to the same number of convolutions, so that a result matrix of a size greater than the input is obtained. Note that the data matrix in the case of a grayscale image is provided [36] as a single matrix, or as three in the case of a color image in RGB format, one representing the reds, another the greens, and one third the blues. In either case, the convolution operation will sort the input data to extract the salient features and from them build the so-called feature hierarchy where low-level features (such as image edges) are sorted and the high-level ones (which are combinations of the low-level ones, for example, a shape of an object). The subsampling layers are based on [37] the grouping of the adjacent data by generating a new data matrix based on a given of so that its size is smaller. To do this, it is divided into submatrices, and it operates with the elements of this submatrix with the aim of obtaining a single element. The most used operations [38] are to obtain the maximum of that submatrix (max-pooling) or to take the average of its elements (average pooling). The objective of these layers is to reduce the size of the

5 Recognition of Facial Expressions Using Convolutional Neural Networks

57

features, which implies an improvement in performance and decreases the degree of overlearning (that specific features of the training sample are taken as general when classifying), and helps to elaborate characteristics more resistant to noise, so that data that are out of the ordinary do not spoil the sample. The fully connected layers are formed [40] by a multilayer neural network that is responsible for classifying the data preprocessed by the previous ones. Before that, it is necessary to use a flattening layer that is responsible for transforming the data into a one-dimensional array. There are other layers such as [41] activation layers where activation functions are applied to the data (they can be separate or integrated in the convolutional and fully connected layers), dropout layers to decrease overlearning by deactivating a percentage of the connections in the network chosen randomly, and the batchnormalization layers that are responsible for normalizing the output of another layer in such a way that extreme cases are avoided. The structure of a convoluted network is configured by [42] setting how many convolutional layers will be used, where they will be placed, the size of the filter, the type of subsampling, and the activation functions used. The way of combining the different layers and their parameters gives rise to different results of lower or higher quality depending on the problem in question. Next, the CNN architectures that will be used for the facial expression recognition problem will be briefly described. (a) Le-Net 5 is a convoluted neural network [43] consisting of a convolutional layer that subjects the image to 6 square filters of order 5 with a shift of one unit and valid padding (i.e., no padding). In this way, 6 new images are obtained, one per filter, square of order 28. The hyperbolic tangent is applied to these images resulting from the convolution as an activation function. After this, there is a subsampling layer of means of square submatrices of order 2. This reduces the size of the image to 14 × 14. Once this has been done, it is repeated again; a convolution layer with 16 filters of size 5 × 5, with one unit offset and valid padding, tanh is applied and subsampled again. At the end of this process, 16 images of size 5 × 5 are available, which are transformed into a one-dimensional array. This array is used as input in a neural network that has 120 artificial neurons as input layer, a hidden layer of 84, and an output layer of 10. Of these layers, the output layer uses a SoftMax activation function [44] and the other two the hyperbolic tangent. (b) Basic CNN is a convoluted neural network [45] made up of 2 convolutional layers (which in this case use equal padding, ReLU as the activation function and a unit shift), each followed by a subsampling layer, and after this, it will be passed as input the fully connected layers that form the final neural network. These layers will be 3, one input, one hidden, and one output. Its main difference with the Le-Net 5 architecture is the increase in the number of filter units and artificial neurons, which means an increase in the number of parameters on which learning is performed. (c) AlexNet is a convolutional neural network [46] architecture that features a convolutional layer applied to the input data, 48 × 11 × 11 filters with an

58

A. Sarasa-Cabezuelo

offset of 4, and no padding. This reduces the image from 224 × 224 to 55 × 55. After this, a maximum subsampling layer is applied. The data are then passed through another convolutional layer with a 5 × 5 filter, with equal padding and offset of 1, and a subsampling layer is applied again. To end the preprocessing, 3 equal convolutional layers with 3 × 3 filters and a final subsampling layer are applied. These data serve as input to a 3-layer neural network in which the first two, the input and the hidden, have 4096 neurons and the output 1000, which corresponds to the number of classes to be differentiated. This network was the first to use the ReLU function as an activation function and generally performs well in image classification. (d) ResNet-50 is a neural network design technique [47] that is characterized by using shortcuts in the network. To do this, a layer is not connected to the immediately following one, but rather it is skipped and connected to another one that is later on. In this way, through an additional operation (or layer), the data that have passed through a series of layers are combined with others that come from further back through a shortcut. These architectures are called ResNet, and they are a very useful alternative when a network becomes complex by increasing the number of parameters or layers since adequate performance may not be obtained and training times are very high. A particular case is the ResNet-50 network whose structure consists mainly of the interleaved use of convolution and identity blocks where shortcuts are used. The first in the shortcut is subject to block normalization to the data matrix, and the second is passed directly.

2.3 Artificial Data Generation The quality of the classification carried out by neural networks depends on the number of samples [48] that the training dataset has. In this sense, it is necessary to have a sufficiently high number of samples to be able to extract the general characteristics of each class that allow them to be differentiated (if the samples are not sufficient, aspects such as which side is facing or the relative position of the eyes and mouth in the image when they do not provide information about the expression). To solve this problem, artificial training data will be generated based on the existing ones through 3 types of transformations: • Image flipping [49]: it modifies the image to a lesser extent. To do this, if, for example, there are images of 48 × 48 pixels; the column in position i is exchanged for the one in position 48 − i + 1. Thus, a new image is obtained with a mirror effect on the first one. This simple implementation method has been successfully tested on databases such as ImageNet or CIFAR-10. • Translation [50]: it consists of displacing all the columns or rows of the image by the same amount and in the same direction. Since images are often centered, you may get a better result than you would if you didn’t do anything if the images you are testing on aren’t centered afterward. When moving the rows or columns of an image, those that are outside the dimension of the image are eliminated while new

5 Recognition of Facial Expressions Using Convolutional Neural Networks

59

ones have to be filled in on the opposite side. This padding can be any value. In this work, the first row or column has been repeated as many times as necessary. In this sense, the number of positions in which the image is moved in the vertical case is less than in the horizontal case since in the images, generally, the mouth tends to be very low and could be eliminated. • Random elimination [51]: it consists of defining a number of submatrices of the variable dimension image (image patches) and transforming the elements that form it into a fixed value (white has been used in this work). This method produces good results as it forces the network to ignore certain features. Note that this transformation can decrease overlearning.

2.4 Key Performance Indicators This section defines the metrics (or KPIs, Key Performance Indicators) used to be able to evaluate the results of the algorithms used [52]: • Precision It is the proportion of elements that truly belong to a class among all those that have been classified as such. The precision of class k calculated as follows: ck vk precisionk = n i=1 ck vi where ci v j : In a classification problem of n classes, it indicates the number of elements classified as class i that truly belongs to class j. • Recall It is the proportion of all elements truly belonging to a class, those that have been classified as such. The recall of class k is calculated as follows: ck vk recallk = n i=1 ci vk where ci v j : In a classification problem of n classes, it indicates the number of elements classified as class i that truly belongs to class j. • Accuracy Indicates the proportion of total success of the model and coincides with the weighted average of the recall. It is calculated as follows: n accuracy =

i=1 ci vi

N

60

A. Sarasa-Cabezuelo

where NThe total number of elements to classify. ci v j In a classification problem of n classes, it indicates the number of elements classified as class i that truly belongs to class j.

3 Results The neural architectures described have been trained on the same subset of data in order to maintain the same conditions between the architectures. The only exception to this is those experiments where artificial data generation is used since these are created randomly at runtime. In the first place, the implementation that provides the best result will be chosen and will be subjected to another training with the dataset after introducing the artificial data generation techniques. These artificial data generation techniques will always be the same although they will provide different images due to the randomness on which they are based. For each image, it will be decided whether each of the techniques is applied independently, and one image may be subjected to several. This is decided as follows: • Each image in the base training dataset has a 50% chance of generating a flipped copy. • Each image in the base training dataset has a 50% chance of generating a translated copy. This copy has the same probability of being translated on the horizontal axis (with an offset of between 5 and 15 columns, randomly decided) as on the vertical axis (by 5–10 rows in this case). • Each image in the base training dataset has a 25% chance of spawning a copy of itself with patches removed. The number of patches and their size is decided randomly and can be between 1 and 3 patches of between 5 and 15 pixels in width and height each. When evaluating each of the convolutional neural networks, the same test set will always be used, and the precision, recall, and accuracy of each of the trained models will be studied.

3.1 Le-Net 5 Using this architecture, 3 models have been implemented, changing some elements of it: • Implementation 1: The last connected layer has been changed so that it contains 7 neutrons since this is the number of classes that we want to classify. • Implementation 2: In variant I1, the tanh activation function has been replaced by a ReLU function

5 Recognition of Facial Expressions Using Convolutional Neural Networks Table 2 Results of Le-Net 5

61

Label

Precision

Recall

Accuracy

Implementation 1

0.56

0.33

0.33

Implementation 2

0.59

0.41

0.41

Implementation 3

0.63

0.45

0.45

Best Implementation

0.64

0.50

0.50

• Implementation 3: In the I2 variant, the subsampling of means has been changed to maximums, and dropouts and block normalization have been introduced as regularization methods to prevent overlearning of the dataset. Note that the best results are achieved with implementation 3. It is for this reason that this architecture is chosen to train it with the artificially generated data. Table 2 shows the results obtained.

3.2 Basic CNN Using this architecture, 5 models have been implemented, changing some elements of it: • Implementation 1: The base implementation is used. It is observed that after applying the second layer of convolution and subsampling, there is still an image with a considerable size (12 × 12). This causes that the number of parameters to process is high and that the network takes time to train. • Implementation 2: In the previous implementation, a third convolution layer with its corresponding subsampling layer is introduced. This new layer has 248 filters, and their size is reduced from 5 × 5 to 3 × 3 because the image is not considered large enough for a 5 × 5 filter to provide relevant information. In addition, a second hidden layer is introduced, just like the first, in the fully connected layers to see if it is possible to extract more information from the data that was available. • Implementation 3: Using implementation 2, dropout and block normalization layers are introduced after each convolutional layer and fully connected layer as a technique to reduce overlearning • Implementation 4: Using implementation 2, between each subsampling layer, the convolution layers are doubled, introducing a new layer after each of the ones that were available and the same with one exception, all the new ones use filters of size 3 × 3 to see if it is possible to obtain new information that was not previously available. • Implementation 5: Dropout and block normalization are introduced in the previous version.

62 Table 3 Results of basic CNN

A. Sarasa-Cabezuelo Label

Precision

Recall

Accuracy

Implementation 1

0.60

0.42

0.42

Implementation 2

0.61

0.41

0.41

Implementation 3

0.63

0.41

0.41

Implementation 4

0.62

0.44

0.44

Implementation 5

0.60

0.52

0.52

Best-implementation

0.65

0.48

0.48

Note that the best results are achieved with implementation 5. It is for this reason that this architecture is chosen to train it with the artificially generated data. Table 3 shows the results obtained.

3.3 AlexNet Using this architecture, 3 models have been implemented, changing some elements of it: • Implementation 1: The first convolutional layer was adapted so that it did not reduce the size of the image. For this, the displacement was reduced from 4 to 1, and, since the FER213 images are much smaller in size than those of the AlexNet input, the filter dimension was changed from 11 × 11 to 5 × 5 since it was considered that such a filter large could not give quality information about the image. Also, the last connected layer has been changed to have seven outputs, and due to the difference between the number of classes, the size of the hidden layer has been reduced to 2048 • Implementation 2: In the previous version, the fully connected input layers are replaced by one of 1024 units, two hidden layers of 2048, and one output of 7 • Implementation 3: Dropout and block normalization are introduced in the previous version. Note that the best results are achieved with implementation 3. It is for this reason that this architecture is chosen to train it with the artificially generated data. Table 4 shows the results obtained. Table 4 Results of AlexNet

Label

Precision

Recall

Accuracy

Implementation 1

0.63

0.39

0.39

Implementation 2

0.62

0.46

0.46

Implementation 3

0.56

0.48

0.48

Best-implementation

0.63

0.52

0.52

5 Recognition of Facial Expressions Using Convolutional Neural Networks Table 5 Results of ResNet-50

63

Label

Precision

Recall

Accuracy

Implementation 1

0.60

0.51

0.51

Implementation 2

0.60

0.43

0.43

Best-implementation

0.64

0.51

0.51

3.4 ResNet-50 Using this architecture, 2 models have been implemented, changing some elements of it: • Implementation 1: The final fully connected layer is changed by one of 7 learning units (the number of classes to classify) • Implementation 2: In the previous implementation, three fully connected layers are added before the output layer. The first has 1024 learning units; the other two have 2048 and an output of 7. Likewise, dropout and block normalization techniques are also used in these layers. Note that the best results are achieved with implementation 1. It is for this reason that this architecture is chosen to train it with the artificially generated data. Table 5 shows the results obtained.

4 Discussion From the results of the metrics obtained in the experiments, the following conclusions can be drawn. In the first place, it has been verified that there is a direct influence of the type of activation function that is used in the networks since the experiments show that when the ReLU activation function is used in all the layers of the networks (except in the one of output where in all cases SoftMax is used) better results are obtained than using other activation functions. Secondly, it has been found that there is a direct relationship between the complexity of the networks (number of filters in the convolutional layers and number of learning units) and the goodness of the results. In this sense, in general, the more complex the networks, the better the results obtained. This phenomenon does not hold if the network was already complex enough (for example, in the ResNet-50 architecture, when several convolutional layers are added to the first implementation, then the resulting second implementation does not improve, probably because the images before reaching the fully connected layers had already been pre-rendered). Thirdly, the results show that the use of the dropout regularization and block normalization techniques, used in all the experiments, generally improve the accuracy of the networks. Fourth, it has been verified in all types of networks that the extension of the training dataset shows in all cases an improvement in the accuracy of the model,

64

A. Sarasa-Cabezuelo

and also, in three of the implementations, it increases or maintains the same global accuracy. Finally, it can be seen that it is possible to obtain as good results with a simple network as those obtained with more complex networks. This phenomenon is observed in the Le-Net 5 network where changing the activation function, using regularization techniques and expanding the training dataset, results are as good as those obtained with other more complex networks and with a lower training time due to the simplicity of the network (for example, it can be compared with the best implementation of the AlexNet network where more filters and more preprocessing are introduced). If the results obtained in the confusion matrices (Appendix A) are analyzed, the following phenomena can be observed. The class of expression of anger is the one that presents the most classification errors and where the networks work the worst since in all of them copies that are not are classified as anger. In particular, the results show that this confusion error with the expression of anger occurs mainly with the fear, sadness, and neutral classes. However, it occurs to a lesser extent with the expression of joy or surprise. This behavior could be explained by several reasons. Regarding the behavior with the expressions of joy or surprise, the explanation could be that in the case of the expression of joy it is the opposite emotion to anger, and it presents an equally opposite expression, and in the case of surprise, the explanation could be that in the gesture of surprise an O is formed with the mouth that is easier to recognize and therefore to distinguish more easily. And regarding the misbehavior with the expressions of fear, sadness, and neutral, the explanation could be found in the implementation of the models, and the internal workings of the framework used, so that it could be happening that if there is not a sufficient number of copies to be able to recognize an image as belonging to a class, then they are classified as the class of expression that is most similar and that follows in greater number of copies to the previous ones (in this case as an expression of anger). Likewise, in the particular case of the misclassification with the expression of sadness, the explanation could be due to the difficulty in distinguishing both expressions because some of the facial features that can be associated with anger can also be associated with sadness (line of the mouth straight or slightly curved downward). On the other hand, there is the expression of disgust that also presents classification problems. In this case, the explanation is due to the fact that the number of images of disgust (547) is much lower than that of other classes such as happiness (which has 8989 images). This problem could be reduced if techniques are used to reduce class imbalance by introducing new images that can be classified in the disgust category. Finally, it is observed that ignoring the previous cases, the models used classify the rest of the classes of expressions quite accurately.

5 Recognition of Facial Expressions Using Convolutional Neural Networks

65

5 Conclusions and Future Work In this work, the ability of some types of convolutional neural networks to classify images showing different facial expressions corresponding to emotional states has been studied. For this, several experiments have been carried out. In the first place, 4 types of networks have been chosen, and for each of them, improvements have been implemented for each of the networks by varying their architecture or their components. Each of the implemented variants has been trained and tested with the same dataset obtained from the FER2013 facial expression image database, and the results obtained in each case have been compared with the aim of selecting the variant that is best classified for each type of architecture. Secondly, artificial data have been generated from the images used in the FER2013 database by means of three artificial data generation techniques (flip, translation, and random elimination) with the aim of increasing the number of specimens to train and test the networks and in this way to be able to measure the impact with respect to the number of copies on the results obtained. The dataset augmented with the artificially generated data has been tested with the best variants that had been obtained using the original data. And the main conclusions that have been obtained from the results of the experiments show that: (1) Using the ReLU activation function in the non-output layers performs better than the hyperbolic tangent activation function (2) The use of regularization techniques improves the accuracy of the network (3) Extending the dataset using artificial data generation techniques improves network performance (4) The greater the complexity of a network, the better results are obtained. However, it is possible to obtain similar results with simple networks to those obtained in more complex implementations if their parameters and components are adjusted appropriately. (5) Bad behavior is obtained in the classification with the expression of anger, probably due to the fact that this emotion shares features with the rest of the expressions, such as the curvature of the mouth between anger and sadness. (6) Misbehavior is also obtained with the expression of disgust due to the low number of images in the dataset used when compared to the number of images available for the rest of the facial expressions. There are several lines of future work to improve the results obtained in this work. Firstly, it is proposed to extend the dataset used for training by means of other datasets or by generating more artificial data by means of other techniques such as image blending. In this sense, it would also be interesting to compare artificial data generation techniques to analyze which are the most appropriate for this problem. Another line of future work is to analyze the impact on the classification results of the greater or lesser use of preprocessing techniques (for example, the lighting of images or the extraction of facial features) or the type of images used (in particular, to analyze the difference between the use of 2D and 3D images). Lastly, another line of work consists of analyzing how the fact of considering that facial expressions

66

A. Sarasa-Cabezuelo

of emotions are discrete or continuous influences the classification process, that is, considering that an easy expression can only represent a single type of emotion, or considering that in the same facial expression traits of various emotions can be found intermingled. Acknowledgements I would like to thank Mateo García Pérez for developing the analyses.

References 1. Ali W, Tian W, Din SU, Iradukunda D, Khan AA (2021) Classical and modern face recognition approaches: a complete review. Multimedia Tools Appl 80(3):4825–4880 2. Liu Q, Zhang N, Yang W, Wang S, Cui Z, Chen X, Chen L (2017) A review of image recognition with deep convolutional neural network. In: International conference on intelligent computing. Springer, Cham, pp 69–80 3. Javidi B (2022) Image recognition and classification: algorithms, systems, and applications. CRC Press 4. Pak M, Kim S (2017) A review of deep learning in image recognition. In: 2017 4th international conference on computer applications and information processing technology (CAIPT). IEEE, pp 1–3 5. Quraishi MI, Choudhury JP, De M (2012) Image recognition and processing using artificial neural network. In: 2012 1st international conference on recent advances in information technology (RAIT). IEEE, pp 95–100 6. Chen H, Geng L, Zhao H, Zhao C, Liu A (2021) Image recognition algorithm based on artificial intelligence. Neural Comput Appl 2021:1–12 7. Hu Z, He T, Zeng Y, Luo X, Wang J, Huang S, Lin B (2018) Fast image recognition of transmission tower based on big data. Protect Control Mod Power Syst 3(1):1–10 8. Sapijaszko G, Mikhael WB (2018) An overview of recent convolutional neural network algorithms for image recognition. In: 2018 IEEE 61st international midwest symposium on circuits and systems (MWSCAS). IEEE, pp 743–746 9. Revina IM, Emmanuel WS (2021) A survey on human face expression recognition techniques. J King Saud Univ Comput Inf Sci 33(6):619–628 10. Dhillon A, Verma GK (2020) Convolutional neural network: a review of models, methodologies and applications to object detection. Progr Artif Intell 9(2):85–112 11. Nonis F, Dagnes N, Marcolin F, Vezzetti E (2019) 3D Approaches and challenges in facial expression recognition algorithms—a literature review. Appl Sci 9(18):3904 12. Ekundayo O, Viriri S (2019) Facial expression recognition: a review of methods, performances and limitations. In: 2019 conference on information communications technology and society (ICTAS). IEEE, pp 1–6 13. Kodhai E, Pooveswari A, Sharmila P, Ramiya N (2020) Literature review on emotion recognition system. In: 2020 international conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–4 14. Abdullah SMS, Abdulazeez AM (2021) Facial expression recognition based on deep learning convolution neural network: a review. J Soft Comput Data Min 2(1):53–65 15. Masson A, Cazenave G, Trombini J, Batt M (2020) The current challenges of automatic recognition of facial expressions: a systematic review. AI Commun 33(3–6):113–138 16. Altaher A, Salekshahrezaee Z, Abdollah Zadeh A, Rafieipour H, Altaher A (2020) Using multi-inception CNN for face emotion recognition. J Bioeng Res 3(1):1–12 17. Owusu E, Kumi JA, Appati JK (2021) On facial expression recognition benchmarks. Appl Comput Intell Soft Comput

5 Recognition of Facial Expressions Using Convolutional Neural Networks

67

18. Ekundayo O, Viriri S (2021) Multilabel convolution neural network for facial expression recognition and ordinal intensity estimation. PeerJ Computer Science 7:e736 19. Kaur P, Krishan K, Sharma SK, Kanchan T (2020) Facial-recognition algorithms: a literature review. Med Sci Law 60(2):131–139 20. Pham L, Vu TH, Tran TA (2020) Facial expression recognition using residual masking network. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 4513–4519 21. Alreshidi A, Ullah M (2020) Facial emotion recognition using hybrid features. Informatics 7(1) 22. Balasubramanian B, Diwan P, Nadar R, Bhatia A (2019) Analysis of facial emotion recognition. In: 2019 3rd international conference on trends in electronics and informatics (ICOEI). IEEE, pp 945–949 23. Chengeta K, Viriri S (2019) A review of local, holistic and deep learning approaches in facial expressions recognition. In: 2019 conference on information communications technology and society (ICTAS). IEEE, pp 1–7 24. Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401 25. Dino H, Abdulrazzaq MB, Zeebaree SR, Sallow AB, Zebari RR, Shukur HM, Haji LM (2020) Facial expression recognition based on hybrid feature extraction techniques with different classifiers. TEST Eng Manage 83:22319–22329 26. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst 27. Kumar A, Kaur A, Kumar M (2019) Face detection techniques: a review. Artif Intell Rev 52(2):927–948 28. Zhang H, Jolfaei A, Alazab M (2019) A face emotion recognition method using convolutional neural network and image edge computing. IEEE Access 7:159081–159089 29. Saxena A, Khanna A, Gupta D (2020) Emotion recognition and detection methods: a comprehensive survey. J Artif Intell Syst 2(1):53–79 30. Jaapar RMQR, Mansor MA (2018) Convolutional neural network model in machine learning methods and computer vision for image recognition: a review. J Appl Sci Res 14(6):23–27 31. Singh S, Nasoz F (2020) Facial expression recognition with convolutional neural networks. In: 2020 10th annual computing and communication workshop and conference (CCWC). IEEE, pp 0324–0328 32. Kusuma GP, Jonathan APL, Lim AP (2020) Emotion recognition on fer-2013 face images using fine-tuned vgg-16. Adv Sci Technol Eng Syst J 5(6):315–322 33. Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ (2021) 1D convolutional neural networks and applications: a survey. Mech Syst Signal Process 151:107398 34. Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4):611–629 35. Zhou DX (2020) Theory of deep convolutional neural networks: downsampling. Neural Netw 124:319–327 36. Lindsay GW (2021) Convolutional neural networks as a model of the visual system: past, present, and future. J Cogn Neurosci 33(10):2017–2031 37. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516 38. Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868 39. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8(1):1–74 40. Boulent J, Foucher S, Théau J, St-Charles PL (2019) Convolutional neural networks for the automatic identification of plant diseases. Front Plant Sci 10:941 41. Véstias MP (2019) A survey of convolutional neural networks on edge with reconfigurable computing. Algorithms 12(8):154

68

A. Sarasa-Cabezuelo

42. Kimutai G, Ngenzi A, Said RN, Kiprop A, Förster A (2020) An optimum tea fermentation detection model based on deep convolutional neural networks. Data 5(2):44 43. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 44. Kouretas I, Paliouras V (2019) Simplified hardware implementation of the softmax activation function. In: 2019 8th international conference on modern circuits and systems technologies (MOCAST). IEEE, pp 1–4 45. Jaafra Y, Laurent JL, Deruyver A, Naceur MS (2019) Reinforcement learning for neural architecture search: a review. Image Vis Comput 89:57–66 46. Ismail Fawaz H, Lucas B, Forestier G, Pelletier C, Schmidt DF, Weber J, Petitjean F (2020) Inceptiontime: finding alexnet for time series classification. Data Min Knowl Disc 34(6):1936– 1962 47. Li B, Lima D (2021) Facial expression recognition via ResNet-50. Int J Cogn Comput Eng 2:57–64 48. Lateh MA, Muda AK, Yusof ZIM, Muda NA, Azmi MS (2017) Handling a small dataset problem in prediction model by employ artificial data generation approach: a review. J Phys Conf Ser 892(1) 49. Taylor L, Nitschke G (2018) Improving deep learning with generic data augmentation. In: 2018 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 1542–1547 50. Body T, Tao X, Li Y, Li L, Zhong N (2021) Using back-and-forth translation to create artificial augmented textual data for sentiment analysis models. Expert Syst Appl 178:115033 51. Liu H, Motoda H (eds) Computational methods of feature selection. CRC Press 52. Peral J, Maté A, Marco M (2017) Application of data mining techniques to identify relevant key performance indicators. Comput Stand Interf 54:76–85

Chapter 6

Identification of Customer Preferences by Using the Multichannel Personalization for Product Recommendations B. Ramakantha Reddy

and R. Lokesh Kumar

1 Introduction From the past publication articles, it has been seen that the promotional businessfacing has very high variations because of the fast and excessive dispersion of the Internet. Also, the conventional, and online advertisement media are utilizing a search engine and social media for enhancing their promotional resolution [1]. So, the customers or consumers give their messages through various online channels and conventional mode channels. After that, the reaction of customers to online mode advertising is neglected [2]. As of now, one-third of the persons are utilizing the adblock software for removing unnecessary advertising channels. From the literature review, many companies use the personalization concepts for improving their efficiency of publicity [3]. Personation is one of the most common product recommendation systems which are utilized in e-commerce. Later, the personalization is incorporated directly with the software industry website, and the email communication happens between the customers and organizations [4]. At the initial stage, personalization recommendations are mostly related to the intelligence technology and it transfers to the substantial advertising methodology for attracting the particular audience. In article [5], the authors evidently explain that the tentative results give improved publicity rates. From the last few year’s research data, it is clearly identified that the product endorsement system is frequently applied for all industrial applications [6]. In article [7], a German retailer explains that the purchasing order rate has improved by 25% from the help of a product related B. Ramakantha Reddy · R. Lokesh Kumar (B) Vellore Institute of Technology, Katpadi, Vellore, Tamil Nadu 632014, India e-mail: [email protected] B. Ramakantha Reddy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_6

69

70

B. Ramakantha Reddy and R. Lokesh Kumar

recommendation system. The excessive enhancement of advertisement channels and at most deteriorate advertisement funds within the companies makes the companies take the decision on effective investment of funds for publicity purposes. So, it is very complex to find out the improvement of advertisement channels and their related messages on advertisement. The personalized advertisements are a present concern in various media channels in order to evaluate recommendations for industries for attracting the consumers [8]. Hence, in this article, the evaluation of personalized product endorsements have been designed in various advertisements in terms of different media channels, essential recommended techniques. Also, the other design parameters are considered based on the customer dependent. In article [9], the single information communicating channel is utilized for the design of a product personalization system which is a website of a retailer. The past research gives the first examination of how the recommendation features effect on customer’s motivation for following the personalization recommendations. The second one gives the comparison of banner-based advertising, array or package inserts, and finally the advertisement of email [10]. This article gives the boosted and advanced present information on personalized features effect on advertising Medias. Also, it gives the interrelation between the product recommendations design, and personalized related research. Here, the human gender has been considered. From the previously published magazines, it is identified that the female candidates are more involved in clothing business when compared to the male candidates [11]. So, gender variation is one of the major factors in advertising the clothes manufacturing companies. Also, more data is collected from various customers before buying the clothes. Now, based on the generation, the nature of online shopping, and male purchasing nature is changing rapidly day by day [12]. The enjoyment of male peoples are increasing rapidly for each candidate deeper understanding is analyzed which is illustrated as a research call. So, the discrete analysis has been made for each and every gender of human beings.

2 Realistic Choice-Based Conjoint Analysis The experimental investigation is done in this section based on three considerations: those are defined as the research area, design of experimental personalized product, and finally different samples and collection of data.

2.1 Area of Research From the recent papers review, the personalized effects on products are explained by taking the factorial strategy and it needed various samples for making the conditions. So, the determination of each preference related to the various alternatives of experiment analysis is not all suitable for all conditions. So, it is well and good in particular

6 Identification of Customer Preferences by Using the Multichannel …

71

conditions. Due to this disadvantage, the experiment analysis is replaced and it is used as a choice related investigation Bayes was utilized as a centric user determination technique for evaluating the recommendations of personalized products [13]. Also, the conventional conjoint investigation is applied for finding the consumer’s preferences behavior from the rank related data or rating related data [14]. In conjoint related investigation analysis, the defendants may be asked to select an appropriate option from the choice data sets which is named as a set of data alternatives [15]. The respondents may select the none option instead of the different product choices that show that their selection shows to some other current stimuli. This type of choices selection in between the various substitutes are quite the same to the present real world results in the various advertising places. As a result, the customer’s choices may be taken into account in a realistic way [16]. However, in this article, the visual indication of all stimuli has been carried out. This may come due to the particular characteristics of recommendations related to the product. The results of recommendations have been determined by utilizing the underlying concept. In this underlying concept, the recommendations are determined by the combination of stimulation characteristics, and personal characteristics. There are a number of published articles in the marketing field giving the alternatives by visual manner [17]. The primary research of marketing handles the landing related pages optimization, and interfacing of websites. Also, the determination of advertisement effectiveness in several contexts is a very difficult task. Similar to the landing related optimization, the designing of different complex ads have been made product recommendation feature evaluation completion [18]. The demerits of above literature methods are compensated by applying the conjoint related concept. The conjoint is applied in the recommendation system for handling the complex advertisements and it may help to determine the reasons for refusing all media ads. The conjoint-based concept features the first one is personalization system, and the other is selection of particular tasks by utilizing none option [19]. However, a unique product conjoint analysis is a one of the special cases of choice-based analysis.

2.2 Experimental Strategy of Conjoint Analysis Similar to the conjoint investigation, the defendants are given in a pullover manner. For suppose, the pullover is bought recently from the retailer then choice-based analysis may ask the various subjects, and their related instantaneous recommendation on products. There are two major conjoint analysis have been done on sexual separated human beings. Based on the conjoint features, the bestselling products are evaluated on amazon website [20]. Based on the personalized recommendation integration with the package of image insertion, the channel of advertisement is analyzed visually. However, the collaborative concept-based recommendation systems consist of different challenges those are recommendation speed initiation, sparsity of data recorded, effectiveness of recommendation, and scalability [21]. There are various

72

B. Ramakantha Reddy and R. Lokesh Kumar

unlimited attempts have been employed for limiting the disadvantages of collaborative issues. Based on these attempts, the resultant generation recommendation may be highly accurate. The present work gives the various recommendation systems explanation and their required determinations. From the literature of recommended systems, these systems are classified as content related systems, Knowledge Correlated Systems (KCS), memory dependent, and finally, Collaborative Related Filtering (CRF) [22]. The content related RS works depending on the comfort filtering, and analysis of memory data. Here, from one item to another item, the collaborative filtering concept is utilized because of its features are good simplicity, quality in recommendation, high scalability, and fast and quick data understanding and updating. Collaborative concepts in recommendation of projects have been done through the various websites of amazon personalized recommendations [23]. In amazon, the product style of description is given in the form of a label. The representation of the label is defined as a unique proposal for you depending on the products which you bought currently. As of now, the recommendation systems are developed for contributing the products for the possible customers. The co-operative filtering is called a basic technique in one among the recommendation systems for giving the advice of parallel customers at the time of incoming, and past transactions. Big data analysis is the one important in co-operative clarifying recommendation networks [24]. So, the suggestions made by utilizing the collaborative seeping consist of very less accuracy. From the previous works, several research scholars have employed the associate rules in various recommended methods for enhancing the accuracy of suggestions. But, the major drawbacks of recommendation techniques are a longer running time period, and it may not be useful for real world application purposes [25]. At present, the highest number of researchers working on personalized product recommendations. The determination of recommendation systems by neglecting the advertising media is clearly given in Table.1.

2.3 Various Samples Plus Data Assembly From the literature study, one of the German institutes selected a group of students for understanding and analyzing the choice related conjoint method. The analysis of assignments and their levels in conjoint concept is given in Table.2. Here, the population is the major consideration for the current research topic. The nineteenth century onwards students are developed with digital technologies which is defined as digital natives [26]. Moreover, digital technology is playing a major role in the present day life. Also, the involvement of higher age peoples in the digital system is the current trend setters for style. Based on the higher influence of digital media, and personalized concept, it is very important to associate both the traditional recommendations, and digital recommendations in various channels. In article [27], the literature is explained by the authors for differentiating the male and female candidates in provisions of clothes shopping. From the different

6 Identification of Customer Preferences by Using the Multichannel …

73

Table 1 Probable causes for declining the utilization of product endorsements Product or cause

Hypothesis

Principal, I use to neglect the every personalized publicity

Avoiding of publicity

Principal, I use to neglect the every email related publicity

Avoiding of publicity

Principal, I use to neglect the every banner publicity

Avoiding of publicity

Principal, I use to neglect the every publicity in packet Avoiding of publicity inserts If the firm may access my private documents

Confidential anxiety

If I have any advertising related documents, I think it can be peeving

Peeving

I haven’t like the personalization items

Quality of recommendation

Most of the personalized products are same with each and other

Perceived variation of recommendation

I was not knows that why continuously I am receiving Clarity recommended mails Table 2 Assigns and their levels in choice related conjoint analysis Investigation theory or research hypothesis

Determination of present choice conjoint analysis Assigns

Levels

Customers mostly refer printed channels publicity for making the publicity of information exchange channels which are banners, and emails

Various publicity channels

Banner, and package inserts

Consumer refer emails ads to exceptional publicity

Process

Email and banners publicity

Customers refer product recommendations which are obtained from the collaborative filtering

Process

Collective streaming, best marketing items

Consumer refer product style description to general description

Clarification

Each and every recommendation related to recent product purchasing

Customer refers moderate recommendation data sets to higher data sets

Multiple recommendation systems

4, 8, 12

Customer refer retailer advertise Source to high level credibility

Amazon, baur, and vests

74

B. Ramakantha Reddy and R. Lokesh Kumar

results, the gender variation is controlled accurately. Also, the gender variation gives the people preferences on various products, and their recommendations in multiple ways of media. The economics, and faculty of law related digital survey has been considered. Based on the survey of faculty law, the random number of students are taken into account who are currently available at the survey time. Here, all the students must and should have the equal probability of selecting samples. In article [28], the authors utilized the four computers in the faculty block entrance, and posters were pasted in order to make the survey effective. The update of participation in each and every survey has been done. In article [29], the selected students for conducting an analytical survey is 334. After that the two respondents are removed from the survey, and remaining made the fast literature survey which are not digital natives. Overall, the female persons are equal to 48.8%, and remaining 51.2% are male candidates [30]. Here, 76.2% respondents are aged between 18 and 23. Also, the respondents doing their undergraduate work are 74.1%. However, the selected samples may not be accurate due to the small selection biases.

3 Research Results and Discussion The factors considered for the analysis of results are fit goodness, analytical validity, choice related conjoint, and its detailed hypothesis.

3.1 Evaluation of Quality of Fit and Predictive Analysis The quality of fit of the various models of data sets is evaluated by utilizing the Average Root Probability (ARP) rates. The present fit value is greater than the ARP rate for the causal model of 0.5 for all combination samples as well as individual samples. Also, the high value of average Initial Selected Hit Rate (ISHR) gives good predictability. The choice firmness was exactly forecasted as 80.12% for the male candidates, and 79.88% for the female data set samples. Accordingly, the gender differentiation evaluation accuracy is improved to 59% for human male peoples. Similarly, the female candidates’ determination accuracy is enhanced to 54% when associated with the random value. So, the choice related conjoint analysis with highly recommended stimuli is the best method for identifying the advertisement predilections as given in Table.3.

6 Identification of Customer Preferences by Using the Multichannel … Table 3 Interior, and analytical validity of the effective utility estimations

75

Male (x = 195) Female (x = 158) ARP or RLH Cumulative

0.783

0.721

Specific

0.795

0.775

ISHR or FCHR Holdout assessment-1 (%) 75.31

75.33

Holdout assessment-2 (%) 79.98

72.77

Holdout assessment-3 (%) 78.22

80.88

Holdout assessment-4 (%) 84.55

85.11

Average

78.02

79.85

3.2 Choice Associated Combined and Assumption-Based Testing The evaluated results from the conjoint for cumulative and individual samples which are mean importance plus nil positioned measure are the functions for the relative characteristic levels specify the inclination to utilize the particular product personalized recommendations which are motivated to the different media channels. The gender differentiation of male plus female has been done by applying the advertising media and it is the most useful consideration for male candidates. As per the advertising media, the rate of male candidates is 47%, and female rate is 43%. From Table.4, the male candidates refer to package inserts instead of advertising media for product recommendations. Similarly, the female candidates refer to the email advertisement for personalized recommendations.

3.3 Product Recommendation Rejecting Reasons The average one-third of the responses chose the sixteen choices in combined with the 4 hold out tasks to get the existing product recommendation systems. Hence, approximately one-third of the probability tasks defendants refuse the use of personalized product recommendations. Those candidates select none in 3 or 16 choice tasks. However, the major reason for the rejection of the product recommendation system is different. So, the significant variation of 2-samples is clearly shown in Fig. 1.

76

B. Ramakantha Reddy and R. Lokesh Kumar

Table 4 Segment value utilities, plus quality of necessity for two genders association Quality or level

Significance and average deviation Male candidates (x = 195)

Female candidates (x = 158)

Publicity channel

46.66%, 17.5528

44.22%, 15,9334

Package supplements

41.92934, 85.27832

26.85, 90.3287

Advertising based on email

17.88834, 89.81167

36.98763, 64.734441

Advertising based on banner

− 68.004521221

− 69.2999876001

Algorithm

12.222%, 8.8999

18.66%, 14.044887

Collaborative filtering

− 21.00044, 29.4655

6.224, 56.81123

Good selling item

21.22982, 29.455430

− 4.89321, 57.228991

Clarification

6.59012%, 5.55021

6.8213%, 5.00012

Style of item

− 2.00031, 23.10321

− 10.274528, 19.9928

Undefined

2.254377, 23.332299

10.732854, 19.99934

Recommendations

19.665443%, 11.7756

19.4406%, 9.34288

4

52.9965, 55.88826

− 7.7756, 45.99878

8

− 23.8778, 30.6667

− 23.88865, 28.999654

12

− 17.9987, 44.4565021

29.99989, 40.99978765

Supplier

14.88765%, 7.564330

14.9876554%, 7.003421

Amazon

24.987654, 32.34667

26.365350, 30.6753635

Option none

173.9678674, 225.87656745

99.9944563, 124.87654

Baur

− 14.8754333, 28.54322

− 18.54330, 20.5433

Vestes deis

− 12.32211, 23.9876

− 7.985446, 31.997654

4 Conclusion In this work, the advertising on personalization is described clearly in various forms which are the use of choice related conjoint, and visuals performances of stimuli. These are not illustrated clearly in the previous published marketing works. Here, in this work, the conjoint analysis is effectively finding customer preferences in advertisement related channels. Also, the choice related conjoint gives better gender differentiation, and their references on product related recommendations in various advertising channels. Hence, the promoters, and local sellers are intellectual about doing publicity on their products. Also, the other constraints have been taken into account which are the total amount of personalized recommendations based on particular customer preferences.

6 Identification of Customer Preferences by Using the Multichannel …

77

Fig. 1 Major reasons for opposing the product recommendation system

References 1. Xie C, Teo P (2020) Institutional self-promotion: a comparative study of appraisal resources used by top-and second-tier universities in China and America. High Educ 80(2):353–371 2. Li D, Atkinson L (2020) Effect of emotional victim images in prosocial advertising: the moderating role of helping mode. Int J Nonprofit Voluntary Sector Market 25(4):e1676 3. Wongwatkit C, Panjaburee P, Srisawasdi N, Seprum P (2020) Moderating effects of gender differences on the relationships between perceived learning support, intention to use, and learning performance in a personalized e-learning. J Comput Educ 7(2):229–255 4. Kwayu S, Abubakre M, Lal B (2021) The influence of informal social media practices on knowledge sharing and work processes within organizations. Int J Inf Manage 58:102280 5. Huey RB, Carroll C, Salisbury R, Wang JL (2020) Mountaineers on Mount Everest: effects of age, sex, experience, and crowding on rates of success and death. PLoS ONE 15(8):e0236919 6. Selvaraj V, Karthika TS, Mansiya C, Alagar M (2021) An over review on recently developed techniques, mechanisms and intermediate involved in the advanced azo dye degradation for industrial applications. J Mol Struct 1224:129195 7. Schreiner T, Rese A, Baier D (2019) Multichannel personalization: identifying consumer preferences for product recommendations in advertisements across different media channels. J Retail Consum Serv 48:87–99 8. Hong T, Choi JA, Lim K, Kim P (2020) Enhancing personalized ads using interest category classification of SNS users based on deep neural networks. Sensors 21(1):199 9. Wang Y, Ma HS, Yang JH, Wang KS (2017) Industry 4.0: a way from mass customization to mass personalization production. Adv Manuf 5(4):311–320 10. Guitart IA, Hervet G, Gelper S (2020) Competitive advertising strategies for programmatic television. J Acad Mark Sci 48(4):753–775 11. Sen S, Antara N, Sen S (2021) Factors influencing consumers’ to take ready-made frozen food. Curr Psychol 40(6):2634–2643

78

B. Ramakantha Reddy and R. Lokesh Kumar

12. Matuschek E, Åhman J, Webster C, Kahlmeter G (2018) Antimicrobial susceptibility testing of colistin–evaluation of seven commercial MIC products against standard broth microdilution for Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa, and Acinetobacter spp. Clin Microbiol Infect 24(8):865–870 13. Haruna K, Akmar Ismail M, Suhendroyono S, Damiasih D, Pierewan AC, Chiroma H, Herawan T (2017) Context-aware recommender system: a review of recent developmental process and future research direction. Appl Sci 7(12):1211 14. Carlo AD, Hosseini Ghomi R, Renn BN, Areán PA (2019) By the numbers: ratings and utilization of behavioral health mobile applications. NPJ Digital Med 2(1):1–8 15. Gottschall T, Skokov KP, Fries M, Taubel A, Radulov I, Scheibel F, Gutfleisch O (2019) Making a cool choice: the materials library of magnetic refrigeration. Adv Energy Mater 9(34):1901322 16. Illgen S, Höck M (2019) Literature review of the vehicle relocation problem in one-way car sharing networks. Transp Res Part B Methodol 120:193–204 17. Sample KL, Hagtvedt H, Brasel SA (2020) Components of visual perception in marketing contexts: a conceptual framework and review. J Acad Mark Sci 48(3):405–421 18. He R, Kang WC, McAuley J (2017) Translation-based recommendation. In: Proceedings of the eleventh ACM conference on recommender systems, pp 161–169 19. Micu A, Capatina A, Cristea DS, Munteanu D, Micu AE, Sarpe DA (2022) Assessing an onsite customer profiling and hyper-personalization system prototype based on a deep learning approach. Technol Forecast Soc Chang 174:121289 20. Kaushik K, Mishra R, Rana NP, Dwivedi YK (2018) Exploring reviews and review sequences on e-commerce platform: a study of helpful reviews on Amazon. J Retail Consumer Serv 45:21–32 21. Wu Z, Li C, Cao J, Ge Y (2020) On Scalability of Association-rule-based recommendation: a unified distributed-computing framework. ACM Trans Web (TWEB) 14(3):1–21 22. Tan Z, He L (2017) An efficient similarity measure for user-based collaborative filtering recommender systems inspired by the physical resonance principle. IEEE Access 5:27211–27228 23. Yoneda T, Kozawa S, Osone K, Koide Y, Abe Y, Seki Y (2019) Algorithms and system architecture for immediate personalized news recommendations. In: IEEE/WIC/ACM international conference on web intelligence, Oct 2019, pp 124–131 24. Kamilaris A, Kartakoullis A, Prenafeta-Boldú FX (2017) A review on the practice of big data analysis in agriculture. Comput Electron Agric 143:23–37 25. Tarus JK, Niu Z, Mustafa G (2018) Knowledge-based recommendation: a review of ontologybased recommender systems for e-learning. Artif Intell Rev 50(1):21–48 26. Kirschner PA, De Bruyckere P (2017) The myths of the digital native and the multitasker. Teach Teach Educ 67:135–142 27. Bourabain D, Verhaeghe PP (2019) Could you help me, please? Intersectional field experiments on everyday discrimination in clothing stores. J Ethn Migr Stud 45(11):2026–2044 28. Schwab-McCoy A, Baker CM, Gasper RE (2021) Data science in 2020: computing, curricula, and challenges for the next 10 years. J Stat Data Sci Educ 29(sup1):S40–S50 29. Oswalt SB, Lederer AM, Chestnut-Steich K, Day C, Halbritter A, Ortiz D (2020) Trends in college students’ mental health diagnoses and utilization of services, 2009–2015. J Am Coll Health 68(1):41–51 30. Kao K, Benstead LJ (2021) Female electability in the Arab world: the advantages of intersectionality. Comp Polit 53(3):427–464

Chapter 7

A Post-disaster Relocation Model for Infectious Population Considering Minimizing Cost and Time Under a Pentagonal Fuzzy Environment Mayank Singh Bhakuni, Pooja Bhakuni, and Amrit Das

1 Introduction Disasters are unanticipated disastrous events that cause significant harm to the population. In the recent years there has been considerable increase in frequency of natural and man-made disaster. These catastrophic causes human causalities and damage to public-private infrastructure. In the past, natural disaster like flood in Kedarnath in 2013, earthquake in Nepal in 2015, flood in Assam in 2017 and Chennai flood in 2021 have causes devastation of life of million of people. The damage to infrastructure causes delay in delivery of basic amenities to the affected population. Considering their miserable situation and need for humanitarian goods the population needs to be quickly transported to relief centres. The affected population comprises of both infectious and non-infectious. The transportation of infectious population poses a greater challenge because of deteriorating of health condition and possibility of spreading the disease to non-infectious population. Kouadio et al. [1] describe the need of prevention and measures taken to tackle infectious population during disaster. Loebach et al. [2] discuss the displacement of infectious population due to the consequences of natural disaster. To counter the challenges during relocation process in post-disaster phase, a multi-objective solid transportation (MOST) model is developed for transportation of infectious population from affected areas to relief centres. The model is based on core notion of solid transportation problem (STP) [3] which is an extension of transportation problem [4] developed by F. L. Hitchcock in 1941. The STP persist in various environments consisting of single and multiple objectives. Das et al. [5], Zhang et al. [6] and Kundu et al. [7] consider multiple objective for their STP.

M. S. Bhakuni · P. Bhakuni · A. Das (B) Department of Mathematics, School of Advanced Sciences, Vellore Institute of Technology, Vellore, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_7

79

80

M. S. Bhakuni et al.

The post-disaster phase persists various challenges for relocation process. These include limited budget [8], unavailability of conveyance, limited bedding in relief centres, etc. Cao et al. [9] proposed an optimization model for relief distribution in post-disaster phase. Moreover, Zhang et al. [10] develop a optimization model for humanitarian relief network. The constrains of the developed model are designed to undertake these challenges during relocation process. The model consists of two crucial objectives functions which include total travel time i.e., transportation time of affected population from source to relief centres using different types of conveyance and service time i.e., travel time of affected population, loading and unloading time and time taken to provide accommodation in relief centres. The objectives functions are design in such a way that minimum resources are used and timely allocation of infectious population is done. The consequences of disaster are unpredictable. The mathematical formulation of the model is designed considering source points, relief centres and conveyances. The exact cost of transportation, time taken to transport to relief centres, loading and unloading time, accommodation time, etc., are difficult to predict and varies depending upon the condition of source, destination, and conveyance, intensity of damage caused to roads. To counter the impreciseness in the real-life scenario we have considered inputs as pentagonal fuzzy number (PFN). The fuzzy set theory was introduced by Zadeh [11] in 1996. Later, in 1999 K Atanassov proposed [12] intuitionistic fuzzy set. The reason for choosing PFN as inputs is because it uses five components to represent a fuzzy number, allowing it to comprehend the vagueness to a great extent. The developed model is implemented in the case study conducted on Chennai flood. The COVID affected population at Chennai is transported from different sources to relief centres. The model involve multiple conflicting objectives so a compromise solution techniques is used to obtain the result. These techniques includes: global criterion method (GCM) and fuzzy goal programming (FGP). The model in solved in LINGO optimization software and thorough analysis is done for the obtained result. As far as our knowledge and research, our suggested model combines the following novel finding: – A mathematical model for transportation of infectious population. – Considering accommodation and, loading and unloading time. – Inputs of MOST model in the form of PFN.

2 Basic Concepts and Defuzzification Method for PFN In this section, we define PFN and various criteria satisfied by the membership function of PFN. We also discuss fuzzy equivalence relation. The advantage of utilizing PFN over a standard fuzzy set is that it allows expressing a fuzzy number with five components, which helps capture uncertainty and ambiguity more effectively.

7 A Post-disaster Relocation Model …

81

2.1 Basic Concepts In this section we discuss the basic concepts aligned to PFN and some properties of its membership function. Definition 1. Linear pentagonal fuzzy number [13]: A PFN is denoted as F˜ = ( f 1 , f 2 , f 3 , f 4 , f 5 ; p) and the membership function is represented as follows: ⎧ x− f1 p f2 − f1 ⎪ ⎪ ⎪ ⎪ f2 ⎪ 1 − (1 − p) fx− ⎪ ⎪ 3 − f2 ⎪ ⎨1 ζ F˜ (x) = ⎪ 1 − (1 − p) ff44−−xf3 ⎪ ⎪ ⎪ ⎪ ⎪ p ff5−−xf ⎪ ⎪ ⎩ 5 4 0

if f 1 ≤ x ≤ f 2 if f 2 ≤ x ≤ f 3 if x = f 3 if f 3 ≤ x ≤ f 4 if f 4 ≤ x ≤ f 5 if x > f 5

Definition 2. Properties of PFN [13]: A PFN F˜ = ( f 1 , f 2 , f 3 , f 4 , f 5 ; p) and its membership function ζ F˜ (x) must accomplish following conditions: (i) ζ F˜ (x) is a continuous function with a range of values between [0, 1] (ii) ζ F˜ (x) is strictly increasing on [ f 1 , f 2 ] and [ f 2 , f 3 ] (iii) ζ F˜ (x) is strictly decreasing on [ f 3 , f 4 ] and [ f 4 , f 5 ].

2.2 Removal Area Method to Convert PFN to Crisp Number Let us consider a PFN F˜ = ( f 1 , f 2 , f 3 , f 4 , f 5 ; p). The defuzzification methodology to convert PFN to equivalent crisp number using the removal area method was proposed by Chakraborty et al. [14] in 2019, where the author considered five sets of the area obtained through the different regions of PFN. Then, the average value of the obtained area is calculated by summing up all the obtained areas and dividing by 5. The formula obtained is used for the defuzzification of PFN, and the representation is as follows:

˜ = D( F)

( f1 + f2 ) 2

· p+

( f2 + f3 ) 2

· (1 − p) + f 3 + f 4 − 5

( f4 − f3 ) 2

· (1 − p) +

( f5 + f4 ) 2

·p

(1) The Eq. (1) is simplified further using algebraic operations and the resulted equation is shown below: ˜ = D( F)

f 2 + 4 f 3 + f 4 + p( f 1 − 2 f 3 + 2 f 4 + f 5 ) 10

(2)

82

M. S. Bhakuni et al.

3 Problem Statement and Model Formation This section comprises mathematical modeling for the transportation of infectious populations during the post-disaster phase. Transportation must be done as early as possible to avoid further deterioration of the health of the infectious population. At the same, it must be kept in mind that transportation must be cost-efficient, such that every individual is transported using minimum resources. Therefore, there is a need for a mathematical model considering cost and time as objective functions. Since the availability of conveyance plays a crucial role during the transportation, therefore we have to consider the various type of conveyance which are restricted by the service time.

3.1 Modelling In this section, we have introduced the MOST model along with various assumptions. Further, a brief interpretation of the MOST model is made. Assumptions For The Model (i) Limited amount of budget is allotted for relief operation. (ii) Each relief centre has a limited capacity for the number of infectious populations. (iii) Working time of each convenience is limited. (iv) Each conveyance is confined to carry a restricted number of persons. Indices: (i) I set of source, indexed by i (ii) J set of relief centre, indexed by j (iii) K set of conveyance, indexed by k. Parameters: i jk C j F  T T i jk T LU i jk  AC Tj  TP i P

fuzzy transportation cost of per person from source i to relief centre j using kth conveyance fuzzy facility cost of a person in relief centre j fuzzy transportation time taken to transport per person from source i to relief centre j using kth conveyance fuzzy loading and unloading time of conveyance k while travelling from source i to relief centre j fuzzy time taken to accommodate at relief centre j fuzzy number of total population that needs to be transported fuzzy number of population present at source i that needs to be trasported to relief centres

7 A Post-disaster Relocation Model …

k CC  C Rj  B k T

83

fuzzy capacity of kth conveyance fuzzy capacity of jth relief centre fuzzy budget allocated for relief work fuzzy limited time for kth conveyance

Decision Variables: xi jk yi jk =

unknown number of persons to be shifted via transportation from source i to relief centre j using k−th conveyance 1 i f xi jk > 0 0 i f other wise

Mathematical Model Min T C =

I J K i jk + F j )xi jk (C

(3)

k=1 i=1 j=1

Min ST =

I J K   (T T i jk + T LU i jk + AC T j )yi jk

(4)

i=1 j=1 k=1

subject to, J K I

 xi jk = T P

(5)

i=1 j=1 k=1 K J

i xi jk = P

i = 1, 2, ..., I,

(6)

j=1 k=1 I J

k xi jk ≤ CC

k = 1, 2, ..., K ,

(7)

 xi jk ≤ C Rj

j = 1, 2, ..., J,

(8)

i=1 j=1 I K i=1 k=1 K I J i jk xi jk + F j )xi jk ≤  (C B

(9)

k=1 i=1 j=1 J I  (T T i jk + T LU i jk )yi jk ≤ Tk k = 1, 2, ..., K

(10)

i=1 j=1

0 ≤ xi jk i = 1, 2, ..., I,

j = 1, 2, ..., J, k = 1, 2, ..., K , (11)

Model Interpretation The objective function (3) minimizes the total cost of relocating a disaster-affected population, and it has two key terms. The first term delineates the transportation cost from source i to relief centre j using k−th conveyance and

84

M. S. Bhakuni et al.

the second term represents the facility cost of relief centre j. The objective function (4) minimizes the total service time taken to settle the affected population, and it contains three essential terms. The transportation time between source i and relief centre j using k−th conveyance, loading and unloading time of overall population, and accommodation time of each relief centre j. Constraint (5) indicates the overall population that need to be transported to the relief centre. Constraint (6) depicts the number of people present at source i. Constraint (7) delineates the capacity of conveyance k. Constraint (8) represents the capacity of relief centre j. Constraints (9) depicts the available budget. Constraints (9) illustrated the limited working time for each conveyance k. Constraint (11) demonstrate non-negative restrictions.

4 Solving Multi-objective Optimization Problems The model proposed in Sect. 3 is full of complexity due to the presence of different fuzzy parameters and multiple objectives. To solve the model, first, we need a defuzzification technique and then utilize a compromise programming approach to obtain a optimal solution.

4.1 Defuzzification of MOST Model The MOST model describe in Sect. 3 exists in fuzzy environment. In order to solve the MOST model it needs to be converted to the crisp environment. Using the defuzzi˜ i jk , fication technique discussed in Sect. 2 we convert fuzzy C˜ i jk , F˜ j , T˜T i jk , T LU ˜ k , C˜R j , B, ˜ T˜k to crisp Ci jk , F j , T Ti jk , T LUi jk , AC Ti jk , T P, AC˜ T i jk , T˜P, P˜i , CC Pi , CCk , C R j , B, T˜k , respectively. The obtained crisp model is as follows: Min T C =

K I J (Ci jk + F j )xi jk

(12)

k=1 i=1 j=1

Min ST =

I J K

T Ti jk + T LUi jk + AC T j yi jk

(13)

i=1 j=1 k=1

subject to, I J K

xi jk = T P

(14)

i=1 j=1 k=1 J K j=1 k=1

xi jk = Pi

i = 1, 2, ..., I,

(15)

7 A Post-disaster Relocation Model … I J

85

xi jk ≤ CCk

k = 1, 2, ..., K ,

(16)

i=1 j=1 K I

xi jk ≤ C R j

j = 1, 2, ..., J,

(17)

i=1 k=1 I J K

(Ci jk xi jk + F j )xi jk ≤ B

(18)

k=1 i=1 j=1 J I (T Ti jk + T LUi jk )yi jk ≤ Tk k = 1, 2, ..., K

(19)

i=1 j=1

constraint (11)

(20)

4.2 Compromise Programming Approach to Solve MOST Model The above model consists of two conflicting objective functions. Where first objective minimizes the total cost and second objective minimizes the overall service time. The dissonance and incomprehensibility among objective functions result in the lack of a single dominant solution to the overall problem. As a result, employing techniques for solving MOST model a compromise solution is obtained. We use two solution methods namely FGP and GCM to solve MOST model. Global Criterion Method The GCM is used to obtain a compromise solution for a MOST model. The lack of the Pareto ranking mechanism [15] gives this technique a significant advantage above other multi-objective optimization methods in terms of simplicity and efficiency. It minimizes the metric function, which represents the sum of the difference between the objective function and its respective ideal solution. The procedures for utilizing GCM to solve the MOST model are as follows: Step 1: Each objective function (ϕ1 , ϕ2 , ..., ϕm ), of the MOST model is solved independently. Step 2: The value obtained from step-1 for each objective function is titled as ideal objective vectors. The obtained values are (ϕ1 min , ϕ2 min , ..., ϕm min ). Step 3: Using GCM, the MOST model is simplified to a single objective and represented as follows:

86

M. S. Bhakuni et al.



η M ϕm − ϕmmin Min ϕmmin m=1

 η1

subject to, constraints (14)–(20) 1≤η≤∞ The value of the integer-valued exponent, i.e. η = 1, means that we are giving equal significance to each objective function [16]. While η > 1 indicates that higher importance is given to an objective with maximum deviation. For η = 1 we get linear objective function while for η > 1 we have non-linear [17]. Fuzzy Goal Programming In 1961, Charnes and Cooper [18] developed goal programming. The basic idea underlying goal programming is to minimize the distance between objective functions i.e., ϕ1 , ϕ2 , ..., ϕm and their aspirant level i.e., ϕ 1 , ϕ 2 , ..., ϕ m respectively. Further, Mohamed [19] proposed a new concept to minimize distance by defining positive (δm+ ) and negative deviations (δm− ) variables as shown below: 1 {(ϕm − ϕm ) + |ϕm − ϕm |} 2 1 δm− = max (0, ϕm − ϕm ) = {(ϕm − ϕm ) + |ϕm − ϕm |} 2

δm+ = max (0, ϕm − ϕm ) =

m = 1, 2, ..., M, m = 1, 2, ..., M,

In 1972 Zimmermann [20] proposed fuzzy linear programming, where the objective function and respective constraints were defined in fuzzy parameters. In 1997, Mohamed [19] drew attention to the resemblance between goal programming and fuzzy linear programming, as well as how one may lead to the other. Zangiabadi and Maleki [21] introduced an FGP technique that used a unique sort of nonlinear (hyperbolic) membership function for each fuzzy objective and constraint. The steps to solve the MOST model described in Sect. 3 using FGP are stated as follows: Step 1: Solve each MOST model objective function independently, i.e. take just one objective function at a time and ignore the rest. Assume that q1 , q2 , ..., ql are the values of unknown variables acquired after solving each objective function. Step 2: Using unknown variables acquired in step 1 and we obtain ϕ1 (q1 ), ϕ1 (q2 ), ..., ϕ1 (ql ), ϕ2 (q1 ), ϕ2 (q2 ), ..., ϕ2 (ql ), ..., ϕm (q1 ), ϕm (q2 ), ..., ϕm (ql ). Step 3: Calculate the best (bm ) and the worst (wm ) for each objective function. bm = min ϕm (ql ) and wm = max ϕm (ql ) ∀l∈L

∀l∈L

l = 1, 2, ..., L .

Step 4: The model described in Sect. 3 is represented as follows:

7 A Post-disaster Relocation Model …

min ξ subject to,



(bm +wm )

−ϕ

 ν

87

  − (bm +wm ) −ϕ ν

m p m p 2 2 −e 1 1e     − δm+ + δm− = 1 + m ) −ϕ m ) −ϕ 2 2 (bm +w − (bm +w m νp m νp 2 2 e +e ξ ≥ δm− p = 1, 2, · · · , P, δm+ δm− = 0

constraints (14)–(20) 0≤ξ ≤1 6 νp = wm − bm

4.3 LINGO Optimization Software The MOST model proposed in Sect. 3 is solved using LINGO software. It comes with a collection of built-in solvers for various problems. The modeling environment is strictly aligned to the LINGO solver and because of this inter connectivity it transmit problem directly to memory which results in minimization of compatibility issues between solver and modeling components. It uses multiple CPU cores for model simulation, thus giving faster results.

5 Numerical Experiments and Discussions Flood causes loss of human life, damage to infrastructure, non-operation of infrastructural facilities, and worsening health conditions due to waterborne infections. The situation of those suffering from infectious diseases might wreak much more significant harm. Therefore, timely relocation to relief centres is required. To counter this problem, we have proposed the MOST model in Sect. 3, and its practical implementation is demonstrated in the upcoming subsection.

5.1 Input Data for the Real Life Model We have considered the case study based on Chennai, the capital of Tamil Nadu, India. In 2021, the city received more than 1000mm of rainfall in four weeks which resulted in floods in various regions. The peak of COVID made the situation more havoc. Which further delayed various rescue operations. Considering this problem, the designed model targets the relocation of the infectious population during the post-

88

M. S. Bhakuni et al.

Table 1 PFN inputs for MOST model j

j F

 C Rj

 AC Tj

1 (78,95,125,142,154;0.5)

(8,13,19,25,40;0.6)

(435,491,556,641,715;0.7)

2 (75,84,105,119,127;0.9) i i P

(11,14,17,31,51;0.3) k CC

(564,625,687,827,856;0.8) k T

k

1 (163,189,225,275,307;0.8)

1

(197,275,343,383,423;0.4) (1359,1486,1635,1856,1959;0.8)

2 (254,295,345,384,453;0.6)

2

(345,416,445,494,557;0.9) (1238,1416,1698,1783,1902;0.9)

3 (345,395,465,515,555;0.7)

3

(368,434,475,548,671;0.8) (1292,1391,1535,1686,1761;0.6)  B

(151901,184389,197653,213678,230675;0.8)  T P (768,925,1078,1194,1315;0.6)

disaster phase. We have considered three source points located at Velachery, West Mambalam and Pullianthope. From these source points, the affected population is transported to two relief centres situated at thiruvallur and Kanchipuram. Depending upon the transportation time, capacity and ease of travelling, we have considered j ), time taken to three types of conveyance. The PFN inputs for the facility cost ( F   accommodate at jth relief centre ( AC T j ), capacity of relief centre (C R j ), population   present at source ( Pi ), capacity of conveyance (CC k ), limited time of conveyance k ), budget for relief operation (  B) and total population that needs to be transported (T  i jk ), (T P) is shown using Table 1. While, Table 2 represents cost of transportation (C  transportation time (T T i jk ), loading and unloading time (T LU i jk ).

5.2 Result Analysis The model is solved in the LINGO optimization solver using FGP and GCM approaches. Using GCM for η = 1 a total of six allocations of unknown variables are made i.e., x113 = 182, x213 = 10, x211 = 231, x212 = 5, x222 = 7, x322 = 347 with the total cost being TC= 144338 and service time ST= 994. For η = 2 there are five allocations i.e., x321 = 171, x122 = 182, x222 = 1, x312 = 176, x213 = 252 and total cost is TC= 158835 and service time ST= 802. While solving using FGP five allocations are made i.e., x211 = 40, x222 = 12, x322 = 347, x113 = 182, x213 = 201 along with TC= 148514 and service time ST= 817. Further, the transportation of number of people from source to relief centres using GCM and FGP is shown using Figs. 1 and 2 respectively. After analyzing the result obtained using FGP and GCM we infer following points:

7 A Post-disaster Relocation Model …

89

Table 2 PFN inputs for transportation cost and time i jk C

i jk C

 T T i jk

 T T i jk

 T LU i jk

 T LU i jk

i

k=1

k=2

1

(136,148,159,184,198;0.5)

(130,132,155,176,193;0.8)

k=3

(125,145,180,202,236;0.6)

2

(58,86,125,138,156;0.9)

(95,136,156,185,197;0.3)

(96,124,153,176,198;0.7)

3

(86,105,135,177,210;0.6)

(78,112,146,188,218;0.7)

(91,132,168,204,217;0.9)

1

(136,164,181,243,310;0.7)

(168,186,235,286,325;0.4)

(185,248,259,264,269;0.5)

2

(132,175,205,225,253;0.4)

(86,115,193,221,248;0.8)

(145,178,215,245,289;0.5)

3

(56,76,104,142,158;0.4)

(74,95,102,127,151;0.8)

(76,112,154,183,196;0.3)

1

(105,134,156,192,198;0.4)

(130,132,155,176,193;0.8)

(125,145,180,202,236;0.6)

2

(123,156,178,204,235;0.8)

(114,178,198,245,317;0.2)

(95,132,163,176,229;0.4)

3

(158,186,221,284,331;0.4)

(128,156,191,237,270;0.7)

(125,149,167,205,219;0.9)

1

(105,124,168,189,203;0.9)

(94,136,157,210,220;0.3)

(86,114,138,169,182;0.5)

2

(165,178,214,267,299;0.7)

(143,168,195,246,265;0.6)

(125,158,198,231,247,0.5)

3

(143,176,189,234,247;0.8)

(114,167,198,245,284;0.5)

(131,156,174,204,239;0.8)

1

(18,27,34,43,52;0.5)

(14,21,24,34,36;0.7)

(8,14,21,25,38;0.5)

2

(23,31,45,61,75;0.6)

(18,28,41,53,68;0.5)

(21,24,35,50,69;0.3)

3

(18,26,35,44,64,0.6)

(18,21,24,39,57;0.8)

(16,21,28,37,41;0.4)

1

(12,24,31,38,42;0.2)

(9,16,19,22.25,0.9)

(7,9,15,17,24;0.4)

2

(15,28,34,48,77;0.9)

(18,28,31,46,57;0.4)

(9,17,28,35,57;0.2)

3

(14,28,32,36,54;0.5)

(10,18,25,33,44;0.7)

(8,17,20,29,34;0.9)

j =1

j =2

j =1

j =2

j =1

j =2

Fig. 1 Solution of MOST model using GCM for η = 1 and η = 2

– The minimum value of objective function TC is obtained using GCM for η = 1, and the maximum value is obtained using η = 2. – For objective function, ST minimum value is obtained using GCM for η = 2, and the maximum value is obtained using η = 1. – FGP provides an intermediate result for both the objective function TC and ST compared with the solution obtained using η = 1 and η = 2 of GCM. Decision-makers can choose any solution technique depending upon their priority. If they want minimum expenditure in relief procedure (have ample time), they can opt for GCM with η = 1. While, if they need to complete the relief procedure as fast as possible (without being restricted to budget), GCM with η = 2 can be used. While if they want interim value for cost and time objectives, they can opt FGP technique.

90

M. S. Bhakuni et al.

Fig. 2 Solution of MOST model obtained using FGP

6 Conclusion and Future Prospects A MOST model comprising two objective functions is developed for the transportation of infectious populations during the post-disaster phase. The model is developed considering time, budget, and capacity constraints. The after affected of a disaster is unpredictable. Therefore, in order to grasp the uncertainty and vagueness, we have considered PFN as the input parameters. The model is successfully implemented in the case study conducted on the Chennai flood. The inputs for the model are considered in the PF environment. The inputs are converted to equivalent crisp values using the defuzzification method based on the removal area method. The obtained equivalent crisp model is solved in the LINGO optimization solver using compromise solution techniques (GCM and FGP). Further, a detailed analysis of the optimal solution is done, and suggestions are made for the various choices of the decisionmaker. In future, the model can be extended in other uncertain environments like type-2 fuzzy, stochastic, etc. The model can further be implemented in various case studies considering additional objectives and constraints.

References 1. Kouadio IK, Aljunid S, Kamigaki T, Hammad K, Oshitani H (2012) Infectious diseases following natural disasters: prevention and control measures. Expert Rev Anti-infect Ther 10(1):95– 104 2. Loebach P, Korinek K (2019) Disaster vulnerability, displacement, and infectious disease: Nicaragua and Hurricane Mitch. Popul Environ 40(4):434–455 3. Haley KB (1962) New methods in mathematical programming-the solid transportation problem. Oper Res 10(4):448–463 4. Hitchcock FL (1941) The distribution of a product from several sources to numerous localities. J Math Phys 20(1–4):224–230 5. Das A, Bera UK, Maiti M (2018) Defuzzification and application of trapezoidal type-2 fuzzy variables to green solid transportation problem. Soft Comput 22(7):2275–2297 6. Zhang B, Peng J, Li S, Chen L (2016) Fixed charge solid transportation problem in uncertain environment and its algorithm. Comput Ind Eng 102:186–197

7 A Post-disaster Relocation Model …

91

7. Kundu P, Kar S, Maiti M (2014) Multi-objective solid transportation problems with budget constraint in uncertain environment. Int J Sys Sci 45(8):1668–1682 8. Vahdani B, Veysmoradi D, Noori F, Mansour F (2018) Two-stage multi-objective locationrouting-inventory model for humanitarian logistics network design under uncertainty. Int J Disaster Risk Reduction 27:290–306 9. Cao C, Liu Y, Tang O, Gao X (2021) A fuzzy bi-level optimization model for multi-period post-disaster relief distribution in sustainable humanitarian supply chains. Int J Prod Econ 235:108081 10. Zhang P, Liu Y, Yang G, Zhang G (2020) A distributionally robust optimization model for designing humanitarian relief network with resource reallocation. Soft Comput 24(4):2749– 2767 11. Zadeh LA, Klir GJ, Yuan B (1996) Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers, vol 6. World Scientific 12. Atanassov KT (1999) Intuitionistic fuzzy sets. Intuitionistic fuzzy sets. Physica, Heidelberg, pp 1–137 13. Mondal SP, Mandal M (2017) Pentagonal fuzzy number, its properties and application in fuzzy equation. Future Comput Inform J 2(2):110–117 14. Chakraborty A, Mondal SP, Alam S, Ahmadian A, Senu N, De D, Salahshour S (2019) The pentagonal fuzzy number: its different representations, properties, ranking, defuzzification and application in game problems. Symmetry 11(2):248 15. Chiandussi G, Marco C, Ferrero S, Varesio FE (2012) Comparison of multi-objective optimization methodologies for engineering applications. Comput Math Appl 63(5):912–942 16. Salukvadze M (1974) On the existence of solutions in problems of optimization under vectorvalued criteria. J Optim Theory Appl 13(2):203–217 17. Tabucanon MT (1988) Multiple criteria decision making in industry, vol 8. Elsevier Science Limited 18. Kade G, Charnes A, Cooper WW (1964) Management models and industrial applications of linear programming. Vol. I und II. New York-London, (1961) Book Review. J Econ/Zeitschrift für Nazionalökonomie 23:432 19. Mohamed RH (1997) The relationship between goal programming and fuzzy programming. Fuzzy Sets Syst 89(2):215–222 20. Zimmermann H-J (1975) Description and optimization of fuzzy systems. Int J Gen Syst 2(1):209–215 21. Zangiabadi M, Maleki HR (2007) Fuzzy goal programming for multiobjective transportation problems. J Appl Math Comput 24(1):449–460

Chapter 8

The Hidden Enemy: A Botnet Taxonomy Sneha Padhiar , Aayushyamaan Shah, and Ritesh Patel

1 Introduction Modern society is heavily dependent on the use of the Internet. From carrying out simple tasks like opening an application to creating a presentation and complex tasks like playing music from an online application to playing MMO games, everyone uses the Internet. This very use sometimes requires the download of proprietary software which might not come cheap in some cases. Due to this, piracy has hit an all-time high. This in turn made it easy for hackers to spread malware [1–3]. This malware is often made to utilize the resources of the infected machine as a ready-to-use compute node in a large distributed yet interconnected network known as a botnet. In other words, botnets can be said to be a network of interconnected, malware-infected, zombie-like computers that are managed by a central authoritative server known as the command and control server (and the one to control the C&C server is known as the botmaster) [4]. The botmaster is like the brain of the network where an infected machine will do whatever the botmaster will ask it to do. One of the most well-known uses for such a botnet is to carry out a DDoS attack. DDoS attack, in layman’s terms, is where multiple computers start pinging or utilizing a server’s resources in order to stop the actual clients of that server from accessing the services provided [5, 6]. DDoS in itself is hard to prevent and identify if executed correctly, and a botnet adds to it. With the help of botnets, it becomes impossible S. Padhiar (B) · A. Shah · R. Patel U & P.U Patel Department of Computer Engineering, CSPIT, Charusat University of Science and Technology, Anand, GJ 388421, India e-mail: [email protected] A. Shah e-mail: [email protected] R. Patel e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_8

93

94

S. Padhiar et al.

for simple firewalls and security systems to distinguish a DDoS attack from heavy traffic[6]. The rest of the paper is organized as follows. Section 2 introduces a detailed life cycle of botnet. In Sect. 3, we analyze botnet phenomenon, classification based on its architecture, and most relevant detection technique. Finally, Sect. 4 presents conclusions.

2 Botnet Life Cycle Understanding the concept of a botnet is not a difficult task. Designing one is completely different as there are many intricacies involved. The designer has to keep in mind some of the following points before actually deploying it [7–9]: • It should be very efficient in utilizing the resources of the infected computers. • The code should be camouflaged to prevent it from being detected by the antimalware software. • The infected machine should follow the command and control server’s commands. Keeping that in mind, a simple botnet life cycle consisting of five phases was created. The phases of the botnet life cycle (Fig. 1) are as follows: A. Initial injection In this step, the attacker identifies potential hosts by different methods. Once the attacker finds a suitable target, he/she uses a series of attacks to infect that host. This series of attacks include phishing, sending spam emails, creating backdoors, etc. When any one of these attacks is successful and the host is infected, the main goal of the attacker is to add the infected machine to the bot network [7–9]. This is done by periodically refreshing and/or updating this entry. At the end of this stage, we can call the infected computer a newborn or dormant bot.

Fig. 1 Botnet life cycle

8 The Hidden Enemy: A Botnet Taxonomy

95

B. Secondary injection In this stage, the infected machine downloads the actual binary scripts of the malware. The download is done with the help of HTTP, IRC, and peer-to-peer (P2P) protocols. Once the download is complete, the bot compiles the scripts if needed and runs them in the background. By doing this, the bot becomes ready to take useful commands from the command and control servers (C&C servers). These ready bots are called active bots [7–9]. C. Connection After infecting the computers, the botnet is still useless until the C&C server actually communicates with the bots. Hence, after the two phases of infection, the C&C server makes contact with the bots to give command and control information on what is to be done by the bots. On receiving the command and control information from the C&C server, the bot replies to the server with an acknowledgment message proving that it is still active. The person controlling the C&C servers to send the command and control messages is known as the botmaster. To ensure proper working of the bots, the botmaster might restart the bots to check their status (newer methods include simple pinging to know the status of a bot) [7–9]. D. Command and control server The heart of the botnets is the command and control servers (C&C servers). These servers are what the botmaster uses to communicate with and control the army of bots that was developed in the first two phases. C&C servers, or servers in some cases, are created in different manners based on the use and requirements of the attacker. For example, a botnet where the bots are used to carry out an elaborate DDoS attack on a high profile company might consist of multiple C&C servers in a tree-like fashion (hierarchical fashion), where one controls the other and the servers near the leaf nodes control the bots [7–9], whereas a botnet that is designed to simply gather information from the infected machine might only consist of a single C&C server (centralized). E. Upgrade and Maintenance The most crucial step in software development is maintenance and upgradation. Simply put, maintenance of a botnet is required to adapt it to new technologies making it more efficient, and easy to handle, and preventing C&C servers from being detected by anti-malware software and network analysts. To do so, the C&C servers are shifted to new locations preventing them from being tracked by network traffic analysts [7–9].

96

S. Padhiar et al.

3 Classification Since there are no standards for creating malware, classification of all the malware is important to identify the similarities among them. These very classifications help the researchers to group them into categories during their research and practical implementations [10–12]. For this paper, we have decided to classify different botnets based on their architectures.

3.1 Botnet Architecture Botnet architecture means the way the botnet is created. As seen in Fig. 2, there are three different types of models a botnet can fall into based on its architecture [13].

3.1.1

Centralized Architecture

The simplest architecture in this classification is centralized architecture. In this architecture (Fig. 3), all the bots are managed by one single C&C server or a single chain of C&C servers that act as a single entity. The botmaster uses this single C&C server for all the malicious activities. To connect to this type of server, bots use HTTP and IRC protocols. Due to this, the implementation of the botnet is comparatively easy [13, 14].

3.1.2

Decentralized Architecture

The next tier in the architecture of botnets is decentralized architecture (Fig. 4). Here the botmaster uses a C&C server to control one or some bots which in turn act as Fig. 2 Types of botnet based on its architecture

8 The Hidden Enemy: A Botnet Taxonomy

97

Fig. 3 Centralized botnet architecture

Fig. 4 Decentralized botnet architecture

C&C servers for other bots. In other words, all the bots are capable of acting as a bot as well as C&C servers for other bots [13, 14]. This type of network is called P2P or peer-to-peer network and uses the aptly named P2P protocol for communication.

3.1.3

Hybrid Architecture

The final type of architecture is hybrid architecture. This architecture is a combination of centralized and decentralized architecture. In this sense, it is modified peer-to-peer architecture where some aspects of the centralized architecture are implemented. The general idea is to create a mesh-like topology of bot networks where the nodes in the tree are sub-networks of bots (mini botnets). This can also be viewed as a centralized C&C server that controls multiple different C&C servers that in turn are responsible for controlling the local botnets [13, 14].

98

S. Padhiar et al.

Fig. 5 Botnet detection techniques

3.2 Botnet Detection Techniques See Fig. 5.

3.2.1

Honeynets and Honeypots-Based Detection Technique

The honeynet detection technique works with honeypot and honey wall. This technique is used to detect and observe the overall behavior of a botnet. A honeypot is defined as a vulnerable pot that can be easily compromised [15, 16]. Additionally, it is vulnerable, as it is intended to be a part of a botnet and to attract botmasters to infect it. There are different ways to set up honeypots.

Anomaly-Based Detection Technique Anomaly-based techniques work with monitoring the network traffic. The anomalybased detection techniques indicate spamming bot activity by detecting unexpected network interactions and network traffic on unused and unusual ports, as well as unusual system behavior [17]. In order to spot botnets, it is necessary to distinguish malicious traffic from normal traffic. It is possible to detect and shut down a botnet with a number of the existing techniques, but none of them is 100% accurate [13–17]. There are different methods that each technique uses to produce results, and they produce results in different ways. Since all techniques have some advantages and limitations, Table 1 compares the different techniques used for detecting botnets.

8 The Hidden Enemy: A Botnet Taxonomy

99

Table 1 Comparison of different botnet detection techniques Detection technique

Known bot detection

Unknown bot detection

Encrypted traffic detection

Structure and protocol independent

Real-time detection

Anomaly

Yes

Yes

Yes

Yes

No

Signature

Yes

No

No

No

No

DNS

Yes

Yes

Yes

No

No

Data mining Yes

No

No

Yes

No

Honeypot

Yes

Yes

Yes

Yes

Yes

4 Conclusion and Future Scope Botnets are considered the most dangerous form of cyber-security attack in the list of malware. There has been substantial research on botnets in previous years, and it continues to be a difficult topic for researchers due to its complexity and dynamism. Furthermore, botnets can perform legal functions when they are in charge of checking the activities of all organizations and employers. A brief summary of the problem of botnets, what botnet attacks look like at present, the types of botnets, and their existing detection methods is presented in this paper. Different protocols and architectural designs can be used to create botnets. Several new types of botnets are being created, such as bot clouds and mobile botnets. Facebook, Twitter, and other social networking websites are attacked by botnets such as social bot. The existing botnet detection techniques can be classified as setting up a honeynet or using an intrusion detection system (IDS). The use of machine learning, automatic models, and other methods is one way to identify bots. Yet, tracing botnets is not yet 100% accurate with any existing model or technique. In the future, security researchers should investigate the various botnet detection techniques with respect to botnet architecture and improve its potential extension. In addition, it is necessary to monitor the entire functioning of the different botnets and create a complete list of the bots with their signatures, as this will prove beneficial in developing new botnet detection models and techniques. In the field of cyber-security, providers should pay more attention to the latest botnet attacks such as bot clouds, mobile botnets, and social bots.

References 1. Kurt A, Erdin E, Cebe M, Akkaya K, Uluagac AS (2020) LNBot: a covert hybrid botnet on bitcoin lightning network for fun and profit. In: Computer security–ESORICS. Springer, Berlin, Germany 2. Shin S, Xu L, Hong S, Gu G (2016) Enhancing network security through software defined network (SDN). IEEE

100

S. Padhiar et al.

3. Ali I, Ahmed AIA, Almogren A et al (2020) Systematic literature review on IoT-based botnet attack. IEEE Access 8:212220–212232 4. Almomani (2018) Fast-flux hunter: a system for filtering online fast-flux botnet. Neural Comput Appl 29(7):483–493 5. Al-Nawasrah A, Al-Momani A, Meziane F, Alauthman M (2018) Fast flux botnet detection framework using adaptive dynamic evolving spiking neural network algorithm. In: 2018 9th international conference on information and communication systems (ICICS). IEEE, pp 7–11 6. Sandip Sonawane M (2018) A survey of botnet and botnet detection methods. Int J Eng Res Technol (IJERT) 7(12) 7. Karim A, Salleh RB, Shiraz M, Shah SAA, Awan I, Anuar NB (2014) Botnet detection techniques: review, future trends and issues. J Zhejiang Univ Sci C 15(11):943–983 8. Liu CY, Peng CH, Lin IC (2014) A survey of Botnet architecture and botnet detection techniques. Int J Netw Secur 16(2):81–89 9. Kaur N, Singh M (2016) Botnet and botnet detection techniques in cyber realm. In: 2016 international conference on inventive computation technologies (ICICT) 10. Wu D, Fang B, Cui X, Liu Q (2018) Bot catcher: botnet detection system based on deep learning. J Commun 39(8):18–28 11. Grill M, Rehák M (2014) Malware detection using http user-agent discrepancy identification. In: 2014 IEEE international workshop on information forensics and security (WIFS), IEEE, pp 221–226 12. Zha Z, Wang A, Guo Y, Montgomery D, Chen S (2019) BotSifter: an SDN-based online bot detection framework in data centers. In: Proceedings of the 2019 IEEE conference on communications and network security (CNS), Washington DC, DC, USA, Nov 2019, pp 142– 150 13. Xiong Z (2019) Research on botnet traffic detection methods for fast-flux and domain-flux. University of Electronic Science and Technology, Chengdu, China 14. Kirubavathi G, Anitha R (2018) Structural analysis and detection of Android botnets using machine learning techniques. Int J Inf Secure 17(2):153–167 15. Ghosh T, El-Sheikh E, Jammal W (2019) A multi-stage detection technique for DNS-tunneled botnets. Can Art Ther Assoc 58:137–143 16. Khan RU, Kumar R, Alazab M, Zhang X (2019) A hybrid technique to detect botnets, based on P2P traffic similarity. In: Proceedings of the 2019 cybersecurity and cyberforensics conference (CCC), Melbourne, Australia, May, 2019, pp 136–142 17. Kempanna M, Jagadeesh Kannan R (2015) A novel traffic reduction technique and ANFIS based botnet detection. In: International conference on circuits, systems, signal and telecommunications, 31 Jan 2015

Chapter 9

Intelligent Call Prioritization Using Speech Emotion Recognition Sanjana Addagarla, Ravi Agrawal, Deep Dodhiwala, Nikahat Mulla, and Kaisar Katchi

1 Introduction Humans are intelligent beings, and one of the most important aspects that differentiate this species from the rest is our ability to communicate using a well-defined language with each other. This language consists of verbal communication and non-verbal cues that make us receptive to the speaker’s emotional state of affairs. The past decade has invited much research into analyzing and improving the emotional capability of the human–machine interaction (HMI). This research has branched to the different modalities of human communication, including physical analysis (by examining face-tracking, facial expression recognition, eyesighttracking), speech analysis (by acoustic interpretations, voice tone quality), and physiological analysis using sensors to monitor heart rate, EEG signal and more. Customer Relationship Management (CRM) is a critical component of any business’s growth and sustainability. Inbound call centers receive calls from customers to resolve complaints, answer queries or accept all feedback. The Customer Service Representatives (CSR) play a significant role in creating a fluid, seamless and positive customer experience. CSRs performance is crucial for maintaining customer satisfaction and retention [1]. This process is also reliant on the compatibility of the customer and the assigned agent’s capability to address the call subject. When all S. Addagarla (B) · R. Agrawal · D. Dodhiwala · N. Mulla Department of Information Technology, Sardar Patel Institute of Technology, Mumbai, Maharashtra 400053, India N. Mulla e-mail: [email protected] K. Katchi Department of Applied Sciences and Humanities, Sardar Patel Institute of Technology, Mumbai, Maharashtra 400053, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_9

101

102

S. Addagarla et al.

agents are busy, the callers are placed in a waiting queue and are serviced on a firstcome-first-serve basis. So, in theory, a caller whose call needs immediate servicing will be placed in a wait queue and would have to wait until all the customers in front of them in the queue are serviced. Then the caller is matched with the next available agent. This process does not guarantee that the agent is a good match for the customer. Additionally, this bitters their experience due to all the wait time. Call centers also review calls for performance purposes after the call has taken place to improve customer experience. But there is no analysis done before the call when the caller is added to the waiting queue. This paper focuses on leveraging speech emotion recognition and textual analysis to identify the caller’s emotional state and match them with the right agent while reducing and redistributing their waiting time to service the call as quickly as possible. This helps in the smart utilization of the workforce and the available data. Ekman [2] states that human emotions can be broken down into six primary emotions of anger, fear, surprise, sadness, happiness, and disgust. Using these principal emotions, the caller in the waiting queue is requested to record a short clip detailing their reason for the call on which speech and textual analysis are performed. Priority is assigned to the callers based on the predicted emotion and the calls in the wait queue are reordered accordingly. The identified emotion is used to match the caller with the appropriate emotionally skilled agent using our Emotion-Based Routing (EBR) to address this call. The underlying assumptions made in the paper are that some callers in the waiting queue are of a higher priority than others according to the parameters determined by the business use case. For a customer service call center, anger has been recognized as the most critical emotion for businesses [3]. It is in the business’s interest to service angry callers as soon as possible to have a higher customer retention rate and lower call abandonment rate. It is also assumed that the emotion of the caller is proportionately reflected in the voice and the lexicon of the caller. The paper is divided into the following sections: Sect. 2 refers to the related works of speech emotion recognition and textual analysis, and Sect. 3 covers the proposed methodology for the prioritization of calls in the waiting queue of the call center. The experimental results and observations are detailed in Sect. 4. Section 5 summarizes the conclusion and the future work possible for the paper.

2 Literature Survey Petrushin and Valery [3] proposed an Emotion Recognition software for call centers to process telephone quality voice messages which can be used for prioritizing voice messages. They concluded that speech emotion recognition employing neural networks proved beneficial for fostering and enhancing CRM systems. The features most considered for speech emotion recognition are Mel-frequency cepstral coefficients (MFCC), prosody and voice quality features such as pitch, intensity, duration harmonics, Perceptual Linear Predictive Cepstrum (PLPC), and more. Using the

9 Intelligent Call Prioritization Using Speech Emotion Recognition

103

combination of the different perceptual features of MFCC, PLP, MFPLPC, LPC, and statistics gives the best result for a simple DNN model trained on the Berlin Database [4] as found in [5]. Vidrascu et al. [6] identified the 25 best features to be selected from 129, i.e., four from Microprosody, four F0 related, four Formants, five from Energy, four from Duration from phonetic alignment, and six other cues from Transcripts. Their research suggests that sadness is the hardest emotion to recognize without mixing cues from the transcripts. Research on call redistribution based on emotion [7] indicates that a multilayer perceptron used for speech emotion recognition does better when the neural network has more neurons in the hidden layer, but it is more computationally expensive. Machine learning models such as Linear Bayes Classifier are comparatively faster and also have higher accuracy. The calls are reordered in descending order by placing angry and fearful emotion callers first. This increased the waiting time of joy and neutral emotion callers significantly. Banerjee and Poorna [8] considered the four parameters of pitch, SPL, timbre, and time gaps to perform their analysis using MATLAB and Wavepad considering only three emotional states normal, angry, and panicked, and observed that the emotion of the speaker also affects the pitch and the number of pauses the speaker takes. Khalil et al. [9] reviewed the background behind emotion detection in speech and recognition using different techniques. They classified the datasets into three kinds: Simulated Databases, created by experienced performers and actors; Induced Database (taken without the knowledge of the speaker); Natural Database: recorded from call center conversations, conversations in the public domain, etc. They observed the best accuracy for the three emotion classes of Anger (99%), Happy (99%), and Sad (96%) using a Deep Convolutional Neural Network. They also noted that the Deep Learning model had an advantage since they tend to learn quickly and efficiently provide representation, but its disadvantage is that RvNN is usually used for natural language processing and not speech emotion recognition though it can handle the different modalities. In [10] Heracleous et al. focused on speech emotion recognition in multiple languages by drawing on lifelike emotional speeches from films in three languages. Language-specific features are combined with features specific to certain emotions and are classified using an extremely randomized trees classifier (ERT). This classifier performed well to give an unweighted average recall of 73.3%. Cho et al. [11] proposed combining acoustic information and text transcriptions from the audio for improved speech emotion recognition. With an LSTM network for acoustic analysis and CNN for textual analysis trained on the IEMOCAP [12] dataset, they observed that the combination of both text and speech models outperformed the text-trained and speech-trained models and proved that the textual information extracted from the audio was complementary. Jia et al. [13] experimented with Long Short-Term Memory (LSTM) and a Latent Dirichlet Allocation (LDA) to extract the acoustic and textual features, respectively. They worked with Soguo Voice Assistant to get a dataset of audio and text and used three indicators namely Descriptive, Temporal, and Geo-social. They also observed that time also plays a crucial role in

104

S. Addagarla et al.

affecting one’s emotion, i.e., people tend to be more joyous early at night and dull before dawn, correlating this with one’s working hours.

3 Methodology This section details the proposed methodology (see Fig. 1) to predict the emotion and order of the calls in a wait queue for a call center. The input to the system is the audio recordings collected when the caller is placed in the wait queue. The recording would be 15–30 s long. This audio clip goes through the Audio Preprocessing Module, where the features are extracted, and the audio sample is transcribed to text which is passed on to the next module. The Emotion Detection Module consists of two submodules: Speech Emotion Recognition and Textual Analysis. The speech emotion recognition submodule analyses the prosodic and waveform features to predict the emotion. The textual analysis submodule analyses the transcribed text from the audio recording to determine the underlying emotion. The output from both the submodules is then combined to compute the overall emotion from the four basic emotions of anger, sadness, happiness, and neutral emotions. This emotion and other extracted features are inputted into the Call Prioritizer Module, where the calls are reordered according to the prioritized emotion of the application. Section 3.1 describes the Audio Preprocessor Module. Section 3.2 illustrates the Emotion Detection Module’s working, and Sect. 3.3 explains the Call Prioritizer Module algorithm.

3.1 Audio Preprocessor Module Dataset. The datasets used are the Interactive Emotional Dyadic Motion Capture developed by USC (USC-IEMOCAP) [12] and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [14]. IEMOCAP consists of multimodal information, i.e., it has video, audio, and text transcriptions of the same. We have

Fig. 1 System block diagram

9 Intelligent Call Prioritization Using Speech Emotion Recognition

105

used the audio and text from the dataset that contains 10,038 corpus samples. The RAVDESS dataset contains 7356 files. The datasets chosen have audio files of actors enacting emotions as real-world call datasets are not publicly available. The presumption made is that real-world conversation will not differ vastly from the acted-out audio samples. For the textual analysis module, we have used a combination of 3 datasets—DailyDailog [15], ISEAR [16], and Emotion-Stimulus [17]. Preprocessing and Feature Extraction. The audio samples are augmented to prevent overfitting and output a better accuracy. The audio samples are subject to noise injection, waveform shifting, and speed modification. The waveform and spectrogram for a sample audio clip from the dataset having the emotion as “happy” have been visualized in Figs. 2 and 3, respectively. The happy and sad samples are oversampled since they are underrepresented in the dataset. The following features are extracted from the audio sample for prediction: signal mean, signal standard deviation, pitch, root mean square error (RMSE), root mean square deviation (RMSD), duration of silence, harmonic, autocorrelation factors, and MFCCs using the Librosa [18] library. The text from the audio clip is transcribed using the SpeechRecognition python library that runs on Google Cloud Speech API. For textual analysis, the text is first tokenized, followed by the counting of unique tokens. Each token is padded and encoded for categorization. The text is also checked for buzzwords, which are important phrases and words determined by the business application.

Fig. 2 The waveform of an audio sample with “happy” emotion

Fig. 3 Spectrogram of the waveform in Fig. 2

106

S. Addagarla et al.

3.2 Emotion Detection Module Speech Emotion Recognition Submodule. The speech emotion recognition module predicts the emotion based on the extracted features. The following deep learning models and machine learning algorithms were trained on the preprocessed dataset. From Table 1, it is observed that though models such as Multi Naïve Bayes, XGBoost, and Logistic Regression had comparable performance, the Random Forest algorithm performs better than the Deep Learning models and other machine learning algorithms. It is also faster compared to Deep Learning algorithms. Random Forest also proves to be useful in a real-time application due to its nimble and lightweight feature. Deep Learning algorithms also take significantly more time to train as compared to the Machine Learning models. The speech emotion recognition module has the highest accuracy for happiness and anger emotions as these emotions have much more distinguished characteristics auditorily as compared to sadness and neutral emotions. Textual Emotion Analysis Submodule. BERT is one of the most versatile methods for performing NLP as this model can fine-tune the input data specific to the context of the language and the purpose of use. Another advantage that BERT has is that it is pre-trained on a large dataset which improves the accuracy even without finetuning. For predicting the emotion through the text transcripts of the call, we have used Transfer Learning using BERT, using the KTrain library. The model gives an unweighted average accuracy (UAA) of 82%. The accuracies of the BERT model and alternative Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM) models are as follows. From Table 2, it is observed that BERT outperforms the CNN and LSTM model by a considerable margin, which can also be noticed by interpreting the loss graph of the CNN and the LSTM model depicted in Figs. 4 and 5, respectively. Even though training loss decreases significantly, the same is not reflected in the validation loss. These models need more time and epochs to attain similar accuracy as the BERT Table 1 Accuracy results of the models for the speech emotion recognition submodule Classifier

Accuracy Overall

Happiness

Anger

Sadness

Neutral

SVM

0.76

0.92

0.86

0.89

0.86

Multi Naïve Bayes

0.88

0.98

0.94

0.88

0.96

Random forest

0.90

0.95

0.95

0.95

0.96

XGBoost

0.89

0.97

0.93

0.95

0.95

Multilayer perceptron

0.89

0.96

0.95

0.92

0.95

Logistic regression

0.89

0.98

0.94

0.89

0.96

CNN

0.64

0.75

0.84

0.85

0.83

LSTM

0.81

0.89

0.94

0.91

0.89

9 Intelligent Call Prioritization Using Speech Emotion Recognition

107

Table 2 Accuracy results of the models for the textual emotion analysis submodule Classifier

Accuracy Overall

Happiness

Anger

Sadness

Neutral

CNN

0.77

LSTM

0.73

0.90

0.89

0.91

0.91

0.87

0.89

0.89

BERT

0.82

0.91

0.94

0.91

0.93

0.93

model. The CNN and LSTM models are most accurate in identifying “Neutral” while the BERT model has the highest accuracy for “Happiness” and lowest accuracy for “Anger.” Hence, the BERT model is chosen for the textual analysis of the text transcribed from the input audio sample. Fig. 4 Textual emotion analysis—CNN loss graph

Fig. 5 Textual emotion analysis—LSTM loss graph

108

S. Addagarla et al.

3.3 Call Prioritizer Module The predicted emotion from the speech emotion recognition and textual analysis submodules are combined in a weighted average format to output the overall predicted emotion. We propose an algorithm to prioritize the callers in the waiting queue considering the following factors: • Emotion Score: This is the combined prediction score generated by the Emotion Detection Module by analyzing both audio and textual features. This score is then multiplied by the emotion multiplying factor, depending on the detected emotion and business logic. • Buzzword Score: This is set depending on the presence of the “buzzword” in the short recording of the caller. • Loyalty Score: This score gives priority to different client tiers for a multi-tier business. • Call Back Score: This factor considers if a caller got disconnected by an error or is calling back multiple times because the issue is unresolved. In such a case, the caller is given an additional boost in the score to reduce the frustration of being at the bottom of the waiting queue to resolve the same issue again [19]. • Wait Time Score: This factor is added to prevent starvation of the callers and accounts for the time the caller has spent in the waiting queue. Using the above-mentioned factors, a Priority Score is calculated for each caller according to which they are placed in a waiting queue. After this, we use the Agent Emotion Matrix to calculate the Agent Suitability for each caller in the wait queue. The agents are ranked according to their ability to deal with callers having certain emotions. A separate Agent Waiting List is generated for each agent, considering their suitability rank for each caller. As soon as any agent gets free, the caller on top of their specific waiting list is dequeued and assigned to them. Call Priority Score: WT

Wait time, i.e., the time spent in the waiting queue in seconds

CT

Current time in seconds

AT

Arrival time in seconds

S

Priority score, based on which the call is prioritized

ES

Emotion score

EM(e)

Emotion multiplier for emotion ‘e’

L

Loyalty score

CB

Call back score

CBF

Call back factor

BS

Buzzword score

S(a)

Suitability score for agent ‘a’, based on which the caller is placed in the waiting queue of agent ‘a’ (continued)

9 Intelligent Call Prioritization Using Speech Emotion Recognition

109

(continued) R(a)

Suitability rank of agent ‘a’ for the caller

C

Number of calls in the waiting queue

N

Number of agents

A ≡ aij Agent emotion matrix of size C * n

WT = CT − AT

(1)

S = ES ∗ EM(e) + L + CB ∗ CBF + BS + WT/60

(2)

S(a) = S ∗ (1 − R(a)/n)

(3)

ai j = {e, when agent i supports emotion e at priority level j 0, otherwise

(4)

where i ranges from 1 to C, 1 < k < n, 1 < j < n When a new caller joins the waiting queue, the Agent Suitability is calculated using the Agent Emotion Matrix A (4). The caller is placed in the agent waiting queues based on the Suitability Score for each agent (3), which is in turn dependent on (1) and (2). As soon as an agent becomes free, the callers on top of the waiting queue of that agent are assigned to that agent, removed from all waiting queues and Agent Suitability is recalculated for the remaining callers in the waiting queue (see Fig. 6).

4 Results and Observations Table 3 shows the comparison between the proposed combined text and speech model, the only speech model, and the only text model. The proposed model surpasses the alternative models and has the highest accuracy for the “anger” emotion. Adding the semantic aspect of communication to the prosodic features enhances the recognition rate. Humans also analyze telephonic conversations on these 2 parameters and enabling the model to integrate both perspectives gives it the necessary edge to succeed. The proposed model not only has better overall accuracy but on average recognizes each emotion better than the other 2 models. This validates that both speech emotion recognition and textual emotion analysis are highly complementary. To simulate a call center using the proposed algorithm versus the first-come-firstserve algorithm traditionally used in call centers, a list of callers with the attributes required by the call prioritization module is passed through the algorithm. The application in Fig. 7 is the simulation of the calls arriving at the call center and being

110

S. Addagarla et al.

Fig. 6 Flowchart for call prioritization

Table 3 Accuracy comparison of three models Model

Accuracy Overall

Happiness

Anger

Sadness

Neutral

Only speech

0.90

0.95

0.95

0.95

0.96

Only text

0.82

0.94

0.91

0.93

0.93

Speech + text

0.92

0.95

0.97

0.96

0.96

processed by the proposed methodology. The upper panel indicates whether a certain agent is free or busy and if busy which caller is the agent currently dealing with. The panel on the left shows the overall waiting queue and the individual waiting queues of each agent. The panel on right simulates ten calls, there are three buttons for each call, one for starting the call, one for ending it, and one for listening to the audio file. Using the call and end call button for each caller, we can simulate the running of the call center as seen in the figure above. When the call arrives, the agent availability is checked and if all agents are busy, the recorded audio clip is pushed to run through the model to extract the emotion from the Emotion Detection module. After detecting the emotion, the calls are dynamically reordered in the queues according to the calculated scores. The findings as shown in Table 4 depict the Average Waiting time before and after using the application. The Average Patience time is also listed in the table, which

9 Intelligent Call Prioritization Using Speech Emotion Recognition

111

Fig. 7 Simulation of call center using proposed methodology

is the average amount of time in seconds a caller is willing to stay in the waiting queue. It is observed that the waiting time of the emotions, “Anger” and “Sadness” have drastically reduced, while though that of the others have increased, they are still less than the average patience time of the caller. During the simulation of fifteen callers calling in a brief time period, two angry callers hung up, while waiting to be assigned to an agent using first-come-first-serve, while none hung up using the proposed algorithm.

112

S. Addagarla et al.

Table 4 Comparing the waiting time difference before and after applying the algorithm Emotion Average waiting time using Average waiting time using Average patience time (in the first-come-first-serve emotion-based routing (in s) algorithm (in s) s) Sad

45

30

390

Neutral

90

210

435

Angry

300

140

220

Happy

60

90

555

5 Conclusion This research has identified the gap and combated the issue that is faced in call centers daily. The proposed solution analyses the emotional state of the caller before the call has taken place when the caller is placed in a waiting queue when all the agents are busy. This helps to prioritize the callers according to the use case of the application. In customer service centers, anger could be the prioritized emotion, whereas in national emergency helplines, fear could be the prioritized emotion. This gives the research flexibility to adapt to any application domain as it considers both the emotional aspects of speech as well as the lexicon and linguistic aspects of the call. Call centers usually emphasize reviewing the call post its occurrence, but by performing pre-call analysis, the waiting time for the prioritized emotion callers is cut down significantly and is smartly redistributed among other callers. All the callers are not placed in a singular waiting queue but are placed in each agent’s waiting queue according to their compatibility and the agent’s emotional capability to address the call. This ensures that the caller avails of the best possible service for their call. This creates the best outcome for both the caller and the call center. By accounting for additional parameters such as loyalty, buzzword, and callback scores, the model increases its capability to make a well-informed decision. The experimental results, though on a small scale, look promising and viable to work on a larger scale. For future research, the model could be trained on real-world data from call centers to assess its efficacy on realistic data. The model would also benefit from post-call analysis and agent skill-based routing in addition to the proposed research to heighten its emotion recognition rate and reduce waiting time even further. More parameters could be added to the priority score to make it more adaptable and realistic.

References 1. Kumar M, Misra M (2020) Evaluating the effects of CRM practices on organizational learning, its antecedents and level of customer satisfaction. J Bus Ind Market 2. Ekman P (1999) Basic emotions. Handbook Cogn Emotion 98(45–60):16 3. Petrushin V (2000) Emotion in speech: recognition and application to call centers. In: Proceedings of artificial neural networks in engineering

9 Intelligent Call Prioritization Using Speech Emotion Recognition

113

4. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005, Sept) A database of German emotional speech. Interspeech 5:1517–1520 5. Lalitha S, Tripathi S, Gupta D (2019) Enhanced speech emotion detection using deep neural networks. Int J Speech Technol 22(3):497–510 6. Vidrascu L, Devillers L (2007, Aug) Five emotion classes detection in real-world call center data: the use of various types of paralinguistic features. In: Proceedings of international workshop on paralinguistic speech between models and data, ParaLing 7. Bojani´c M, Deli´c V, Karpov A (2020) Call redistribution for a call center based on speech emotion recognition. Appl Sci 10(13):4653 8. Dasgupta PB (2017). Detection and analysis of human emotions through voice and speech pattern processing. arXiv:1710.10198 9. Khalil RA, Jones E, Babar MI, Jan T, Zafar MH, Alhussain T (2019) Speech emotion recognition using deep learning techniques: a review. IEEE Access 7:117327–117345 10. Heracleous P, Mohammad Y, Yoneyama A (2020, July) Integrating language and emotion features for multilingual speech emotion recognition. In: International conference on humancomputer interaction. Springer, Cham, pp 187–196 11. Cho J, Pappagari R, Kulkarni P, Villalba J, Carmiel Y, Dehak N (2018) deep neural networks for emotion recognition combining audio and transcripts. INTERSPEECH 12. Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335–359 13. Jia J, Zhou S, Yin Y, Wu B, Chen W, Meng F, Wang Y (2018) Inferring emotions from large-scale internet voice data. IEEE Trans Multimedia 21(7):1853–1866 14. Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5):e0196391 15. Li Y, Su H, Shen X, Li W, Cao Z, Niu S (2017) DailyDialog: a manually labelled multi-turn dialogue dataset. IJCNLP 16. Geneva U, Wallbott HG (1994) Evidence for universality and cultural variation of differential emotion response patterning: correction. J Pers Soc Psychol 67(1):55–55 17. Ghazi D, Inkpen D, Szpakowicz S (2015, April) Detecting emotion stimuli in emotionbearing sentences. In: International conference on intelligent text processing and computational linguistics. Springer, Cham, pp 152–165 18. McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015, July) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, vol 8, pp 18–25 19. Hu K, Allon G, Bassamboo A (2022) Understanding customer retrials in call centers: preferences for service quality and service speed. Manuf Serv Oper Manage 24(2):1002–1020

Chapter 10

The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection Marko Djuric, Luka Jovanovic, Miodrag Zivkovic, Nebojsa Bacanin, Milos Antonijevic, and Marko Sarac

1 Introduction During the global pandemic of COVID-19, the world has witnessed a big spike in e-commerce and online transactions. Most economies in the world have relied on e-commerce to alleviate the pressure that quarantine has brought [2]. According to UNCTAD the 2019 e-commerce transactions globally increased to more than $26 trillion during that year and were equal to almost one-third of the gross domestic product (GDP) worldwide. Nilson company report states that by the year 2030 global volume of credit card transactions is expected to reach 74.14 trillion dollars, and the industry is projected to lose close to 49.32 billion dollars to fraud. Moreover, card fraud in the next ten years will amount to around $408.50 billion in losses cumulatively. Credit card frauds are a type of identity theft involving obtaining a victim’s credit card private data without authorization with a malicious goal to charge payments to the card and/or withdraw the money from it. Looking at the growth of M. Djuric · L. Jovanovic · M. Zivkovic · N. Bacanin (B) · M. Antonijevic · M. Sarac Singidunum University, Danijelova 32, 11000 Belgrade, Serbia e-mail: [email protected] M. Djuric e-mail: [email protected] L. Jovanovic e-mail: [email protected] M. Zivkovic e-mail: [email protected] M. Antonijevic e-mail: [email protected] M. Sarac e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_10

115

116

M. Djuric et al.

credit card usage in the coming years, it is necessary to implement a precise system to detect fraudulent activity and protect everyone using them. In this research, the authors utilized machine learning (ML) methods in order to detect fraudulent activity in credit card transactions that were assessed on a realworld dataset formed during September 2013 from different transactions by credit card users across Europe. Unfortunately, the dataset is exceedingly imbalanced. To mitigate this issue the authors used a social network search algorithm (SNS) [38]. Machine learning algorithms utilized by the authors in their research include decision tree (DT), random forest (RF), extra tree (ET), support vector machine (SVM), extreme gradient boosting (XGBoost), and logistic regression (LR). The aforementioned ML algorithms were compared separately in order to measure the quality of convincingness and classification. Also, Adaptive Boosting (AdaBoost) technique has been used with every model in order to make them more robust. The focal point of this research is a parallel comparison of multiple machine learning algorithms on a dataset that is accessible to the public, that consists of real-world card payments. Furthermore, this paper examines AdaBoost to boost the condition of observed classifiers on an exceedingly disproportional credit card dataset. Finally, the paper introduces the possibility of applying the novel SNS algorithm to control the highly imbalanced dataset. SNS metaheuristics were used to optimize the hyperparameters of AdaBoost. The authors correlated their suggested solution to the already implemented solution and all other ML algorithms in [28]. The data were also compared with Matthews correlation coefficient (MCC), the area under the curve (AUC), precision, accuracy, and recall. Some of the contributions of the research given in this manuscript include the following: – A scalable framework that is designed to detect fraudulent credit card transactions. – Social network search algorithm is implemented for the purpose of solving the problem of imbalanced classes within the employed dataset. It was used to optimize the hyperparameters of AdaBoost model. – AdaBoost was blended with the SNS metaheuristics with the purpose to raise the performance of the proposed framework. Furthermore, a comparative analysis was conducted with the following metrics in mind: accuracy, precision, recall, AUC, and MCC. – This framework was implemented on a dataset that is extremely imbalanced in order to confirm its effectiveness The structure of the rest of this manuscript is defined in the following way: Section 2 brings a brief literature survey of recent publications that employed machine learning algorithms for credit card fraud identification. Section 3 describes the employed SNS metaheuristics that were later utilized in experiments to enhance the performances of the AdaBoost even further. Section 4 presents the experimental configuration, the implementation of the suggested framework on a synthetic credit card fraud dataset, and finally, presents the obtained experimental findings together with comparative analysis. Section 5 concludes the research.

10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection

117

2 Literature Review and Background In their work, Tanouz et al. [40] have given us a structure on how to utilize machine learning methods for detecting credit card frauds. The authors utilized the European cardholders dataset for determining the performance of chosen methods. Furthermore, the authors employed an under-sampling approach in order to deal with the imbalance that occurs in a used dataset. Random forest and logistic regression were examined in this paper, and accuracy was considered to be the main performance measurement. Results have shown that random forest managed to detect fraud with 91.24% efficiency. On the other hand, logistic regression managed to achieve a 95.16% accuracy. Additionally, the authors managed to measure the confusion matrix in order to affirm if their chosen methods performed adequately in regards to negative and positive classes. Randhawa et al. [35] have suggested a method that pairs AdaBoost with a majority voting in order to detect credit card frauds. In their paper, the authors utilized the dataset produced by cardholders in Europe. Furthermore, the adaptive boosting method (AdaBoost) was paired with different machine learning methods like support vector machines. Matthews’s correlation coefficient and accuracy were chosen to be the main measures of performance. AdaBoost-SVM had a MCC of 0.044, and 99.959% accuracy. Rajora et al. [34] organized a parallel investigation of machine learning algorithms to determine fraudulent activity in credit card transactions utilizing the dataset that was generated from European cardholders. Notable methods that were considered include random forest (RF) and K-nearest neighbors (KNN). The area under the curve and accuracy were considered to be the main performance measures. Results that were obtained show that the random forest algorithm accomplished an accuracy of 94.9% and AUC of 0.94. On the other side, K-nearest neighbors accuracy was 93.2% and AUC of 0.93. The authors decided not to explore the imbalance that occurs in the used dataset. Dataset: The dataset utilized for the purpose of this paper was produced by collecting European cardholder’s transactions in September 2013. It is openly available on Kaggle, although it is greatly skewed. Furthermore, it is not synthetic as the transactions contained in it happened over a longer time period. Also, it has 284,807 transactions out of which 99.828% are valid and 0.172% are false. Furthermore, 30 attributes are included, alongside time and amount.

2.1 Adaptive Boosting Algorithm AdaBoost is employed to enhance the performances of observed machine learning methods. It produces the best results when used in combination with weak learners, models that attain accuracy just above random chance. A combination of various elementary or less precise model boosting techniques generally helps machine learning

118

M. Djuric et al.

solutions to develop and achieve higher accuracy. Adaptive boosting algorithm [26] has been utilized in this research in order to help with performance classification. AdaBoost generates weighted sums using a mix of individually boosted outputs. The mathematical formula for the algorithm is described below:

G N (x) =

N 

gt (x)

(1)

t=1

where the weak learner results in a likely input vector x the prediction are depicted as gt , whilst iteration is presented as t. Prediction representing weak learners in each training sample is h(xn ). In every iteration a coefficient βt multiplies a picked weak learner to measure the error of the training process L, which is conveyed in the next equation: Lt =



L[G t−1 xn + βt h(xn )]

(2)

n

where G t−1 serves as a boosted classifier taken from the iteration before t − 1, whilst the weak classifier that is considered to be a part of the conclusive model is βt h(xn ).

2.2 Metaphor Based Metaheuristics The bulk of metaheuristics is established on the basis of biological evolution. Special attention is given to stimulating different biological metaphors which diverge in character when it comes to the schemes used for depiction. Three major paradigms have established themselves: immune systems, swarm, and evolutionary [1]. Artificial immune systems (AIS) have found their ingenuity in theoretical immunology, and noticeable immune functions, with all their rules and methods. A case can be made that they are an algorithmic alternative to Evolutionary Algorithms. When used for optimization, antibodies serve as candidate solutions, which iteratively grow by reproducing certain operators such as cloning, mutation, and selection. Antigens serve as an objective function, and memory cells are used to store satisfying solutions. Nearly every AIS-based metaheuristic relies on the basis of clonal selection such as the B-Cell algorithm, clonal selection algorithm [23], and artificial immune network [22] that is used for optimization (opt-AINET) [19], and negative selection algorithms. Swarm intelligence (SI) finds its inspiration in the collective behavior of an animal community, such as bees or ants. Swarm intelligence mostly relies on the decentralization principle where all the individuals are transformed by interacting with other solutions and the environment. Some of the more established algorithms include

10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection

119

particle swarm optimization (PSO) [29], harris hawk optimizer (HHO) [27], bat algorithm (BA) [43], ant colony optimization (ACO) [24], moth-flame optimization (MFO) [31], firefly algorithm (FA) [42], grey wolf optimizer (GWO) [32] and many others. Evolutionary algorithms (EA) mimic the biological advancement on a cellular level utilizing various parameters such as mutation, crossover, selection, and reproduction to achieve the best possible candidate solutions (chromosomes). For evolutionary calculation, four paradigms are notable: genetic programming, evolutionary strategies, genetic algorithms, and evolutionary programming. It appears that the three concepts are interdependent due to the connection of the operators that were used, specifically selection and mutation operators. Thus a case could be made that AIS and SI are a subcategory of EA. That being said, what separates these three approaches is the method by which they handle exploration and exploitation. In the domain of informatics, computing and information technologies, natureinspired metaheuristics techniques had been extensively used in addressing NP-hard tasks, including MRI classifier optimization [15, 17], forecasting the rate of COVID19 infections [44, 46], artificial neural network feature selection optimization and hyperparameters tuning [4, 7–10, 12, 14, 16, 21, 25, 37, 49], cloud-based task scheduling optimization [6, 13, 18, 48], network lifetime optimization in the domain of wireless sensor networks [5, 11, 45, 47, 50], and so on.

3 Social Network Search Algorithm Social Network Search (SNS) algorithm [39] draws its inspiration from social networks. Human beings are a very social species, and social networks are mechanisms that were created for the purpose of connecting people. SNS algorithm replicates the interaction between social network users in order to achieve more popularity. The base of the SNS algorithm is the interaction between users across different social networks. They can alter and influence each other’s opinions, all for the purpose of increasing their popularity. There are different kinds of social networks, but their users exhibit similar behavior. They familiarise themselves with other users’ views and accept them if they are better. Decision moods and mathematical model: The viewpoint of the users can be modified by alternative views in various moods including Imitation, Conversation, Disputation, and Innovation. Virtually every metaheuristic algorithm applies a set of operations to develop new solutions. SNS algorithm achieves new solutions by using one of the four moods that mimic real-world social behavior. Imitation: The predominant quality of social media is that people have the option to follow one another, and when someone shares something, friends and followers of said person can be informed about it. If some new happening presents a challenge, users will aim to post a topic about it. The formula for imitation is presented in Eq. (3)

120

M. Djuric et al.

X i new = X j + rand(−1, 1) × R R = rand(0, 1) × r

(3)

r = X j − Xi where X j is used to describe the vector of the j-th user’s opinion, which is picked at random, and i = j, is the vector of i-th user’s view. Rand(−1, 1), and rand(0, 1) are arbitrary vectors in intervals [−1, 1] and [0, 1]. The shock radius R represents the amount of influence of the j-th user, and its magnitude is considered as a multiple of r . Conversation: Represents a category where people find out more information while communicating amongst themselves and grow their information about different occurrences via a private chat. Users tend to find a different perspective through conversation and can draw new conclusions. The mathematical formulation is presented in Eq. (4) X i new = X k + R R = rand(0, 1) × D

(4)

D = sign( f i − f j ) × (X j − X i ) where X k represents the vector of the subject of conversation and is chosen at random. R is the result that the conversation is having, and is established on the difference in opinions and represents the change in their perspective (X k). D is the difference of views that the users have, rand(0, 1) is an arbitrary vector in the interval [0, 1], X j represents a vector of a random user’s view for a chat and X i is the vector that represents the view that the i-th user has, and it is important to remember that i = j = k. Sign ( f i − f j ) is used to determine in which direction X k will move by comparing f i and f j . Disputation: Describes a state in which the users explain and defend certain views on a subject to other users. In addition, users can create groups in order to discuss certain subjects. Thus are users influenced by seeing different opinions. Here an arbitrary number of social network users are considered to be commenters or members of a group, and new views can be calculated according to Eq. (5) X i new = X i + rand(0, 1) × (M − AF × X i )  Nr Xt M= t Nr AF = 1 + round(rand)

(5)

where X i is the vector used to represent the view of i-th user, rand(0, 1) is an arbitrary vector in the interval [0, 1], M is mean of views of commenters. AF is the admission factor, which serves as an indication of insistence that the users have of their opinion when discussing it with other people and is represented by an integer between 1

10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection

121

and 2. Round() is a function that rounds the input to the nearest integer, and rand is a random number [0, 1]. Nr represents the number of people that comment or group size. It is a number between 1 and N-user. N-user represents the number of users of the network. Innovation: Occasionally shared content is the result of users’ opinions and experiences. For example when a person contemplates particular issues and possibly views said issues in a different light and is in a situation where it is capable of understanding the nature of the problem with more accuracy or discovering an entirely new perspective. A distinct subject may have particular features, and by changing the perception about some of those features, the broad perception of them changes. The mathematical formula for innovation is described in Eq. (6) X id new = t × x dj + (1 − t) × n dnew n dnew = lbd + rand1 × (ubd − lbd )

(6)

t = rand2 where X i new = [x1 , x2 , x3 , ..., xid new , ..., xD ]  Xi =

X i , f (X i ) < f (X i new ) X i new , f (X i new ≥ f (X i ))

(7)

(8)

The pseudocode for the SNS algorithm can be seen in Algorithm 1.

4 Experimental Setup In the suggested setup, the SNS algorithm was utilized to optimize the AdaBoost hyperparameters. The following hyperparameters underwent the optimization procedure: – n_estimators, within the boundaries [60, 200] that were empirically determined. – base_estimator, binary-coded with three possible values, 0 denoting DT, 1 representing LR, and 2 being SVM. – learning_rate, within the range [0.5, 2]. Each solution within the population is encoded as a vector of 3 parameters, where the first parameter is an integer value in the range [60, 200], the second parameter is also an integer in the range [0, 2], while the third parameter is continuous in the

122

M. Djuric et al.

Algorithm 1 Psuedocode for tne SNS algorithm Set number of user, MaxIter, LB, UB, Iter = 0 Initialize population according to X 0 = LB + rand(0, 1) × (UB − LB) Evaluate each user according to objective function i =0 Do If (i < N ) Iter = Iter + 1 i =0 i =i +1 Mood = rand(1, 4) If (Mood == 1) Create new views based on Eq. (3) ElseIf (Mood == 2): Create new views based on Eq. (4) ElseIf (Mood == 3) Create new views based on Eq. (5) ElseIf (Mood == 4) Create new views based on Eq. (3) Limit new views According to Eq. (8) Evaluate new view based on the objective function If (New view better then current view) Keep old view, don’t share new view Else Replace old view with new view and share it While Termination criteria not met Return Optimal Solution

range [0.5, 2]. Therefore, this particular optimization problem is a combined integercontinuous NP-hard task. The algorithm was tested with 20 units in the population, with 100 iterations. The best-obtained outcomes have been reported in the resulting tables. To provide good insight into the results, the FA algorithm has also been applied to tune the same AdaBoost hyperparameters. FA was tested under the same conditions as the SNS algorithm. The simulations were conducted in two stages, as suggested by the experiments published by [28]. It must be noted here that the authors have separately realized all machine learning methods for the purpose of this research, and used their own results in the later comparative analysis. The obtained results of ML methods that don’t employ the AdaBoost are presented in Table 1, while the scores of these methods with AdaBoost are given in Table 2. For the experiments with AdaBoost, another version of AdaBoost that had hyperparameters optimized by the FA algorithm has been implemented and tested under the same conditions, to enable a more detailed comparison. The results of the basic AdaBoost have also been provided. Since the dataset is highly skewed, the synthetic minority over-sampling technique (SMOTE) has been utilized to address the heavily imbalanced data. The KNN model has been used together with SMOTE method to create the peculiar class entries by connecting the data points to k-nearest neighbors. This approach generates fictional

10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection Table 1 Simulation outcomes without AdaBoost employed Method ACC (%) RC (%) PR (%) DT RF ET XGB LR

99.91 99.95 99.93 99.90 99.92

75.60 79.34 78.22 59.43 56.61

Table 2 Simulation outcomes with AdaBoost employed Method ACC (%) RC (%) DT-AdaBoost RF-AdaBoost ET-AdaBoost XGB-AdaBoost LR-AdaBoost AdaBoost FA-AdaBoost SNS-AdaBoost

99.69 99.94 99.97 99.98 98.76 99.94 99.97 99.99

99.02 99.80 99.95 99.96 93.84 99.93 99.97 99.98

123

MCC (%)

79.85 97.22 96.28 83.98 85.23

0.80 0.86 0.88 0.73 0.61

PR (%)

MCC (%)

98.81 99.92 99.91 99.94 97.58 99.91 99.95 99.96

0.98 0.99 0.99 0.99 0.95 0.99 0.99 0.99

synthetic data, that is not observed as a direct copy of the class sample being the minority. This approach is used to control the overfitting problem. The actual implementation of this method is used exactly as proposed in [28], where the pseudo-code can also be found. Finally, the identical library (Imblearn) was used for the credit card dataset as suggested in [28], with the goal to establish proper grounds for the comparisons between the approaches. Entries in the utilized dataset are labeled 0 or 1 depending on whether or not the data is legitimate or fraudulent. Therefore, this task belongs to the binary classification problems. In these circumstances, the main metrics that are utilized to validate the performances of the observed models are the precision (PR), accuracy (AC), and recall (RC), calculated according to the following mathematical expressions:

AC =

TN + TP TP + TN + FN + FP

(9)

PR =

TP TP + FP

(10)

RC =

TP TP + FN

(11)

124

M. Djuric et al.

where the TN and TP mark the true negatives and positives, while the FN and FP denote the false negatives and positives, in that order. True values indicate the situations where the model successfully estimated the negative/positive outcome. On the other hand, the false values describe the situations where the model made a mistake in predicting a negative/positive outcome. Since the utilized European cardholders dataset is in extreme disproportion, the performances of the observed models must be evaluated with additional metrics. Those metrics include the confusion matrix (CM), the area under the curve (AUC) [33], and the MCC [20]. This paper measures the quality of the classification by utilizing the MCC metrics, with an allowed value range of [−1, 1]. Since the objective quality stands in proportion to the MCC metric, the higher the values of the MCC indicate better classification performance. A confusion matrix is utilized to highlight the errors made by the observed classifier [30], while the AUC measure represents both the quality and reliability of the models, determining the effectiveness of the observed classifier. The values of the AUC also fall in the range [−1, 1], where a higher value indicates a more optimal solution [33]. (TN × TP) − (FN × FP) MCC = √ (TP + FP)(TP + FN)(TN + FP)(TN + FN)

(12)

The results depicted in Table 3 present the comparison of the suggested SNS technique against other well-known machine learning models. Additional details on other competitor techniques observed through the comparative analysis are obtainable by [28]. The proposed model is significantly superior to the traditional models, with the accuracy that is in some cases more than 5% higher than standard RF and KNN approaches. The models have been evaluated on the synthetic credit card dataset that can be obtained from [3]. The fundamental characteristics of this dataset are shown in Table 4. The experimental findings of the models and their performances on this dataset are presented in Table 5. The SNS-enhanced model achieved the best performance on this dataset, as the results clearly show. Both experiments have shown the challenges that can arise from highly disproportional datasets. High accuracy can be misleading, as the model will be able to correctly classify the valid transactions (that are dominant), however, the minority class can be falsely classified, and consequently, the model will fail to detect some of the malicious transactions. The proposed SNS model has shown very promising results in this domain, however, further extensive testing with additional real-life datasets is required before putting it into the use in practice.

10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection Table 3 Existing standard ML models comparison Author Model Rajora and others [34] Rajora and others [34] Trivedi and others [41] Tanouz and others [40] Tanouz and others [40] Riffi and others [36] Riffi and others [36] Suggested model Suggested model Suggested model Suggested model Suggested model Suggested model Suggested model

AC (%)

RF KNN RF RF LR MLP ELM RF-AdaBoost DT-AdaBoost ET-AdaBoost XGB-AdaBoost AdaBoost FA-AdaBoost SNS-AdaBoost

94.90 93.20 94.00 91.24 95.16 97.84 95.46 99.94 99.69 99.97 99.98 99.94 99.97 99.99

Table 4 Description of the dataset key features Properties Class User, Card, Year, Month, Day, Time, Amount, Use Chip, Merchant Name, Merchant City, Merchant State, Zip, MCC, Errors

Is fraud Not fraud

Table 5 Synthetic dataset AdaBoost experimental findings Method ACC (%) RC (%) PR (%) DT-AdaBoost RF-AdaBoost ET-AdaBoost XGB-AdaBoost LR-AdaBoost AdaBoost FA-AdaBoost SNS-AdaBoost

99.68 99.94 99.98 99.98 100.0 99.95 99.99 100.0

98.98 99.85 99.99 99.99 98.90 99.91 99.99 100

98.80 99.94 99.92 99.92 78.84 99.88 99.93 99.95

MCC (%) 0.98 0.99 0.99 0.99 0.17 0.98 0.98 0.99

125

126

M. Djuric et al.

5 Conclusion This research focused on metaphor-based metaheuristic algorithms specified for detection in credit card transactions. The algorithms covered in this paper included decision tree (DT), random forest (RF), logistic regression (LR), extra tree (ET), support vector machine (SVM), and extreme gradient boosting (XGBoost). Additionally, every algorithm was tested with the AdaBoost method in order to improve classification accuracy. This paper proposes an SNS approach in order to maximize these performances by optimizing the AdaBoost hyperparameters’ values with the SNS algorithm. The results from the conducted experiments clearly support the SNS algorithm as superior, as the SNS-AdaBoost method obtained the best performances on the observed dataset. This established the SNS-AdaBoost as a strong candidate for solving the credit card frauds detection, however, further extensive testing of the model on more datasets is necessary. The possible future work in this field would include modifying the original implementation of the SNS algorithm to further improve its performances and also using it in other application domains where it can address other NP-hard practical problems.

References 1. Abdel-Basset M, Abdel-Fatah L, Sangaiah AK (2018) Chapter 10—metaheuristic algorithms: a comprehensive review. In: Sangaiah AK, Sheng M, Zhang Z (eds) Computational intelligence for multimedia big data on the cloud with engineering applications, pp 185–231. Intelligent Data-Centric Systems, Academic Press. https://www.sciencedirect.com/science/article/ pii/B9780128133149000104 2. Alcedo J, Cavallo A, Dwyer B, Mishra P, Spilimbergo A (2022) E-commerce during covid: stylized facts from 47 economies. Working Paper 29729, National Bureau of Economic Research. http://www.nber.org/papers/w29729 3. Altman ER (2019) Synthesizing credit card transactions. arXiv preprint arXiv:1910.03033 4. Bacanin N, Alhazmi K, Zivkovic M, Venkatachalam K, Bezdan T, Nebhen J (2022) Training multi-layer perceptron with enhanced brain storm optimization metaheuristics. Comp Mater Cont 70(2):4199–4215. http://www.techscience.com/cmc/v70n2/44706 5. Bacanin N, Arnaut U, Zivkovic M, Bezdan T, Rashid TA (2022) Energy efficient clustering in wireless sensor networks by opposition-based initialization bat algorithm. In Computer networks and inventive communication technologies. Springer, pp 1–16 6. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M, Zivkovic M (2019) Task scheduling in cloud computing environment by grey wolf optimizer. In 2019 27th Telecommunications forum (TELFOR). IEEE, pp 1–4 7. Bacanin N, Bezdan T, Venkatachalam K, Zivkovic M, Strumberger I, Abouhawwash M, Ahmed A (2021) Artificial neural networks hidden unit and weight connection optimization by quasirefection-based learning artificial bee colony algorithm. IEEE Access 8. Bacanin N, Bezdan T, Zivkovic M, Chhabra A (2022) Weight optimization in artificial neural network training by improved monarch butterfly algorithm. In Mobile computing and sustainable informatics. Springer, pp 397–409

10 The AdaBoost Approach Tuned by SNS Metaheuristics for Fraud Detection

127

9. Bacanin N, Petrovic A, Zivkovic M, Bezdan T, Antonijevic M (2021) Feature selection in machine learning by hybrid sine cosine metaheuristics. In International conference on advances in computing and data sciences. Springer, pp 604–616 10. Bacanin N, Stoean R, Zivkovic M, Petrovic A, Rashid TA, Bezdan T (2021) Performance of a novel chaotic firefly algorithm with enhanced exploration for tackling global optimization problems: application for dropout regularization. Mathematics 9(21). https://www.mdpi.com/ 2227-7390/9/21/2705 11. Bacanin N, Tuba E, Zivkovic M, Strumberger I, Tuba M (2019) Whale optimization algorithm with exploratory move for wireless sensor networks localization. In International conference on hybrid intelligent systems. Springer, pp 328–338 12. Bacanin N, Zivkovic M, Bezdan T, Cvetnic D, Gajic L (2022) Dimensionality reduction using hybrid brainstorm optimization algorithm. In Proceedings of international conference on data science and applications. Springer, pp 679–692 13. Bacanin N, Zivkovic M, Bezdan T, Venkatachalam K, Abouhawwash M (2022) Modified firefly algorithm for workflow scheduling in cloud-edge environment. Neur Comput Appl 1–26 14. Bacanin N, Zivkovic M, Salb M, Strumberger I, Chhabra A (2022) Convolutional neural networks hyperparameters optimization using sine cosine algorithm. In Sentimental analysis and deep learning. Springer, pp 863–878 15. Bezdan T, Milosevic S, Venkatachalam K, Zivkovic M, Bacanin N, Strumberger I (2021) Optimizing convolutional neural network by hybridized elephant herding optimization algorithm for magnetic resonance image classification of glioma brain tumor grade. In 2021 Zooming innovation in consumer technologies conference (ZINC). IEEE, pp 171–176 16. Bezdan T, Stoean C, Naamany AA, Bacanin N, Rashid TA, Zivkovic M, Venkatachalam K (2021) Hybrid fruit-fly optimization algorithm with k-means for text document clustering. Mathematics 9(16):1929 17. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Glioma brain tumor grade classification from mri using convolutional neural networks designed by modified FA. In International conference on intelligent and fuzzy systems. Springer, pp 955–963 18. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Multi-objective task scheduling in cloud computing environment by hybridized bat algorithm. In International conference on intelligent and fuzzy systems. Springer, pp 718–725 19. de Castro LN, Von Zuben FJ (2002) ainet: an artificial immune network for data analysis. In Data mining: a heuristic approach. IGI Global, pp 231–260 20. Chicco D, Jurman G (2020) The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13 21. Cuk A, Bezdan T, Bacanin N, Zivkovic M, Venkatachalam K, Rashid TA, Devi VK (2021) Feedforward multi-layer perceptron training by hybridized method between genetic algorithm and artificial bee colony. Data Sci Data Anal Oppor Chall 279 22. De Castro LN, Timmis J (2002) An artificial immune network for multimodal function optimization. In Proceedings of the 2002 congress on evolutionary computation. CEC’02 (Cat. No. 02TH8600), Vol 1. IEEE, pp 699–704 23. De Castro LN, Von Zuben FJ (2000) The clonal selection algorithm with engineering applications. In Proceedings of GECCO, Vol 2000, pp 36–39 24. Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Magaz 1(4):28–39 25. Gajic L, Cvetnic D, Zivkovic M, Bezdan T, Bacanin N, Milosevic S (2021) Multi-layer perceptron training using hybridized bat algorithm. In Computational vision and bio-inspired computing. Springer, pp 689–705 26. Hastie TJ, Rosset S, Zhu J, Zou H (2006) Multi-class adaboost. Statistics and its. Interface 2:349–360 27. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Fut Gener Comp Syst 97:849–872 28. Ileberi E, Sun Y, Wang Z (2021) Performance evaluation of machine learning methods for credit card fraud detection using smote and adaboost. IEEE Access 9:165286–165294

128

M. Djuric et al.

29. Kennedy J, Eberhart R (1995) Particle swarm optimization. In Proceedings of ICNN’95international conference on neural networks, Vol 4. IEEE, pp 1942–1948 30. Luque A, Carrasco A, Martín A, de Las Heras A (2019) The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Patt Recogn 91:216– 231 31. Mirjalili S (2015) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl Based Syst 89:228–249 32. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61 33. Norton M, Uryasev S (2019) Maximization of auc and buffered auc in binary classification. Math Program 174(1):575–612 34. Rajora S, Li DL, Jha C, Bharill N, Patel OP, Joshi S, Puthal D, Prasad M (2018) A comparative study of machine learning techniques for credit card fraud detection based on time variance. In 2018 IEEE symposium series on computational intelligence (SSCI), pp 1958–1963 35. Randhawa K, Chu Kiong L, Seera M, Lim C, Nandi A (2018) Credit card fraud detection using adaboost and majority voting. IEEE Access, pp 14277–14284 36. Riffi J, Mahraz MA, El Yahyaouy A, Tairi H, et al (2020) Credit card fraud detection based on multilayer perceptron and extreme learning machine architectures. In 2020 International conference on intelligent systems and computer vision (ISCV). IEEE, pp 1–5 37. Strumberger I, Tuba E, Bacanin N, Zivkovic M, Beko M, Tuba M (2019) Designing convolutional neural network architecture by the firefly algorithm. In 2019 International young engineers forum (YEF-ECE). IEEE, pp 59–65 38. Talatahari S, Bayzidi H, Saraee M (2021) Social network search for global optimization. IEEE Access 9:92815–92863 39. Talatahari S, Bayzidi H, Saraee M (2021) Social network search for global optimization 40. Tanouz D, Subramanian RR, Eswar D, Reddy GP, Kumar AR, Praneeth CV (2021) Credit card fraud detection using machine learning. In 2021 5th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 967–972 41. Trivedi NK, Simaiya S, Lilhore UK, Sharma SK (2020) An efficient credit card fraud detection model based on machine learning methods. Int J Adv Sci Technol 29(5):3414–3424 42. Yang XS (2009) Firefly algorithms for multimodal optimization. In International symposium on stochastic algorithms. Springer, pp 169–178 43. Yang XS, Gandomi AH (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 29(5):464–483 44. Zivkovic M, Bacanin N, Djordjevic A, Antonijevic M, Strumberger I, Rashid TA, et al (2021) Hybrid genetic algorithm and machine learning method for covid-19 cases prediction. In Proceedings of international conference on sustainable expert systems. Springer, pp 169–184 45. Zivkovic M, Bacanin N, Tuba E, Strumberger I, Bezdan T, Tuba M (2020) Wireless sensor networks life time optimization based on the improved firefly algorithm. In 2020 International wireless communications and mobile computing (IWCMC). IEEE, pp 1176–1181 46. Zivkovic M, Bacanin N, Venkatachalam K, Nayyar A, Djordjevic A, Strumberger I, Al-Turjman F (2021) COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain Cities Soc 66:102669 47. Zivkovic M, Bacanin N, Zivkovic T, Strumberger I, Tuba E, Tuba M (2020) Enhanced grey wolf algorithm for energy efficient wireless sensor networks. In 2020 Zooming innovation in consumer technologies conference (ZINC). IEEE, pp 87–92 48. Zivkovic M, Bezdan T, Strumberger I, Bacanin N, Venkatachalam K (2021) Improved Harris Hawks optimization algorithm for workflow scheduling challenge in cloud—edge environment. In Computer networks, big data and IoT. Springer, pp 87–102 49. Zivkovic M, Stoean C, Chhabra A, Budimirovic N, Petrovic A, Bacanin N (2022) Novel improved salp swarm algorithm: an application for feature selection. Sensors 22(5):1711 50. Zivkovic M, Zivkovic T, Venkatachalam K, Bacanin N (2021) Enhanced dragonfly algorithm adapted for wireless sensor network lifetime optimization. In Data intelligence and cognitive informatics. Springer, pp 803–817

Chapter 11

Prediction of Pneumonia Using Deep Convolutional Neural Network (CNN) Jashasmita Pal and Subhalaxmi Das

1 Introduction Pneumonia is one example of a communicable di ease. Bacteria, viruses, or microorganisms cause infections. Pneumonia is a respiratory infection that affects lasting lung damage. When a healthy individual inhales, he or she exhales slowly and deeply [1]. Alveoli are little sacs that fill with air. The lungs are in the process of being formed. Patients with pneumonia experience difficulty breathing and limited oxygen intake due to clogged alveoli in the lungs. This bacterium is extremely harmful for youngsters under the age of five, as well as those in their declining years [1]. Fortunately, antibiotics and antiviral medicines are often effective in treating pneumonia. Early identification and treatment of pneumonia, on the other hand, are critical in preventing death. Pneumonia can be diagnosed using a variety of techniques, including chest X-rays, CT scans, chest ultrasounds, and chest MRIs [2, 3]. Chest X-rays are the most popular and well-known clinical approach for identifying pneumonia nowadays. In rare circumstances, pneumonia appears indistinct on chest X-ray pictures and is mistaken for another disease. These discrepancies resulted in a lot of subjective judgments and variations among radiologists when it came to diagnosing pneumonia. As a result, a computerized web is required to assist radiologists in the avoidance of X-ray-induced pneumonia. Convolutional neural networks (CNNs) have recently achieved tremendous results in picture categorization and segmentation using deep learning approaches (See Fig. 1). The chest cavity is visible or dark in color on the left image of Fig. 1 because the lung chambers are filled with air as fluid fills the air sacs, [4] the radiological J. Pal (B) · S. Das Odisha University of Technology and Research, Bhubaneswar, Odisha, India e-mail: [email protected] S. Das e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_11

129

130

J. Pal and S. Das

(a)

(b)

Fig. 1 a Normal chest X-ray. b Pneumonia chest X-ray

image of the chest cavity brightens, indicating pneumonia, as seen in Fig. 1 on the right. Deep learning is currently posing a significant difficulty in the medical industry which would aid a physician in making the best decision possible regarding early detection. In this study, we present an approach for determining whether or not a person has pneumonia. Deep learning methods and convolutional neural networks (CNNs) were employed to extract and classify the features. Finally, the performance of the models was tested using several performance measures. The following are the work’s concluding sections: Sect. 2 describes the chapter’s Literature Review. Section 3 describes the methodology of this chapter, which contains information on deep learning techniques. Section 4 presents the suggested work for the chapter. Section 5 compares the results of the categorization algorithms, and Sect. 6 presents the conclusion.

2 Literature Survey Recently, many researchers have made efforts to predict pneumonia. Rahman et al. [1] described a deep learning system to diagnose pneumonia, as well as bacterial and viral illness, from X-ray images. The performance of four different pre-trained networks is investigated in the proposed study (AlexNet, ResNet18, DenseNet, and SqueezeNet). This report examines three different networks while also providing methodological information for future studies. Except for three-opposite networks, DenseNet201 surpasses all other networks. Various CNN deep networks after training, DenseNet201 can accurately categorize pneumonia using only a few numbers of complicated datasets such as pictures, yielding in less bias, and better generalization. Sibbaluca et al. [2] Using five convolutional neural network models, the author employs the deep learning function to identify pneumonia through computer vision. The picture datasets were taken from the database of the Radiological Society of North America. The only models that validated the researchers’ observations with

11 Prediction of Pneumonia Using Deep Convolutional Neural Network …

131

an accuracy rate of 95–97% were AlexNet, LeNet, GoogleNet, and VGGNet. In all of the models, pneumonia was successfully recognized 74% of the time, while normal chest X-rays were correctly detected 76% of the time. Using a densely connected convolutional neural network, Varshni et al. [3] proposed a pneumonia detection method (DenseNet-169). Upon analyzing many pre-trained CNN models and other classifiers, the authors chose DenseNet-169 to extract features and SVM to classify them based on the statistical results. According to Ayan et al. [4], their study used both the Xception and VGG16 convolutional neural network models. After that, during the training phase, transfer learning and fine-tuning were applied. In terms of accuracy, the VGG16 network outperformed the Xception network by 0.87 and 0.82%, respectively. Among the two well-known models used by Ayush Pant et al. [5] in this study, the “EfficientNet-B4-based U-Net” model outperformed the “ResNet-based UNet” model with low precision but high recall. High precision is provided by the EfficientNet-B4 U-Net, whereas the ResNet-34 U-Net provides high recall. Amazing results can be obtained by combining these two models. To improve pneumonia detection accuracy, Tilve et al. [6] explored image preprocessing techniques, such as CNN, RESNET, CheXNet, DENSENET, ANN, and KNN, which are critical to converting raw X-ray images into standard formats for analysis and detection. Rudraraju et al. [7] demonstrated a system that uses an unprepared convolution neural system model to describe and use a variety of chest X-ray imaging modalities to detect the presence of pneumonia, it constructs a convolution neural system without any prior preparation to separate highlights of a chest X-ray image and order it to return to a decision whether someone is infected with pneumonia or not. Researchers Mubarok et al. [8] examine how well residual networks and maskRCNNs identify and diagnose pneumonia using two well-known deep convolutional architectures. Furthermore, the outcomes are contrasted and scrutinized. In terms of detecting pneumonia, the residual network outperforms the mask-RCNN. The chest X-rays used by Chakraborty et al. [9] enabled them to uncover some cutting-edge pneumonia identification results. They employed a convolutional neural net for detection, which allowed us to analyze the spatial relevance of the information included within the images in order to correctly identify whether the chest X-rays were sensitive to pneumonia. Li et al. [10] Investigated illness features in CXR images and described an attention-guided CNN-based pneumonia diagnosis technique. Using SE-ResNet as a backbone, they build a fully connected convolutional neural network model for end-to-end output detecting objects. According to the findings, the proposed method outperforms the state-of-the-art object detection model in terms of accuracy and false-positive rate. Li et al. [11] developed an improved convolutional neural network approach for identifying pneumonia. The model in this work was created by adding a convolutional layer, a pooling layer, and a feature integration layer to the initial conventional

132

J. Pal and S. Das

lenet-5 model, and then extensively abstracting the acquired features. Finally, remarkable results were obtained on both the training and test sets of two public datasets, confirming the robustness of the proposed model. To classify pneumonia from X-ray pictures, Islam et al. [12] employed pre-trained deep neural networks as feature extractors in conjunction with classical classification approaches. They also used chest X-ray images to retrain the pre-trained networks, and they chose the two networks with the highest accuracy and sensitivity to use as feature extractors. Radiologists could use the model presented by Yue et al. [13] to make decisions regarding pneumonia diagnosis from digital chest X-rays. The purpose of this thesis is to optimize the weighted predictions of AI models such as ResNet18, Xception, InceptionV3, DenseNet121, and MobileNetV3 by using a weighted classifier. Mohammad Farukh Hashmi et al. developed a method for detecting pneumonia using digital chest X-ray images, which they believe would aid radiologists in making better decisions. It also included a weighted classifier that combines the weighted predictions from cutting-edge deep learning models such as ResNet18, Xception, InceptionV3, DenseNet121, and MobileNetV3 to produce the best possible outcome. Shah et al. [14] used to diagnose Pneumonia properly recognizes chest X-rays. The model’s loss is minimized during training, and as a result, accuracy improves with each epoch step, yielding distinct outcomes for distinguishing between pneumoniaaffected and non-affected people. As a result of the information augmentation and preprocessing processes, convolutional neural networks and deep neural networks are not overfitted, guaranteeing that the outputs are consistent. From overall all these papers we observed that in most of the cases CNN gives the better result and better accuracy. Different authors used this technique to implement different algorithms. So, we use CNN technique for this chapter for prediction of pneumonia disease.

3 Methodology 3.1 Deep Learning Deep learning, in general, refers to a machine learning approach that takes an input X and predicts an output Y. A deep learning algorithm will attempt to narrow the gap between its forecast and the system’s output given a big dataset of input and output pairs. Deep learning algorithms explore for links between inputs and outputs using a neural network. In a neural network, “nodes” make up the input, hidden, and output layers. In a numerical representation of knowledge, hidden layers are associated with the majority of the calculation (e.g., images with pixel specs). Input layers encode knowledge numerically (for example, photos with pixel specifications), output layers provide predictions, and neural networks are used to execute deep learning [3, 5].

11 Prediction of Pneumonia Using Deep Convolutional Neural Network …

133

Fig. 2 Diagram of neurons

The biological neuron represents the concept of motivation in neural networks. A neuron is nothing but brain cells. In Fig. 2, the diagram of neurons; we have got dendrites that are wont to provide input to our neurons. As here multiple dendrites these many inputs are provided to neurons. Inside the Cell body, we have a nucleus that performs some function [6]. At the moment output will travel through Axon, and it’ll go toward axon terminals, so this neuron will fire this output toward the subsequent neuron. Now, this may tell us the following neuron or two neurons are never connected to every other. The gap between two neurons is named Synapse. So this is often the fundamentals of neurons. Within the bellow, part is the same because the upper part diagram. The neural networks feed-forward neural network, convolutional neural network (CNN), recurrent neural network (RNN), modular neural network, artificial neural network (ANN), multilayer perception will be accommodated based on the categories of knowledge. We will go over a convolution neural network in this paper (CNN).

3.2 Convolutional Neural Networks (CNNs) A CNN finds relevant traits without the need for human intervention. Input images, an in-depth feature extractor, and a classifier are the three main components. Images that have not been processed (or have been pre-processed) are used as input. The feature extractor automatically learns the important features [1]. The taught features are fed into a classifier, such as SoftMaxs, which sorts them into categories based on the learned qualities. CNNs are particularly popular due to their superior picture classification capabilities. An artificial intelligence technique called CNN is a feedforward neural network. The tool is widely used in the field of image recognition. An

134

J. Pal and S. Das

Fig. 3 CNN architecture

array of multidimensional data is used by CNN to represent the input file. It works well with a large amount of labeled data [2]. (See Fig. 3) Figure 3 depicts CNN’s architecture, which is made up of three main sorts of layers: (1) The convolution layer is the convolutional network’s first layer and it is utilized to find features. (2) The max-pooling (subsampling) layer reduces dimensionality and down-samples the image, lowering computational costs. The most common polling approach is max-pooling, which uses the most significant element from the feature map as an input. And (3) a fully linked layer to give the network with categorization capabilities [3].

3.3 Pre-trained Convolutional Neural Networks CNN, ResNet50, VGG19, and InceptionV3 are five well-known pre-trained deep learning CNNs to be employed for pneumonia diagnosis in this paper. The following is a quick description of these pre-trained networks.

11 Prediction of Pneumonia Using Deep Convolutional Neural Network …

135

3.4 ResNet ResNet stands for Residual Network, a specific kind of neural network that was introduced by Kaiming He, Zhang in 2015. Residual neural networks are a type of neural network that exists in the absence of Use skip connections or shortcuts to jump over some layers to experiment with this. There are two main reasons to feature skip connections like vanishing gradients and mitigating degradation problems [4, 7]. ResNet comes in several flavors: ResNet18, ResNet50, and ResNet101. ResNet was effectively used for transfer learning in biomedical picture classification. We utilized ResNet50 to detect pneumonia in this article.

3.5 InceptionV3 Inception-v3 may be a deep convolutional neural network with 48 layers’ model with pre-trained layers. It’s a variant of the network that’s been trained on millions of pictures from the Imogene collection. Inceptionv3 requires a 299 * 299-pixel input picture. Convolutions, average pooling, maximum pooling, concat, dropouts, and completely linked layers are among the symmetric and asymmetric construction components of the model [6]. This approach is utilized throughout the model to batch normalize the activation inputs. Softmax is used to compute the loss.

3.6 VGG19 VGG is an acronym for Visual Geometry Group. It’s a multi-layered convolutional neural network. This is the foundation for cutting-edge deep neural network-based object identification models [7, 9, 10]. VGG comes in a variety of forms, including VGG11, VGG16, VGG19, and others. In this study, we look at the VGG19 pretrained model. It’s based on the VGG model, there are 19 levels in total, and they include (16 convolution layers, 3 FC layers, 5 Maxpool layers and 1 Softmax). The model’s weight layers are represented by “19”.

4 Proposed Work In this part, we discussed the proposed work of this topic. We design the five-ensemble framework, i.e., CNN, ResNet50, Inceptionv3, and VGG19 which are explained in detail in the previous section. In this section we discussed the proposed method, preprocessing and augmentation technique, and performance measures (See Fig. 4).

136

J. Pal and S. Das

Datas t INPUT PHASE

Data Preprocessing & Data Augmentation

Normal FINAL PREDICTION

Deep CNN PreTrained Model CNN ResNet50 InceptionV 3 Vgg19

Performance Classification & Evaluation

Pneumonia

Fig. 4 Overview of model framework

Figure 4 depicts an overview of the model structure. The input part of the setup is connected to the second part which could be a preprocessing stage. In these stages, we preprocess the information and resize the pixels. In preprocessing stages, we convert the information into a clean dataset or understandable format. Then we use different data augmentation techniques for expanding the dimensions of a training dataset by creating modified images in this dataset. Then the various deep convolutional transfer learning models are generated to a machine, during this work, five different types of deep learning algorithms: CNN, ResNet50, InceptionV3, VGG19, DenseNet121 are accustomed to classify the output. Hence, the output part is identified as either normal or pneumonia. During this study, four evaluation metrics accuracy, recall, precision, and F1-score were assigned to the bottom CNN model.

4.1 Preprocessing and Augmentation One of the important steps during this methodology is data preprocessing. It is resized the image input for various algorithms [11]. All images are normalized in line with the pre-trained model. As we all know CNN works with an outsized dataset. As a result, data augmentation techniques are frequently used to construct alternative versions of an actual dataset in order to increase its size or to build a whole new training dataset [1]. Deep learning algorithms can benefit from data augmentation to improve their accuracy. Gray scales, horizontal and vertical flips, random crops, color jitters, translations, rotations, resizing, scaling, and a variety of other augmentation techniques are all available. Using these tactics on our data, we may easily double or quadruple the amount of training data, resulting in a very strong model. We used different augmentation techniques for each algorithm in this paper to get new datasets: for CNN, we used (rescale = 1./255, zoom range = 0.3, vertical flip = true, width shift range = 0.1, height shift range = 0.1), for RESNET50, we used horizontal_flip = True, width_shift_range = 0.2, height_shift_range = 0.2, shear_range = 0.2,

11 Prediction of Pneumonia Using Deep Convolutional Neural Network …

137

Table 1 Preprocessing and augmentation technique for different algorithm Algorithm

Augmentation technique

CNN

Vertical flip = true, rescale = 1/255, zoom range = 0.3, width shift range = 0.1, height shift range = 0.1

RESNET50

horizontal_flip = True, width_shift_range = 0.2, height_shift_range = 0.2, shear_range = 0.2, zoom_range = 0.2

INCEPTIONV3 Resized(150, 150, 3) VGG19

horizontal flip = True, validation split = 0, rotation range = 40, rescale = 1./255, shear range = 0.2, zoom range = 0.2, horizontal flip = True, validation split = 0 horizontal flip = True, validation split = 0, shear range = 0.2, zoom range = 0.2

zoom_range = 0.2, for INCEPTIONV3 we use resize (150, 150, 3), for VGG19 we use (rotation_range = 40, rescale = 1./255, shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True, validation_split = 0.1) see the Table 1.

4.2 Performance Metrics K-fold cross-validation is used to train and test four CNNs. After evaluating and comparing various networks’ performances on testing datasets, performance metrics such as accuracy, sensitivity, recall, precision, AUC, and F1-score are used. As a performance metric, True Positive (TP) is a model’s correct choice of the positive selection and is compared using accuracy, sensitivity, recall, precision, AUC, and F1-score [12]. False Positives (FPs) are negative classes that are incorrectly classified as positive classes [13]. Negatives in the model should be categorized as True Negatives (TN). False negatives (FN) are negative classes that have been wrongly classified as positive classes. Performance metrics are shown below for several deep CNNs: Accuracy =

(TP + TN) (TP + FN) + (FP + TN)

Recall =

(TN) (FP + TN)

(1) (2)

Sensitivity =

(TP) (TP + FN)

(3)

Precision =

(TP) (TN + FP)

(4)

(2 ∗ TP) (2 ∗ TP + FN + FP)

(5)

F1 score =

138

J. Pal and S. Das

The words true positive (TP), true negative (TN), false positive (FP), and false negative (FN) refer to the number of pneumonia images in the preceding equations that were classified as pneumonia [7]. And hence the number of normal images that were classified as normal. The term “accuracy” relates to a model’s overall accuracy, or how many correct predictions it makes. The “precision and recall” values show how well the model performs. The model’s positive label prediction precision is demonstrated. This is the ratio of the model’s accurate predictions to its overall predictions. The percentage of bottom truth positives that the model correctly anticipated is measured by recall. “F1-Scores” strike the right blend of precision and recall. Thus, instead of focusing just on the accuracy rate, assessment metrics are used in medical picture classification to obtain an explicit identification of non-diseased, but nonetheless suffering patients [8].

5 Results and Discussion The evaluation findings of the suggested model are discussed in this section. This model’s dataset was also discussed. Table 2 shows the results of different deep learning algorithms and calculates the different measures. The above performance is the training and testing accuracy for different CNNs. It can be found that InceptionV3 produces high accuracy. For normal and pneumonia classification, the accuracy is 96%. In this study, we also calculate the recall, precision, F1-score. Table 2 represent the different performance metrics for five CNNs. This suggested work is also compared with the results of recently published work. Tawsifur Rahman et al. [1] described a deep learning system to diagnose pneumonia, as well as bacterial and viral illness, from X-ray images. The performance of four different pre-trained networks is investigated in the proposed study (AlexNet, ResNet18, DenseNet, and SqueezeNet) with an accuracy of 0.94, 0.96, 0.98, and 0.96, respectively. Sibbaluca et al. [4] classify a person normal or pneumonia and showed outcomes of a result of 0.84 and sensitivity, precision, and AUC of 0.89, 0.91, 0.87. Table 2 Different performance metrics for different deep CNN Algorithms

Accuracy (%)

Recall

Precision

F1-score

CNN

0.90

0.94

0.90

0.92

ResNet50

0.94

0.91

0.88

0.90

InceptionV3

0.96

0.94

0.88

0.91

VGG19

0.74

0.80

0.77

0.82

11 Prediction of Pneumonia Using Deep Convolutional Neural Network … Table 3 Overview of dataset

139

Types

Training set

Test set

Val

Normal

1341 (Normal)

234 (Normal)

8 (Normal)

Pneumonia

3875 (Pneumonia)

390 (Pneumonia)

8 (Pneumonia)

Total

5216

624

16

5.1 Dataset Chest X-Ray Images are included in the dataset (Pneumonia) https://www.kaggle. com/datasets/paultimothymooney/chest-xray-pneumonia. The 1 GB Kaggle chest X-ray pneumonia database, which contains 5856 chest X-ray images, was applied in this study. The images are in jpg or png format to fit the model. There are three sections in the data folder: train, test, and validate. Each directory has subdirectories for every image category of the several types of pneumonia (Normal, Pneumonia). 1341 chest X-rays were judged to be normal, whereas 3875 were discovered to be pneumonia. We covered dataset specifics in the Table 3.

5.2 Simulation Results The following section is discussing about different graph for different algorithm such as CNN, ResNet50, VGG19, and InceptionV3. Figure 5 presented a graph between model accuracy and model loss. This graph shows the No of epoch, loss, Accuracy, Val_loss, Val Accuracy. Here the No of epoch is 20, Loss is 0.21, Accuracy is 0.90, Val_loss and Val accuracy are 0.43 and 0.68. For ResNet50 (See Fig. 6) here the No of epoch is 6. We get 94% accuracy after 6 epochs. Here the Val accuracy is 0.93 and loss is 0.23. Here the training set seems to be steadily increasing, but overall, it increases and this might be a sign of overfitting. Figure 7 shows the graph of InceptionV3. Here the No of epochs is 6, loss is 0.09, Accuracy is 0.96, Val_loss is 0.70, and Val Accuracy is 0.78. For InceptionV3 accuracy is higher than algorithm. Figure 8 shows the graph of model VGG19. Here the number of epochs is 20, the loss is 0.58, the accuracy is 0.74, the loss is 0.58, and the accuracy is 0.74. The Val_loss is 0.58, and the Val accuracy is 0.74. In the graph model accuracy, the X-axis represents the number of epochs and the Y-axis reflects the algorithm’s accuracy; in the graph model loss, the epoch and loss will be displayed. The Python programming language is used to train, test, and evaluate many algorithms in this research. In order to train various models, the machine is equipped with a 64-bit Windows 8 operating system as well as an Intel i3 processor running at 2.40 GHz with 4 GB of RAM.

140 Fig. 5 Model_CNN

Fig. 6 Model_ResNet50

Fig. 7 Model_InceptionV3

Fig. 8 Model_VGG19

J. Pal and S. Das

11 Prediction of Pneumonia Using Deep Convolutional Neural Network …

141

6 Conclusion and Future Work This work demonstrates how deep CNN-based transfer learning may be used to identify pneumonia automatically. Four different pre-trained CNN algorithms are trained and assessed using chest X-ray to distinguish between normal and pneumonia patients [15]. In order to determine the most appropriate course of treatment and ensure that pneumonia does not pose a life-threatening threat to the patient, early identification of pneumonia is critical [14]. The most common method for diagnosing pneumonia is to take a chest radiograph. It has been discovered that InceptionV3 has a better level of accuracy than the others. For normal and pneumonia, the classification accuracy, recall, precision, and F1-score are all high (96, 94, and 96%). Every year, 1,000,000 children die as a result of this critical disease. Many lives can be saved by a speedy recovery combined with efficient therapy based on an accurate identification of the ailment. A timely diagnosis of pneumonia is essential to determining the best course of treatment and preventing life-threatening complications in patients. Deep convolutional neural networks have a unique style that produces high accuracy and superior results. This research can also be used to help in the diagnosis of other health problems. We intend to build a larger database as part of our future work. As a result, other deep learning methodologies can be used to train and evaluate the system in order to more precisely predict outcomes.

References 1. Rahman T, Muhammad EHC, Khandakar A, Islam KR, Islam KF, Mahbub ZB, Kadir MA, Kashem S (2020) Transfer learning with deep convolutional neural network (CNN) for pneumonia detection using chest X-ray. Appl Sci 10(9) 2. Varshni D, Thakral K, Agarwal L, Nijhawan R, Mittal A (2019) Pneumonia detection using CNN based feature extraction. In 2019 IEEE international conference on electrical, computer and communication technologies (ICECCT), pp 1–7. IEEE 3. Militante SV, Dionisio NV, Sibbaluca BG (2020) Pneumonia detection through adaptive deep learning models of convolutional neural networks. In: 2020 11th IEEE control and system graduate research colloquium (ICSGRC), pp 88–93. IEEE 4. Sibbaluca BG (2020) Pneumonia detection using convolutional neural network. Int J Sci Technol Res 5. Al Mubarok AF, Dominique Jeffrey AM (2019) Pneumonia detection and classification from chest x-ray image using deep learning approach. IEEE 6. Pant A, Jain A, Nayak KC, Gandhi D, Prasad BG (2020) Pneumonia detection: an efficient approach using deep learning. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–6 7. Ayan E, Ünver HM (2019) Diagnosis of pneumonia from chest X-ray images using deep learning. In: 2019 scientific meeting on electrical-electronics & biomedical engineering and computer science (EBBT). IEEE, pp 1–5 8. Sirish Kaushik V (2020) Pneumonia detection using convolutional neural networks (cnns). In: Proceedings of first international conference on computing, communications, and cybersecurity (IC4S 2019). Springer

142

J. Pal and S. Das

9. Al Mubarok AF, Faqih A, Dominique JAM, Thias AH (2019) Pneumonia detection with deep convolutional architecture. In: 2019 international conference of artificial intelligence and information technology (ICAIIT). IEEE 10. Li X, Chen F, Hao H, Li M (2020) A pneumonia detection method based on improved convolutional neural network. In: 2020 IEEE 4th information technology, networking, electronic and automation control conference (ITNEC), vol 1. IEEE, pp 488–493 11. Mehta SSH (2020) Pneumonia detection using convolutional neural networks. In: Third international conference on smart systems and inventive technology (ICSSIT 2020) IEEE Xplore Part Number: CFP20P17-ART, IEEE 12. Krishnan S (2018) Prevention of pneumonia using deep learning approach 13. Imran A, Sinha V (2019) Training a CNN to detect pneumonia 14. Chakraborty S, Aich S, Sim JS, Kim H-C (2019) Detection of pneumonia from chest xrays using a convolutional neural network architecture. In: International conference on future information & communication engineering, vol 11, no 1, pp 98–102 15. Bozickovic J, Lazic I, Turukalo TL (2020) Pneumonia detection and classification from X-ray images—a deep learning approach

Chapter 12

State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert System Eugene Fedorov , Tetyana Utkina , Tetiana Neskorodieva , and Anastasiia Neskorodieva

1 Introduction Artificial incubation of poultry eggs, both industrially and in small households, has significant advantages over the classical method, using a mother hen. But sometimes, the eggs incubation results can be unsatisfactory due to the detection of a significant percentage of losses. Losses, as a rule, include unfertilized eggs identified during candling, eggs with the presence of blood rings, freezing of the embryo in development, physiological abnormalities, etc. If many these are detected, it is necessary to diagnose errors in the incubation modes [1]. The most common incubation errors are related to storage, high humidity, too high or low hatch temperature, incorrect turning, or insufficient ventilation. Therefore, first, it is necessary to revise the microclimate parameters: temperature, humidity, the composition of ventilated air, etc. And also, do not forget that the main condition for breeding healthy poultry chicks during the incubation process is the quality of the eggs themselves. It is advisable to choose poultry eggs according to their average weight and size. As a rule, the weight of an egg should fluctuate in the range: chicken 55–60 g, duck 80–92, turkeys 82–85, geese 160–180, quails 9–11. However, deviations of a few grammes are allowed. For incubation, eggs are chosen without growths and thickenings on the shell, the correct shape, oval, without cracks [1]. On

E. Fedorov (B) · T. Utkina Cherkasy State Technological University, Shevchenko Blvd., Cherkasy 460, 18006, Ukraine e-mail: [email protected] T. Utkina e-mail: [email protected] T. Neskorodieva · A. Neskorodieva Vasyl’ Stus Donetsk National University, 600-Richcha Str., 21, Vinnytsia 21021, Ukraine e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_12

143

144

E. Fedorov et al.

initial candling, there should be no noticeable bloody rings or spots, yolk displacement, airbox problems, or a uniform glow. Candling of eggs should be done carefully without unnecessary shaking, shock, prolonged cooling, and, as a rule, no more than 3 times [2]. In addition, no one is immune from the case when, due to external factors (power outages, programme error, or incorrect operator actions), the hatchery may stop working in normal mode. Diagnosis and errors elimination in the egg incubation mode are of paramount importance for breeding healthy poultry, as it will allow creating and controlling optimal external conditions in the incubator, corresponding to the normal embryo development in the egg. The insufficient number of works related to the creation of systems for intelligent diagnostics of the state of development of poultry eggs is a gap in research in the existing literature and a motivation to correct this shortcoming. Therefore, the development of a neuro-fuzzy expert system for the implementation of intelligent diagnostics of the state of development of poultry eggs in the process of incubation is an urgent task. This will make it easy and timely to diagnose and eliminate errors in the mode of incubation of poultry eggs, based on biological signs indicating violations of the development of the embryo due to the adverse influence of external conditions, which will ensure a greater percentage of the yield of high-quality and healthy offspring of poultry. Now for intellectual diagnostics of egg development state methods of artificial intelligence are used, at the same time, the most popular are: 1. Machine learning: (1) Metric approach (for example, k of the nearest neighbours) [1]. (2) Probabilistic approach: logistic and multinomial regression [3]; linear discriminant analysis [4, 5]; Naive Bayesian classifier [6]. (3) Logical approach (for example, decisions tree) [1]. (4) Connectionist approach: support vector machine [7, 8]; multilayer perceptron [9, 10]; convolution neural networks [11, 12]. (5) Taxonomical approach (for example, k-means [13]). (6) Metaheuristic approach (for example, clonal selection [14, 15], genetic algorithm) [16]. 2. Expert systems: expert system [17]; fuzzy expert system [18]. Now for intellectual diagnostics of egg development state by the most popular methods of machine learning artificial neural networks are [19, 20]. Now for intellectual diagnostics of egg development states amongst expert systems fuzzy expert systems are the most [21]. Recently, neural nets are combined with fuzzy expert systems, and for training of parameters of membership function can be used metaheuristics [22, 23]. Thus, development of an intellectual system of diagnostics of egg development states which will allow to eliminate the specified defects is relevant. Thus, it is relevant to develop an intelligent system for diagnosing the state of egg development, which will eliminate these shortcomings, which is a new contribution to the study of this problem.

12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …

145

The purpose of work is increasing and diagnosing the efficiency of egg development states at the expense of a neuro-fuzzy expert system which studies the fundamentals of metaheuristics. For goal achievement, it is necessary to solve the following problems: 1. Creation of a neuro-fuzzy expert system of egg development states diagnostics. 2. Creation of mathematical models of a neuro-fuzzy expert system of egg development states diagnostics. 3. Choice of criteria for evaluation of efficiency of mathematical models of a neurofuzzy expert system of egg development states diagnostics. 4. Parameter identification of the mathematical model neuro-fuzzy expert system of egg development states diagnostics based on an algorithm of the back propagation in batch mode. 5. Parameter identification of the mathematical model neuro-fuzzy expert system of egg development states diagnostics based on the optimizer of a grey wolf.

2 The Neuro-fuzzy Expert System of Egg Development States Diagnostics Creation For diagnostics of egg development states in work, the fuzzy expert system which provides representation of egg development states knowledge in the form of the fuzzy rules available to understanding by the person reached further improvement and provides performance of the following stages [24, 25]: formation of linguistic variables; formation of base of fuzzy rules; fuzzification; subconditions aggregation; conclusions activation; conclusions aggregation; defuzzification. Previously, extraction of features from the image by means of the image transformation in grey, segmentations, threshold processing (definition of cracks and sites of egg without shell), and morphological operations (receiving all egg surface by means of the shell defects filling) is carried out [18].

2.1 Linguistic Variables Formation As accurate input variables were chosen: • • • •

availability of blood on a shell x1 ; existence of the site of egg without shell x2 ; share of superficial cracks x3 ; egg size x4 . As linguistic input variables were chosen:

• blood on a shell x˜1 with the values  a11 = present,  a12 = absent, at which values areas are fuzzy sets A˜ 11 = {x1 |μ A˜ 11 (x1 )}, A˜ 12 = {x1 |μ A˜ 12 (x1 )};

146

E. Fedorov et al.

• site of egg without shell x˜2 with the values  a21 = present,  a12 = absent, at which values areas are fuzzy sets A˜ 21 = {x2 |μ A˜ 21 (x2 )}, A˜ 22 = {x2 |μ A˜ 22 (x2 )}; • share of superficial cracks x˜3 with the values  a31 = large,  a32 = medium,  a33 = small,  a34 = zero, at which values areas are fuzzy sets A˜ 31 = {x3 |μ A˜ 31 (x3 )}, A˜ 32 = {x3 |μ A˜ 32 (x3 )}, A˜ 33 = {x3 |μ A˜ 33 (x3 )}, A˜ 34 = {x3 |μ A˜ 34 (x3 )}; • egg size x˜3 with the values  a31 = small,  a32 = medium,  a33 = large, at which values areas are fuzzy sets A˜ 41 = {x4 |μ A˜ 41 (x4 )}, A˜ 42 = {x4 |μ A˜ 42 (x4 )}, A˜ 43 = {x4 |μ A˜ 43 (x4 )}. As an accurate output variable, number of a state of egg was chosen y. As a linguistic output variable, the state of egg was chosen y with the values 1 = invalid, β 2 = poor, β 3 = average, β 4 = good, β 5 = excellent, at which β ˜ ˜ values areas are fuzzy sets B1 = {y|μ B˜ 1 (y)}, B2 = {y|μ B˜ 2 (y)}, B˜ 3 = {y|μ B˜ 3 (y)}, B˜ 4 = {y|μ B˜ 4 (y)}, B˜ 5 = {y|μ B˜ 5 (y)}.

2.2 Formation of Base of Fuzzy Rules The offered fuzzy rules consider all possible combinations of input linguistic variables values and output linguistic variable values corresponding to them: R 1 : If x˜1 is α˜ 11 and x˜2 is α˜ 21 and x˜3 is α˜ 31 and x˜4 is α˜ 41, then y˜ is β˜1 (F 1 )… R 48 : If x˜1 is α˜ 12 and x˜2 is α˜ 22 and x˜3 is α˜ 34 and x˜4 is α˜ 43, then y˜ is β˜5 (F 48 ). where F r —coefficients of fuzzy rules R r . For example, fuzzy rule R 1 corresponds to the following knowledge: If blood is present at a shell, and the opening is present at a shell both a share of superficial cracks big and the size of egg small, then a state of egg development invalids.

2.3 Fuzzification Let us define degree of truth of each subcondition of each fuzzy rule by means of membership function μ A˜ i j (xi ). Membership function of subcondition is defined in a form 

 −1   xi − γi1 2βi j   μ A˜ i j (xi ) = 1 +  , i ∈ 1, 4, j ∈ 1, n i , αi j  or in a form

(1)

12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …

⎧ 0, xi ≤ α ⎪ ⎪ ⎪ xi −α ⎪ ⎪ , α ≤ xi ≤ β ⎨ β−α μ A˜ i j (xi ) = 1, β ≤ xi ≤ γ , i ∈ 1, 4, j ∈ 1, n i ⎪ δ−xi ⎪ ⎪ , γ ≤ xi ≤ δ ⎪ δ−γ ⎪ ⎩ 0, xi ≥ δ

147

(2)

where αi j , βi j , γi j δi j —parameters, n 1 = 2, n 2 = 2, n 3 = 4, n 4 = 3.

2.4 Aggregation of Substates Let us define degree of truth of each state of each fuzzy rule by means of membership function μ A˜ r (x). Membership function of states is defined in a look μ A˜ r (x) = μ A˜ 1i (x1 )μ A˜ 2 j (x2 )μ A˜ 3k (x3 )μ A˜ 4l (x4 ),

(3)

r ∈ 1, 48, i ∈ 1, n 1 , j ∈ 1, n 2 , k ∈ 1, n 3 , l ∈ 1, n 4 , or in a form μ A˜ r (x) = min{μ A˜ 1i (x1 ), μ A˜ 2 j (x2 ), μ A˜ 3k (x3 ), μ A˜ 4l (x4 )},

(4)

r ∈ 1, 48, i ∈ 1, n 1 , j ∈ 1, n 2 , k ∈ 1, n 3 , l ∈ 1, n 4 .

2.5 Activation of the Conclusions Let us define degree of truth of each state of each fuzzy rule by means of membership function μC˜ r (x, z). Membership function of the conclusions is defined in a look μC˜ r (x, z) = μ A˜ r (x)μ B˜ m (z)F r , r ∈ 1, 48.

(5)

μC˜ r (x, z) = min{μ A˜ r (x), μ B˜ m (z)}F r , r ∈ 1, 48.

(6)

or in a form

In this work membership function μ Bm (z) and weight coefficients of fuzzy rules F r are defined in a form

148

E. Fedorov et al.

μ Bm (z) = [z = m] =

1, z = m , m ∈ 1, 5 0, z = m

(7)

F r = 1.

2.6 Conclusions Aggregation Let us define degree of truth of the conclusion by means of membership function μC˜ (x, z). Membership function of the conclusion is defined in a form μC˜ (x, z) = 1 − (1 − μC˜ 1 (x, z)) · ... · (1 − μC˜ 48 (x, z)), z ∈ 1, 5

(8)

or in a form μC˜ (x, z) = max{μC˜ 1 (x, z), ..., μC˜ 48 (x, z)}, z ∈ 1, 5

(9)

2.7 Defuzzification For obtaining number of egg development states, the method of a maximum of membership function is used z ∗ = arg max μC˜ (x, z), z ∈ 1, 5.(10). z

3 Creation of Mathematical Models of a Neuro-fuzzy Expert System of Egg Development States Diagnostics For the neuro-fuzzy expert system of egg development states, diagnostics in work reached further improvement mathematical models of artificial neural networks due to use of p-sigma, inverted p, and min–max of neurons that allows to model stages of a fuzzy logical conclusion which defines model structure. The model structure of a neuro-fuzzy expert system is presented in the form of the graph in Fig. 1. The input (zero) layer contains four neurons (corresponds to quantity of input variables). The first hidden layer realises a fuzzification and contains eleven neurons (corresponds to values number of linguistic input variables). The second hidden layer realises aggregation of subconditions and contains 48 neurons (corresponds to the

12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …

149

z

Fig. 1 Model structure of a neuro-fuzzy expert system in the graph form

x1













































layer 2

layer 3

… …

x2

… … …

x3 … …

x4



layer 0

layer 1

y1 …



y2 y3 y4



y5

layer 4

number of fuzzy rules). The third hidden layer realises activation of the conclusions and contains forty-eight neurons (corresponds to the number of fuzzy rules). The output (fourth) layer realises aggregation of the conclusions and contains five neurons (corresponds to values number of linguistic output variables). Functioning of a neuro-fuzzy expert system is presented as follows (Fig. 1). In the first layer, membership function of subconditions on a basis is calculated: • trapezoidal function ⎧ 0, ⎪ ⎪ ⎪ xi −α ⎪ ⎪ ⎨ β−α , μ A˜ i j (xi ) = 1, ⎪ δ−xi ⎪ ⎪ , ⎪ ⎪ ⎩ δ−γ 0,

xi ≤ α α ≤ xi ≤ β β ≤ xi ≤ γ , i ∈ 1, 4, j ∈ 1, n i ; γ ≤ xi ≤ δ xi ≥ δ

(11)

• bell-shaped function 

 −1   xi − γi1 2βi j  μ A˜ i j (xi ) = 1 +  , i ∈ 1, 4, j ∈ 1, n i , αi j 

(12)

150

E. Fedorov et al.

n 1 = 2, n 2 = 2, n 3 = 4, n 4 = 3. In the second layer condition, membership function on a basis is calculated: • product of the sums μ A˜ r (x) =

ni 4

wirj μ A˜ i j (xi ) r ∈ 1, 48, wirj ∈ {0, 1};

(13)

i=1 j=1

• minimization of maximising

 μ A˜ r (x) = min max wirj μ A˜ i j (xi ) , i ∈ 1, 4, j ∈ 1, n i , r ∈ 1, 48, wirj ∈ {0, 1}. i

j

(14) In the third layer, membership function of the conclusions on a basis is calculated: • products μC˜ r (x, z) = wr μ A˜ r (x)μ B˜ r (z), z ∈ 1, 5, r ∈ 1, 48;

(15)

• minimisation μC˜ r (x, z) = wr min{μ A˜ r (x), μ B˜ r (z)}, z ∈ 1, 5, r ∈ 1, 48, wr = F r .

(16)

In the fourth layer, membership function of the conclusion on a basis is calculated: • the inverted product yz = μC˜ (x, z) = 1 −

48

wrz (1 − μC˜ r (x, z)), z ∈ 1, 5, wrz ∈ {0, 1};

(17)

r =1

• maximising   yz = μC˜ (x, z) = max wrz μC˜ r (x, z) , z ∈ 1, 5, r ∈ 1, 48, wrz ∈ {0, 1}. r

(18)

Thus, the mathematical model of a neuro-fuzzy expert system based on bellshaped function, the product of the sums, the product, and the inverted product is presented in the form yz = μC˜ (x, z) = 1 −

48 r =1





wrz ⎝1 − wr ⎝

ni 4





wirj μ A˜ i j (xi )⎠μ B˜ r (z)⎠, z ∈ 1, 5.

i=1 j=1

(19)

12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …

151

Thus, the mathematical model of a neuro-fuzzy expert system based on trapezoidal function, minimisation of maximising, minimisation, and maximising is presented in the form





 yz = μC˜ (x, z) = max wrz wr min min max wirj μ A˜ i j (xi ) , μ B˜ r (z) , z ∈ 1, 5 r ∈1,48

i∈1,4 j∈1,n i

(20) For making decision on an egg development state for models (19)–(20), the method of a maximum of membership function is used z ∗ = arg max yz = arg max μC˜ (x, z), z ∈ 1, 5. z

z

(21)

4 Choice of Criteria for Efficiency Evaluation of Mathematical Models of a Neuro-fuzzy Expert System of Egg Development States Diagnostics In work for assessment of parametric identification of mathematical models of a neuro-fuzzy expert system of egg development states diagnostics (19), (20) are chosen: • accuracy criterion which means the choice of such values of parameters θ = (α11 , β11 , γ11 , ..., α43 , β43 , γ43 ) or θ = (α11 , β11 , γ11 , δ11 , ..., α43 , β43 , γ43 , δ43 ), which deliver a minimum of a mean square error (the differences of an output on model and a test output) F=

P 5 1 (y pz − d pz )2 → min θ 3P p=1 z=1

(22)

where d pz —test output, d pz ∈ {0, 1}, y pz —output on model, P—number of test realisation; • reliability criterion which means the choice of such values of parameters θ = (α11 , β11 , γ11 , ..., α43 , β43 , γ43 ) or θ = (α11 , β11 , γ11 , δ11 , ..., α43 , β43 , γ43 , δ43 ), which deliver a minimum of probability of the wrong decision (the differences of an output on model and a test output) F=

 P  1 arg max y pz = arg max d pz → min, θ P p=1 z∈1,5 z∈1,5

(23)

152

E. Fedorov et al.

⎧   ⎨ 1, arg max yz = arg max d pz z∈1,5 z∈1,5 arg max yz = arg max d pz = . ⎩ 0, arg max yz = arg max d pz z∈1,5 z∈1,5 z∈1,5

z∈1,5

• speed criterion which means the choice of such values of parameters θ = (α11 , β11 , γ11 , ..., α43 , β43 , γ43 ) or θ = (α11 , β11 , γ11 , δ11 , ..., α43 , β43 , γ43 , δ43 ), which deliver a minimum of computing complexity F = T → min .

(24)

θ

5 Numerical Research The numerical research of the offered mathematical models of a neuro-fuzzy expert system and a usual multilayer perceptron was conducted in a Python package. Computing difficulties, mean square errors (MSE), and probabilities of the wrong decisions on diagnostics of a state the developments of egg gained on the benchmark of Chicken [26] containing RGB of the image of 1080 × 800 in size are presented in Table 1 and from 380 images 80% of the images for the training sample and 20% of images for test and test sample, by means of artificial neural network of type multilayered perceptron (MLP) with the back propagation (BP) and the optimizer of a grey wolf (GWO), and the offered models (19) and (20) with the back propagation (BP) and the optimizer of a grey wolf (GWO), respectively, were in a random way selected. At the same time, MLP had 2 hidden layers (everyone consisted of 11 neurons, as well as an input layer). It was experimentally established that parameter ε = 0.05 (Table 1). Table 1 Quality characteristics of the diagnostics of the state of development of the egg Model and method of parameters identification

MSE

Probability of the wrong decision

Computing complexity

Usual MLP with BP in the consecutive mode

0.49

0.19

T = PN

Usual MLP with GWO without parallelism

0.38

0.14

T = PNI

Author’s model (19) with BP in batch mode with bell-shaped membership function

0.10

0.04

T =N

Author’s model (20) with GWO with parallelism with trapezoidal membership function

0.05

0.02

T =N

12 State Diagnostics of Egg Development Based on the Neuro-fuzzy Expert …

153

According to Table 1, the best results are yielded by model (20) with identification of parameters based on GWO and with trapezoidal membership function. Based on the experiments made, it is possible to draw the following conclusions. Identification procedure of parameters based on the optimizer of a grey wolf is more effective than a method of training at a basis of the back propagation due to the automatic choice of structure of models, reduction of probability of hit to a local extremum, and use of technology of parallel information processing.

6 Conclusions For a solution of the problem of increase in efficiency of diagnostics of egg development states, the corresponding methods of artificial intelligence were investigated. This research showed that today, the most effective is use of a neuro-fuzzy expert system in combination with metaheuristics. The novelty of a research is that the offered neuro-fuzzy expert system provides representation of knowledge about states of development of egg in the form of the fuzzy rules available to understanding by the person and reduces computing complexity, probability of the wrong decision and a mean square error due to the automatic choice of structure of model, reduction of probability of hit in a local extremum and uses of technology of parallel information processing for the optimizer of a grey wolf and the back propagation in batch mode. As a result of a numerical research, it was established that the offered neurofuzzy expert system provides probability wrong the made decisions on states of development of egg 0.02 and mean square error 0.05. Further prospects of a research are use of the offered neuro-fuzzy expert system for various intellectual decision support systems.

References 1. Kertész I, Zsom-Muha V, András R, Horváth F, Németh C, Felföldi J (2021) Development of a novel acoustic spectroscopy method for detection of eggshell cracks. Molecules 26:1–10. https://doi.org/10.3390/molecules26154693 2. Zhihui Z, Ting L, Dejun X, Qiao-hua W, Meihu M (2015) Nondestructive detection of infertile hatching eggs based on spectral and imaging information. Int J Agric Biol Eng 8(4):69–76. https://doi.org/10.25165/ijabe.v8i4.1672 3. Lai C-C, Li C-H, Huang K-J, Cheng C-W (2021) Duck eggshell crack detection by nondestructive sonic measurement and analysis. Sensors 21:1–11 4. Sun L, Feng S, Chen C, Liu X, Cai J (2020) Identification of eggshell crack for hen egg and duck egg using correlation analysis based on acoustic resonance method. J Food Process Eng 43(8):1–9. https://doi.org/10.1111/jfpe.13430 5. Teimouri N, Omid M, Mollazade K, Mousazadeh H, Alimardani R, Karstoft H (2018) On-line separation and sorting of chicken portions using a robust vision-based intelligent modeling approach. Biosys Eng 167:8–20

154

E. Fedorov et al.

6. Nikolova M, Zlatev ZD (2019) Analysis of color and spectral characteristics of hen egg yolks from different manufacturers. Appl Res Technics Technol Educ 7(2):103–122. https://doi.org/ 10.15547/artte.2019.02.005 7. Zhu Z, Ma M (2015) The identification of white fertile eggs prior to incubation based on machine vision and least square support vector machine. Int J Animal Breed Genetics 4(4):1–6 8. Bhuvaneshwari M, Palanivelu LM (2015) Improvement in detection of chicken egg fertility using image processing techniques. Int J Eng Technol Sci 2(4):65–67 9. Hamdany AHS, Al-Nima RRO, Albak LH (2021) Translating cuneiform symbols using artificial neural network. TELKOMNIKA Telecommun Comput Electron Control 19(2):438–443. https://doi.org/10.12928/telkomnika.v19i2.16134 10. Saifullah S, Suryotomo AP (2021) Chicken egg fertility identification using FOS and BP-neural networks on image processing. J Rekayasa Sistem dan Teknologi Informasi 5(5):919–926. https://doi.org/10.29207/resti.v5i5.3431 11. Geng L, Hu Y, Xiao Z, Xi J (2019) Fertility detection of hatching eggs based on a convolutional neural network. Appl Sci 9(7):1–16. https://doi.org/10.3390/app9071408 12. Fedorov E, Lukashenko V, Patrushev V, Lukashenko A, Rudakov K, Mitsenko S (2018) The method of intelligent image processing based on a three-channel purely convolutional neural network. CEUR Workshop Proc 2255:336–351 13. Saifullah S K-means Segmentation based-on lab color space for embryo egg detection, 1–11 (2021). arXiv:2103.02288. https://doi.org/10.48550/arXiv.2103.02288 14. Grygor OO, Fedorov EE, Utkina TY, Lukashenko AG, Rudakov KS, Harder DA, Lukashenko VM (2019) Optimization method based on the synthesis of clonal selection and annealing simulation algorithms. Radio Electron Comput Sci Control 2:90–99. https://doi.org/10.15588/ 1607-3274-2019-2-10 15. Fedorov E, Lukashenko V, Utkina T, Lukashenko A, Rudakov K (2019) Method for parametric identification of Gaussian mixture model based on clonal selection algorithm. CEUR Workshop Proc 2353:41–55 16. Loshchilov I CMA-ES with restarts for solving CEC-2013 benchmark problems. In: 2013 IEEE congress on evolutionary computation proceedings, pp 369–376 17. Patel VC, McClendon RW, Goodrum JW (1998) Development and evaluation of an expert system for egg sorting. Comput Electron Agric 20(2):97–116 18. Omid M, Soltani M, Dehrouyeh MH, Mohtasebi SS, Ahmadi H (2013) An expert egg grading system based on machine vision and artificial intelligence techniques. J Food Eng 118(1):70–77. https://doi.org/10.1016/j.jfoodeng.2013.03.019 19. Haykin S (2009) Neural networks and learning machines. Pearson Education, Inc., Upper Saddle River, New Jersey, 3rd ed 20. Du K-L, Swamy MNS (2014) Neural networks and statistical learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3 21. Rancapan JGC, Arboleda ER, Dioses JL, Dellosa RM (2019) Egg fertility detection using image processing and fuzzy logic. Int J Sci Technol Res 8(10):3228–3230 22. Yang X-S (2018) Nature-inspired algorithms and applied optimization. Springer, Charm 23. Nakib A, Talbi El-G (2017) Metaheuristics for medicine and biology. Springer, Berlin 24. Fedorov E, Nechyporenko O (2021) Dynamic stock buffer management method based on linguistic constructions. CEUR Workshop Proc 2870:1742–1753 25. Tsiutsiura M, Tsiutsiura S, Yerukaiev A, Terentiev O, Kyivska K, Kuleba M (2020) Protection of information in assessing the factors of influence. In: 2020 IEEE 2nd international conference on advanced trends in information theory proc, pp 285–289 26. Fedorov E Chicken eggs image models. https://github.com/fedorovee75/ArticleChicken/raw/ main/chicken.zip

Chapter 13

Analysis of Delay in 16 × 16 Signed Binary Multiplier Niharika Behera, Manoranjan Pradhan, and Pranaba K. Mishro

1 Introduction Multiplication is a basic and vital operation which helps in implementing algebraic arithmetic computations. It is equally important in both unsigned and signed operations, such as multiply and accumulate, fast Fourier transform, digital signal processors, microprocessors, and filtering applications. The processing speed of these processors is a dependent factor on the type of multipliers used. With the use of conventional approaches, less operational speed is experienced with respect to the need. Vedic mathematics was developed from Primeval Indian Vedas in the early twentieth century. It is a type of ancient mathematics discovered by an Indian mathematician Jagadguru Sri Bharati Krishna Tirthaji. According to him, Vedic mathematics consists of 16 sutras and 13 sub-sutras, which helps in computing calculus, geometry, conics, algebra, and arithmetic [1]. Normal mathematical problems can be easily solved and equally improved with the utilization of the Vedic mathematics. It is not only a mathematical astonishment but also analytical. The analytical support of Vedic mathematics has a standard of elevation which cannot be discarded. Due to these exceptional features, it has set up a flagship in the exploration of mathematical models. It is a very riveting area and gives a number of effective methods [2]. This can be employed in several fields of engineering such as computational and analytical models and digital signal processing. Sixteen Vedic algorithms (sutra) can be employed for solving the difficulties in the design of binary multipliers. Two sutras of Vedic mathematics such as Nikhilam and Urdhva Tiryagbhyam (UT) sutras are used commonly in multiplication operations. Similarly, Yavadunam sutra is used in square and cube operations. With the use of these sutras, the mathematical N. Behera (B) · M. Pradhan · P. K. Mishro Veer Surendra Sai University of Technology, Burla, Odisha 768018, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_13

155

156

N. Behera et al.

operation can be analyzed in a short period of time. This can further reduce the chip area in the FPGA board, and the processing delay can also be improved. The details about the signed and unsigned numbers are discussed in [3]. The author established a statement that the performance of multipliers is efficient by using these signed and unsigned numbers. Performance evaluation of squaring operation by Vedic mathematics multiplier is discussed in [4]. For squaring operation, they suggested a design architecture which can be easily implemented in a short time period by using Yavadunam sutra in VHDL. The performance is compared with conventional Booth’s algorithm. The parameters like time delay and area inhabited on the Xilinx Virtex are considered for the comparison purpose. However, the delay performance is not convenient. In [5], the authors extended the decimal algorithm to the binary number system. Most of the operations have been done by using only unsigned numbers and decimal numbers [6]. However, there is a scope of improvisation of speed, delay, and reducing the area. The Booth multipliers give a few number of one-sided multiplications (partial product) by concerning a bunch of multiplier portions while balanced through array multipliers [7]. Time delay and area performance are compared with conventional multipliers [8]. However, the higher carry propagation is the major issue. It occurred in larger operand proportion in array multiplier. A generalized architecture for cube operation based on Yavadunam sutra of Vedic mathematics is discussed in [9]. The sutra converts the cube of a higher magnitude number into a lower magnitude number and insertion operation. Using Xilinx ISE 14.5 software, the cubic architecture is synthesized and simulated using various FPGA devices for comparison purposes. Many research works have been reported by using these sixteen sutras such as addition, multiplication, division, square, and cubes [9–11]. In [12], authors reported a fast 16 × 16-bit Vedic multiplier utilizing sutra. However, less delay is obtained with the Wallace tree, Array, and Booth multipliers. Similarly, a multiplier acquired in application specific integrated circuit design utilizing Vedic sutra is discussed in [13]. FPGA implementation of complex multiplier based on signed Vedic multiplier is reported in [14, 15]. It multiplies signed numbers in 2’s complement form and produces the result in 2’s complement form [16–18]. All the early designs using Vedic sutra were commonly focused on unsigned operations. It has motivated us to design the signed multiplier using Vedic UT sutra. The suggested multiplier is designed with more than 1-bit of multiplier and multiplicand in every cycle. This technique is experimented with multiplication of more than one decimal number as well as binary numbers. The multiplication operations using the UT sutra are easier than the conventional approaches. We can easily find the product of large signed or unsigned numbers in one step using the suggested design. The architecture may also be useful for future exploration of signed bit multiplications. The rest of the paper is organized as follows: Sect. 2 explains the standard UT sutra. In Sect. 3, the proposed design is elaborated with a representation block diagram. The results and related discussion are presented in Sect. 4. Finally, the paper is concluded in Sect. 5 with future scope.

13 Analysis of Delay in 16 × 16 Signed Binary Multiplier

157

2 Urdhva Tiryagbhyam Sutra In this section, a generalized UT sutra is presented. The sutra is found to be suitable for all cases of multiplications. The explicit structure of this sutra is “vertically and crosswise” operation. It is applied for multiplication in both signed and unsigned number systems (Fig. 1). Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7:

y0 = m0 × n0. y1= m1 × n0 + m0 × n1 y2 = m2 × n0 + m0 × n2 + m1 × n1 y3 = m3 × n0 + m0 × n3 + m2 × n1 + m1 × n2. y4 = m3 × n1 + m1 × n3 + m2 × n2. y5 = m3 × n2 + m2 × n3. y6 = m3 × n3.

Example 1: Finding the signed number multiplication of two decimal numbers 12 by (−13) using UT sutra, as shown in Fig. 2. Step-1: The leftmost digit of 12 (1) is multiplied by the multiplicand vertically by the leftmost digit (−1) of the multiplier −13, getting their product (−1), and setting it down as the leftmost part of the answer. Step-2: The digits 1 of 12 with (−3) of −13 and (−1) of −13 with 2 of 12 are crosswise multiplied and added the two, getting (−5) as the sum, and setting it down as the middle part of the answer. Step-3: The digits 2 of 12 and (−3) of −13 are multiplied vertically, getting (−6) as their product, and putting it down as the last (the rightmost) part of the answer. Thus, 12 × (−13) = (−156). Step 4: Finally, left, middle, and right parts are concatenated to get the final signed product (−156).

Fig. 1 Generalized procedure of the UT sutra

158

N. Behera et al.

Fig. 2 Multiplication of decimal numbers using UT sutra

Fig. 3 Multiplication of binary numbers using UT sutra

Example 2: Finding the signed binary multiplication of (−1) (multiplicand) and (−3) (multiplier) using UT sutra, as shown in Fig. 3. Step-1: Finding the 2’s complement of multiplicand that is -1 = 1111. Step-2: Finding the 2’s complement of multiplier that is -3 = 1101. Step-3: Vertical multiplication of least significant bit (LSB) of the multiplicand and multiplier (1 × 1 = 1) is the right part result (RPR). Step-4: Cross-multiplication is performed among (1 × 1 + 1 × 0 = 1). 1 is stored as the 1st bit of the middle part of the result (MPR). Step-5: Cross-multiplication is performed among (1 × 1 + 1 × 1 + 1 × 0 = 10), and the LSB 0 will be stored as the 2nd bit of MPR and MSB 1, as the 1st bit of the carry c1. Step-6: Cross-multiplication is performed among (1 × 1 + 1 × 1 + 1 × 1 + 1 × 0 = 11), MSB 1 results as 2nd bit of carry c1, the LSB 1 is added with 1st bit of carry c1, and10 is formed. 0 is stored as 3rd bit of MPR, and 1 is taken as 1st bit of the carry c2.

13 Analysis of Delay in 16 × 16 Signed Binary Multiplier

159

Step-7: Cross-multiplication is performed among (1 × 0 + 1 × 1 + 1 × 1 = 10), the MSB 1 results as the3rd bit of the carry c1, the LSB 0 is added with the 2nd bit of c1 and 1st bit of c2, and 11 is obtained. The LSB of 1 is stored as the 4th bit of MPR, and MSB of 1 is taken as the 2nd bit of c2. Step-8: Cross-multiplication is performed among (1 × 1 + 1 × 1 = 10), the MSB 1 behaves as4th bit of the carry c1, the LSB 0 is added with the 3rd and 2nd bit of c2, and 11 is obtained. The LSB of 1 is stored as the 5th bit of MPR, and the MSB of 1 is taken as the 3rd bit of c2. Step-9: Lastly, the vertical multiplication is performed among the MSB of multiplicand and multiplier (1 × 1 = 1), which is then added with the 4th bit of c1 and 3rd bit of c2, and 11 is obtained. The LSB of 1 is stored as the 6th bit of MPR, and MSB of 1 is stored as the left part result (LPR). So concatenation of LPR, MPR, and RPR gives the final result.

3 Proposed Design The block diagram of signed multiplier using conventional method is shown in Fig. 4. The two inputs m and n correspond to multiplier and multiplicand which have 16-bit, respectively.

Fig. 4 Proposed 16 × 16 signed multiplier architecture using conventional method

160

N. Behera et al.

The complement of multiplicand or multiplier is symbolized by taking 2’s complement of that 16-bit binary number. So the 2’s complement part generates the complemented output of 16-bit multiplier operand (−m) and 16-bit multiplicand operand (−n). The complemented output of 16-bit multiplier operand and 16-bit multiplicand operand is two inputs of the signed multiplier module. After signed multiplication, the 32-bit product has been stored as the final product. The proposed 16 × 16 signed Vedic multiplier (VM) using “Urdhva Tiryagbhyam” sutra is shown in Fig. 2. Four 8 × 8 signed Vedic multiplier modules, one 16-bit ripple carry adder (RCA), and two 17-bit binary adder stages are required for the proposed 16 × 16 signed Vedic multiplier. 16-bit RCA and 17-bits binary adder modules of proposed architecture used to make the concluding 32-bits signed multiplier (s31–s16) and (s15–s8) and (s7–s0). The s7–s0 (8-bits) of the product shows the 8-bits least significant part of the 16-bit resultant of the right hand part 8 × 8 signed multiplier module. The 16-bit RCA adds three input 16-bit operands, i.e., concatenated 16-bit (“11111111” and the major important eight bits output of rightmost part 8 × 8 signed VM module), every 16-bit output of next and third 8 × 8 signed VM modules. The 16-bit RCA fabricates two 16-bit output operands, sum part, and carry part. The outputs of the RCA are fed into the initial 17-bit binary adder to produce 17-bit summation. The middle part (s15–s8) corresponds to the negligible amount of large eight bits of 17-bit summation. The 16-bit output of the left hand part 8 × 8 signed VM module and concatenated 16-bits (“1111111” and the important nine bits of 17-bits sum part) is cropped into the next 17-bit binary adder. The s31–s16 corresponds to the 16-bit summation. Final product is 33-bit; 32-bit is taken from the left significant part of products.

4 Results and Discussion The conventional design as shown in Fig. 4 has been coded and synthesized using Verilog hardware description language through the ISE Xilinx 14.5 software. Figure 6 shows the simulation waveform of our proposed conventional signed architecture. Here for 16 × 16-bit multiplier, applied input a = 1111111111011111 (−33) and input b = 1111111111111111 (−1) and obtained 32-bit product for the same y = 00000000000000000000000000100001(33). The proposed 16 × 16 binary signed VM multiplier design as shown in Fig. 5 is synthesized using Verilog hardware description language through the ISE Xilinx 14.5 software. Further, the proposed design is implemented in Virtex 4 FPGA device. Figure 7 (in signed decimal radix) and Fig. 8 (in binary radix) show the simulation waveforms of the proposed 16 × 16 signed architecture using UT sutra. Here for 16 × 16-bit multiplier, applied input a = 0000000000001111 (15) and input b = 1111111111111110 (−2) and obtained 32-bit product for the same y = 11111111111111111111111111100010 (−30). Here, we multiplied the unsigned number with the signed number using UT sutra and got the appropriate result.

13 Analysis of Delay in 16 × 16 Signed Binary Multiplier

Fig. 5 Proposed 16 × 16 signed VM architectures

Fig. 6 Simulation result of 16 × 16 conventional signed multiplier

Fig. 7 Simulation result of 16 × 16 signed VM multiplier

Fig. 8 Simulation result of 16 × 16 binary signed VM multiplier

161

162 Table 1 Synthesis report of proposed multipliers

N. Behera et al. Device: virtex 4 vlx15sf363-12

16 × 16 conventional multiplication

16 × 16 signed VM

Path delay (ns)

10.948

9.956

Area (4 input LUT)

64 out of 10,944

25 out of 10,944

Power (W)

166.38

166.38

FPGA implementation is the most important tool in Verilog. The performance of the proposed multiplier design is examined through different family of FPGA devices. From Table 1, it is observed that the path delay means the total time required for the designing of the multiplier and considers it in terms of nanosecond (ns). Area is the total lookup table (LUT) which is used to design the proposed architecture. Power consumption is also an important parameter. More power consumption is the demerit for the multiplier. Table 1 shows the synthesis report of conventional multiplier and our proposed signed VM multiplier using Virtex 4 FPGA devices. From the above table, it has been observed that the proposed signed VM architecture shows a significant improvement in the area delay than the conventional multiplication methods. The area of the designs is noted on the basis of the number of lookup tables. Hence, the Vedic multiplier has played a significant role in the proposed work. After multiplication operation using UT sutra, we obtained the results and compared them with the existing standard models. The proposed structure is also implemented by using Spartan 3, XC3S50 FPGA device. Table 2 illustrates the comparison of proposed 16 × 16 signed VM design with the Wallace tree multiplier [12], Booth multiplier [12], signed Vedic multiplier [15], signed Vedic multiplier [8]. The suggested design has almost 72.62, 65.97, 58.21, and 36.45% less combinational delay over [8, 12, 15]. Table 2 Delay comparison for the proposed signed VM in Spartan-3 device

16 bit multiplier

Combinational path delay (ns)

Percentage of improvement

Wallace tree multiplier [12]

46.046

72.62

Booth multiplier [12]

37.041

65.97

Signed Vedic multiplier [15]

30.16

58.21

Signed Vedic multiplier [8]

19.832

36.45

Proposed signed VM

12.603

13 Analysis of Delay in 16 × 16 Signed Binary Multiplier

163

5 Conclusion In this work, we proposed the design of a conventional signed multiplier and an efficient signed Vedic multiplier using UT sutra. A simplified design of signed multiplier is presented to be used in digital verification systems. It presents an architecture from unsigned decimal number to signed binary number in the multiplication process. The suggested design is synthesized using ISE Xilinx 14.5 and implemented using different FPGA devices. From the results, we observed that the use of UT sutra in signed binary multipliers helps in reducing the combinational path delay and area. This also improves the system performance in terms of execution speed. The proposed design is compared with the previously reported multiplier architectures [8, 12, 15]. From the results, the superiority of the suggested design is claimed. The work can be extended in implementation with higher bit size multiplier design in future.

References 1. Tirtha S, Agrawala V, Agrawala S (1992) Vedic mathematics. Motilal Banarsi Dass Publ, India 2. Patali P, Kassim S (2020) An efficient architecture for signed carry save multiplication. IEEE Lett Comput Soc 3(1):9–12 3. Parhami B (2010) Computer arithmetic: algorithms and hardware designs. Oxford University Press, New York 4. Poornima M, Shivukumar S, Shridhar KP, Sanjay H (2013) Implementation of multiplier using vedic algorithm. Int J Innov Technol Explor Eng 2(6):219–223 5. Gorgin S, Jaberipur G (2009) A fully redundant decimal adder and its application in parallel decimal multipliers. Microelectron J 40(10):1471–1481 6. Thapliyal H, Arbania HR (2004) A time-area-power efficient multiplier and square architecture based on ancient Indian vedic mathematics. In: Proceedings of the international conference on VLSI (VLSI’04), Las Vegas, Nevada, pp 434–439 7. Thapliyal H, Srinivas MB (2004) High speed efficient N × N parallel hierarchical overlay multiplier architecture based on ancient Indian vedic mathematics. Trans Eng Comput Technol 8. Sahoo S, Bhoi B, Pradhan M (2020) Fast signed multiplier using Vedic Nikhilam algorithm. IET Circuits Dev Syst 14(8):1160–1166 9. Barik R, Pradhan M (2017) Efficient ASIC and FPGA implementation of cube architecture. IET Comput Digital Tech 11(1):43–49 10. Kasliwal PS, Patil BP, Gautam DK (2011) Performance evaluation of squaring operation by vedic mathematics. IETE J Res 57(1):39–41 11. Sethi K, Panda R (2015) Multiplier less high-speed squaring circuit for binary numbers. Int J Electron 102(3):433–443 12. Bansal Y, Madhu C (2016) A novel high-speed approach for 16 × 16 vedic multiplication with compressor adders. Comput Electr Eng 49:39–49 13. He Y, Yang J, Chang H (2017) Design and evaluation of booth- encoded multipliers in redundant binary representation. In: Proceedings of embedded systems design with special arithmetic and number systems, pp 113–147 14. Palnitkar S (2003) Verilog HDL: a guide to digital design and synthesis. Prentice Hall Professional, India 15. Barik RK, Pradhan M, Panda R (2017) Time efficient signed vedic multiplier using redundant binary representation. J Eng 2017(3):60–68

164

N. Behera et al.

16. Madenda S, Harmanto S (2021) New approaches of signed binary number multiplication and its implementation in FPGA. Jurnal Ilmiah Teknologi dan Rekayasa 26(1):56–68 17. Imaña JL (2021) Low-delay FPGA-based implementation of finite field multipliers. IEEE Trans Circuits Syst II Express Briefs 68(8):2952–2956 18. Ullah S, Schmidl H, Sahoo SS, Rehman S, Kumar A (2020) Area-optimized accurate and approximate softcore signed multiplier architectures. IEEE Trans Comput 70(3):384–392 19. Paldurai K, Hariharan K (2015) Implementation of signed vedic multiplier targeted at FPGA architectures. ARPN J Eng Appl Sci 10(5):2193–2197 20. Pichhode K, Patil M, Shah D, Chaurasiya B (2015) FPGA implementation of efficient vedic multiplier. In: Proceedings of international conference of IEEE, pp 565–570

Chapter 14

Review of Machine Learning for Antenna Selection and CSI Feedback in Multi-antenna Systems Garrouani Yassine, Alami Hassani Aicha, Mrabti Fatiha, and Dhassi Younes

1 Introduction In the last three decades, multi-antenna systems were suggested as the best solution to improve a wireless communication system’s reliability. Ranging from single-user MIMO systems to massive MIMO ones, they succeeded in combating deep fading events that single-antenna systems used to suffer from, this was mainly achieved using the transmit diversity and receive diversity setups. In addition, thanks to spatial multiplexing, many users had become able to be served simultaneously over the same time–frequency resource, which contributed substantially in improving the overall system’s spectral efficiency. However, for these breakthroughs to come to fruition, there are some limiting factors that need some consideration during the design of such systems. On the first hand, Gao et al. [1] showed that within the antenna array, there might be some impairing antennas not performing as expected causing a degradation in the system performance. Moreover, the hardware cost and energy consumption have also been two subjects of concern, especially that in an ideal setup, every antenna should have its own radio-frequency (RF) module. On the other hand, as the wireless communication is being carried over an ever changing environment, the manifestation of phenomena as fast-fading caused by users with high-mobility, pilot contamination arising from pilot reuse in neighboring cells and so on is to be expected, and this makes the task of acquiring channel state information (CSI) more challenging especially in systems operating in the frequency division duplexing (FDD) mode. In such systems, the wireless channel is estimated in downlink by the user equipment (UE) and then fed back to the base station (BS) on the uplink which introduces an overhead that scales with the number of antennas at the base station side. G. Yassine (B) · A. H. Aicha · M. Fatiha · D. Younes Sidi Mohamed Ben Abdellah University Fez, Fes, Morocco e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_14

165

166

G. Yassine et al.

To overcome the limitations mentioned above, and as a workaround for the impairing antenna issue as well as the cost and energy concerns, Arash et al. [2] and Ouyang and Yang [3] have suggested to deploy less RF modules than antennas and activate only the antennas that optimize the system performance. This implies the selection of a subset of antennas among the available ones with respect to a given criterion such as bit-error-rate (BER) or signal-to-noise ratio (SNR), the unselected antennas are seen as degrees of freedom that could be resorted to later when their conditions become favorable. Machine learning (ML) has recently emerged as one of the best choices that can be resorted to in order to handle the problems emerging in wireless communication systems in a new way and precisely following a data-driven approach as detailed by Wen et al. [4] and Zhang et al. [5]. Besides the diverse optimization techniques used by Jounga and Sun [6], Gorokhov et al. [7] and Gharavi-Alkhansari and Gershman [8] and mainly the conventional schemes relying on exhaustive search and on iterative algorithms, Joung [9], Yao et al. [10] and Yao et al. [11] used machine learning (ML) to approach this kind of problems in a new way, more specifically, consider the problem of antenna selection as a multiclass classification problem. Regarding the overhead in FDD-based massive MIMO systems, many researchers proposed some techniques to mitigate its impact on the system performance by means of CSI compression prior to feedback. To fulfill this compression task, Wen et al. [12], Yang et al. [13], Lu et al. [14] and Liu et al. [15] used machine learning techniques and especially deep learning ones. The following paper is organized as follows: Sect. 2 describes the system model of transmit antenna selection (TAS) for single-user MIMO as well as untrusted relay networks adopted by Joung [9], Yao et al. [10] and Yao et al. [11]. Next, we describe in detail the ML-based schemes that were suggested to solve the TAS problem. In Sect. 3, we discuss the problem of channel state information (CSI) in FDD-based massive MIMO systems and especially the overhead originating from feeding back the measured CSI to the base station transceiver. Next, we describe the methods that suggested the use of ML-based schemes to achieve optimal CSI feedback before moving to the explanation of the compression via convolutional neural networks. Finally, Sect. 4 wraps up the paper.

2 TAS for Single-User MIMO and Untrusted Relay Networks 2.1 System Model As illustrated in Fig. 1, Joung [9] assumed a single-user MIMO system where the transmitter and the receiver are equipped with N T and N R antennas, respectively. At the transmitter side, only N s RF chains are deployed where N s < N T . The wireless channel between the two communicating sides is represented by a matrix of size N R × N T noted as H and described as follows:

14 Review of Machine Learning for Antenna Selection and CSI Feedback …

167

Fig. 1 MIMO system with antenna selection



h 11 h 12 ⎢h ⎢ 21 h 22 ⎢ H =⎢ . . ⎢ ⎣ . . h NR 1 h NR 2

. . . . .

⎤ . h 1NT . h 2NT ⎥ ⎥ ⎥ . . ⎥ ⎥ . . ⎦ . h N R NT

(1)

where each hij coefficient represents the complex fading coefficient between the ith receiving antenna and the jth transmitting antenna. The magnitude |hij | of these coefficients is assumed to be Rayleigh distributed. The communication between the two sides is modeled using the equation below: y = H.x + n

(2)

where y is the N R × 1 signal received at the receiver side, H is the CSI matrix representing the wireless channel over which communication is being carried, x is the N T × 1 data signal transmitted by the base station and n represents the AdditiveWhite-Gaussian-Noise. The aim is to select the best Ns antennas out of the N T available antennas with respect to a key performance indicator (KPI) and activate them for communication. To perform this, Joung [9] considered the use of K-nearest neighbors (KNNs) and support vector machine (SVM) to build a CSI classifier that takes as input a new CSI matrix and outputs the subset for which the communication would be optimal. Regarding the untrusted relay networks, Yao et al. [10] and [11] considered a TDDbased wireless communication system comprising a source that is equipped with NT transmitting antennas and serving a single-antenna destination. They assumed the communication to be carried along a non-line of sight path with the presence of

168

G. Yassine et al.

Fig. 2 Untrusted relay network

shadowing objects, and hence, the communication between the source and the destination is achieved via an untrusted relay node that serves primarily for amplifying the attenuated signals prior to forwarding them to the end user. The wireless channel between the source and the relay is represented by a vector h of size N T × 1, while the channel between the relay and the destination is represented by a scalar g. Each entry in h represents the complex fading coefficient between a transmitting antenna and the single-antenna relay node. The communication is performed in two phases as illustrated in Fig. 2. Within the first phase, the source sends pre-coded data to the relay, and the destination as well sends a jamming signal that serves primarily for preventing the relay from interpreting the pre-coded data. During the second phase, the relay amplifies the signal and forwards it to the destination. As approached by Joung [9], the authors considered the source to have only N s RF modules where N s < N T, and the aim is to find the best antenna subset with respect to a KPI. They suggested the use of the following ML techniques to solve the combinatorial search problem: SVM, Naïve Bayes, KNN and deep neural networks (DNNs).

2.2 Dataset Generation and CSI Classifier Building: Joung [9] generated a training set comprising M channel realizations of size N R × N T . As a KPI, the singular value decomposition (SVD) of submatrices was used to evaluate the C NNTs possible antenna combinations of each training CSI matrix looking for the best antenna combination. The purpose of this operation is to label the training CSI matrices against which new channel realizations will be evaluated. After that, each of the M CSI matrices is reshaped to a 1 by N feature vector where N = N R . N T and its coefficients are constructed by computing the squared magnitude |hij |2 of each complex channel coefficient and assigned the label found during the evaluation phase. Similarly, Yao et al. [10] and [11] generated a training set containing N channel realizations of size N T × 1 and computed the magnitude |hij | of each complex fading coefficient to construct the feature vector as follows: {|h 1 |, |h 2 |, . . . , |h NT |, |g|}. As a KPI, they used the secrecy rate and evaluated the C NNTs antenna combinations looking for the one maximizing that rate. The obtained training datasets in the three aforementioned papers are normalized prior to evaluating new channel realizations.

14 Review of Machine Learning for Antenna Selection and CSI Feedback …

169

Fig. 3 KNN classification

2.2.1

KNN

It is a non-parametric method that can be used for classification as well as for regression. In the case of multi-antenna systems, KNN is used as a classification method to build a CSI classifier. It is a quite simple algorithm that classifies a sample into the class of the majority of its K closest neighbors, the closest in terms of the Euclidean distance for example as illustrated in Fig. 3. From the example above, we can see clearly how the choice of K influences the classification decision. In the case of antenna selection, this bias can be considered as a shortcoming that might lead to choosing the less optimal subset of antennas. So, K is also subject to optimization. Joung [9] and Yao et al. [10] preceded the evaluation of a new channel realization by extracting its feature vector as well as normalizing it. After that, an optimal antenna subset could be found by computing the Euclidean distance between every feature vector in the training dataset and the newly constructed feature vector. Once done, the results are sorted in the ascending order. Finally, among the K smallest values, the label of samples that represent the majority is chosen, and the new CSI matrix is assigned that label.

2.2.2

SVM

As KNN, SVM also can be used for both classification and regression. But unlike KNN, SVM aims to find a decision boundary or a hyperplane that helps classify the data points into different classes as illustrated in Fig. 4. There are many hyperplanes or boundaries that could separate the two classes of data. However, SVM aims to find

170

G. Yassine et al.

Fig. 4 SVM classification

a boundary that is located at a maximal distance from the data points of both classes; this will help to classify future data points accurately and with more confidence. Since antenna selection is approached as a multi-class classification problem, Joung [9] and Yao et al. [10] used the one-vs-all also known as one-vs-rest method to find the boundary separating an antenna subset from the remaining subsets. As binary classification was used, the M labeled training feature matrix is split into two sub-training matrices. One is containing the feature vectors having the same label, and the second is containing the remaining training samples. For every binary classification, a 1 by M label vector is generated. Its elements will be set to 1 for the indices having the same label and 0 otherwise. The boundary between each class and the remaining classes is found via solving a logistic regression problem where the purpose is to find the optimal parameters that minimize a cost function. If there are L classes, then there will be L optimal parameters, i.e., {θ1 , θ2 , . . . ., θ L }. After finding all thetas, a new channel realization could be evaluated. Its feature vector f is extracted, normalized and then for i ∈ L , θiT . f is computed. The largest value is kept, and the new channel realization is assigned its label.

2.2.3

Naïve Bayes

The Naïve Bayes algorithm is a probabilistic classifier based on the Bayes formula expressed below. It is a quite simple algorithm to implement. To predict the class of an input sample, the algorithm considers each of the features of the input sample to contribute independently to the probability that the sample is part of a given class of samples. It overlooks the correlations between features and the relationships that might be between them. In other words, it assumes that the existence of a feature in a

14 Review of Machine Learning for Antenna Selection and CSI Feedback …

171

class of samples is unrelated to the existence of the remaining features in this class. The formula below enables us to compute the posterior probability of a new feature vector f belonging to a given class c: P(c| f ) =

P( f |c).P(c) P( f )

(3)

To perform transmit antenna selection as suggested by Yao et al. [10], for every new channel realization, the feature vector is constructed according to the operations carried in the training dataset generation and the posterior conditional probability that this vector belongs to a given class is computed. This operation is repeated for all classes. Among all the computed probabilities, the maximal value is kept, and the vector is assigned that class label.

2.2.4

DNN

Concerning the deep neural networks, they are structured into many layers as illustrated in Fig. 5. Each layer comprises a set of nodes called neurons connected to the nodes of the next layer. The transition from a layer to its successor is achieved through a matrix called the weights matrix. This latter controls the mapping between layers. At the output layer level, we get a hypothesis function called the classifier that can predict the class label of an unseen input. Yao et al. [11] fed the labeled training dataset to the neural network for the purpose of learning the mapping between each feature vector in the training set and its corresponding label. Since the transmit antenna selection is approached as a multi-class classification problem, the output layer comprises L nodes.

Fig. 5 Neural network with multiple nodes at the output layer

172

G. Yassine et al.

The following table contains a summary of the setups used to simulate the MLbased classifiers in the aforementioned papers and presents the pros as well as the cons of the adopted ML algorithms.

2.3 Performance Analysis of ML-Based CSI Classifiers Besides the advantages and disadvantages summarized in Table 1, the performance of the adopted ML algorithms has to be evaluated with respect to the operations carried in the training phase as well as the decision-making one not to forget their extensibility to serve large-scale MIMO systems. Regarding the training carried by Joung [9], Yao et al. [10, 11, Junior et al. [16] and Vu et al. [17], evaluating all the possible combinations looking for the best one might be affordable in single-user MIMO systems as the number of antennas does not exceed eight. But for massive MIMO systems where the number of antennas deployed at the base station side is large, this becomes computationally heavy and time consuming. One way to accomplish this phase is by offloading the training to an edge datacenter where the evaluation of the training CSI matrices could be performed using parallel computations; this will accelerate the process of building the CSI classifiers. For KNN, it computes M Euclidean distances and bases its decisions only on the K smallest ones. Computing M Euclidean distances is acceptable in the case of single-user MIMO systems as the size of the flattened feature vector is relatively small and the dataset comprises only 2000 samples, but for massive MIMO systems, this becomes heavy as the size of that vector gets larger as well as training dataset size. Moreover, the choice of K is crucial as different values of it can lead to different outcomes, which makes it a parameter subject also to optimizations and should not be set statically. In addition, a high memory is required to store the M labeled training feature vectors at the base station side. Regarding SVM, binary classification was used to solve the multi-class classification problem, and this means that C NNts binary classifications are required before finding the class label of the best antenna combination; this implies also solving C NNts regression problems looking for the optimal parameters that minimize the cost function. As the number of antennas in a massive MIMO system is large, the number of possible combinations gets larger. Consequently, the training phase will require a considerable time before converging and ultimately, finding out the hyperplanes that separate the different classes. Not to forget that for both KNN and SVM, the number of samples in the training set must be way greater than that of possible antenna combinations (M > C NNts ). For Naïve Bayes approach, it computes the probability of a new feature vector given that it belongs to a specific class, which implies computing the frequency of occurrence of each of its coefficients within the training feature vectors. In a worst case, given that all possible classes might have occurred in the training phase, C NNts posterior probabilities have to be computed which might be affordable in single-user MIMO but not in large-scale MIMO systems. In addition, for some new CSI realizations, it is possible to get a null conditional probability for all classes and hence, become

14 Review of Machine Learning for Antenna Selection and CSI Feedback …

173

Table 1 Summary of setups for TAS in single-user MIMO and untrusted relay networks System

Single-user MIMO

{N T , N S , N R }

{8, 1, 1} and {6, 2, 2} {6, 2, 1} and {6, 1, 1}

Training dataset size 2000 CSI matrices

Untrusted relay node 10,000 CSI matrices

Number of classes

C81 = 8andC62 = 15

C61 = 6andC62 = 15

KPI

Max[SVD(H min )]

Secrecy rate

|2

Feature vector

|h i j

Algorithms

KNN and SVM

|h i j | SVM, KNN, Naïve Bayes and DNN

Pros KNN

It is easy to implement and fast as there is no training phase. There is a variety of metrics: Euclidean, Manhattan, Minkowski distance

SVM

It is accurate on cleaner and relatively small datasets

Naïve Bayes

It is simple, fast on small datasets and suitable for real-time applications; in addition, it is not sensitive to irrelevant features

DNN

It is fast once trained, and its performance improves with more datasets

Cons KNN

It is slow for large datasets as in the case of TAS for massive MIMO systems. In addition, it requires a high memory to store the training data. Not to forget the difficulty of choosing an optimal k

SVM

It is slow for large datasets and performs poorly on noisier datasets

Naïve Bayes

It overlooks the dependency between features especially the correlation between adjacent antennas, and it suffers from the zero-frequency problem

DNN

It is difficult to choose a network model. It is seen as a black box which makes the interpretability of the output difficult, and it is expensive computationally

unable to select the optimal antenna subset. This is a problem that Naïve Bayes approach suffers from and could be combatted only if the size of the training dataset is increased so that the odds of getting null conditional probability gets substantially minimized. Regarding DNN, the number of neurons in the input layer is equal to N = N t . N r which is manageable in single MIMO systems but might not be in massive MIMO ones. In addition, unlike KNN, SVM and Naïve Bayes, the black box nature of neural networks makes the interpretability of the output class label difficult which gets complicated when there are many hidden layers in the network. Not to forget the choice of a network architecture which might require adjustments if not testing diverse architectures before finding the one that performs better on our problem. In addition to the above points, as the environment over which wireless communication is carried evolves in time, so should training. If the built models were trained on

174

G. Yassine et al.

a given time period, this implies that there exist some antennas that outperforms their sisters over that period of time but not necessarily over the upcoming time periods. This requires re-training the models on a regular basis so that they can keep up with the future changes and hence avoid penalizing some antennas. The frequency at which re-training should be carried depends on the long-term behavior of the wireless channel which needs to be determined. We can reduce the rate at which retraining will occur by performing training on channel realizations that were acquired on different time instances with enough separation in time.

3 FDD-Based Massive MIMO Systems For massive MIMO systems, the time division duplexing (TDD) is considered the preferred operating mode when compared to frequency division duplexing (FDD) as the channel estimation is performed only by the base station transceiver on uplink, and the acquired CSI matrix is used to pre-code users’ data in downlink, this has been examined thoroughly by Flordelis et al. [18]. In addition, this trend is motivated by the fact that channel estimation in FDD mode introduces a high overhead. In this mode, the channel needs to be estimated by users in the downlink and then fed back to the base station in the uplink. Moreover, the feedback overhead increases as the number of antennas in the array gets increased making the system perform in a less optimal way. Many researchers proposed some techniques to enable FDD-based massive MIMO systems; most of them aim mainly to reduce the feedback overhead by means of CSI compression. Their concept can be summarized as follows: The channel estimated by UEs on the downlink undergoes some processing before being fed back to the BS. To reduce the dimensionality of the estimated CSI matrix, its key features are extracted and then compressed before being sent back to the base station. In the following part, we are giving a brief description of the compression scheme adopted by Wen et al. [12], Yang et al. [13], Lu et al. [14] and Liu et al. [15] before we explain in details the core technique of these compression schemes which is the convolutional neural network.

3.1 System Model The system model adopted in the aforementioned papers is described as follows: The BS is equipped with NT antennas and serves single-antenna UE. The multi-carrier transmission technique known as OFDM is used, and data is transmitted over N C subcarriers. The communication between the BS and the UEs is modeled by the following equation: y = H.v.x + n

(4)

14 Review of Machine Learning for Antenna Selection and CSI Feedback …

175

where y is the signal received at the UE, H is a matrix of size N C × N T representing the wireless channel response over the N C subcarriers between the BS and the served UE, v is the pre-coding vector used to pre-code the signal x in downlink and n is the Additive-White-Gaussian-noise. The aim is find the least possible representation of H to and feed it back to the BS. To fulfill this task, deep learning (DL) techniques were used. Wen et al. [12] used compressed sensing technique which is a technique that enables the recovery of an estimated channel from fewer numbers of samples compared to the number of samples required by the Nyquist-Shannon theorem. It assumes the channel to be sparse in some spatial directions and hence can be represented using short code-words which reduces considerably the feedback overhead. The suggested CSI sensing and recovery network is called CsiNet, and it comprises an encoder that serves primarily for dimensionality reduction and a decoder at the BS side to recover the CSI from the received code-words. The architecture of CsiNet is as follows: The estimated CSI matrix at the UE is transformed to the angular-delay domain using the 2D discrete Fourier transform (DFT), and its real and imaginary parts are fed to a convolutional neural network (CNN) for feature extraction. After that, the acquired features are fed to a fully connected layer to compute the appropriate code-word. At the BS station side, the reverse operations are carried out. The received code-word is fed to a fully connected layer to reconstruct real and imaginary parts of H. To refine further the reconstructed CSI matrix, RefineNet units are used. Ultimately, the real and imaginary parts of H are combined to start pre-coding data in downlink. Yang et al. [13] proposed a network that comprises a feature extractor module, a quantization module and an entropy encoder module as illustrated in 8. The feature extraction is achieved through a CNN that aims to output a low representation of the channel matrix, the quantization module quantizes this low representation into a discrete valued vector, while the last module tries to achieve the most possible compression rate by means of entropy encoding. The same modules are reversed at the base station level. Its CSI recovering module comprises a feature decoder as well as an entropy decoder. Figure 6 illustrates the proposed architecture: Lu et al. [14] proposed a recurrent neural network (RNN) whose architecture is illustrated in Fig. 7. It is composed of a UE encoder that comprises a feature extractor that extracts the key features of the channel and a feature compressor that tries to represent the key features by the shortest possible bit sequence, it encompasses a longshort term memory to infer the correlation that might exist between different inputs,

Fig. 6 CSI feedback with entropy encoder/decoder

176

G. Yassine et al.

Fig. 7 CSI feedback with encoder–decoder modules

which will enable the system to catch the temporal correlation between channel realizations and hence improve the performance of the hypothesis function. The decoder at the base station side performs the reverse processing. It comprises a feature decoder and a feature decompressing module. The architecture described above is illustrated in Fig. 7. In order to further improve the downlink estimated CSI for FDD systems, Liu et al. [15] proposed a scheme that exploits the correlation that has been proved in the previous works between the downlink and the uplink channels. The encoder separates the magnitude of the CSI matrix from the phase. This latter is quantized and then sent directly to the base station, while the magnitude is fed to a convolutional layer for feature extraction. After reshaping the acquired feature map, it is fed to a fully connected layer to compute the code-word corresponding to the magnitude of the input channel realization. On the other hand, the decoder does the reverse processing but in addition, it exploits the estimated uplink CSI available at the base station side to improve the estimated downlink CSI, and it uses two blocks of residual layers to overcome the vanishing gradient descent problem which prevents artificial neural network from keeping training. At the output level, the estimated magnitude as well as the estimated phase are combined together to recover the channel matrix. Table 2 summarizes the setups used to simulate the DL-based CSI feedback scheme suggested in the papers mentioned above.

3.2 CNN for Compression The common factor between the feedback schemes suggested by Wen et al. [12], Yang et al. [13], Lu et al. [14], Liu et al. [15] and Madadi et al. [19] is the use of CNN. In this part, we explain in detail the architecture of a CNN in the field of imagery visualization, but the same concept holds for massive MIMO systems since the channel matrix can be seen as a two-dimensional image. A convolutional neural network is a deep learning technique that is applied to analyze images. It is useful for catching the key features of an image without worrying about losing features which are critical for its accurate recognition. Moreover, it has the ability to catch the spatial as well as the temporal dependencies in an image by means of the application of relevant filters. It comprises three main modules: convolutional layers, pooling layers and a fully connected layer. Figure 8 illustrates the convolution

14 Review of Machine Learning for Antenna Selection and CSI Feedback …

177

Table 2 Summary of setups in DL-based CSI feedback schemes Paper

[12]

[13]

[14]

[15]

DL scheme

CsiNet

DeepCMC

RecCsiNet PR-RecCsiNet

DualNet-MAG DualNet-ABS

Channel model

COST 2100

Scenario

Indoor Indoor 5 GHz 5.3 GHz Outdoor 300 MHz

Indoor

Indoor 5.3 GHz UL 5.3 GHz DL Semi-urban outdoor 260 MHz UL 300 MHz DL

Number of antennas NT

16, 32, 48

32

32

32

Number of subcarriers NC

1024

128,160, 192,224, 256

1024

1024

PKI

NMSE and cosine similarity

NMSE

Compression rate

1/14, 1/16, 1/32, 1/64

1/32, 1/64





operation between an image of size 5 × 5 × 1 represented in green and a kernel of size 3 × 3 × 1 represented in yellow with its coefficients written in red color. A convolutional layer aims to extract the low level features through a filter called the kernel. Small in size when compared to the input image, the kernel has to hover over portions of the image which have its same size. It repeats the process until the entire image is fully covered. Since one convolutional layer is able to catch only low level features, adding more convolutional layers enables the network to catch high-level features and acquire a full understanding of the input image. Undergoing these convolutional layers can lead to two possible results: a reduction in feature Fig. 8 Convolution operation between an input image and a kernel

178

G. Yassine et al.

Fig. 9 Pooling with a 3 × 3 × 1 kernel

dimensionality by means of the valid padding technique and preserving or increasing it through the so-called technique SAME padding. Figure 9 illustrates the convolution operation between an image of size 5 × 5 × 1 represented in green and a kernel of size 3 × 3 × 1 represented in yellow with its coefficients written in red color: The output of these convolutional layers is then fed to the next layer which is called the polling layer. This latter aims to reduce the spatial size of the convolved features; the operation it performs tries to extract the dominant features which stay invariant by rotation as well as translation. It could be considered as a down sampling operation. There are two types of pooling as in Fig. 11 such as max pooling where the kernel keeps the maximal value of the image portion over which the kernel hovers and average pooling which computes the average value of the image portion in question. Figure 10 illustrates the two pooling techniques: The output of these pooling layers is then flattened and fed to a fully connected layer to output the least possible representation of the input. The whole architecture of a CNN is illustrated in figure. Fig. 10 Two pooling techniques

14 Review of Machine Learning for Antenna Selection and CSI Feedback …

179

Fig. 11 Convolutional neural network architecture

3.3 Analysis of the ML-Based CSI Feedback Techniques To get the least possible representation of CSI prior to feedback, Wen et al. [12], Yang et al. [13], Lu et al. [14], Liu et al. [15] and Madadi et al. [19] used CNN which is a lossy compression technique that tries to catch the main features in a matrix. After undergoing the convolved CSI matrix max polling, a considerable amount of complex channel coefficients is lost which cannot be recovered after feedback, i.e., if the size of polling filter is N by N, then N.(N − 1) values are gone, and only the max value is kept which is a loss of information. Besides the errors caused by the noisy wireless channel, undergoing the data quantification as suggested by Yang et al. [13] will add extra errors which might adversely affect the recovery of the right CSI at the base station side. As suggested by Wen et al. [12], Yang et al. [13], Lu et al. [14] and Liu et al. [15], after transforming the channel matrix H to the delay domain, the N C × N T matrix is truncated to a N T × N T squared matrix assuming that the N C − N T remaining rows are all equal to zero, this implies that the signals carried over the N C − N T subcarriers undergo deep fading which attenuates them totally, this assumption is not practical in massive MIMO-OFDM systems. In addition, the assumed spatial sparsity requires some hardware-related conditions to be met such as deploying a large antenna array in terms of the number of antennas as well as their aperture, and in most cases, it does not hold. Wen et al. [12], Yang et al. [13], Lu et al. [14], Liu et al. [15] and Madadi et al. [19]and papers similar to them were aiming mainly to reduce the feedback overhead on uplink by mean of CSI compression but none of them had explained how the CSI matrix were firstly acquired at the UEs or even addressed the mapping of pilot symbols over the OFDM resource grid in downlink. As it is known when pilot-aided channel estimation (PACE) is used in FDD systems, the base station and the UEs share a priori the same cell reference signals literally known as pilot signals. These signals enable the user to extract the channel response at their positions and then infer

180

G. Yassine et al.

the channel response of the remaining subcarriers by means of a time–frequency interpolator. In a multi-antenna LTE system, the mapping of pilot symbols over the OFDM resource grid assumes that within a resource block (RB) that spans 12 subcarriers in frequency and 7 OFDM symbols in time when cyclic prefix is used, when an antenna transmits a pilot symbol over a resource element, the remaining antennas should be silent or transmit what is called spectral nulls over the same resource element, and this is required to avoid interference between the pilot symbols and ease the channel estimation at the UE side. But, owing to the half and the half rule stating that if more than half of the available time–frequency resources are used for operations other than data transmission, then the digital communication system is no longer optimal, and the extension of the LTE approach to massive MIMO systems is not possible as the whole resource block will be consumed only by pilot symbols. So, the feedback overhead is really a concern when it comes to FDD-based massive MIMO systems, but we should not overlook how the CSI was acquired in the first place, especially how the pilot symbols are mapped over the resource grid as this is a very challenging task. Assuming the availability of CSI at the UE side, the aforementioned papers tried to compress the whole CSI matrix before feeding it back to the base station. An important point we want to put the emphasis on here is as follows: Given that the base station and the UEs share a priori the pilot signals, why not making the UEs to only compress the sub-channels measured at the positions of pilot symbols instead of applying the ML-based compression schemes on the whole resource blocks. If proceeding this way, the recovery of the whole resource grid at the base station side will be possible as the expansion of the fed back sub-matrix after uncompressing it to the whole CSI matrix requires only a time–frequency interpolator. By doing so, a considerable amount of time–frequency resources will be conserved on the uplink as well as an important load will be shifted to the base station side given its ability when compared to battery-powered UE, this will contribute also in improving the energy efficiency at the UE side.

4 Conclusion In this paper, we have put the spot on two main topics where machine learning techniques have shown their upper hand over classical ones. Antenna selection can be used to turn-off some antennas when the traffic demand in a cell goes below a predefined threshold or when there are few active users. Regarding the CSI feedback in FDD-based MIMO systems, we believe that pilot-aided channel estimation should be given some attention as in the literature, and there is no mapping scheme for pilot symbols over the resource grid. In addition, as machine learning techniques are computationally demanding and need training before converging, we have to note that the implementation of the ML-based schemes is mainly challenged by the energy constraints at the UE side. Consequently, future works have to put the UE energy

14 Review of Machine Learning for Antenna Selection and CSI Feedback …

181

efficiency as their core objective before proposing schemes that solve the issues at the base station side but ultimately comes at the expense of the UE.

References 1. Gao X, Edfors O, Tufvesson F, Larsson EG (2015) Massive MIMO in real propagation environments: do all antennas contribute equally? IEEE Trans Commun 1–12 (early access articles) 2. Arash M, Yazdian E, Fazel MS, Brante G, Imran G (2017) Employing antenna selection to improve energy-efficiency in massive MIMO systems. arXiv: 1701.00767 [cs.IT] 3. Ouyang C, Yang H (2018) Massive MIMO antenna selection: asymptotic upper capacity bound and partial CSI. arXiv:1812.06595 [eess.SP] 4. Wen C-K, Shih W-T, Jin S (2019) Machine learning in the air. arXiv:1904.12385 [cs.IT] 5. Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a survey. arXiv:1803.04311[cs.NI] 6. Jounga J, Sun S (2016) Two-step transmit antenna selection algorithms for massive MIMO. In: IEEE international conference on communications 7. Gorokhov A, Gore D, Paulraj A (2003) Receive antenna selection for MIMO flat-fading channels: theory and algorithms. IEEE Trans Inf Theory 8. Gharavi-Alkhansari M, Gershman AB (2004) Fast antenna subset selection in MIMO systems. IEEE Trans Signal Process 9. Joung J (2016) Machine learning-based antenna selection in wireless communications. IEEE Commun Lett 10. Yao R, Zhang Y, Qi N, Tsiftsis TA (2018) Machine learning-based antenna selection in untrusted relay networks. arXiv:1812.10318 [eess.SP] 11. Yao R, Zhang Y, Wang S, Qi N, Tsiftsis TA, Miridakis NI (2019) Deep learning assisted antenna selection in untrusted relay networks. arXiv:1901.02005 [eess.SP] 12. Wen C-K, Shih W-T, Jin S (2018) Deep learning for massive MIMO CSI feedback. IEEE Wirel Commun Lett 13. Yang Q, Mashhadi MB, Gunduz D (2019) Deep convolutional compression for massive MIMO CSI feedback. arXiv:1907.02942 [cs.IT] 14. Lu C, Xu W, Shen H, Zhu J, Wang K (2018) MIMO channel information feedback using deep recurrent network. arXiv:1811.07535 [cs.IT] 15. Liu Z, Zhang L, Zhi D (2019) Exploiting bi-directional channel reciprocity in deep learning for low rate massive MIMo CSI feedback. IEEE Wirel Commun Lett 16. de Souza W Jr, Bruza Alves TA, Abrão T Antenna selection in non-orthogonal multiple access multiple-input multiple-output systems aided by machine learning, 18 April 2021 17. Vu TX, Nguyen V-D, Nguyen DN, Ottersten B (2021) Machine learning-enabled joint antenna selection and precoding design: from offline complexity to online performance 18. Flordelis J, Rusek F, Tufvesson F, Larsson EG, Edfors O Massive MIMO performance—TDD versus FDD: what do measurements say?, 3 Apr 2017 19. Madadi P, Jeon J, Cho J, Lo C, Lee J, Zhang J PolarDenseNet: a deep learning model for CSI feedback in MIMO systems, 2 Feb 2022

Chapter 15

Cassava Leaf Disease Detection Using Ensembling of EfficientNet, SEResNeXt, ViT, DeIT and MobileNetV3 Models Hrishikesh Kumar, Sanjay Velu, Are Lokesh, Kuruguntla Suman, and Srilatha Chebrolu

1 Introduction Image classification [2] is the task of understanding the main content of an image, which is easy for humans but hard for machines. Existing approaches towards the diagnosis of plant leaf diseases need the assistance of an agricultural specialist to visibly investigate and diagnose plants. These methods are labour-intensive, lowyield and expensive. As an added challenge, successful solutions for diagnosing the disease must execute well under notable constraints utilising the least possible resources, since some may only have availability to low quality mobile cameras. The image-based disease diagnosis has made much more impact than the older traditional practices previously in use as it is efficient, effective and non-subjective [14, 16]. Image processing is one of the major technologies being used for localizing infected parts in disease-ridden plant leaves. The model’s accuracy faces a bottleneck as trained models struggle to detect the presence of disease confidently due to its similarity with other diseases on images. The recent advancements in machine learning, especially in deep learning, have guided us to make a promising performance in disease detection [11]. The proposed model deals with using such advances and creates an ensemble model that efficiently classifies and localizes the Cassava leaf disease. The model takes a Cassava leaf image as input to identify and localize its disease. The dataset used for the training and validation of this model is the Cassava leaf disease dataset [15]. The dataset is introduced by Makerere University AI Lab. The dataset is divided into two parts for (i) training and (ii) validation purposes. The leaf images are of type RGB coloured. The H. Kumar (B) · S. Velu · A. Lokesh · K. Suman · S. Chebrolu Department of Computer Science and Engineering, National Institute of Technology Andhra Pradesh, Tadepalligudem, India e-mail: [email protected] S. Chebrolu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_15

183

184

H. Kumar et al.

training dataset contains around 15,000 images and the validation dataset consists of around 6000 images. The total number of images in the dataset is around 21,400. The dataset is categorised into five classes as follows (i) Healthy Cassava leafs (ii) Cassava Bacterial Blight (CBB) (iii) Cassava Brown Streak Disease (CBSD) (iv) Cassava Green Mottle (CGM) and (v) Cassava Mosaic Disease (CMD). The proposed model will accurately detect the presence of disease, swiftly recognise infected plants, and assuredly preserve crops before they impose irreparable damage.

2 Related Work This section discusses about all the state-of-the-art models which are being used in the proposed ensemble model. The models that are being employed in the ensemble learning are EfficientNet [25], Squeeze-and-Excitation ResNeXt (SEResNeXt) [10], [29], Vision Transformer (ViT) [3], Data-efficient Image Transformer (DeIT) [26], MobileNetV3 [8].

2.1 EfficientNet A Convolutional Neural Network (CNN) [1] model’s architecture is created with fixed computational resources and then scaled up to improve accuracy. EfficientNet can increase the model’s performance by carefully balancing the network depth, width and resolution. A baseline network is chosen using the neural architecture search in the literature. And the baseline architecture is scaled up to build a family of models known as EfficientNet. These models outperform ConvNets [13] in terms of accuracy. The compound scaling method scales the networks over width, depth and resolution and can be generalized to existing CNN architectures such as MobileNet [9] and ResNet [7]. However, choosing an efficient baseline network is critical for acquiring the best results. The compound scaling method enhances the network’s predictive capability. This is achieved through replicating the convolutional operations along with the network architecture of the baseline network.

2.2 Squeeze-and-Excitation Networks In CNN, the central building block is the convolution operator, which serves the task of constructing informative features at each layer. Squeeze-and-Excitation (SE) block [10], is an additive to the convolution operator which is able to adaptively re-calibrate the channel-wise features response by modelling the inter-dependencies between channels explicitly. These fundamental blocks need to be stacked together to form a SENet architecture which is able to perform effective generalization to a

15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .

185

greater extent across different datasets. The SE blocks also bring significant performance improvements in the existing state-of-the-art CNN, but with a slightly more computational cost.

2.3 ResNeXt ResNeXt50 [29], is an advanced computer vision model from the ResNet family. The ResNet-50 model of the ResNet family, is a CNN, having a depth of 50 neural layers. It can be pre-trained and is able to classify images in up to 1000 different categories. The network architecture is simple and highly modularized for the classification of images. This network is created by stacking multiple units of it to form a building block that aggregates transformations of similar topology. This unique design with only a few hyper-parameters forms a homogeneous, multi-branch architecture. The aforesaid technique leads to the discovery of a new dimension called cardinality [29]. The NeXt in ResNeXt refers to this newly found dimension, which is very helpful in improving the classifier efficiency of this model even under constrained environments just by increasing the cardinality of the architecture, in contrast to going into deeper layers.

2.4 ViT ViT [3], is a pure Transformer [27] which is directly applied to the sequences of image patches and shows high performance on image classification tasks, where CNN are not required. ViT models are generally pre-trained on large datasets, and then they are fine-tuned according to the smaller, downstream tasks. This work proves to be more beneficial when the fine-tuning at the higher resolutions than pre-training is done. The result of the above-mentioned technique is a larger effective sequence length as the patch size is kept the same while high-resolution images are being fed to the network. The ViT is thus able to handle arbitrary sequence lengths, and in addition to that, pre-trained position embedding is performed to maintain the integrity of the ViT model.

2.5 DeIT DeIT [26] introduced by Touvron et al., is one of those models which are based on CNNs but do not use many statistical priors of images. These DeIT models are just an advanced version of ViT models but do not require any external data for the purpose of training. This makes the DeIT model, one of the finest models to be used for image classification in constrained resource environments. The training strategy

186

H. Kumar et al.

of DeIT is a plus point, as it can simulate training on a much larger dataset than the computing environment may allow. DeIT contains an additional distillation token. As the performance of ViT decreases when it is trained on insufficient data. Distillation becomes an easier way to deal with this problem. Distillation is of two types, soft distillation and hard distillation. Soft distillations are convertible to hard distillation using label smoothing (InceptionV3) [24].

2.6 MobileNetV3 MobileNets [9] is designed in such a way that they can handle a lot of tasks in computer vision. They are suitable for object detection, face attributes, fine-grain classification, and large-scale geo-localization. They were basically designed for usage in mobile phones as well as embedded vision applications. These models use a streamlined architecture which makes the use of depthwise separable convolutions. Using this aforesaid strategy they build light weight deep neural networks. MobileNetV3 requires two hyper-parameters, both are global. These parameters are able to trade-off efficiently between accuracy and latency. The MobileNet is thus able to decide the correct size of modelling for the prescribed application using these hyper-parameters and various other usage constraints.

2.7 Stacked Ensemble Learning Generalization Stacking [28] involves integrating the predictions obtained from various machine learning models. These models have been evaluated on the same dataset. In stacking, the models that are present generally differ from one another and are then fitted on the same data. Dissimilar to boosting [19], this has a solitary model which is made to figure out the weights of predictions from the different models. The design incorporates two base models, referred to as level-0 models, and a meta-model. The meta-model joins the prediction obtained from the base models. The base models are trained on the complete training data. Preparing a training dataset for a meta-model may likewise incorporate several contributions from base models like k-fold crossvalidation [17, 18]. The meta-model is trained when the training data is available. Stacking-ensemble is suitable when various models have different skills i.e., the outcome predicted by models or else the errors made in predictions possess a small correlation. Other ensemble learning [4] algorithms may also be utilized as base models.

15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .

187

3 Proposed Method This section describes the proposed model for the classification and localization of Cassava leaf disease using DeIT, EfficientNet, ViT, SEResNeXt, CropNet [6] and MobileNetV3. The architecture of the model is discussed with an explanation of pipeline-1 and pipeline-2. The model diagram as shown in Fig. 1 depicts the architecture of the model.

3.1 Architecture The visual identification and classification of these four diseases are being done by the ensemble model that has been proposed in this paper. Moreover, the idea of the transfer learning technique is employed for training the model pipelines. The

Fig. 1 The proposed model architecture for Cassava leaf disease detection

188

H. Kumar et al.

complete end-to-end pipeline design of the proposed model is shown in Fig. 1. Following are the two pipelines defined for classification tasks. Following are the two pipelines defined for classification tasks. Pipeline 1: The Cassava leaf dataset is used for training the models. The models used in this pipeline are EfficientNet, SEResNeXt, ViT and DeIT. All the models are first trained on the dataset and then their neural weights are recorded and kept for usage while inferencing is done. This pipeline has two modes. Both the modes are different in architecture. One of the modes is suitable for low-quality handheld devices and mode 2 uses stacking and is more efficient and more powerful. The classification done by these models falls under the category of multi-label classification. Mode 1: It contains only one model DeIT for inferencing purposes. The images are trained on dimensions 384*384*3. These image transformers do not require a very massive quantity of records to be trained and can work on a lesser amount of data. The linear transformation on DeIT is done to further increase the efficiency. The classification efficiency by DeIT is lesser than mode-2 but can perform on low resource availability. Mode 2: This mode consists of three models stacked in sequence for inferencing purposes. The linear transformation is applied to the models to improve the stacking efficiency. The first of the three models in stack EfficientNet, works on images of dimensions 512*512*3. A scaling approach that uniformly scales all dimensions of depth, width and resolution with the use of a compound coefficient. This model directly starts the inference of each image after the scaling. The second model ViT in the stack uses images of dimensions 384*384*3. The ViT model represents an entire image as a chain of image patches, just like the collection of phrase embeddings used whilst the usage of transformers to text, and immediately predicts class labels for the image. The third and the last model SEResNeXt in this stacking requires images of dimensions 512*512*3. The SEResNeXt structure is an addition to the deep residual network which replaces the usual residual block with one which leverages a splittransform-merge strategy. This model uses segmentation for inferencing the images, thus reducing the chances of any overfitting or misrepresentation of output. Pipeline 2: In the second pipeline, a pre-trained model, named as CropNet classifier for Cassava leaf disease is used. The CropNet classifier is built on MobileNetV3 architecture which shows an accuracy of 88% on Cassava leaf disease classification. A sequential MobileNetV3 model [23] has been used by the proposed architecture to create the CropNet replica and employs a weight loader to capture and load the classifier’s neural weights into the sequential model of dimensions 224*224*3. Thus formed model is used further to create the pipeline-2, in the ensemble model. This model replication is done due to the reason that the CropNet classifier classifies the images in six classes (CBB, CBSD, CGM, CMD, healthy, or unknown) in contrast to the five classes of Pipeline 1. Slicing of the prediction values is performed to remove the unknown class classification and create a suitable number of classes.

15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .

189

4 Experimental Results and Analysis The complete experimental environment is the Kaggle Python kernel. Compute Unified Device Architecture (CUDA) [5] is used, for the training of image-level detection and classification in both the pipelines. The evaluation metric here used is Classification Accuracy. Test Time Augmentation (TTA) [20] is being performed in this work, as it helps to increase the efficiency of the model in the inference stage. TTA creates multiple versions of a single image for each test image during inference. The data pre-processing involves various techniques based on the data flow. First, the training data and validation data are segregated, and then transformation techniques are applied to both sets. The data is resized as per the specific model requirements. A random 50% of the dataset of both the sections is picked each time and then they are (i) transposed, (ii) flipped vertically or horizontally, (iii) rotated, and then it goes through (iv) hue saturation and (v) brightness and contrast changes. The data after all this procedure is normalised to not let the system create redundant data. The batch size for the training data is set as 128. Thus created dataset is then provided to mode-1 and mode-2 of pipeline-1 where mode-1 contains DeIT model and mode-2 consists of EfficientNet, ViT, SEResNeXt where Stacking is performed on the output from mode-2. Out of both of these models, anyone can be chosen to assist the pipeline-1. The data is passed through pipeline-1 and provides the result to the ensemble model. Pipeline-2 uses a direct way of image classification and prediction. As this model also predicts the images of unknown class, modification of the dimensions to attach it with pipeline-1 is done. On obtaining the results from both the pipelines, weighted-box-fusion is employed after the pipelining [22] where the results from both the pipelines are merged by using the weighted average technique. The weights to the pipelines are provisioned to them on the basis of their sensitivity score. Thus the final ensemble model is formed in which a stacked model pipeline and a pre-trained model pipeline are combined to provide better results than the individual model architecture. Experiments are conducted on Cassava leaf disease dataset using the proposed architecture to identify and localise the leaf diseases. Table 1 shows the results of these seven experiments conducted. Each experiment is conducted on a unique values set for the hyper-parameters such as optimisers, TTA, batch size. The last column of Table 1 shows the classification score obtained for each experiment. Among all the experiments conducted, Experiment 1 has achieved the highest classification score of 90.75%. These experiments can be found in the Kaggle Cassava leaf disease classification competition [12]. Figure 2 shows the sample input images. Figure 3 depicts the localization of leaf disease i.e. represented with bounding boxes and classification with identification of the type of diseases. The sixth image in Fig. 3 does not have any bounding box as it is identified as a healthy leaf by our proposed model. Moreover, this technique also provides better results in terms of time taken to train. The final ensemble model is efficient and shows on-par results.

190

H. Kumar et al.

Table 1 Experimental results obtained by fine-tuning the hyper-parameters of the proposed model

Experiment 1

Experiment 2

Experiment 3

Experiment 4

Experiment 5

Experiment 6

Experiment 7

Model

Optimizer

DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3

Adam Adam Adam Adam Adam

DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3

SGD Adam Adam Adam Adam

DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3

SGD SGD SGD SGD Adam

DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3

SGD Adam Adam Adam Adam

DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3

SGD Adam Adam Adam Adam

DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3

SGD Adam Adam Adam Adam

DeIT EfficientNet-B4 SEResNeXt50 ViT MobileNetV3

SGD Adam Adam Adam Adam

Batch size

TTA value

Classification score 0.9075

64

5

0.8993 128

10

0.9073 128

10

0.9002 64

5

0.8998 128

8

0.8991 64

10

0.9068 128

10

SGD stochastic gradient descent, TTA test time augmentation

The results obtained by the proposed method are compared with the CNN [21] and CropNet Cassava leaf disease classifier [6]. Table 2 shows the classification scores obtained by each of the methods. CNN has obtained a score of 85.3% and CropNet has achieved a score of 88.01% whereas the proposed model achieved the highest score of 90.75%.

15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .

Fig. 2 The sample input images given to the proposed model

Fig. 3 Localization and classification of the sample input images Table 2 Comparison of the proposed method with other models Model Classification score CNN CropNet Cassava classifier Proposed method

0.853 0.8801 0.9075

191

192

H. Kumar et al.

5 Conclusion In this work, an ensemble deep learning model having two pipelines is proposed, for automatic Cassava leaf disease identification and classification using coloured images. The model is sufficient enough to classify all the four primary diseases CBB, CBSD, CGM and CMD caused in a Cassava leaf. The model is a standalone deep learning network and doesn’t require any external applications for use. Experimental results obtained show that this model can be used to build a classifier that can efficiently predict the presence of Cassava leaf diseases. This model will be helpful for amateur as well as experienced farmers as it eliminates the need for human assistance and related complications. The categorization accuracy achieved by the proposed model is 90.75%. The architecture of this model is so simple that it can be used for software(s) demanding real-time applications. Similarly, this model is able to detect other leaf diseases from coloured images if additional training data is provided. This model can therefore help farmers in Cassava fields to get appropriate assistance in the identification and classification of diseases. These precautionary measures have the potential to improve the management, survival, and prospects of Cassava plant fields.

References 1. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In Proceedings of the international conference on engineering and technology, pp 1–6 2. Chan TH, Jia K, Gao S, Lu J, Zeng Z, Ma Y (2015) PCANet: a simple deep learning baseline for image classification? IEEE Trans Image Process 24(12):5017–5032 3. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 4. Ganaie M, Hu M, et al (2021) Ensemble deep learning: a review. arXiv preprint arXiv:2104.02395 5. Ghorpade J, Parande J, Kulkarni M, Bawaskar A (2012) GPGPU processing in CUDA architecture. arXiv preprint arXiv:1202.4347 6. Google: TensorFlow CropNet Cassava disease classification model. https://tfhub.dev/google/ cropnet/classifier/cassava_disease_V1/2. [Online; accessed 20-July-2022] 7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778 8. Howard A, Sandler M, Chen B, Wang W, Chen LC, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Q (2019) Searching for MobileNetV3. In Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324 9. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 10. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7132–7141 11. Janiesch C, Zschech P, Heinrich K (2021) Machine learning and deep learning. Electron Mark 31(3):685–695

15 Cassava Leaf Disease Detection Using Ensembling of EfficientNet . . .

193

12. Kumar H (2022) Kaggle Cassava leaf disease detection. https://www.kaggle.com/code/ hrishikesh1kumar/cassava-leaf-disease-detection. Accessed on 20 Jul 2022 13. Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S (2022) A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545 14. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7:1419 15. Mwebaze E, Gebru T, Frome A, Nsumba S, Tusubira J (2019) Cassava 2019 fine-grained visual categorization challenge. arXiv preprint arXiv:1908.02900 16. Ramcharan A, Baranowski K, McCloskey P, Ahmed B, Legg J, Hughes DP (2017) Deep learning for image-based cassava disease detection. Front Plant Sci 8:1852 17. Raschka S (2018) Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808 18. Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. Encycl Database Syst 532–538 19. Schapire RE (2003) The boosting approach to machine learning: an overview. In Nonlinear estimation and classification, pp 149–171 20. Shanmugam D, Blalock D, Balakrishnan G, Guttag J (2021) Better aggregation in test-time augmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pp 1214–1223 21. Shkliarevskyi M (2022) Kaggle Cassava leaf disease: Keras CNN prediction. https://www. kaggle.com/code/maksymshkliarevskyi/cassava-leaf-disease-keras-cnn-prediction. Accessed on 20 Jul 2022 22. Solovyev R, Wang W, Gabruseva T (2021) Weighted boxes fusion: ensembling boxes from different object detection models. Image Vis Comput 107:104–117 23. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neur Inform Process Syst 27 24. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 25. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In Proceedings of the international conference on machine learning, pp 6105–6114 26. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers and distillation through attention. In Proceedings of the international conference on machine learning, pp 10347–10357 27. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser LU, Polosukhin I (2017) Attention is all you need. In Advances in neural information processing systems, Vol 30. Curran Associates, Inc. 28. Wolpert DH (1992) Stacked generalization. Neur Netw 5(2):241–259 29. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5987–5995

Chapter 16

Scene Segmentation and Boundary Estimation in Primary Visual Cortex Satyabrat Malla Bujar Baruah , Adil Zafar Laskar , and Soumik Roy

1 Introduction Previous research has shown plenty of evidence of neurons’ strong computational capacities [24]. Neurons have distinct morphology that are adjusted to a specific frequency of inputs in order to retrieve unique information. Distinct retinal ganglion cell (RGC) morphology in the visual cortex is linked with exact connectome specificity, affecting the main visual cortex’s global response [3, 34]. However, very little has been known about the role of dendrite arbors’ electrophysiology and topologies in producing such complicated responses. The majority of the investigated neural networks are defined as learning systems because they focus on the mathematical interpretation of global behavior rather than local dynamics affecting global responses [6, 30]. Basic operations processed in the striate cortex of primate vision, such as edge recognition, scene segmentation, multi-resolution feature extraction, depth perception, and motion estimation, are suspected to be intrinsic behavior [1, 35] rather than an exhaustive learning process [11, 26]. The process of these basic visual activities is thought to be aided by a broad variety of neuron morphology [12, 16]. In this work, an attempt is made to link parasol RGC physiology, nonlinear dynamics, and connectome specificity to magnocellular scene segmentation, which aids in boundary prediction and object tracking type activity. To replicate their local behavSupported by organization x. S. M. B. Baruah · A. Z. Laskar (B) · S. Roy Department of Electronics and Communication Engineering, Tezpur University, Napam, Assam 784028, India e-mail: [email protected] S. Roy e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_16

195

196

S. M. B. Baruah et al.

ior, morphologically detailed parasol RGCs were constructed and modeled, including active and passive membrane dynamics. To simulate the global responses in these layers, a peculiar arrangement of midget and parasol RGCs creating RGC layers has been built as instructed in the in vivo investigation.

2 Method The morphology of parasol cells and midget cells that project information to the magnocellular and parvocellular layers of the visual cortex via specialized parallel pathways has long been known [9, 23, 23, 33]. Identifying the anatomical and functional relationships between the component neurons at consecutive points in the path is a key problem in understanding the organization and function of parallel visual pathways [15, 35]. Our model employs unique parasol RGCs connected to sympathetic bipolar cell along with nonlinear neural electrophysiology in driving scene segmentation functionalities. Natural images in ‘tiff’, ‘png’, and ‘jpg’ formats are fed to the model which are converted to spatio-temporal square pulses by bipolar cells. Temporal signal with an offset of 10 ms, pulse width of 150 ms, and total temporal length of 250 ms is generated by bipolar cells considering average response time of primates vision. Amplitude of the temporal signal is scaled proportional to amplitude of signal intensity within RGC sensitivity range of 1024–1016 nA [6]. These spatio-temporal signals are fed to the scene segmentation network that generates segmentation map of the visual stimuli that are send to the magnocellular layer. Orientation selective RGC layer in the magnocellular region then extracts the edge boundary from the segmented image. Details of the RGC morphology and connectome specificity with bipolar cell as well as boundary estimation in the visual cortex are discussed in Sects. 2.1 and 2.2 follows.

2.1 The RGC Morphology The proposed framework emphasizes on the computational role of unique neuron morphology, particularly parasol RGC in shaping visual scene segmentation and object boundary estimation. A moderate receptive field size has been taken [10, 18, 19] to optimize the computational complexity of the model and the morphology is shown in Fig. 1. RGC morphology in Fig. 1a is used for the scene segmentation model and RGC morphology in Fig. 1b is used for the boundary estimation model where the junctions, cell body synapses, and dendritic fibers are color encoded. Similar color at the synapses suggests connection of RGC solely with ON bipolar cells. Junction and soma are modeled as summing nodes that performs temporal summation and re-encoding of incoming cumulative signals. Re-encoding at localized active ion channel [29, 31] has been modeled using the Izhikevich’s membrane model and is given as

16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex

197

Fig. 1 Parasol RGC morphologies used in striate cortex and magnocellular layer

C

dv = k ((v − vr ) (v − vt ) − u + I ) , i f v ≥ vt dt du = a [b (v − vr ) − u] , v ← c, u ← u + d dt

(1) (2)

where v is the membrane potential with I as the stimuli to the neurons, u as the recovery current, vr as the resting membrane potential, and vt as the threshold potential. Different spiking activity such as regular spiking, chattering, and intrinsic bursting is controlled by parameters a, b, c, d, k, C. Use of Izhikevichs’ membrane model [21, 22] in our proposed model is because of its low computational complexity and its robustness in mimicking mammalian neurodynamics. The passive dendritic branches in the RGC morphology facilitate decremental conduction of propagating signal. Decremental conduction in passive fiber has been modeled using equations IinTotal = It + Iout (Vout − Vin ) IinTotal = Rlon dVout + G L (Vout − E L ) = 0 It + C m dt

(3) (4) (5)

from our previously published modeling work [4, 5]. Vin is the action potential generated by the localized active region, Vout is the membrane potential at the junction with initial membrane potential equals to resting membrane potential, Cm is the equivalent capacitance of fiber, Rlon is the axial resistance, G L is the membrane leakage conductance E L is the equilibrium potential due to the leakage ion channels, IinTotal is the total propagating current toward the nodes/ soma, It is the transmembrane current due to membrane dynamics, and Iout is the total delivered current.

198

S. M. B. Baruah et al.

2.2 Connectome Specificity Connectome specificity refers to the connectivity of parasol RGC with bipolar cells in context to the proposed framework. Shown in Fig. 1a are the excitatory type connectivity patterns of RGC morphology Fig. 1a with ON bipolar cell which are connected in oriented patterns [2, 8, 17, 20, 28]. A value of 1 in the connectivity matrix shown in Fig. 2 corresponds to excitatory connectivity with the ON bipolar cell, value of 2 in the connectivity matrix suggests two distal dendrites connected to the ON bipolar cell from opposite parent dendrites and value of 0 in the connectivity matrix suggests no connectivity. Scene segmentation type RGC morphologies connected in oriented pattern normalize the small gradient change corresponding to fine features and encode them in terms of their spiking frequency. Four orientation bands optimize the small local gradient change corresponding to specific orientation which are then passed through max-pool operator to generate the segmentation type response. Segmentation type images are then fed to the boundary detection network to extract the boundary information. With the minor gradient corresponding to fine features removed, when the segmented response is passed through orientation selective RGC network, the network tracks the major gradient corresponding to boundary of objects. Boundary estimation network employs the parasol RGC shown in Fig. 1b with exitatory as well as inhibitory connectivity with specific oriented patterns shown in Fig. 3. A value of 1 in the connectivity matrix suggests excitatory connectivity with RGC, a value of −1 in the connectivity matrix suggests inhibitory connectivity and

Fig. 2 Parasol RGC connectivity with ON bipolar cells for normalizing gradient change along specific orientations

Fig. 3 Connectivity matrix for boundary detection type RGC shown in Fig. 1b with segmentation type response with orientation specificity to 0◦ , 45◦ , 90◦ and 135◦

16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex

199

0 suggests no connectivity. These connectivity patterns detect gradient variation corresponding to 1s and −1s and starts firing at high frequency when the gradient is very high.

3 Simulation Results and Discussion The suggested model was simulated using the Python 3.6 interpreter, with packages like ‘openCV’ and ‘scikit’ for basic image operations and the ‘scipy.integrate’ package for solving differential equations. For plotting reaction images and other plot generations, the ‘matplotlib’ package was utilized in a similar way. Natural images in the ‘tif’, ‘png’, and ‘jpg’ formats were used as input to the suggested model, which is employed to stimulate photo receptor cells, and image data were collected from the Berkley segmentation database (BSDS500). Shown in Fig. 4 are some of the input images fed to the proposed model and their corresponding segmented responses and boundary estimation responses. As can be seen from ‘Segmentated’ response of Fig. 4a, c, the model successfully maps most of the boundary regions of the object, whereas responses corresponding to Fig. 4c, d shows some texture extraction which is due to the receptive field size of parasol RGCs in the segmentation layer. Increasing the receptive field size of the segmentation layer will remove most of the local gradient change making the boundary region more prominent. Thus, need of larger receptive field projecting to magnocellular region for better normalization of fine textures seems necessary. But due to computational complexity of the model, larger receptive fields are not being considered and remains to be interest of our future works. Table 1 lists the model parameters for modeling the passive membranes’ low

Fig. 4 Segmentation and boundary responses of the segmentation type RGC layer and boundary estimation RGC layer to input images (a–d)

200

S. M. B. Baruah et al.

Table 1 Izhikevich bursting and chattering membrane parameters and passive membrane propagation parameters Izhikevich Bursting Chattering Propagation Value parameter parameter parameter parameter a b c d C k vr vt

0.01 5 − 56 130 150 nF 1.2 − 65 mV − 35 mV

0.03 −2 −50 100 100 nF 0.7 − 60 mV − 30 mV

Cm

1 µF

Rlon

2

L

− 65 mV

GL

10−6 s

Table 2 Performance comparison of the proposed framework with existing state-of-the-art-models on BSDS500 database References Methods ODS OIS [25] [32] [38] [38] [38] [7] [36] [14] [27] [37] This work

RCF DeepContour Human BDCN BDCN-w/o SEM DeepEdge HED SE Multicue CEDN Parvocellular region

0.806 0.757 0.803 0.779 0.778 0.753 0.788 0.75 0.72 0.788 0.727

0.823 0.776 0.803 0.792 0.791 0.772 0.808 0.77 − 0.804 0.835

pass features [4, 5] as well as Izhikevich’s membrane dynamics [21, 22]. Izhikevich’s membrane model has been included to imitate the behavior of a human’s visual cortex due to its capacity to emulate Ca2 + ion channel dynamics. The proposed method has also been tested with the BSDS500 database because of the ground truth reference in the database. The proposed model is also compared with some existing state-of-the-art models as given in Table 2. All the models performed the analysis on the same dataset that is BSDS500 which our proposed model also considers. These models used some modern-day algorithms based on convolutional neural nets and some other specific methods for the estimation of the features. The Optimal Dataset Scale (ODS) and Optimal Image Scale (OIS) values for the existing models which are in the literature are compared with our proposed model as given in Table 2. The proposed method performs very well in edge perception and

16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex

201

reaches a maximum OIS score of 83.5% which is nearly same as an average human being’s perception performance and an average ODS score of 72.7%. Validation of the performance has been evaluated using ‘Piotr’s Matlab Toolbox’ [13].

4 Conclusion The proposed model gives an insight into conversion of natural scenes fed to the visual cortex being segmented in the primary layer that later helps in the formation of object boundary. Even though the exact specificity of connectivity is not yet well explored for boundary estimation due to unavailability of measuring devices, the proposed methodology has been build with reference to in silico experimentation that refers to connectivity of neural networks specifically to either ON or OFF bipolar cells. Connectivity of network solely to ON type of bipolar cells gives rise to segmentation with relatively moderate receptive field. Moderate to larger receptive fields with connectivity to single type of bipolar cell give rise to segmentation type behavior and orientation selective RGC layer connected to segmentation type responses gives rise to object boundary detection, which is one of the major features projected onto magnocellular region of visual cortex. Thus, the proposed model gives insight into the process of object boundary estimation which later helps in complex function formation such as object tracking and object motion estimation. Acknowledgment This publication is an outcome of the R&D work undertaken project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation.

References 1. Aleci C, Belcastro E (2016) Parallel convergences: a glimpse to the magno-and parvocellular pathways in visual perception. World J Res Rev 3(3):34–42 2. Antinucci P, Hindges R (2018) Orientation-selective retinal circuits in vertebrates. Front Neural Circ 12:11 3. Barlow HB (1982) David Hubel and Torsten Wiesel: their contributions towards understanding the primary visual cortex. Trends Neurosci 5:145–152 4. Baruah SMB, Gogoi P, Roy S (2019) From cable equation to active and passive nerve membrane model. In: 2019 second international conference on advanced computational and communication paradigms (ICACCP). IEEE, pp 1–5 5. Baruah SMB, Nandi D, Roy S (2019) Modelling signal transmission in passive dendritic fibre using discretized cable equation. In: 2019 2nd international conference on innovations in electronics, signal processing and communication (IESC). IEEE, pp 138–141 6. Baruah SMB, Nandi D, Gogoi P, Roy S (2021) Primate vision: a single layer perception. Neural Comput Appl 33(18):11765–11775 7. Bertasius G, Shi J, Torresani L (2015) Deepedge: a multi-scale bifurcated deep network for top-down contour detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4380–4389

202

S. M. B. Baruah et al.

8. Briggman KL, Helmstaedter M, Denk W (2011) Wiring specificity in the direction-selectivity circuit of the retina. Nature 471(7337):183–188 9. Callaway EM (2005) Structure and function of parallel pathways in the primate early visual system. J Physiol 566(1):13–19 10. Cooler S, Schwartz GW (2021) An offset on-off receptive field is created by gap junctions between distinct types of retinal ganglion cells. Nat Neurosci 24(1):105–115 11. Dacey DM, Brace S (1992) A coupled network for parasol but not midget ganglion cells in the primate retina. Visual Neurosci 9(3–4):279–290 12. Dipoppa M, Ranson A, Krumin M, Pachitariu M, Carandini M, Harris KD (2018) Vision and locomotion shape the interactions between neuron types in mouse visual cortex. Neuron 98(3):602–615 13. Dollár P. Piotr’s computer vision Matlab toolbox (PMT). https://github.com/pdollar/toolbox 14. Dollár P, Zitnick CL (2014) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570 15. Edwards M, Goodhew SC, Badcock DR (2021) Using perceptual tasks to selectively measure magnocellular and parvocellular performance: Rationale and a user’s guide. Psychonom Bull Rev 28(4):1029–1050 16. Garg AK, Li P, Rashid MS, Callaway EM (2019) Color and orientation are jointly coded and spatially organized in primate primary visual cortex. Science 364(6447):1275–1279 17. Garg AK, Li P, Rashid MS, Callaway EM (2019) Color and orientation are jointly coded and spatially organized in primate primary visual cortex. Science 364(6447):1275–1279 18. Gauthier JL, Field GD, Sher A, Greschner M, Shlens J, Litke AM, Chichilnisky E (2009) Receptive fields in primate retina are coordinated to sample visual space more uniformly. PLoS Biol 7(4):e1000063 19. Gauthier JL, Field GD, Sher A, Shlens J, Greschner M, Litke AM, Chichilnisky E (2009) Uniform signal redundancy of parasol and midget ganglion cells in primate retina. J Neurosci 29(14):4675–4680 20. Guo T, Tsai D, Morley JW, Suaning GJ, Kameneva T, Lovell NH, Dokos S (2016) Electrical activity of on and off retinal ganglion cells: a modelling study. J Neural Eng 13(2):025005 21. Izhikevich EM (2003) Simple model of spiking neurons. IEEE Trans Neural Netw 14(6):1569– 1572 22. Izhikevich EM (2007) Dynamical systems in neuroscience. MIT Press 23. Kling A, Gogliettino AR, Shah NP, Wu EG, Brackbill N, Sher A, Litke AM, Silva RA, Chichilnisky E (2020) Functional organization of midget and parasol ganglion cells in the human retina. BioRxiv 24. Koch C, Segev I (2000) The role of single neurons in information processing. Nat Neurosci 3(11):1171–1177 25. Liu Y, Cheng MM, Hu X, Bian JW, Zhang L, Bai X, Tang J (2019) Richer convolutional features for edge detection. IEEE Trans Pattern Anal Mach Intell 41(8):1939–1946. https://doi.org/10. 1109/TPAMI.2018.2878849 26. Manookin MB, Patterson SS, Linehan CM (2018) Neural mechanisms mediating motion sensitivity in parasol ganglion cells of the primate retina. Neuron 97(6):1327–1340 27. Mély DA, Kim J, McGill M, Guo Y, Serre T (2016) A systematic comparison between visual cues for boundary detection. Vision Res 120:93–107 28. Nelson R, Kolb H (1983) Synaptic patterns and response properties of bipolar and ganglion cells in the cat retina. Vision Res 23(10):1183–1195 29. Nusser Z (2012) Differential subcellular distribution of ion channels and the diversity of neuronal function. Curr Opin Neurobiol 22(3):366–371 30. Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nat Neurosci 2(11):1019–1025 31. Shah MM, Hammond RS, Hoffman DA (2010) Dendritic ion channel trafficking and plasticity. Trends Neurosci 33(7):307–316 32. Shen W, Wang X, Wang Y, Bai X, Zhang Z (2015) Deepcontour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3982–3991

16 Scene Segmentation and Boundary Estimation in Primary Visual Cortex

203

33. Soto F, Hsiang JC, Rajagopal R, Piggott K, Harocopos GJ, Couch SM, Custer P, Morgan JL, Kerschensteiner D (2020) Efficient coding by midget and parasol ganglion cells in the human retina. Neuron 107(4):656–666 34. Troncoso XG, Macknik SL, Martinez-Conde S (2011) Vision’s first steps: anatomy, physiology, and perception in the retina, lateral geniculate nucleus, and early visual cortical areas. In: Visual Prosthetics, pp 23–57 35. Wang W, Zhou T, Zhuo Y, Chen L, Huang Y (2020) Subcortical magnocellular visual system facilities object recognition by processing topological property. BioRxiv 36. Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403 37. Yang J, Price B, Cohen S, Lee H, Yang MH (2016) Object contour detection with a fully convolutional encoder-decoder network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 193–202 38. Zhao Y, Li J, Zhang Y, Song Y, Tian Y (2021) Ordinal multi-task part segmentation with recurrent prior generation. IEEE Trans Pattern Anal Mach Intell 43(5):1636–1648. https://doi. org/10.1109/TPAMI.2019.2953854

Chapter 17

Dynamic Thresholding with Short-Time Signal Features in Continuous Bangla Speech Segmentation Md Mijanur Rahman

and Mahnuma Rahman Rinty

1 Introduction Segmentation is essential in any voiced activated system that decomposes the speech signal into smaller units [1]. Words, phonemes, and syllables are the fundamental auricular units of the speech waveform. Therefore, the word is the most acceptable candidate for the natural speech unit with a well-defined acoustic representation. Short-term features [2], dynamic thresholding [3], wavelet transform [4], fuzzy approaches [5], neural networks [6], and Markov model (HMM) [7] have mostly been used for speech segmentation. To segment an entity into separate, non-overlapping components are the basic goal of segmentation. [8]. There are many different ways to categorize continuous speech segmentation techniques [9], but a fundamental division is made between approaches that are assisted and those that are blind. [10, 11]. The significant distinction between aided and blind segmentation is how the computer processes the target speech units utilizing previously gathered data or external features. The most significant prerequisite for any speech recognition system is speech feature extraction, commonly referred to as signal processing front-end. It is a mathematical representation of the voice file, which turns the speech waveform into a parametric representation for further analysis and processing. The gathering of notable features is carried out using this parametric representation. The speech signal can be parametrically encoded in a variety of ways. Examples of these include short-time energy, zero-crossing rates, level crossing rates, spectral centroid, and other related parameters [12]. When it comes to speech segmentation and recognition, a good feature can help. M. M. Rahman (B) Dept. of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh 2224, Bangladesh e-mail: [email protected] M. R. Rinty Department of Computer Science and Engineering, Southeast University, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 205 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_17

206

M. M. Rahman and M. R. Rinty

2 Short-Time Speech Signal Features The basic idea behind the method is to use short-term speech features to locate segment borders by tracking frequency or spectrum variations in a spoken signal. Moment and spectral signal characteristics are used to segment speech transmissions. These segmentation techniques rely on time-domain information, the same as signal energy/amplitude and average zero-crossing-rate (ZCR), are straightforward to apply and calculate. The signal energy is computed as short-term basis that locates voiced sounds in a continuous speech, which have higher power than silence/unvoiced (see Fig. 1). It is usually calculated by short-term windowing the speech frame, squaring the sample values, and assuming root-mean-square (RMS) values [13].

Fig. 1 Speech sentence’s “Amader Jatiya Kabi Kazi Nazrul Islam”: (a) original speech signal, (b) short-time energy curves, and (c) its short-time average zero-crossing rate curves

17 Dynamic Thresholding with Short-Time Signal …

207

The root-mean-square amplitude of a voice signal is a measure of its energy. It gives us a measurement of the difference in amplitude over time when used in subsequent windows of a speech signal. The short-term RMS energy of a speech frame with length N is given by:

E n(RMS)

  N 1  = [x(m)w(n − m)]2 N m=1

(1)

The average ZCR, on the other hand, keeps track of how many times the signal’s amplitude crosses zero throughout a specific period [14]. In general, silent/unvoiced segments have greater ZCR values than loud ones. The speech ZCR curve, as depicted in Fig. 1c, has peaks and troughs from the voiced and unvoiced sections, respectively. The short-time averaged ZCR is defined asZn = where:

N 1 |sgn[x(m)] − sgn[x(m − 1)]|w(n − m) 2 m=1

 1 x(m) ≥ 0 sgn[x(m)] = −1 x(m) < 0

(2)

(3)

and the rectangle window w(n) has length N . A basic speech/non-speech determination can be made using a combination of RMS and average ZCR. However, average ZCR is higher in unvoiced sounds, while RMS value is higher in voiced sounds and lower in non-speech sounds. The resulting speech signals after extracting the short-time (time-domain) signal features are shown in Fig. 1. The frequency range of most voice information is 250–6800 Hz [15]. The “discrete Fourier transform (DFT)” handles the frequency-domain features that provide information about each frequency contained in a signal [16]. Two frequency-domain characteristics were used in the segmentation methods: spectral centroid and spectral flux. The spectral centroid depicts the center of gravity of a voice signal. High values represent vocal sounds, and this feature assesses the spectral position [17]. “The center of gravity of its spectrum” is how the spectral centroid, SCi , of the i-th frame is described, and the following equation gives it: SCi =

 N −1

m=0 f (m)X i (m)  N −1 m=0 X i (m)

(4)

Here, f (m) stands for the center frequency of that bin’s length N and X i (m) is the amplitude corresponding to bin i in the DFT spectrum. As illustrated in Fig. 2b, higher values of this characteristic, which measures spectral location, correspond to “brighter sounds”. Finally, spectral flux is a measurement of how quickly a signal’s

208

M. M. Rahman and M. R. Rinty

power varies. It is determined by comparing the power signal between two frames (the Euclidean distance) [18]. The spectral flux can be used, among other things, to assess the timbre of an audio stream or in onset detection. The equation for spectral flux, for example, is: N /2  SFi = [|X i (k)| − |X i (k − 1)|]2

(5)

k=1

The DFT coefficient of the i-th short-term frame of length N is represented here by X i (k). SHi shows the rate of spectral changes in speech features. Figure 2 shows the generated speech signals after extracting the short-time (frequency-domain) signal characteristics.

Fig. 2 Graphs of a original signal, b spectral centroid features, and c spectral flux features of the same speech sentence

17 Dynamic Thresholding with Short-Time Signal …

209

3 Dynamic Thresholding Approach A dynamic thresholding approach is used to find the uttered words after extracting speech feature sequences. In segmentation, this approach uses a separate threshold value for each speech feature vector. This approach automatically finds two thresholds, T1 and T2 , for energy and spectral centroid spectra, respectively. Finally, sequential frames with individual signal qualities greater than the calculated criteria are used to produce the targeted voiced segments. Figure 3 shows both filtered feature sequence curves (signal energy and spectral centroid) with threshold settings. Figure 4 shows the overall segmentation results of another Bangla speech sentence which contains 5 (five) speech words. The other segmentation approach utilizes dynamic thresholding and blocking method, also known as “blocking black area” method, in the spectrogram image of Bangla speech sentence. After generating the spectrogram image of a speech signal, the thresholding method first transforms the image into a grayscale illustration [19]. The threshold analysis algorithm is then used to limit which pixels are redirected into black or white [20], as shown in Fig. 5. Static and dynamic thresholding are two most used thresholding procedures. Each pixel in the spectrogram image has its own threshold in the proposed approach. Finally, sequential frames with individual signal

Fig. 3 Curves with threshold values are ordered by the median filtered feature

210

M. M. Rahman and M. R. Rinty

Fig. 4 Overall segmentation results: a the original signal, b the short-time signal energy features curve with a threshold value, c the spectral centroid sequence curve with a threshold value, and d the segmented words in dashed circles

qualities greater than the calculated criteria are used to produce the targeted voiced segments. However, the issue is determining how to choose the desired threshold. Therefore, this study offers Otsu’s thresholding method to calculate the preferred threshold value. Otsu’s technique, which was developed in 1979 [21], by “Nobuyuki Otsu”, is typically more successful at segmenting images [22]. This method assumes that the image has two main areas: Background and foreground. It then determines “an optimal threshold that minimizes the weighted within-class variance and maximizes the between-class variance”.

4 Blocking Method A new approach (blocking method) is introduced in speech segmentation [23]. It blocks voiced parts of continuous speech’s thresholded illustration into multiple black boxes that separate them from silent/unvoiced segments. The black area represents vocal parts, and the white area represents unvoiced parts. The edges of each black block indicate speech word borders in continuous speech. Correctly identifying the speech unit boundaries (i.e., locating start and end points) shows proper

17 Dynamic Thresholding with Short-Time Signal …

211

Fig. 5 Bangla speech sentence graph and its thresholded (spectrogram) images

segmentation. As a result, in the voiced parts of the spoken sentence, this method generates rectangular black boxes, each of which shows the appropriate speech segment (e.g., words/syllables). The overall segmentation process with blocking method is illustrated in Fig. 6. These two points are utilized to automatically label the word boundaries in the original speech sentence, dividing each speech segment from the

212

M. M. Rahman and M. R. Rinty

Fig. 6 Thresholded spectrogram image (top), rectangular black boxes after blocking the voiced regions (middle), and speech word segments represent the speech segmentation using the blocking black area approach (bottom)

speech sentence, after identifying the start and stop positions of each black box. According to Fig. 6, the speech sentence presents 6 (six) black boxes, which represent 6 (six) word fragments in the “Amader Jatiya Kabi Kazi Nazrul Islam” speech sentence.

17 Dynamic Thresholding with Short-Time Signal …

213

5 Conclusion We have offered a straightforward approach based on short-term speech features for efficiently segmenting continuous speech into smaller units. This feature-based approach gives a foundation for differentiating voiced parts from unvoiced components. Moreover, it shows a changing feature on a short-term basis that may expose the tempo and periodicity character of the targeted speech signal. This paper also presents an effective dynamic thresholding algorithm with blocking method in continuous speech segmentation. However, it faced difficulty in segmenting some words adequately. It owed numerous sources of speech variability, such as phonetic features, pitch and amplitude, speaker’s characteristics, device, and environment properties. Acknowledgement The authors would like to convey their sincere gratitude to the Jatiya Kabi Kazi Nazrul Islam University’s research and extension center (Nazrul University, Bangladesh) for their financial supports and cooperation in conducting this research work.

References 1. Rahman MM, Khan MF, Bhuiyan MA-A (2012) Continuous Bangla speech segmentation, classification and feature extraction. Int J Comput Sci Issues 9(2):67 2. Rahman MM, Bhuiyan MAA (2012) Continuous Bangla speech segmentation using short-term speech features extraction approaches. Int J Adv Comput Sci Appl 3(11) 3. Rahman MM, Bhuiyan MA-A (2013) Dynamic thresholding on speech segmentation. Int J Res Eng Technol 2(9):404–411 4. Hioka Y, Hamada N (2003) Voice activity detection with array signal processing in the wavelet domain. IEICE Trans Fundamentals Electron Commun Comput Sci 86(11):2802–2811 5. Beritelli F, Casale S (1997) Robust voiced/unvoiced speech classification using fuzzy rules. In: IEEE workshop on speech coding for telecommunications proceedings. Back to basics: attacking fundamental problems in speech coding. IEEE, pp 5–6 6. Rahman MM, Bhuiyan MA-A (2015) Comparison study and result analysis of improved backpropagation algorithms in Bangla speech recognition. Int J Appl Res Inf Technol Comput 6(2):107–117 7. Kadir MA, Rahman MM (2016) Bangla speech sentence recognition using hidden Markov models. Int J Multidisc Res Dev 3(7):122–127 8. Rahman MM, Bhuiyan MA-A (2011) On segmentation and extraction of features from continuous Bangla speech including windowing. Int J Appl Res Inf Technol Comput 2(2):31–40 9. Rahman MM, Khan MF, Moni MA (2010) Speech recognition front-end for segmenting and clustering continuous Bangla speech. Daffodil Int Univ J Sci Technol 5(1):67–72 10. Sharma M, Mammone R (1996) Blind speech segmentation: automatic segmentation of speech without linguistic knowledge. In: Proceeding of fourth international conference on spoken language processing. ICSLP’96, vol 2. IEEE, pp 1237–1240 11. Schiel F (1999) Automatic phonetic transcription of non-prompted speech. In: Proceedings of the ICPhS, pp 607–610 12. Rahman MM (2022) Continuous Bangla speech processing: segmentation, classification and recognition. B. P. International 13. Zhang T, Kuo C-C (1999) Hierarchical classification of audio data for archiving and retrieving. In: IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (cat. no. 99CH36258), vol 6. IEEE, pp 3001–3004

214

M. M. Rahman and M. R. Rinty

14. Rabiner LR, Sambur MR (1975) An algorithm for determining the endpoints of isolated utterances. Bell Syst Techn J 54(2):297–315 15. Niederjohn R, Grotelueschen J (1976) The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression. IEEE Trans Acoust Speech Signal Process 24(4):277–282 16. Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301 17. Giannakopoulos T (2009) Study and application of acoustic information for the detection of harmful content, and fusion with visual information. Thesis, Department of Informatics Telecommunications 18. Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler MB (2005) A tutorial on onset detection in music signals. IEEE Trans Speech Audio Process 13(5):1035–1047 19. Shapiro LG, Linda G (2002) Comput Vis. Prentice Hall 20. Rahman M, Khatun F, Islam MS, Bhuiyan MA-A (2015) Binary features of speech signal for recognition. Int J Appl Res Inf Technol Comput 6(1):18–25 21. Gonzalez RC, Woods RE (1992) Digital image processing reading. Addison-Wesley, MA 22. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66 23. Rahman MM, Khatun F, Bhuiyan MA-A (2015) Blocking black area method for speech segmentation. Int J Adv Res Artif Intell 4(2):1–6

Chapter 18

Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images Balla Pavan Kumar , Arvind Kumar , and Rajoo Pandey

1 Introduction The quality of outdoor images is affected by bad climates such as fog. Due to this scenario, the performance of many real-time video applications such as automated driver assistance systems [1] and video surveillance is badly affected. Hence, there is a need to develop a suitable algorithm to reduce the haze effect. There are many dehazing algorithms that can effectively eliminate the haze. However, there are still some challenges that need to be addressed. As mentioned in [2], the dehazing methods can be categorized on the basis of (i) image enhancement, (ii) image fusion, (iii) image restoration, and (iv) neural network. The image enhancement-based dehazing methods [3, 4] improve the visibility of the curved edges of a hazy image but they produce halo artifacts, and also, they cannot efficiently remove the haze. The image fusion-based dehazing methods [5, 6] work well for the images with homogenous haze but they produce the halo effect and exhibit high execution time. The image restoration-based algorithms are currently trending in research of image dehazing. Although these methods [7–10] produce natural outputs with effective haze removal, they produce the over-dark outcome and also, these algorithms produce high computation time. The dehazing method of [11] considers the neural network framework to remove the fog effect. Despite the neural network techniques that are trending these days, they are not suitable for image dehazing as the atmospheric light is empirically chosen, which may result in the unnatural B. P. Kumar (B) · A. Kumar · R. Pandey National Institute of Technology, Kurukshetra, India e-mail: [email protected] A. Kumar e-mail: [email protected] R. Pandey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_18

215

216

B. P. Kumar et al.

outcome as mentioned in [12]. Although the haze relevant features can be learned in these methods, they rely on huge training datasets. Most of the dehazing methods do not consider the local variations such as the regions of ‘less-affected by haze’ and ‘more-affected by haze’. If all the areas of a hazy image are considered uniformly, the regions with less-affected by haze may be over-dehazed and may produce dark outputs; also, the regions with more-affected by haze may be under dehazed, and also, the haze at these regions is not effectively eliminated. Hence, a dehazing algorithm should be designed adaptively, which treats different areas of a dehaze image in appropriate manner, for the better outcome. Inspired from our work [13], a hazy image can be categorized as shown in Fig. 1. Also, many of the existing methods of dehazing are not suitable for real-time image processing systems as they exhibit high computational time. In order to overcome this problem, a fast dehazing algorithm has to be implemented with less complexity. In this paper, a fast adaptive dehazing algorithm is proposed which is inspired from our previous work [13]. At first, a hazy image is categorized into ‘less-affected by haze’ and ‘more-affected by haze’ regions. After that, the input hazy image is passed separately to the two blocks, namely ‘less-affected by haze’ and ‘more-affected by haze’, for the purpose of adaptive dehazing. For every block, the input hazy image is decomposed into base and detail layers with different scale smoothing factors. In each block, the fast dehazing algorithm of [15] is applied for the base layer for the dehazing purpose. Also, the fast Laplacian enhancement [16] is applied for the details layer in

Hazy Image

Dark-region layer

Non-dark

region

Fig. 1 Hazy image categorization which is inspired by [13] for the hazy image— ‘KRO_Google_143’ of RESIDE dataset [14]

18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images

217

each block for the details enhancement. In each block, the dehazed image is fused with a detailed enhanced image to obtain the recovered image. Finally, the recovered image outputs of the two blocks are fused based upon the regional categorization. The proposed method exhibits good outcomes and is also suitable for real-time video dehazing with 25 frames per second (FPS). The rest of the paper comprises (i) background of hazy image representation, (ii) methodology of the proposed method, (iii) experimental results of proposed and existing dehazing methods, and (iv) conclusion.

2 Background As given in [17], the physical description of a hazy image can be expressed as I (x) = J (x).t(x) + A(1 − t(x))

(1)

where x denotes the pixel location, J represents a haze-free image, A denotes the atmospheric light, and I represents a hazy image. As given in [17], the transmission map (t) is expressed as t (x) = e−β.d(x)

(2)

here, d represents the distance from the camera to the scene, and β is the attenuation constant. From Eqs. (1) and (2), it can be observed that the more the distance and/or attenuation constant, the greater the effect of atmospheric light on an image (haze effect). As the different areas of a hazy image are affected differently w.r.t distance and/or attenuation constant, adaptive dehazing is essential to treat according to the effect of haze.

3 Methodology 3.1 Hazy Image Classification The input hazy image is classified into ‘less-affected by haze’ and ‘more-affected by haze’ regions based on the pixel intensity values. The influence of atmospheric light will be more when there is more haze. In most cases, the pixel intensity value of the atmospheric light lies in the range (0.7, 1), where the pixel intensity values of any image can lie in the range (0, 1). Hence, it can be deduced that pixels with lowintensity values are less-affected by haze due to less influence of atmospheric light. Empirically, it can be concluded that the pixel intensities with less than 0.5 would be

218

B. P. Kumar et al.

considered as ‘less-affected by haze’ region, and the rest of the pixels come under the ‘more-affected by haze’ region. The hazy image categorization is illustrated as  Haze level of region =

Less affected by haze; if I (x) < 0.5 More affected by haze; otherwise

(3)

here, I represents the hazy image, and x represents the pixel-coordinate of I.

3.2 Fast Hazy Image Decomposition After the classification of hazy images, the input hazy image is applied to the ‘lessaffected by haze’ and ‘more-affected by haze’ blocks as shown in Fig. 2. For each block, different sets of scale smooth factors are chosen for the decomposition of the hazy image. The hazy image is decomposed into base and detail layers using the fast guided filter of [15] as qi = ak Ibli + bk , ∀i ∈ wk

(4)

where I bl represents the base layer of I, q denotes the filtered image, i represents the pixel index, w denotes the square matrix, k is the index of w; a and b represent the linear map and constant map, respectively. The input hazy image is scaled by p times for faster execution of the guided filter.

Fig. 2 Block diagram of fast region-based adaptive dehazing algorithm

18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images

219

3.3 Fast Dehazing After image decomposition, the base layer is dehazed using the atmospheric model as mentioned in Eq. (1). As given in [18], from the mean of b, the atmospheric light map (Amap ) can be calculated as Amap (x) = mean(b) = μq (x) −

σq2 (x) σq2 (x) + ε

μq (x)

(5)

where μq and σ 2 q are the mean and variance of guided filter output (q), respectively, as shown in Eq. (4), ε is the scale smooth value. The above process is implemented separately in two blocks with different scale smooth values, i.e., ε = 10–6 for lessaffected by haze block and ε = 10–5 for those more-affected by the haze block. The transmission map (T ) can be obtained from [18] as T (x) = 1 − ωD(x)/A

(6)

where ω represents the constant parameter which is empirically chosen as 0.7, D denotes the dark channel; A is the atmospheric light value that is obtained from the median of the topmost 1% pixels of the Amap . The dark channel can be evaluated from [7] as D(x) = min Iblc (x) c∈{r,g,b}

(7)

where Iblc denotes the RGB color channel of Ibl . The transmission map is refined using the spatio-temporal Markov random field, as given in [18]. From the obtained refined transmission map (TR ) and atmospheric light (A), the dehaze image (J) can be obtained from Eq. (1) as J=

Ig − A + AB TR

(8)

where AB is balanced atmospheric light as in [19], AB can be expressed as  AB =

AvB , if σ A > th A, otherwise

(9)

where th represents the threshold value (as  given in [19],it is empirically chosen as th = 0.2), vB denotes the normalizing vector √13 , √13 , √13 , σA represents the variance of atmospheric light. The above dehazing process is implemented separately for two blocks—‘lessaffected by haze’ and ‘more-affected by haze’ with different scale smoothing values ε = 10−6 and ε = 10−5 , respectively.

220

B. P. Kumar et al.

3.4 Fast Multilayer Laplacian Enhancement After the image decomposition, the details layer is enhanced using fast multilayer Laplacian enhancement of [16]. It is expressed as  M(x) =

sw , 1+e(−s(R−Rs ))

x,

if − 0.5sw < Ri < 0.5sw , otherwise

(10)

where M denotes the linear mapping function of fast Laplacian enhancement, sw represents the scale mapping width, s denotes the scale mapping factor; R and Rs denote the residual and mean of the residual images, respectively. The above enhancement process is implemented separately for two blocks—‘less-affected by haze’ and ‘more-affected by haze’ with different scale mapping factors s = 20 and s = 40, respectively. After that, in each block, the results of dehazed image and detailed enhanced image are fused to attain the recovered images as shown in Fig. 2. The final adaptive dehazing result is obtained by fusing the recovered images from both blocks according to the image categorization, i.e., for the recovered image of the ‘less-affected by haze’ block, only the less-affected by haze regions are chosen and vice-versa for the moreaffected by haze block. The proposed dehazing algorithm is fast and also adaptively dehazes according to the level of haze. The experimental results of the proposed dehazing technique show significant improvements when compared to the existing methods.

4 Experimental Results The proposed fast adaptive dehazing technique is tested with Foggy Cityscapes [20] and Foggy Zurich [21] datasets. These datasets contain road images in foggy weather conditions and are primarily used in applications for automated driver assistance systems. The quantitative results are mentioned in Table 1 for the existing methods [6–11] as well as for the proposed fast adaptive dehazing method. The proposed technique is implemented with less complexity to suit real-time video dehazing applications. The proposed method exhibits the computation time of around 0.04 s on average which is equal to around 25 FPS. Although the existing methods produce good quantitative results (NIQE [22] and BRISQUE [23]), they exhibit large execution times which are not suitable for real-time video processing. The qualitative results of the proposed algorithm and existing techniques [6– 11] for the images from Foggy Cityscapes [20] and Foggy Zurich [21] datasets are shown in Fig. 3. The existing methods [6–8, 10, 11] produce better dehazing results than the proposed method, but they produce over-dark outcomes after dehazing. The existing method MOF [9] produces the over-color saturated outcome. The proposed

18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images

221

Table 1 Image quality parameter values for 1508039851_start_01m00s_000351 and frankfurt_003920_beta_0.005 images of Foggy Cityscapes and Foggy Zurich datasets, respectively Parameters

DEFADE [6]

Fast DCP [7]

BCCR [8]

MOF [9]

D-NET [11]

OTM [10]

Proposed

Computation time

9.07735

1.0552

3.0868

1.298

1.0471

18.4023

0.0288

NIQE

2.1594

4.1498

2.2764

2.2764

2.5279

2.3849

7.2775

BRISQUE

22.2386

12.0204

25.0862

25.0862

37.1121

23.2916

64.2919

Computation time

9.09372

1.0504

3.5979

1.369

1.1115

15.9539

0.0603

NIQE

3.489

5.647

3.6279

3.6279

3.7946

3.7767

7.6359

BRISQUE

30.579

31.655

28.6412

28.6412

42.8589

33.3614

46.0747

method produces natural output and does not produce over-dark output and over-color saturation. The analysis of hazy image proposed dehazing result w.r.t semantic segmentation [24] is shown in Fig. 4. The semantic segmentation for the proposed dehazed output

a) Hazy Image

b) DEFADE [6]

c) Fast DCP [7]

e) MOF [9]

f) D-NET [11]

g) OTM [10]

i) Hazy Image

m) MOF [9]

j) DEFADE [6]

n) D-NET [11]

k) Fast DCP [7]

o) OTM [10]

d) BCCR [8]

h) Proposed Method

l) BCCR [8]

p) Proposed Method

Fig. 3 Results of existing dehazing methods and the proposed method. a, i Hazy images ‘1508039851_start_01m00s_000351’ and ‘frankfurt_003920_beta-_0.005’’ from Foggy Cityscapes and Foggy Zurich datasets, respectively. b–h, j–o Dehazed output of existing methods. h, p Dehazed result of the proposed technique

222

B. P. Kumar et al.

Fig. 4 Analysis of hazy image and proposed dehazing result w.r.t semantic segmentation [24]. a Hazy image ‘U080-000041’ from the FRIDA dataset [25]. b Dehazed output of proposed method. c Semantic segmentation of (a). d Semantic segmentation of (b)

shows better results when compared to the same for the hazy image as shown in Fig. 4c, d, respectively. Overall, the proposed fast dehazing method produces fast and efficient results which are suitable for the real-time video processing systems such as automated driver assistance systems.

5 Conclusion The challenges faced by the existing methods of dehazing are more complexity and more time consumption. For real-time video processing systems, the dehazing algorithm must be of less complexity and execute faster. A fast dehazing algorithm is proposed in this paper to overcome the challenges. A hazy image is first classified into ‘less-affected by haze’ and ‘more-affected by haze’ regions on the basis of pixel intensity values. The image decomposition, image dehazing, and details enhancement of the hazy image are performed separately in the blocks named ‘less-affected by haze’ and ‘more-affected by haze’, with different scale factors. The results of these two blocks are fused based upon the regional categorization for adaptive dehazing. The proposed adaptive fast dehazing method produces good dehazed results at the rate of 25 FPS which are suitable for real-time video processing systems.

References 1. Huang SC, Chen BH, Cheng YJ (2014) An efficient visibility enhancement algorithm for road scenes captured by intelligent transportation systems. IEEE Trans Intell Transp Syst 15(5):2321–2332 2. Wang W, Yuan X (2017) Recent advances in image dehazing. IEEE CAA J. Autom. Sinica 4(3):410–436

18 Fast Adaptive Image Dehazing and Details Enhancement of Hazy Images

223

3. Kim JH, Jang WD, Sim JY, Kim CS (2013) Optimized contrast enhancement for real-time image and video dehazing. J Vis Commun Image Represent 24(3):410–425 4. Li Z, Tan P, Tan RT, Zou D, Zhiying Zhou S, Cheong LF (2015) Simultaneous video defogging and stereo reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4988–4997 5. Ancuti CO, Ancuti C (2013) Single image dehazing by multi-scale fusion. IEEE Trans Image Process 22(8):3271–3282 6. Choi LK, You J, Bovik AC (2015) Referenceless prediction of perceptual fog density and perceptual image defogging. IEEE Trans Image Process 24(11):3888–3901 7. He K, Sun J, Tang X (2010) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell 33(12):2341–2353 8. Meng G, Wang Y, Duan J, Xiang S, Pan C. Efficient image dehazing with boundary constraint and contextual regularization. In: Proceedings of the IEEE international conference on computer vision, pp 617–624 9. Zhao D, Xu L, Yan Y, Chen J, Duan LY (2019) Multi-scale optimal fusion model for single image dehazing. Signal Process Image Commun 74:253–265 10. Ngo D, Lee S, Kang B (2020) Robust single-image haze removal using optimal transmission map and adaptive atmospheric light. Remote Sens 12(14):2233 11. Cai B, Xu X, Jia K, Qing C, Tao D (2016) Dehazenet: an end-to-end system for single image haze removal. IEEE Trans Image Process 25(11):5187–5198 12. Haouassi S, Wu D (2020) Image dehazing based on (CMTnet) cascaded multi-scale convolutional neural networks and efficient light estimation algorithm. Appl Sci 10(3):1190 13. Kumar BP, Kumar A, Pandey R (2022) Region-based adaptive single image dehazing, detail enhancement and pre-processing using auto-colour transfer method. Signal Process Image Commun 100:116532 14. Li B, Ren W, Fu D, Tao D, Feng D, Zeng W, Wang Z (2018) Benchmarking single-image dehazing and beyond. IEEE Trans Image Process 28(1):492–505 15. He K, Sun J (2015) Fast guided filter. arXiv:1505.00996 16. Talebi H, Milanfar P (2016) Fast multilayer Laplacian enhancement. IEEE Trans Comput Imaging 2(4):496–509 17. Koschmieder H (1924) Theorie der horizontalen Sichtweite. Beitrage zur Physik der freien Atmosphare, 33–53 18. Cai B, Xu X, Tao D (2016) Real-time video dehazing based on spatio-temporal mrf. In: Pacific Rim conference on multimedia. Springer, Cham, pp 315–325 19. Shin YS, Cho Y, Pandey G, Kim A (2016) Estimation of ambient light and transmission map with common convolutional architecture. In: OCEANS 2016, MTS/IEEE. Monterey. IEEE, pp 1–7 20. Sakaridis C, Dai D, Van Gool L (2018) Semantic foggy scene understanding with synthetic data. Int J Comput Vis 126(9):973–992 21. Sakaridis C, Dai D, Hecker S, Van Gool L (2018) Model adaptation with synthetic and real data for semantic dense foggy scene understanding. In: Proceedings of the European conference on computer vision (ECCV), pp 687–704 22. Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind” image quality analyzer. IEEE Signal Process Lett 20(3):209–212 23. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708 24. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818 25. Tarel JP, Hautiere N, Cord A, Gruyer D, Halmaoui H (2010) Improved visibility of road scene images under heterogeneous fog. In: 2010 IEEE intelligent vehicles symposium, pp 478–485

Chapter 19

Review on Recent Advances in Hearing Aids: A Signal Processing Perspective R. Vanitha Devi and Vasundhara

1 Introduction According to the definition provided by the World Health Organization (WHO), hearing loss is the inability to hear with a hearing threshold of 20 dB or better in both ears. Nowadays, hearing loss is becoming a commonly arising problem due to noise, aging, disease, and inheritance. Herein, the consequences of hearing loss are described systematically. Conversations with friends and family may be difficult for people with hearing loss. They may also have difficulty in hearing doorbells and alarms, as well as responding to warnings. Overall, the problem of hearing loss affects one among three adults between the ages of 65 and 74, and approximately the count expected to be half or little more in those people of age group over 75 [1]. However, some people may be hesitant to confess they have hearing problems. Older adults who have trouble hearing may become depressed or withdraw from others because they are disappointed or humiliated by their inability to understand what is being spoken to them. Because they can’t hear properly, older adults are sometimes misunderstood as being confused, unresponsive, or uncooperative. Based on the origin of hearing loss, it is classified into following types: • Conductive hearing loss: This phrase refers to hearing loss when sound cannot pass through the middle and outer ear. As a consequence of this hearing of soft sounds will be difficult to patent.

R. Vanitha Devi (B) · Vasundhara Department of Electronics and Communication Engineering, National Institute of Technology Warangal, Hanamkonda, Telangana 506003, India e-mail: [email protected] Vasundhara e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_19

225

226

R. Vanitha Devi and Vasundhara

• Sensorineural hearing loss: This phrase refers to hearing loss caused by a problem in the cochlea, the hearing nerve, or both. The cochlea, which is a “sensing organ”, and is referred to as “sensori”, while the hearing nerve is referred to as “neural.” • Mixed hearing loss: A person is said to have a mixed hearing loss if they suffer from both conductive and sensorineural hearing loss. Furthermore, hearing loss can also be categorized based on the ability of listening as minor, mild, moderate, and severe or profound hearing loss. People who are mildly deaf have a hard time understanding normal speech. Moderate hearing loss makes it difficult to understand loud speech. In the case of severe hearing loss one can only get clear speech. Moreover, people with a high level of hearing loss have problems in understanding clear speech. Therefore, hearing aid devices are found to have paramount importance in the life of the people who suffer with hearing loss problems. In subsequent sections, complete literature reports on various types of hearing aid devices and challenges of these were discussed.

1.1 Hearing Aid Devices Hearing aids are electronic devices that convert sound waves received through a microphone into electrical signals, which are then processed thoroughly before being amplified and transmitted to a loudspeaker. These devices are classified as (i) Analog hearing aid devices and (ii) Digital hearing aid devices.

1.2 Analog Hearing Aid Devices The sound coming from outside, including speech and all ambient noise, is amplified by an analog hearing aid. According to Halawai et al. [2] these devices are made up of (i) a small battery, (ii) a microphone, (iii) a speaker, and (iv) a simple electronic circuit with a transistor to amplify and control sound coming from the source. Figure 1 displays a block diagram of an analog hearing aid. Analog hearing aids amplify both speech and background noise since they are unable to distinguish between desired (speech) and undesirable (noise) signals. As a result, background noise can interfere with a conversation. It cannot provide any noise cancelling technology.

1.3 Digital Hearing Aid Devices In 1996, the first fully digital and programmable hearing aids were exhibited to the public for the first time. In comparison with analog hearing aids, the digital hearing aids have more flexibility and these can be fine-tuned to the patient’s demands [2].

19 Review on Recent Advances in Hearing Aids: A Signal Processing …

227

Fig. 1 Block diagram of basic analog hearing aid device

Block diagram of digital hearing aid is given in Fig. 2. These are used to both amplify and reduce noise signals in speech signals. Furthermore, it is interesting to note that digital devices can also distinguish between speech and background noise. Speech enhancement and noise reduction techniques are used in digital hearing aids. As shown in Fig. 2, the microphone in a digital hearing aid device takes incoming signals and transforms them to a digital form. The digital hearing aid devices contain a microcontroller and a small loudspeaker that transmits the incoming signal to the canal of the ear. Hitherto, enormous work has been carried out in the field of hearing aids with an aid of tuning the signals thus the systematic review of the work is essential. Therefore, in this article an attempt has been made to consolidate the prominent results reported in this field.

Fig. 2 Digital hearing aid block diagram

228

R. Vanitha Devi and Vasundhara

2 Smart Hearing Aid Using Deep Learning In speech processing research, clean speech signal restoration by complete removal of noisy speech is regarded as a considerably difficult problem [3]. In recent times, deep learning has gained significant momentum to handle numerous problems effectively that were previously difficult to probe and solve utilizing traditional signal processing approaches. The deep learning algorithms are capable to train a function f which maps an input variable X and provides the appropriate variable Y as output even if the input is more complex and if we have a nonlinear mapping function [4]. As a direct consequence, it can be used to learn the complex nonlinear function that maps from noisy speech to desired speech, thereby separating the desired signal from the undesired one. Different deep neural network topologies have been utilized for speech improvement. One of these networks, the deep belief network (DBN), is addressed in [5–10]. For the pre-train phase, it employs Restricted Boltzmann Machines (RBMs) layers, and for the fine-tuning phase, a feedforward network has been utilized [11]. RBMs learn the dataset’s features in the pretrial phase utilizing an unsupervised training strategy, where each succeeding RBM uses the learned features from the preceding RBM as input to learn ever-higher-level features [12]. During the fine-tuning step, the weights of a conventional back propagation driven feedforward neural network (FNN) are initialized using the features that will have high information. This method of weight initialization aids in the discovery of improved parameter values during training. Another architecture utilized in improvement of speech is Convolutional Neural Networks (CNNs) as shown in [13, 14], in which convolutions are used to learn data which has high information of input data. The dimensionality of the input will vary between FNN and CNN. CNN takes three-dimensional inputs whereas FNN only takes one-dimensional vectors. The CNN is made up of three layers: viz. the input layer, the feature extraction layer, and the output layer [15]. Other speech augmentation methods combine two architectures, such that described in [16], which mixes CNN with recurrent neural network (RNN). In this scenario, features will be extracted from the input data and fixed by CNN before being sent to the RNN for learning and estimation. RNNs have been used successfully in applications that utilize sequence data because of its short-term memory [15]. When making decisions, RNNs consider both the current input, and anything learned from previously received inputs. However, some types of noises are as useful as desired speech signals like hearing aid applications. Car horn, Fire alarm, car siren and several other noise types will lead to major problems if the hearing impaired are not able to hear. Specific external alarm system has been used by the people with hearing difficulty to manage this problem which uses a flashlight or an object which vibrates when the desired noise is produced [17, 18]. The disadvantages of these devices are 1. It is costlier, 2.A separate systems for different types of desired noise should have been used by the user. Further, a new approach has been developed where speech enhancement and

19 Review on Recent Advances in Hearing Aids: A Signal Processing …

229

alerting systems are carried out in a device itself so that now hearing aids will become smart hearing aids [19].

3 Smart Phone-Based Hearing Aids People who use hearing aids along with auxiliary devices will have improved user experience when compared with people who use hearing aids alone [20–23]. A smartphone chipset aboard and a Linux operating system are utilized to assist the operations of a hearing aid [20]. It gives high performance with low noise audio. As additional board is included in this its cost is significantly high. In [21] a smartphonebased app is used and a commercial Bluetooth headset has also been tested. In [22, 23] a system has been tested with a smartphone and personal frequency modulation. Various speech augmentation approaches based on deep learning methods have been created and proven useful, but they have failed to reach real-time processing, which is essential in the case of hearing aids. In this context, Wang and co-workers have contributed significantly in this field of smart phone-based binaural hearing aid [24]. A smartphone-based hearing self-assessment (SHSA) system has been developed and the SHSA utilizes a hearing aid as an audio source with “coarse-to-fine focus” algorithm and a smartphone as the controller and user interface. According to the findings, it has an average test accuracy difference which is less than 6 decibels HL and also able to save more than 50% of testing time when compared to a traditional hearing test. People with hearing loss can use the SHSA system to know self-hearing capability. Further, Bhat et al. have developed a multi-objective learning-based CNN-based speech enhancement (SE) method [25]. The suggested SE method which is CNNbased is computationally efficient and can be used to perform speech enhancement in real-time with reduced audio latency using smartphones. The results obtained from their scrutiny confirms that the suggested method’s functionality and its practical application in the real domain with varying levels of noise and low SNRs. Moreover, it can also be reported that Sun et al. introduced a supervised speech augmentation method based on an RNN structure to handle the real-time challenge [26]. This proposes a supervised speech enhancement approach based on an RNN structure even at low levels of SNR it will increase speech intelligibility and quality. The structure is destined to lower the whole procedure’s computation complexity.

4 Occlusion Effect in Hearing Aids When the hearing aid’s portion of cartilaginous is totally or considerably occluded by the hearing aid (because fixing in the section which has bone leads to discomfort physically), acoustic energy is confined inside the canal of the ear. When a hearing aid user speaks or chews, vibrations are carried in the cartilaginous regions of the ear

230

R. Vanitha Devi and Vasundhara

canal, which serve as an elastic membrane, giving the user the impression that his speech is being muffled. People who regularly hear low-frequency sounds but have severe to profound sensorineural losses at high frequencies are more susceptible to the occlusion effect. This happens as a result of an increase in power at low frequencies, which is mostly in the range of 200–500 Hz. It is reported that an active noise cancellation (ANC) system with a fixed digital controller was used to reduce the OE [27]. A controller has been constructed, and simulations have demonstrated a significant reduction in OE in the 60–700 Hz frequency range. The proposed method does not require precise fit and is durable in a variety of everyday situations. However, the controller must be tweaked individually to ensure best performance. Furthermore, as is known from noise canceling headphones, a complete analysis of occlusion effect cancellation (OEC) and its relationships to ANC can be presented [28]. Acoustic measurements, design restrictions, system topologies, synergies, filter design, and subjective and objective evaluations are all covered in detail. The suggested OEC structure has the primary advantage of almost decoupling the performance and design of the feedforward and feedback filters. By swapping filter coefficients, the system may switch between ANC and OEC operation modes.

5 Feedback Cancellation in Hearing Aids A hearing aid is made up of a microphone that accepts input signals(n), a signal processing block G(z) to effectively handle the amplification, and a receiver that works as a loudspeaker, according to the block diagram in Fig. 3a. All signal processing for noise reduction, signal amplification, and sub-band processing based on the user’s level of hearing loss is contained in the hearing aid’s forward route, denoted by the symbol G(z). The microphone and receiver of a hearing aid will be placed close to one another due to the hearing aid’s small size. Furthermore, the user would feel uncomfortable if the hearing aid is too tightly fixed. As seen in Fig. 3a (F(z)), this creates a feedback path between the receiver and the microphone. The input to the microphone will once more pick up the signal Y (z) of the receiver that is supposed to be delivered to the user’s ear, creating a closed loop system. Acoustic feedback is the name for this phenomenon in hearing aids. Figure 3a depicts the closed loop transfer function with the input signal S(Z) and the received signal Y (Z) as follows: H (z) =

G(z) Y (z) = S(z) 1 − G(z)F(z)

The major problem found in hearing aid devices is Acoustic feedback. The Acoustic feedback causes hearing aid devices to oscillate at higher gain and also hurdles the highest gain which is available to the users. Thus, sound disturbs significantly. Further it also causes a howling effect, screaming and whistling. As a result, acoustic feedback minimization gained utmost importance in hearing aids. According

19 Review on Recent Advances in Hearing Aids: A Signal Processing …

231

Fig. 3 a Hearing aid block diagram with acoustic feedback. b Hearing aid block diagram with normalized least mean square algorithm based on adaptive feedback cancellation (AFC)

to the survey, several techniques were utilized to solve the problem of acoustic feedback. In [29–33] some methods were proposed and in [34, 35] a full evaluation of numerous proposals for AFC in hearing aids were reported. It’s worth noting that as when a hearing impaired keeps a phone near ear the acoustic feedback among the receiver and the microphone might change rapidly and dramatically [36]. AFC is performed using an adaptive filter W (z) to represent the acoustic feedback path F(z), as shown in Fig. 3b. Because of its simplicity, robustness, and ease of implementation, the normalized least mean square (NLMS) method [37] is the most widely used adaptive algorithm for AFC. The received signal y(n) and the microphone signal x(n) which is summed up with s(n) and y f (n) will operate as the input signal and intended response, respectively, for W (z), as shown in Fig. 3b. These two signals are highly linked, resulting in a biased convergence of W (z) and as a result non-optimal acoustic feedback cancellation. Many scholars have spent the last few decades focusing on developing efficient adaptive filtering algorithms. As demonstrated in Fig. 3b, AFC is carried out utilizing an adaptive filter W (z) to simulate the acoustic feedback circuit F(z). The normalized least mean square (NLMS) approach [37] is the most used adaptive algorithm for AFC for its easiness, robustness, and ease of implementation. The received signal y(n) and the microphone signal x(n) are combined to generate s(n) and y f (n), respectively, in Fig. 3b, which shows how they serve as the input signal and desired response for W (z). Due to the strong correlation between these two signals, W (z) converges biasedly, leading to a non-ideal cancellation of acoustic feedback. Over the past few decades, numerous academics have concentrated on creating effective adaptive filtering algorithms. The goal of this study is to highlight important works in this field.

232

R. Vanitha Devi and Vasundhara

In HA a closed loop has been created, due to this the feedback path estimate can induce bias which leads to high correlation between the loud speaker output and the input signal. To minimize the bias, various decorrelation methods were introduced, viz. delay insertion [38–42], probe noise insertion [43–46], frequency shifting [47, 48], phase modulation [49], and pre-whitening filters [50, 51]. The adaptive feedback cancellation which utilizes prediction error method (PEM-AFC) is well known due to its applications in both time [51–56] and frequency [39, 57–61] domains. In this method pre-filters are used to pre-whitened adaptive filters, which results in less correlation and less bias. There were more methods introduced like sub-band techniques [62–65], multiple microphones [56, 66–71], fast converging adaptive filters [51, 52, 54, 69, 72–74], filters with affine combination [59] and variable step size (VSS) [48, 75–77] and all of these combination of techniques [68, 78, 79] for AFC also were given a performance improvement. In [80] proposes an AFC strategy based on decomposing an adaptive filter into a Kronecker Product of two shorter filters. The need for a dependable AFC technique persists despite the fact that the aforementioned AFC approaches can improve system performance to some extent. When the least mean square (LMS) and normalized LMS (NLMS) algorithms were used, the performance of the AFC would suffer due to the sparse properties of the feedback path [81, 82] and correlated input signals. To further enhance the convergence and tracking rates, the hybrid normalized least square method (H-NLMS) for AFC has been introduced [83]. Additionally, a hardware implementation of acoustic feedback cancellers (AFCs) in hearing aids has been successfully achieved using the partitioned time-domain block LMS (PTBLMS) method [84]. To further investigate the change in sparsity conditions, the re-weighted Zero attracting proportionate normalized sub-band adaptive filtering (RZA-PNSAF) algorithm was developed. The perceptual evaluation of speech quality (PESQ) values and maximum stable gain of 3–5 dB have increased as a result of this procedure [85]. Additionally, a switching PEM that employs soft-clipping for AFC was created. By calculating the adaptive filter coefficients, it operates on a new update rule [86]. Additionally, the convex proportionate NLMS (CPNLMS) and convex improved proportionate NLMS (CIPNLMS) algorithms were proposed to reveal the rate of convergence and performance of the adaptive filter in steady-state to improve the cancellation of performance of the acoustic feedback in hearing aids [87]. Additionally, the proposed approach of resilient set membership M-based affine projection with sparsity awareness. Moreover, the proposed method of sparsity aware affine projection like robust set membership M-estimate (SAPL-RSM) filter was taken into consideration in HA to decrease the effect of impulsive noise on the adaptive weight of the feedback cancellation [88]. Details of advances in smart hearing aids and algorithms implemented for feedback cancellation have been systematically summarized in Tables 1 and 2, respectively.

19 Review on Recent Advances in Hearing Aids: A Signal Processing …

233

Table 1 Recent advances in smart hearing aids Authors (Year)

Contribution

References

Park and Lee (2016) Redundant convolutional encoder decoder (R-CED) network was used to map noisy and clean speech using supervised learning to remove babble noise

[13]

Nossier et al. (2019) A smart hearing aid is developed which distinguishes important noises in noise signals like fire alarms

[19]

Chern et al. (2017)

For a variety of target users who could gain from improved [23] listening clarity in the classroom, a smartphone-based hearing assistance system (smart hear) has been developed. The smart hear system includes speech transmission transmitter and reception devices (smartphone and bluetooth headset), as well as an android mobile application that connects and controls all of the various devices through bluetooth or WIFI

Chen et al. (2019)

Smartphone-based hearing self-assessment (SHSA) system [24] has been used to self-check hearing loss degree using a smartphone as the user-interface, controller and hearing aid as the audio source

Bhat et al. (2019)

An application has been built in smart phone which performs real-time speech enhancement and this will assist HA device here multi-objective learning-based CNN has been used which was computationally fast and reduced processing delay

[25]

Sun et al. (2020)

A supervised speech enhancement method using RNN is used for real-time application of HA

[26]

6 Conclusion The present review article provides a summary of recent advances in the performance of hearing aids. We sum up the basics of hearing aid, smartphone-based hearing aids, effect of occlusion, and feedback cancellation in hearing aid. Further the adaptive signal processing techniques employed for occlusion and feedback mitigation have been discussed. In the past decade, various adaptive filtering techniques have been employed for acoustic feedback mitigation in hearing aids. In recent times, focus has been shifted toward making smart hearing aids integrated with android or smart phone-based platforms. The researchers can take up work toward improving the perceptual speech quality deliverable by the hearing aids and making it more self-adjustable and sufficient. Further, they can be integrated with machine learning and artificial intelligence-based notions in the upcoming days. The paradigm can be shifted toward making hearing aids as a complete health monitoring device by including several health and cognitive monitoring embedded facilities in the device with the advent of latest technologies. We firmly believe that the present review article will provide significant insight to the chosen topic for the readers.

Frequency shifting with prediction error method was used which responded faster to feedback changes

Phase modulation was used and compared with frequency shifting (FS) and observed that FS is giving slightly better performance of AFC than phase modulation

Two channel AFC and a decoupled PEM-AFC is used. PEM-AFC is preferred for highly non-stationary signals [51]

NLMS adaptive filters provide a slow convergence rate when colored incoming signals are used. It was suggested to use the affine projection algorithm (APA) to increase convergence rate

The proportionate NLMS (PNLMS) and improved PNLMS algorithm (IPNLMS) has been proposed to speed convergence and tracking for fast sparse echo responses

PEM-based pre-whitening and a frequency-domain Kalman filter (PEM-FDKF) for AFC is compared with standard frequency-domain adaptive filter (FDAF) and reported that proposed algorithm reduces estimation error and improves sound quality

An improved practical VSS algorithm (IPVSS) is proposed which uses a variable step size to update the weight equation with upper and lower limits of the adaptive filter

Convex proportionate normalized Wilcoxon LMS (PNWLMS) algorithm was proposed which shows better cancellation performance than Filtered-x LMS algorithm

The Hybrid-AFC scheme was proposed where a soft-clipping-based stability detector was used to decide which [83] algorithm has to be used (NLMS or PEM) to update the adaptive filter. Computational complexity has increased slightly

Nakagawa et al. (2014)

Schroeder (1964)

Guo et al. (2012)

Spriet et al. (2005)

Tran et al. (2016)

Tran et al. (2017)

Bernardi et al. (2015)

Tran et al. (2016)

Vasundhara et al. (2016)

Nordholm et al. (2018)

(continued)

[81]

[77]

[59]

[54]

[52]

[49]

[47]

[46]

Probe noise inserted and enhanced to reduce bias and convergence rate was reduced by factor of 10

The signal quality reduces by Injecting probe noise in the loudspeaker so the probe signal is shaped and also forward path delay is reduced

Guo et al. (2012)

[44]

References [40]

Contribution

Van Waterschoot et al. (2011) AFC and its challenges were reported

Authors (Year)

Table 2 Algorithms for Feedback cancellation in hearing aids

234 R. Vanitha Devi and Vasundhara

A switched PEM with soft-clipping (swPEMSC) was proposed which improved the convergence and tracking rates, resulting in a better ability to recover from unstable/howling effect which resulted in a reduced howling effect and also system became stable

Convex improved proportionate NLMS (CIPNLMS) algorithm was proposed to improve the performance of AFC

Sparsity aware affine projection-like robust set membership M-estimate (SAPL-RSM) filtering has been used to [88] improve performance of AFC by its weight update method when impulsive noise was entered into the process of HA. Misalignment has been reduced and sound quality was improved

Tran et al. (2021)

Vanamadi et al. (2021)

Vasundhara (2021)

[87]

[86]

[84]

Vasundhara et al. (2018)

References

Contribution

Partitioned time-domain block LMS (PTBLMS) algorithm was proposed where hardware implementation of AFC is realized

Authors (Year)

Table 2 (continued)

19 Review on Recent Advances in Hearing Aids: A Signal Processing … 235

236

R. Vanitha Devi and Vasundhara

References 1. National Institute on aging, https://www.nia.nih.gov/health/hearing-loss-common-problemolder-adults. Accessed on 01 Apr 2022 2. Halawani SM, Al-Talhi AR, Khan AW (2013) Speech enhancement techniques for hearing impaired people: digital signal processing based approach. Life Sci J 10(4):3467–3476 3. Loizou PC (2013) Speech enhancement: theory and practice, 2nd edn. CRC, Boca Raton, FL 4. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(1):436–444 5. Xu Y, Du J, Dai L-R, Lee C-H (2015) A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 23(1):7–19 6. Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising auto encoder. Inter speech 7. Wang Y, Wang DL (2012) Boosting classification based speech separation using temporal dynamics. In: 13th proceedings on inter speech. ISCA Archive, pp 1528–1531 8. Wang Y, Wang DL (2012) Cocktail party processing via structured prediction. In: Proceedings of advances in neural information processing systems. Curran Associates, pp 224–32 9. Wang Y, Wang DL (2013) Towards scaling up classification based speech separation. IEEE Trans Audio Speech Lang Process 21(7):1381–1390 10. Healy EW, Yoho SE, Wang Y, Wang DL (2013) An algorithm to improve speech recognition in noise for hearing impaired listeners. J Acoust Soc Am 134(4):3029–3038 11. Bengio Y (2009) Learning deep architectures for AI. Foundat Trends Mach Learn 2(1):1–127 12. Erhan D, Courville A, Bengio Y, Vincent P (2010) Why does unsupervised pre-training help deep learning. J Mach Learn Res 11:625–660 13. Park SR, Lee J (2016) A fully convolutional neural network for speech enhancement. Available: https://arxiv.org/abs/1609.07132 14. Fu SW, Tsao Y, Lu X, Kawai H (2017) Raw waveform-based speech enhancement by fully convolutional networks. Available: https://arxiv.org/abs/1703.02205 15. Nivarthi PM, Nadendla SH, Kumar CS 9 Comparative study of deep learning techniques used for speech enhancement. In: 2021 IEEE 6th international conference on computing, communication and automation (ICCCA), pp 161–165 16. Zhao H, Zarar S, Tashev I, Lee C-H (2018) Convolutional-recurrent neural networks for speech enhancement. Available: https://arxiv.org/abs/1805.00579 17. Khandelwal R, Narayanan S, Li L (2006) Emergency alert service [Online]. Available: https:// patents.google.com/patent/US7119675B2/en 18. Ketabdar H, Polzehl T (2009) Tactile and visual alerts for deaf people by mobile phones. In: Proceedings 11th international ACM SIGACCESS conference on computer access, pp 253–254 19. Nossier SA, Rizk MRM, Moussa ND, Shehaby S (2019) Enhanced smart hearing aid using deep neural networks. Alex Eng J 58(2):539–550 20. Pisha L, Hamilton S, Sengupta D, Lee C-H, Vastare KC, Zubatiy T, Luna S, Yalcin C, Grant A, Gupta R, Chockalingam G, Rao BD, Garudadri H (2018) A wearable platform for research in augmented hearing. In: Proceedings 52nd Asilomar conference signals, system, computing, pp 223–227 21. Panahi IMS, Kehtarnavaz N, Thibodeau L (2018) Smartphone as a research platform for hearing study and hearing aid applications. J Acoust Soc Am 143(3):1738 22. Lin Y-C, Lai Y-H, Chang H-W, Tsao Y, Chang Y-P, Chang RY (2018) Smart hear: a smartphonebased remote microphone hearing assistive system using wireless technologies. IEEE Syst J 12(1):20–29 23. Chern A, Lai Y-H, Chang Y-P, Tsao Y, Chang RY, Chang H-W (2017) A smartphone-based multi-functional hearing assistive system to facilitate speech recognition in the classroom. IEEE Access 5:10339–10351 24. Chen F, Wang S, Li J, Tan H, Jia W, Wang Z (2019) Smartphone-based hearing self-assessment system using hearing aids with fast audiometry method. IEEE Trans Biomed Circuits Syst 13(1):170–179

19 Review on Recent Advances in Hearing Aids: A Signal Processing …

237

25. Bhat GS, Shankar N, Reddy CKA, Panahi IMS (2019) A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone. IEEE Access 7:78421–78433 26. Sun Z, Li Y, Jiang H, Chen F, Xie X, Wang Z (2020) A supervised speech enhancement method for smartphone-based binaural hearing aids. IEEE Trans Biomed Circuits Syst 14(5):951–960 27. Liebich S, Jax P, Vary P (2016) Active cancellation of the occlusion effect in hearing aids by time invariant robust feedback. Speech communication. In: 12th ITG symposium. Germany, pp 1–5 28. Liebich S, Vary P (2022) Occlusion effect cancellation in headphones and hearing devices—the sister of active noise cancellation. IEEE/ACM Trans Audio Speech Lang Process 30:35–48 29. Maxwell J, Zurek P (1995) Reducing acoustic feedback in hearing aids. IEEE Trans Speech Audio Process 4:304–313 30. Edwards BW (1998) Signal processing techniques for a DSP hearing aid. Proc IEEE ISCAS 6:586–589 31. Bustamante DK, Worrall TL, Williamson MJ (1989) Measurement and adaptive suppression of acoustic feedback in hearing aids. Proc IEEE ICASSP 3:2017–2020 32. Kaelin A, Lindgren A, Wyrsch S (1998) A digital frequency domain implementation of a very high gain hearing aid with compensation for recruitment of loudness and acoustic echo cancellation. Signal Process 64(1):71–85 33. Kates JM (1999) Constrained adaptation for feedback cancellation in hearing aids. J Acoust Soc Am 106(2):1010–1019 34. Kates JM (2008) Digital hearing aids. Plural Publishing 35. Ma G, Gran F, Jacobsen F, Agerkvist FT (2011) Adaptive feedback cancellation with bandlimited LPC vocoder in digital hearing aids. IEEE Trans Audio Speech Lang Process 19(4):677– 687 36. Spriet A, Moonen M, Wouters J (2010) Evaluation of feedback reduction techniques in hearing aids based on physical performance measures. J Acoust Soc Am 128(3):1245–1261 37. Douglas SC (1994) A family of normalized LMS algorithms. IEEE Signal Process Lett 1(3):49– 51 38. Siqueira MG, Alwan A (2000) Steady-state analysis of continuous adaptation in acoustic feedback reduction systems for hearing-aids. IEEE Trans Speech Audio Process 8(4):443–453 39. Spriet A, Doclo S, Moonen M, Wouters J (2008) Feedback control in hearing aids. In: Springer handbook of speech processing. Springer, Berlin/Heidelberg, pp 979–1000 40. Van Waterschoot V, Moonen M (2011) Fifty years of acoustic feedback control: state of the art and future challenges. Proc IEEE 99(2):288–327 41. Hellgren J, Forssell U (2001) Bias of feedback cancellation algorithms in hearing aids based on direct closed loop identification. IEEE Trans Speech Audio Process 9(8):906–913 42. Laugesen S, Hansen KV, Hellgren J (1999) Acceptable delays in hearing aids and implications for feedback cancellation. J Acoust Soc Am 105(2):1211–1212 43. Kates J (1990) Feedback cancellation in hearing aids. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing. NM, pp 1125–1128 44. Guo M, Jensen SH, Jensen J (2012) Novel acoustic feedback cancellation approaches in hearing aid applications using probe noise and probe noise enhancement. IEEE Trans Audio Speech Lang Process 20(9):2549–2563 45. Guo M, Elmedyb TB, Jensen SH, Jensen J (2012) On acoustic feedback cancellation using probe noise in multiple-microphone and single-loudspeaker systems. IEEE Signal Process Lett 19(5):283–286 46. Nakagawa CRC, Nordholm S, Yan WY (2014) Feedback cancellation with probe shaping compensation. IEEE Signal Process Lett 21(3):365–369 47. Schroeder MR (1964) Improvement of acoustic-feedback stability by frequency shifting. J Acoust Soc Am 36(9):1718–1724 48. Strasser F, Puder H (2015) Adaptive feedback cancellation for realistic hearing aid applications. IEEE/ACM Trans Audio Speech Lang Process 23(12):2322–2333

238

R. Vanitha Devi and Vasundhara

49. Guo M, Jensen SH, Jensen J, Grant SL (2012) On the use of a phase modulation method for decorrelation in acoustic feedback cancellation. In: Proceedings of the European signal processing conference (EUSIPCO). Bucharest, pp 2000–2004 50. Hellgren J (2002) Analysis of feedback cancellation in hearing aids with filtered-X LMS and the direct method of closed loop identification. IEEE Trans Speech Audio Process 10(2):119–131 51. Spriet A, Proudler I, Moonen M, Wouters J (2005) Adaptive feedback cancellation in hearing aids with linear prediction of the desired signal. IEEE Trans Signal Process 53(10):3749–3763 52. Tran LTT, Dam HH, Nordholm S (2016) Affine projection algorithm for acoustic feedback cancellation using prediction error method in hearing aids. In: Proceedings of the IEEE international workshop on acoustic signal enhancement (IWAENC), Xi’an 53. Rombouts G, Van Waterschoot T, Moonen M (2007) Robust and efficient implementation of the PEM-AFROW algorithm for acoustic feedback cancellation. J Audio Eng Soc 55(11):955–966 54. Tran LTT, Schepker H, Doclo S, Dam HH, Nordholm S (2017) Proportionate NLMS for adaptive feedback control in hearing aids. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, New Orleans, LA 55. Gil-Cacho JM, van Waterschoot T, Moonen M, Jensen SH (2012) Transform domain prediction error method for improved acoustic echo and feedback cancellation. In: Proceedings of the European signal processing conference (EUSIPCO). Bucharest, pp. 2422–2426 56. Tran LTT, Nordholm SE, Schepker H, Dam HH, Doclo S (2018) Two-microphone hearing aids using prediction error method for adaptive feedback control. IEEE/ACM Trans Audio Speech Lang Process 26(5):909–923 57. Spriet A, Rombouts G, Moonen M, Wouters J (2006) Adaptive feedback cancellation in hearing aids. Elsevier J Frankl Inst 343(6):545–573 58. Bernardi G, Van Waterschoot T, Wouters J, Moonen M (2015) An all-frequency-domain adaptive filter with PEM-based decorrelation for acoustic feedback control. In: Proceedings of the workshop on applications of signal processing to audio and acoustics (WASPAA). New Paltz, NY, pp 1–5 59. Bernardi G, Van Waterschoot T, Wouters J, Hillbratt M, Moonen M (2015)A PEM-based frequency-domain Kalman filter for adaptive feedback cancellation. In: Proceedings of the 23rd European signal processing conference (EUSIPCO). Nice, pp 270–274 60. Schepker H, Tran LTT, Nordholm S, Doclo S (2016) Improving adaptive feedback cancellation in hearing aids using an affine combination of filters. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, Shanghai 61. Tran LTT, Schepker H, Doclo S, Dam HH, Nordholm S (2018) Frequency domain improved practical variable step-size for adaptive feedback cancellation using pre-filters. In: Proceedings of the 2018 16th international workshop on acoustic signal enhancement (IWAENC). Tokyo, pp 171–175 62. Yang F, Wu M, Ji P, Yang J (2012) An improved multiband-structured subband adaptive filter algorithm. IEEE Signal Process Lett 19(10):647–650 63. Strasser F, Puder H (2014) Sub-band feedback cancellation with variable step sizes for music signals in hearing aids. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing. Florence, pp 8207–8211 64. Khoubrouy SA, Panahi IMS (2016) An efficient delayless sub-band filtering for adaptive feedback compensation in hearing aid. J Signal Process Syst 83:401–409 65. Pradhan S, Patel V, Somani D, George NV (2017) An improved proportionate delayless multiband-structured subband adaptive feedback canceller for digital hearing aids. IEEE/ACM Trans Audio Speech Lang Process 25(8):1633–1643 66. Nakagawa CRC, Nordholm S, Yan WY (2012) Dual microphone solution for acoustic feedback cancellation for assistive listening. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing. Kyoto, pp 149–152 67. Nakagawa CRC, Nordholm S, Yan WY (2015) Analysis of two microphone method for feedback cancellation. IEEE Signal Process Lett 22(1):35–39 68. Tran LTT, Nordholm S, Dam HH, Yan WY, Nakagawa CR (2015) Acoustic feedback cancellation in hearing aids using two microphones employing variable step size affine projection

19 Review on Recent Advances in Hearing Aids: A Signal Processing …

69.

70.

71.

72. 73.

74. 75.

76.

77.

78.

79.

80. 81. 82. 83. 84.

85.

239

algorithms. In: Proceedings of the IEEE international conference on digital signal processing (DSP). Singapore, pp 1191–1195 Albu F, Nakagawa R, Nordholm S Proportionate algorithms for two-microphone active feedback cancellation. In: Proceedings of the 23rd European signal processing conference (EUSIPCO). Nice, pp 290–294 Schepker H, Nordholm SE, Tran LTT, Doclo S (2019) Null-steering beamformer-based feedback cancellation for multi-microphone hearing aids with incoming signal preservation. IEEE/ACM Trans Audio Speech Lang Process 27(4):679–691 Schepker H, Nordholm S, Doclo S (2020) Acoustic feedback suppression for multi-microphone hearing devices using a soft-constrained null-steering beamformer. IEEE/ACM Trans Audio Speech Lang Process 28:929–940 Lee S, Kim IY, Park YC (2007) Approximated affine projection algorithm for feedback cancellation in hearing aids. Comp Methods Programs Biomed 87(3):254–261 Lee K, Baik YH, Park Y, Kim D, Sohn J (2011) Robust adaptive feedback canceller based on modified pseudo affine projection algorithm. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society. Boston, MA, pp 3760– 3763 Pradhan S, Patel V, Patel K, Maheshwari J, George NV (2017) Acoustic feedback cancellation in digital hearing aids: a sparse adaptive filtering approach. Appl Acoust 122:138–145 Thipphayathetthana S, Chinrungrueng C (2000) Variable step-size of the least-mean-square algorithm for reducing acoustic feedback in hearing aids. In: Proceedings of the IEEE AsiaPacific conference on circuits and systems. Tianjin, pp 407–410 Rotaru M, Albu F, Coanda H (2012) A variable step size modified decorrelated NLMS algorithm for adaptive feedback cancellation in hearing aids. In: Proceedings of the international symposium on electronics and telecommunications. Timisoara, pp 263–266 Tran LTT, Schepker H, Doclo S, Dam HH, Nordholm S (2016) Improved practical variable step-size algorithm for adaptive feedback control in hearing aids. In: Proceedings of the IEEE international conference on signal processing and communication systems, surfers paradise, QLD Albu F, Tran LTT, Nordholm S (2017) A combined variable step size strategy for two microphones acoustic feedback cancellation using proportionate algorithms. In: Proceedings of the Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). Kuala Lumpur, pp 1373–1377 Tran LTT, Schepker H, Doclo S, Dam HH, Nordholm SE (2017) Adaptive feedback control using improved variable step-size affine projection algorithm for hearing aids. In: Proceedings of the 2017 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). Kuala Lumpur, pp 1633–1640 Bhattacharjee SS, George NV (2021) Fast and efficient acoustic feedback cancellation based on low rank approximation. Signal Process 182:107984 Vasundhara, Panda G, Puhan NB (2016) A robust adaptive hybrid feedback cancellation scheme for hearing aids in the presence of outliers. Appl Acoust 102:146–155 Maheshwari J, George NV (2016) Robust modeling of acoustic paths using a sparse adaptive algorithm. Appl Acoust 101:122–126 Nordholm S, Schepker H, Tran LTT, Doclo S (2018) Stability-controlled hybrid adaptive feedback cancellation scheme for hearing aids. J Acoust Soc Am 143(1):150–166 Vasundhara, Mohanty BK, Panda G, Puhan NB (2018) Hardware design for VLSI implementation of acoustic feedback canceller in hearing aids. Circuits Syst Signal Process 37(4):1383–1406 Vasundhara, Puhan NB, Pandam G (2019) Zero attracting proportionate normalized sub band adaptive filtering technique for feedback cancellation in hearing aids. Appl Acoust 149:39–45

240

R. Vanitha Devi and Vasundhara

86. Tran LTT, Nordholm SE (2021) A switched algorithm for adaptive feedback cancellation using pre-filters in hearing aids. Audiol Res 11(3):389–409 87. Vanamadi R, Kar A (2021) Feedback cancellation in digital hearing aids using convex combination of proportionate adaptive algorithms. Appl Acoust 182:108175 88. Vasundhara (2021) Sparsity aware affine-projection-like filtering integrated with robust set membership and M-estimate approach for acoustic feedback cancellation in hearing aids. Appl Acoust 175:107778

Chapter 20

Hierarchical Earthquake Prediction Framework Dipti Rana , Charmi Shah , Yamini Kabra , Ummulkiram Daginawala , and Pranjal Tibrewal

1 Introduction Earthquakes are perhaps the most dangerous catastrophic events caused due to rock layer development or relocation of the earth’s structural plate. This steep development delivers a tremendous measure of energy that makes a sort of seismic wave. The vibration results that go through the earth’s surface cause harm for the people that live in the earthquake-prone regions by causing injuries and damage to life, damage to the roads and bridges, property damage, etc., and the economy in numerous ways [1]. The earth has four significant layers: the inner core, outer core, mantle, and crust. The crust and the highest point of the mantle make up a thin layer outside our planet. Yet, this layer is not safe. It consists of many pieces like a riddle covering the outside of the earth. That, yet these interconnecting pieces keep gradually moving around, sliding past each other and finding one another. We call these unique pieces structural plates, and the edges of the plates are known as the plate limits. The plate limits have many flaws, and the majority of the earthquakes throughout the planet happen on these cracks. Since the edges of the plates are not good, they stall out while the remainder of the plate continues to move. At long last, when the plate has moved far enough, the edges unstuck on one of the cracks, and there is a tremor [2]. Earthquakes generally happen suddenly and do not permit much time for individuals to respond. In this way, earthquakes can cause genuine wounds and death

D. Rana (B) · C. Shah (B) · Y. Kabra · U. Daginawala · P. Tibrewal Computer Science and Engineering Department, Sardar Vallabhbhai National Institute of Technology, Surat, India e-mail: [email protected] C. Shah e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_20

241

242

D. Rana et al.

tolls and annihilate immense structures and frameworks, prompting incredible financial misfortune. The prediction of earthquakes is essential for the well-being of the general public, but it has been demonstrated to be a complicated issue in seismology. It is indisputable that the seismologist community is the open research area to predict earthquakes due to its severity [2]. Some of the statistics about earthquakes are [3]: • An earthquake is considered a common phenomenon as it always happens somewhere, every day. An average of 20,000 earthquakes every year (about 50 a day) are recorded. • On a yearly basis, USGS detects about half a million earthquakes worldwide. • By The National Earthquake Information Center (NEIC) worldwide. However, millions of earthquakes also occur every year that are too weak to be registered. Japan and Indonesia are the two nations with the highest number of seismic tremors. • The deadliest tremor ever hit China back in 1556, killing around 830,000 individuals. • The rim of the Pacific Ocean called the “Ring of Fire” is a region where 80% of the planet’s earthquakes occur. • On May 22, 1960, in Chile, the largest earthquake in the world was recorded, having a magnitude of 9.5.

1.1 Types of Earthquakes The earth seems like a solid place from the outside, but it is rather dynamic underneath the surface of four primary layers: a solid crust, a nearly solid mantle, a liquid outer core, and a solid inner core. The relentless development causes stress on the earth’s crust and causes a variety of earthquakes which are tectonic earthquakes, fault zones, and volcanic earthquakes.

1.2 Impacts of Seismic Tremor on Earth While the edges of faults have remained together, and the remainder of the square is moving, the energy that would typically make the squares slide past each other is accumulated. When the power of the moving squares, at last, beats the grinding of the rough edges of the earth and it unstuck, all that stockpiled energy is delivered. The energy emanates outward from the cracks in seismic waves, just like waves on a lake. The seismic waves move the earth as they travel through it, and when the waves appear at the world’s surface, they move the ground and anything on it, similar to our homes and us [4].

20 Hierarchical Earthquake Prediction Framework

243

1.3 Challenges for the Earthquake Prediction These days, the earthquake cautioning framework is introduced in numerous distant and volcanic regions that acquire data about tremor attributes and effects on the encompassing area and might expand the number of survivors’ assumptions. Machine learning has been utilized to make progressions in data and forecast results. Earthquake prediction remained an underachieved objective due to numerous challenges. Some of them are as follows: • There is the absence of the volume of information for the successful prediction or the forecast procedure. • There is a lack of technology in precisely observing the stress changes, pressure, and temperature variations deep beneath the crust through scientific instruments, which eventually results in the unavailability of comprehensive data about seismic features. • The gap between seismologists and computer scientists for exploring the various avenues of technology to hunt is a challenging task. • Some predictions’ results have not given a precise forecast. • The heuristic “Many foreshocks (small magnitude earthquakes) lead up to the ‘main’ earthquake” is analyzed by many researchers, but not implemented.

1.4 Application The world has estimated that around 5,00,000 earthquakes occur each year and are increasing with time. As this figure is quite prominent and seismologists cannot find an appropriate earthquake prediction method till now, earthquakes are Therefore, our way of Earthquake prediction can help predict the magnitude and regions of the upcoming earthquakes that can help us take necessary and immediate precautions that can save the lives of millions of people and help reduce financial misfortune that occurs due to collapsing buildings and properties. This research aims to prepare a technique to predict the location and magnitude of the expected earthquakes with as high accuracy as possible. Thus, this research explored the algorithms and methodologies to predict earthquakes using machine learning and deep learning, various preprocessing techniques to convert data into proper format before feeding it to machine learning models using spatio-temporal relations in hierarchical way by considering the relative data.

244

D. Rana et al.

1.5 Organization of Paper This research organization is as follows: The second section lists the previous techniques used to predict earthquakes. The third section provides the proposed framework planned for this research. The fourth section displays all the simulations, graphs, and data analysis implemented. The final section concludes the research and mentions the possible future work for earthquake prediction.

2 Literature Review The study of earthquake prediction is essential for the benefit of the society. This section discusses the literature review of earthquake prediction techniques and lists the seismic parameters required for the prediction model. Most of the earthquake prediction works are classified into five categories. In the first place, a few works utilize numerical or statistical analysis to make earthquakes prediction. Kannan [5] predicted earthquake focal points as indicated by the hypothesis of the spatial connection, i.e., earthquakes happening inside an issue zone are identified with each other. Predictions are made by taking benefits of Poisson range identifier work (PRI), Poisson distributions, and so forth. Boucouvalas et al. [6] worked on the Fibonacci, Dual, and Lucas (FDL) strategy and proposed a plan to anticipate earthquakes by utilizing a trigger planetary angle date preceding a solid seismic tremor as a seed for the unfolding of the FDL time spiral. Notwithstanding, these works were tried with a highly restricted measure of information and did not give excellent outcomes (low achievement rate). Some works predicted earthquakes based on precursor signals studies. Unfortunately, it is difficult to conclude these precursor signals due to very insufficient data. Besides, these precursor signals alone usually cannot lead to accurate prediction results so not elaborated here. Later, machine learning has been utilized as an effective strategy to make earthquake predictions. Last et al. [7] compare several data mining and time series analysis methods, which include J48, AdaBoost, information network (IN), multi-objective info-fuzzy network (M-IFN), k-nearest neighbors (k-NN), and SVM, for predicting the magnitude of the highest anticipated seismic event based on earlier listed seismic events in the same region. Moreover, the prediction characteristics based on the Gutenberg-Richter ratio and some new seismic indicators are much more valuable than those traditionally used in the earthquake prediction literature, i.e., the average number of earthquakes in each region. Cortés et al. [8] analyzed the sensitivity of the present seismic indicators reported in the literature by adjusting the input attributes and their parameterization. The author notices that most machine learning methods make earthquake predictions based on seismic indicators, where only time-domain but not space domain correlations are examined.

20 Hierarchical Earthquake Prediction Framework

245

Fourth, lately, deep learning methods have been employed in earthquake prediction. Narayanakumar et al. [9] estimated the review of BP neural network techniques in predicting earthquakes. The outcomes explain that the BP neural network method can give better prediction accuracy for earthquakes of magnitude 3–5 than former works but still cannot have good results for earthquakes of magnitude 5–8 due to the shortage of adequate data. The author noticed that most of these neural network methods practice various features as input to predict the earthquakes’ time and/or magnitudes. But, very few of them study the spatial relations between earthquakes. The fifth way examines the spatio-temporal relations of the earthquake and predicts the location of the impending earthquakes. This method studies the sequence of earthquakes and recognizes long-term and short-term patterns of earthquakes in a particular area using the LSTM method. This recognizes patterns over a short period and a long period and can help increase the accuracy of the prediction. This work believes that the seismic properties of one location will connect with the seismic properties of another location. It also considers that seismic activities tend to have specific patterns over long periods.

2.1 Seismic Parameters From the literature review discovered, the variety of important features required to compute for the research work to improve the performance of the model. Following are the seismic parameters taken into consideration, time, mean magnitude of the last n events, the rate of the square root of seismic energy released dE, Gutenberg– Richter equation, summation of the mean square deviation from the regression line based on the Gutenberg-Richter inverse power law (η value), the difference between the maximum observed and the maximum expected earthquake magnitude, mean time between characteristic events among the last n events, deviation from the above mean time.

3 Proposed Framework The literature review shows many methods to predict earthquakes or the location of earthquakes. The proposed work considers the spatio-temporal relations to indicate the earthquake location and uses the seismic parameters to predict the magnitude range and exact magnitude value. The proposed framework is as shown in Fig. 1. The work is divided into two parts which include location prediction and magnitude prediction. Location prediction is done using LSTM network with two-dimensional input, and for magnitude prediction, multiclass classification is used to estimate if the earthquake will be mild, medium, or fatal, and ANN is used to estimate the exact magnitude of the earthquake.

246

D. Rana et al.

Fig. 1 Proposed hierarchical earthquake prediction framework

3.1 Dataset The dataset is downloaded from the USGS website [10] for the period 1971–2021 and over the range of 24°–40° latitude and 60°–76° longitude. The information of the earthquakes in the earthquake-prone regions of Afghanistan, Pakistan, Tajikistan, and the Northern part of India is recorded in the dataset.

3.2 Data Preprocessing for Location Prediction The area is split into four sub-regions of equal size. Every earthquake that occurs, together with the matching date and other information, is recorded in each moment of our original data. For the model inputs, we want our data to be in a weekly format, therefore, we consider weeks where one refers to the existence of one or

20 Hierarchical Earthquake Prediction Framework

247

more earthquakes in that particular week. One week has been set aside for the time slot. A one-hot vector is used as the input. Each vector has four indices for each position and 2628 vectors in total (total number of weeks in the mentioned time). If one or more earthquakes happened in the region during the week, the condition of each index is 1. This will be used as input to the dense neural network or LSTM model.

3.3 Location Prediction For estimating the location, the preprocessed vectors are fed as input to the dense/LSTM network. There will be X vectors where W represents the number of weeks considered of size R, where R is the number of regions considered, and in our case, it is 4. Xt will be the input at time t. Ht is the hidden state of the LSTM layer at that particular time. The architecture contains an LSTM layer that includes memory cells and will have Xt and Ht-1 as its input and Ht as its output. These memory cells will retain the information needed for prediction and remove the information not required on a short-term and long-term basis. This Ht will then be fed to the dropout layer, which prevents the model from overfitting. Followed by this, we are planning to have a dense network to learn the features which help us make predictions. Finally, the softmax function may be chosen to be the activation function and apply it to the output of the dense network. The output of this is a vector, Yt, containing four indexes. Every index corresponds to a location. Elements in this vector include 1 or 0 depending on the prediction results whether an earthquake is going to occur or not at time T + 1. Similarly, we have predicted using four models. These models include RNN, LSTM, LSTM+CNN, and finally, LSTM. The format of our data is given (transposed) in Fig. 2. For the rest of the networks, the processed network will be given input similarly but the LSTM layer would be modified accordingly into the respective model layers. The network will be highly dense, and the result will be given as output accordingly. Fig. 2 Data preprocessing for location prediction

248

D. Rana et al.

3.4 Preprocessing for Magnitude Prediction The collected data cannot be fed directly to the classifier or the ANN model. Hence, we need to preprocess the data before providing it into the classification or the ANN model. 1. The data contains a lot of null values which will be resolved in the near future. 2. The original data consists of rows representing every earthquake. In our case, we use weekly data to train our models, i.e., 1 in our dataset represents that there have been one or more earthquakes in the corresponding week. For this purpose, our data is calculated week-wise and then is given as inputs to our models. 3. Seismic parameters that we take into consideration need to be calculated as explained above for all the regions.

3.5 Magnitude Prediction The output of location prediction will give us the regions where there is a maximum probability of an earthquake taking place. These regions will be considered for the further stage of magnitude prediction. Different models will be trained for each location, i.e., four different locations in our case. Seismic parameters calculated on the past n earthquakes is given as input for the prediction of the magnitude of (n + 1)th earthquake. Magnitude prediction is performed in two ways.

3.6 Magnitude Range Classification Classification algorithms are used to perform multiclass classification and predict if the earthquake will be mild, medium, or fatal. Different algorithms like random forest classifier, SVM will be trained, tested, and compared on the same inputs.

3.7 Magnitude Estimation ANN is used to estimate the exact magnitude of the earthquake. The model will be trained using backtracking, and the output layer will have a linear activation function that will estimate the exact magnitude. If we have more than one location to predict for a given week, it will be done consequently by inputting the values into the corresponding models.

20 Hierarchical Earthquake Prediction Framework

249

4 Experimental Analysis 4.1 Dataset The dataset for earthquake prediction has been downloaded from the USGS website [10] of earthquake-prone regions of Afghanistan, Pakistan, Tajikistan, and the Northern part of India for the period 1970–2021 and over the range of 24°–40° latitude and 60°–76° longitude. The dataset includes the following columns, time, latitude, longitude, depth, mag, magType, nst, gap, dmin, rms, net, ID, updated, place, type, horizontalError, depthError, magError, magNst, status, locationSource, and magSource.

4.2 Data Preprocessing for Location Prediction The collected data cannot be fed directly to the dense/LSTM model, and hence, preprocessing of the data is required. For which, the region is divided into four equal sub-regions. Each instant of our original data consists of every earthquake occurring with the corresponding date and other information. We want our data to be in a weekly format for the model inputs, and hence, we achieve that by considering weeks where one corresponds to the presence of one or more earthquakes in that particular week. The data is filtered and processed as needed for the input for the dense/LSTM model. The time slot is defined to be one week. The input is defined as a one-hot vector. Each vector contains four indexes corresponding to each location and 2628 vectors (total number of weeks in the mentioned time). The state of each index is 1 if one or more earthquakes have occurred in the region during the particular week. This will be fed into the dense/LSTM model as input. The data is shown in Fig. 2.

4.3 Location Prediction For estimating the location of the earthquake, we have built various time series models. This can be done in two ways. One-dimensional input corresponding to every region and two-dimensional input in which the whole dataset would be fed into the model for training. In one-dimensional input, we create different vectors for each region. In twodimensional data, we include all four regions together and give that as input to our model. Our data consists of weeks consisting of information of whether earthquakes occurred or not. The time frame we have taken is of 100 events of earthquake. On the basis of the past 100 events, the 101st event is predicted. We have used DNN, LSTM, LSTM+CNN, and RNN models for prediction. We have used softmax activation function for all these models and the learning rate is

250

D. Rana et al.

Fig. 3 Region wise preprocessed data

set to 0.01 with optimizers such as Adam for dense network and RMSProp for the rest of the models. The results are summarized in the below given table.

4.4 Data Preprocessing for Magnitude Prediction The collected raw data cannot be fed directly into the classification model. Hence, preprocessing of the raw data into an understandable format is required. First, the data is divided region wise into four parts. Different seismic parameters are calculated using the respective formulae for each region. Region wise preprocessed data are shown in Fig. 3.

4.5 Magnitude Range Classification For estimating the magnitude range of the earthquake occurring in a particular region, the preprocessed data for each of the four regions is fed as input to classification models. This input data will predict if the earthquake’s magnitude will be mild, moderate, or fatal with respect to the particular range of magnitude. If the earthquake’s magnitude is less than 4.5, it is considered a mild earthquake, if the earthquake’s magnitude lies between 4.5 and 5.9, it is considered a moderate earthquake, and if the magnitude is greater than 5.9, it is considered a fatal earthquake. The classifier SVM is used for the prediction of earthquake’s magnitude range. Calculated precision and recall in following way: • Micro average—Calculated using the sum of total true positives, false negatives, true negatives, and false positives. • Macro average—Calculated using the average of precision and recall of each label in the dataset.

20 Hierarchical Earthquake Prediction Framework

251

Table 1 Loss after the given epochs of respective models Model used

Loss after 25 epochs

Loss after 50 epochs

Dense

0.67

0.56

RNN

0.52

0.30

LSTM + CNN

0.51

0.42

LSTM

0.51

0.39

Table 2 Result obtained for SVM Region

Accuracy (%)

Precision (%)

Recall (%)

Micro

Macro

Weighted

Micro

Macro

Weighted

1

89

89

44

78

89

50

89

2

82

82

52

81

82

51

82

3

96

96

57

96

96

50

96

4

78

78

50

76

78

49

78

All

83

83

83

83

83

61

83

Table 3 Result obtained for Naïve Bayes Region

Accuracy (%)

Precision (%)

Recall (%)

Micro

Macro

Weighted

Micro

Macro

Weighted

1

89

89

44

78

89

50

89

2

59

59

36

64

59

63

59

3

96

96

57

95

96

53

96

4

64

64

38

62

64

38

64

All

72

72

43

71

72

42

72

• Weighted average—Calculated by considering the proportion of each label in the dataset to calculate the average of precision and recall of each label. The results in Table 2 shows that the justifiable accuracy is achieved for the magnitude range prediction using SVM. From the analysis of the result, it is found that compared to the data of region 3, data is imbalanced for the regions 1, 2, and 4 and requires the balancing of the data.

4.6 Magnitude Estimation To calculate the estimated magnitude of the earthquake occurring in a particular region, the preprocessed data for each of the four regions is fed as input to the artificial

252

D. Rana et al.

Table 4 Result obtained for random forest Region

Accuracy (%)

Precision (%)

Recall (%)

Micro

Macro

Weighted

Micro

Macro

Weighted

1

91

91

96

92

91

62

91

2

83

82

54

82

83

51

83

3

94

96

65

94

94

40

94

4

81

78

53

80

81

51

81

All

85

85

72

85

85

56

85

Table 5 Result obtained for ANN

Region

MSE (%)

MAE (%)

1

0.11

0.27

2

0.25

0.38

3

0.21

0.36

4

0.15

0.29

All

0.18

0.32

neural network regression model. This model will predict the numeric value of the magnitude.

5 Conclusions This research provided the various reasons that are causing earthquakes and the significant factors affecting them to understand the different parameters that can act as a backbone to estimating earthquakes. Here, we have reviewed various research papers that acquaint us with the multiple methodologies undertaken to aid this prediction. The research proposed a hierarchical framework for more accurate prediction and implemented location prediction using dense neural network for time series model and justifiable magnitude range prediction as well as magnitude value prediction which is achieved by feeding the preprocessed data to the SVM and to ANN model, respectively. In the future, the work will be carried out to implement the LSTM model for location prediction and improve the framework and generalize the model to improve accuracy in predicting the exact magnitude of future earthquakes.

20 Hierarchical Earthquake Prediction Framework

253

References 1. Murwantara IM, Yugopuspito P, Hermawan R (2020) Comparison of machine learning performance for earthquake prediction in Indonesia using 30 years historical data. TELKOMNIKA (Telecommun Comput Electron Control) 18:1331 2. Wang Q, Guo Y, Yu L, Li P Earthquake prediction based on spatiotemporal data mining: an LSTM network approach, IEEE 3. Trans Emer Topics Comput 8:148–158 (2020) 4. Earthquake Statistics and Facts for 2021|PolicyAdvice, Feb 2021 5. Kannan S (2014) Improving innovative mathematical model for earthquake prediction. Eng Fail Anal 41:89–95. https://doi.org/10.1016/j.engfailanal.2013.10.016 6. Boucouvalas AC, Gkasios M, Tselikas NT, Drakatos G (2015) Modified-Fibonacci-dual-lucas method for earthquake prediction. In: Third international conference on remote sensing and geoinformation of the environment (RSCy2015), vol 9535, 95351. https://doi.org/10.1117/12. 2192683 7. Last M, Rabinowitz N, Leonard G (2016) Predicting the maximum earthquake magnitude from seismic data in Israel and its neighboring countries. PLoS ONE 11:e0146101 8. Cortés GA, Martínez-Álvarez F, Morales-Esteban A, Reyes J, Troncoso A (2017) Using principal component analysis to improve earthquake magnitude prediction in Japan. Logic J IGPL jzx049:1–14. https://doi.org/10.1093/jigpal/jzx049 9. Narayanakumar S, Raja K (2016) A BP artificial neural network model for earthquake magnitude prediction in Himalayas, India. Circuits Syst 07(11):3456–3468 10. Search Earthquake Catalog, https://earthquake.usgs.gov/earthquakes/search/

Chapter 21

Classification Accuracy Analysis of Machine Learning Algorithms for Gearbox Fault Diagnosis Sunil Choudhary, Naresh K. Raghuwanshi, and Vikas Sharma

1 Introduction Monitoring of machine condition through vibration measurement is a very popular method in industries. The critical rotating machineries like gears are one of the most important critical components of gearboxes that are used in aircrafts, automobiles, machining tools, etc. The gearbox failure of these rotating machines is the most common reason for machine breakdown. Therefore, the early stage fault detection of gearbox is the main task to avoid sudden failure. It is studied that a lot of work has been completed for fault diagnosis of gearboxes by using conventional signal processing techniques. Many researchers are working on artificial intelligence techniques for machine fault diagnosis. The statistical features’ extraction from vibration data of gearbox was studied based on time domain, frequency domain, time– frequency domain analysis [1–4]. Machine learning algorithms such as decision tree classification (DT) [5], fault detection using proximal support vector machine (PSVM) and artificial neural network (ANN) [6], support vector machines (SVM) [7], ANN and SVM with genetic machine algorithms [8], ANN [9, 10], wavelet transform (WT) and ANN [11, 12] are the common algorithms that were used by many researchers. Jedlinski and Jonak [7] used the SVM algorithm which shows better classification accuracy in present days. SVM is a popular technique in the current decade for fault diagnosis. ANN is widely used by researchers for gearbox fault diagnosis with single as well as multifaults in the gear tooth [10]. Another widely S. Choudhary · N. K. Raghuwanshi (B) Department of Mechanical Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India e-mail: [email protected] V. Sharma Department of Mechanical-Mechatronics Engineering, The LNM Institute of Information Technology, Jaipur, Rajasthan, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_21

255

256

S. Choudhary et al.

used algorithm for gearbox fault detection is SVM [7, 8, 13, 14]. Random forest classifier depicted very good detection accuracy which demonstrates the feasibility and effectiveness of machine learning algorithms for gearboxes [15]. Semi-supervised graph-based random forest is also applied for gearbox fault diagnosis [16]. Wei et al. [17] used random forest to diagnose faults in the planetary gearbox. Decision tree classifier (J48 type) is also capable of detecting faults present in the spur gearbox with good accuracy [18]. In the present work, the comparison of some of the above and additional machine learning algorithms such as Naive Bayes, decision tree, K-nearest neighbor (KNN), random forest, and SVM is carried out, and these algorithms are applied on vibration data of different classes of gear faults (e.g., chipped, eccentric, and broken). This provides us a better idea for examining better machine learning algorithms for gear faults (e.g., chipped, eccentric, and broken or combined two faults simultaneously) present in the gearbox with accuracy level of classification.

2 Experimental Vibration Data Collection 2.1 Experimental Gearbox Test Setup The data used here is the vibration data of the generic gearbox. This vibration data is imported from the 2009 PHM Society Data challenge [19]. The block diagram of the gearbox is shown in Fig. 1. In this gearbox model, two types of gearbox are used, one is spur gears, and another is helical gears. The spur gearbox has input shaft, idler shaft, and output shaft. Spur Gear 1 (G1) with 32 teeth is mounted on the input shaft. Spur Gear 2 (G2) with 96 teeth and Spur Gear 3 (G3) with 48 teeth are mounted on the idler shaft. Spur Gear 4 (G4) with 80 teeth is mounted on the output shaft. The helical gearbox also has input shaft, idler shaft, and output shaft. Helical Gear 1 (G1) with 16 teeth is mounted on the input shaft. Helical Gear 2 (G2) with

Fig. 1 Generic gearbox block diagram

21 Classification Accuracy Analysis of Machine Learning Algorithms … Table 1 Spur gearbox fault conditions

Table 2 Helical gearbox fault conditions

257

Condition

Description

Spur Gear 1 (SG1) Spur Gear 2 (SG2) Spur Gear 3 (SG3) Spur Gear 4 (SG4) Spur Gear 5 (SG5)

Good or healthy condition G1 chipped and G3 eccentric G3 eccentric G3 eccentric and G4 broken G1 chipped, G3 eccentric, and G4 broken

Condition

Description

Helical Gear 1 (HG1) Helical Gear 2 (HG2) Helical Gear 3 (HG3) Helical Gear 4 (HG4) Helical Gear 5 (HG5)

Good or healthy condition G3 chipped G3 broken Imbalance at input shaft G3 broken

48 teeth and Helical Gear 3 (G3) with 24 teeth are mounted on an idler shaft. Helical Gear 4 (G4) with 40 teeth is mounted on the output shaft. B1, B2, B3, B4, B5, and B6 represent the bearings of the gearbox. In Fig. 1, A1 and A2 are the vibration data collected by accelerometers at input and output sides, respectively. Tm is the tachometer signal (10 pulses per revolution) collected at the input shaft. Gear reduction ratio of gearbox setup is 5:1. Two Endevco accelerometers having sensitivity of 10 mV/g (where g = 9.81 m/s2 ) are mounted on the gearbox casing at input and output sides. It consists of ±1% error. Resonance occurs above 45 kHz frequency. The sampling frequency rate is 66,666.67 Hz or 200 kHz. To evaluate the outcome of different faults on the gearbox vibration signal, results have been completed for six fault conditions of the gearbox for both spur and helical type. The six different fault conditions of the gearbox are considered as shown in Tables 1 and 2. The operating speed of the shaft is varying in continuous increment of 5 Hz by starting from 30 to 50 Hz with high- and low-load conditions. Two samples are collected at high-load condition, and two are at low-load condition with same frequency speed of the shaft. Initially in the first data of vibration signal, the signal readings depict the healthy condition of the spur and helical gearbox. After that, different faults are created in different gears at different locations of the gearbox, and data is collected, respectively, as depicted in Tables 1 and 2.

2.2 Methodology and Procedure The experimental gearbox setup consists of multiple or individual faults in different gears of spur and helical gearbox. The different components of the gearbox used such as gearbox housing, bearings, shaft (input and output sides), gears (total gear four is used in each type of gearbox), so the vibration generated due to gear faults interaction is shifted from gears to shaft axis, shaft axis to bearings which are mounted

258

S. Choudhary et al.

on gearbox housing and then it is transferred from bearings to gearbox housing. This systematic connectivity generates the vibration signal which is measured with the help of two accelerometers (input and output sides). The vibration signals are generated due to input, idler and output shaft rotation, gear tooth meshing, impact, speed variation, and bearings. The time-domain signal data is measured using two accelerometers at regular intervals. For recognizing the spur and helical gearbox faults in the gearbox model, raw vibration data is collected from a double-stage spur and helical gearboxes and with different fault conditions which are shown in Tables 1 and 2. In faulty condition of gearbox, minimum one and maximum five gear faults are existing in the gearbox. After the data collection, each sample data is listed in a .csv format file. This .csv file is used for analysis by using five machine learning algorithms in Python script. There are four features used for classification of faults, and one feature is generated for target class where 0 indicates healthy class of vibration data and 1 indicates for faulty class of data. Then by applying machine learning algorithms, final accuracy results of classification are explained with each class of faults present in the gearbox.

3 Classification Accuracy Selection Process The data is labeled in three columns where first column and second column tell about the vibration data collected by accelerometer mounted at input side and output side, respectively, and third column is for data which is measured by tachometer mounted at input side. Fourth column is generated for frequency in the Python script for helping in classification of faults. The flow diagram for analysis is given in Fig. 2.

4 Machine Learning Algorithm In this work, five machine learning algorithms are used for gearbox fault diagnosis. The algorithms are Naive Bayes, KNN, decision tree, random forest, and SVM. Brief explanations about these algorithms are given in next subsections.

4.1 Naive Bayes Machine Learning Classifier The gearbox healthy and faulty data is classified using Naive Bayes classifier. This classifier is based on Bayes theorem of probability. It is simple and fast in calculation but yet needs to be effective in results for classification. It works well where the vibration signals are noisy. If the target dataset contains large numbers of numeric features, then reliability of model by Naive Bayes classifier is limited.

21 Classification Accuracy Analysis of Machine Learning Algorithms …

259

Fig. 2 Flow diagram for classification accuracy selection process

4.2 KNN Machine Learning Classifier KNN is easy to implement, simple, versatile for use, and one of the best machine learning algorithms. KNN is constructed on a feature similarity method for fault diagnosis. KNN is a lazy learner machine algorithm. In this classifier, the gearbox model structure is defined by the data points itself. Lazy learning machine algorithm for classification of fault detection means this algorithm does not require any training dataset points for generating a model for prediction of new class data. All training dataset points are used in the phase of testing for fault classification. In the KNN algorithm, K is a number which is the nearest neighbor for classification. It does small work in training of datasets and large amounts of work in the

260

S. Choudhary et al.

testing stage to make a classification accurately. It stores the training data points instantly and behaves on upcoming data accordingly. KNN shows better results with a small number of classification features compared to a large number of classification features. In this algorithm, when dimension of dataset is increased by increasing classification features generates the issues of overfitting for the recorded datasets.

4.3 Decision Tree Machine Learning Classifier The decision tree machine learning algorithm is one of the greatest algorithms for classification. As the name indicates, it generates a database model in the form of a tree-like structure. Its grouping exactness for features is focused with different strategies. This algorithm can be used for multi-dimensional analysis with multiple classes of fault detection. The objective of decision tree machine learning is to create a model which predicts the target class value of the outcome results constructed on the new input variables in the classification feature vectors. Feature vector is defined by each node in the decision tree classifier. The output variable of result is evaluated by succeeding a proper path that initiates from the root and is directed by the values of the input variable. A decision tree is usually represented in the format as shown in Fig. 3. Each internal node of decision tree classifier (indicated by boxes) tests an attribute (indicated as A and B within the boxes). Each branch of this algorithm corresponds to an attribute value which is indicated by P and Q in the above figure. Each leaf node defines the class of fault. The first node is known as the root node. Branches generated from the root node are known as leaf nodes. Here, A is the root node, and B is the branch node. P and Q are known as leaf nodes. For small numbers of data points, not much mathematical and computational estimator is required to understand the model. It works well in most cases. It has ability to handle small and large training datasets. It defines a definite clue from which features are more helpful for classification. Often, it is biased toward the features of classification which have more number of levels. Large trees are complex Fig. 3 Block diagram of decision tree classifier

21 Classification Accuracy Analysis of Machine Learning Algorithms …

261

to define the model properly. There is an overfitting issue related to this classifier. So for overcoming this problem, random forest classifier is developed which resolves most of the overfitting issues in dataset evaluation.

4.4 Random Forest Machine Learning Classifier Random forest classifier is also a supervised machine learning algorithm. It is capable of analysis of classification as well as regression. Random forest has the maximum flexibility and is simple to use as a machine learning algorithm. It is a combining classifier that combines many decision tree classifiers to overcome the overfitting issues of the datasets. It is using a large number of tree datasets in random forest for training of models and generates features for models. After that, majority vote is applied to get the combined output of the different tree datasets. This classifier works efficiently on more dataset points for fault classification.

4.5 Support Vector Machine (SVM) SVM is able to do linear classification as well as regression. It is completely based on the concept of surface creation which is called a hyperplane. It creates a boundary between the datasets plotted in the multi-dimensional feature space for classification. The output predicts the class of new upcoming data according to suitability of predefined class at the time of training the algorithm. It can create an N-dimensional hyperplane that assigns the new data class into one of the two output classes (healthy and faulty class). SVM can work for classification problems of two classes or multiclasses. Firstly, the 70% of dataset is trained which is classified into two classes, i.e., healthy and faulty; after that, it creates the structure model. The key task of an SVM machine algorithm is to predict a class from which a new upcoming data point fits in. An SVM creates a graphical map of all the classified data with the maximum margins available between the two classes as shown in Fig. 4.

5 Fault Detection Results and Discussions The database is created by conducting different faulty and healthy tests for the gearbox. The final results of classification accuracy by using different classifiers are analyzed in Python script by using Jupyter notebook. All machine learning algorithm classifiers predict different accuracy levels of classification with different faults in gears which are already listed in Tables 1 and 2. When features for fault detection are increased, then accuracy of Naive Bayes classifier also increases and vice-versa. The classification accuracy assessment of Naive Bayes classifier, KNN classifier,

262

S. Choudhary et al.

Fig. 4 SVM algorithm classification

decision tree classifier, random forest classifier, and support vector machine (SVM) is evaluated with a similar number of data samples for training and testing. It can be well known that the training and testing accuracies for classification are superior to different classifiers with different classes of fault in gears. In the case of spur gearbox, Naive Bayes classifier shows best accuracy of fault classification in all gear faults (chipped, eccentric, and broken) by leaving gear fault type 1 (GFT-1). The results are given in Table 3. SG3, SG4, and SG5 cases of Naive Bayes classifier show accuracy as 54.80%, 61.55%, and 66.34%, respectively. However, the decision tree classifier shows best accuracy for the SG2 case. The SVM classifier remains almost consistent with gear fault type. Similarly in the case of helical gearbox, Naive Bayes classifier shows best classification accuracy in most gear faults as given in Table 4. Best classifier is highlighted in bold for each class of fault in gears. Again the SVM classifier remains almost consistent with gear fault type. For HG2, HG3, and HG5, the accuracy of the Naive Bayes classifier algorithms for gearbox fault detection is 52.96%, 52.71%, and 64.16%, respectively. Random forest classifier accuracy is observed best for HG4 case which is 52.82%. Table 3 Classification accuracy of model with different classifiers of spur-type gearbox S. No.

Model class Gear fault Naive of spur gear type (GFT) Bayes 1 (SG1) classifier versus (in %)

KNN classifier (in %)

Decision tree classifier (in %)

Random forest classifier (in %)

SVM classifier (in %)

1

SG2

GFT-1

55.60

53.12

55.86

54.05

49.76

2

SG3

GFT-2

54.80

52.28

54.26

52.93

49.86

3

SG4

GFT-3

61.55

58.23

59.50

60.06

50.02

4

SG5

GFT-4

66.34

63.80

65.41

66.21

49.14

21 Classification Accuracy Analysis of Machine Learning Algorithms …

263

Table 4 Classification accuracy of model with different classifiers of helical-type gearbox S. No.

Model class of helical gear 1 (HG1) versus

Gear fault Naive type (GFT) Bayes classifier (in %)

KNN classifier (in %)

Decision tree classifier (in %)

Random forest classifier (in %)

SVM classifier (in %)

1

HG2

GFT-1

52.96

51.59

52.58

52.75

49.18

2

HG3

GFT-2

52.71

51.83

51.77

51.93

50.75

3

HG4

GFT-3

51.51

51.68

53.71

52.82

49.88

4

HG5

GFT-4

64.15

60.04

63.45

62.74

50.49

6 Conclusions All machine learning algorithms which are used for gearbox fault detection have been established with the help of time-domain vibration signals. The gearbox fault diagnosis is done by using Naive Bayes, KNN, decision tree, random forest, and SVM algorithms and compares the results of accuracy. The best accuracy of classification is found by using Naive Bayes classifier for most of the cases of the gearbox faults. However, SVM is observed as a reliable algorithm due to its consistency in the results. The present work based on Naive Bayes, KNN, and random forest is employed for gearbox fault diagnosis and found that these algorithms can also be used for gearbox fault diagnosis.

References 1. Laxmikant S, Mangesh D, Chaudhari B (2018) Compound gear-bearing fault feature extraction using statistical features based on time-frequency method. Measurement 125:63–77 2. Loutas TH, Sotiriades G, Kalaitzoglou I, Kostopoulos V (2009) Condition monitoring of a single- stage gearbox with artificially induced gear cracks utilizing on-line vibration and acoustic emission measurements. Appl Acoust 70:1148–1159 3. Assaad B, Eltabach M, Antoni J (2014) Vibration based condition monitoring of a multistage epicyclic gearbox in lifting cranes. Mech Syst Signal Process 42:351–367 4. Li Y, Ding K, He G, Lin H (2016) Vibration mechanisms of spur gear pair in healthy and fault states. Mech Syst Signal Process 81:183–201 5. Saravanan N, Ramachandran KI (2009) Fault diagnosis of spur bevel gear box using discrete wavelet features and decision tree classification. Expert Syst Appl 36:9564–9573 6. Saravanan N, Kumar VNS, Siddabattuni, Ramachandran KI (2010) Fault diagnosis of spur bevel gear box using artificial neural network (ANN) and proximal support vector machine (PSVM). Appl Soft Comput 10:344–360 7. Jedlinski L, Jonak J (2015) Early fault detection in gearboxes based on support vector machines and multilayer perceptron with a continuous wavelet transform. Appl Soft Comput 30:636–641 8. Samanta B (2004) Gear fault detection using artificial neural networks and support vector ma chines with genetic algorithms. Mech Syst Signal Process 18:625–644

264

S. Choudhary et al.

9. Rafiee J, Arvani F, Harifi A, Sadeghi MH (2007) Intelligent condition monitoring of a gear box using artificial neural network. Mech Syst Signal Process 21:1746–1754 10. Dhamande LS, Chaudhari MB (2016) Detection of combined gear-bearing fault in single stage spur gear box using artificial neural network. Proc Eng 144:759–766 11. Saravanan N, Ramachandran KI (2010) Incipient gear box fault diagnosis using discrete wave let transform (DWT) for feature extraction and classification using artificial neural network (ANN). Expert Syst Appl 37:4168–4181 12. Wu JD, Chan JJ (2009) Faulted gear identification of a rotating machinery based on wavelet transform and artificial neural network. Expert Syst Appl 36:8862–8875 13. Bordoloi DJ, Tiwari R (2014) Support vector machine based optimization of multi-fault classi fication of gears with evolutionary algorithms from time-frequency vibration data. Meas J Int Meas Confed 55:1–14 14. Bordoloi DJ, Tiwari R (2014) Optimum multi-fault classification of gears with integration of evolutionary and SVM algorithms. Mech Mach Theory 73:49–60 15. Zarnaq MH, Omid M, Aghdamb EB (2022) Fault diagnosis of tractor auxiliary gearbox using vibration analysis and random forest classifier. Inform Process Agric 9:60–67 16. Chen S, Yang R, Zhong M (2021) Graph-based semi-supervised random forest for rotating machinery gearbox fault diagnosis. Control Eng Pract 117:104952 17. Wei Y, Yang Y, Xu M, Huang W (2021) Intelligent fault diagnosis of planetary gearbox based on refined composite hierarchical fuzzy entropy and random forest. ISA Trans 109:340–351 18. Gunasegaran V, Muralidharan V (2020) Fault diagnosis of spur gear system through decision tree algorithm using vibration signal. Mater Today Proc 22:3232–3239 19. PHM Data Challenge Homepage, https://c3.ndc.nasa.gov/dashlink/resources/997/. Last Accessed on 1 Feb 2022

Chapter 22

Stock Price Forecasting Using Hybrid Prophet—LSTM Model Optimized by BPNN Deepti Patnaik, N. V. Jagannadha Rao, and Brajabandhu Padhiari

1 Introduction Stock market is the group of markets involving regular selling, purchasing, and issuance of shares of the different public held companies through institutionalized formal exchange. Such activities are operated in a market place under a defined set of regulations [1]. The shares are traded in the stock market through stock exchange. The stock exchange being the designated market for trading ensures complete transparency in the transactions [2]. Stock price depends upon various parameters such as financial situations, administrative decisions, market fluctuations, and pandemic effect. [3] Again, the stock prices are dynamic, nonlinear, and noisy in nature [4]. Thus, the prediction of stock prices is a complex issue because of irregularities, volatility, changing trends, noisy, complex features, etc. [5]. Time series is the collection of data with equal spacing and should be in sequence. In time series, the successive observations are usually not independent. Thus, any variable which changes over time is included in time series analysis [6]. Stock prices basically vary with respect to time, thus can be tracked for short term (every business day) or long term such as every month over the course of the last 18–20 years. The highlight of the stock market is the seasonal trend [7]. It is observed from the literature that, Ariyo et al. predicted autoregressive integrated moving average (ARIMA) model for New York Stock Exchange (NYSE) and Nigeria Stock Exchange (NSE) data and proved that ARIMA model is reasonably well with emerging forecasting techniques in short-term predictions [8]. Chen and Chen proposed a fuzzy time series model for stock prediction using linguistic value D. Patnaik (B) · N. V. J. Rao GIET-University, Gunupur, Rayagada, Odisha 765 022, India e-mail: [email protected] D. Patnaik · B. Padhiari IIMT Bhubaneswar, Khurda, Odisha 751 003, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_22

265

266

D. Patnaik et al.

of data. For nonlinear and dynamic datasets, these methods are widely accepted [9]. Elsir and Faris employed regression, artificial neural networks, and support vector machines (SVM) models for forecasting stock data [10]. Bharambe and Dharmadhikari showed that the stock prediction depends on declaration of dividends, estimations of future incomes, management changes, and so on [11]. Shrivas and Sharma projected various machine learning techniques, which involve SVM, regression technique, and artificial neural network (ANN) to analyze and predict the stock trends for Bombay Stock Exchange (BSE) stock prices. They have proved SVM is the best technique among all [12]. Xu et al. use historical price data to forecast the stock price value based on a deep neural network [13]. Li et al. proposed deep learning algorithms to predict market impacts [14]. For right investments with high profit, More et al. projected neuro-linguistic programming (NLP) approach for finding the stock charts [15]. After having these literature studies, it is found that the stock price is a time series stochastic process, and its stochastic nature introduces variation in volatility and thus risk [16]. Although causes are known, its quantification is very difficult [17]. Thus, in this work, a hybrid model which is a combination of prophet and long short-term memory (LSTM) is proposed to overcome these issues. Again at the end, forecasting is further optimized by backpropagation neural network (BPNN). The performance parameters used here are root mean square error (RMSE), mean absolute percentage error (MAPE), and mean average error (MAE).

2 Prophet Model Prophet or ‘Facebook Prophet’ is developed by Facebook for forecasting additive time series model, where nonlinearity is fit with seasonality, daily, weekly as well as holiday effects. It is a novel model for missing data, shifts in trend and handles outliers too. The mathematical dynamics for it is given by [18], Y (t) = g(t) + s(t) + h(t) + e(t)

(1)

where g(t) represents the nonlinear function, s(t) represents the seasonality changes, h(t) represents the holidays or irregular schedules, e(t) is the error value. The modeling flowchart of the prophet model is given in Fig. 1.

3 LSTM Model LSTM is a special kind of recurrent neural network (RNN), which can learn from the long-term dependencies in the data. It is achieved by a combination of 4 layers interacting with each other. Hence overcomes the vanishing gradient problem. LSTM consists of 3 types of gates within a unit those are (i) Forget gate, which decides what

22 Stock Price Forecasting Using Hybrid Prophet—LSTM Model …

267

Parameter tuning

Performance Measures

Prophet Model

Stock price values

Final Predictions

Fig. 1 Flowchart of prophet model

information has to throw away from network (ii) Input gate, which decides which values have to be updated from the memory (iii) Output gate, which decides the output based on input and the memory block It works like a mini state machine, where the weights of the gates are learned during the training process. The internal short-term memory is given by [19], h t = σ (U xt + V h t−1 )

(2)

where ht is the model internal short-term memory, σ is the sigmoid function, U is the weight to input, x t is the sequential input, V is the weight to the short-term memory. The output is given by [20], Yt = W h t

(3)

where w is the weight to the output.

4 MAE, RMSE, MAPE 4.1 Mean Absolute Error (MAE) For a given dataset, the mean absolute error is given by [21], MAE =

N 1  abs(ei ) N i=1

(4)

where ei is the error (difference between original and predicted value), N is the total number of samples. It is also called mean absolute deviation. It shows the overall error in the forecasting of stocks.

268

D. Patnaik et al.

4.2 Root Mean Square Error (RMSE) Root mean square error is a balanced error measure and is a very effective parameter in the accuracy measurement of stock forecasting. It is given by [21],  RMSE =

N 1  2 e N i=1 i

(5)

4.3 Mean Absolute Percentage Error (MAPE) Mean absolute percentage error is also called as loss function in forecasting and is given by [21], N MAPE =

i=1

abs N

 ei  O

× 100%

(6)

where ei is the error (difference between original and predicted value), N is the total number of samples, ‘o’ is the original sample value.

5 Proposed Method At first, the historical data were collected from the Yahoo Finance Web site for the last 7 years, from 15.04.2015 to 14.04.2022. This data include the closing prices of NIFTY 50, S&P 500, Nikkei 225. To visualize the trend, the price index of these stock exchanges is shown in graphs (Figs. 2, 3, and 4). The flowchart of the proposed model is given in Fig. 5. Generally, the stock price value is the composition of linear and nonlinear components. It is given to the input of the prophet model that is designed to have automatic tuning of parameters without any prior knowledge. The prophet model is robust against missing data, holiday data, etc. Thus, the data interpolation is not required. The prophet library of Python 3.10 is used here for simulation purposes. Seasonality is addressed here by using Fourier series. The default values of Fourier components used here are 10 and 3 for yearly and weekly seasonality, respectively. Out of 7 years of data, 6 years are used for training purposes, and the last one year (15.04.2021 to 14.04.2022) is used for testing purposes. The accuracy of the model is tested using the original values and forecasted values in terms of RMSE, MAPE, and MAE. The error terms or residual terms concerned with nonlinearity of the data are forwarded to the next model that is deep learning LSTM model. Before applying it to the LSTM model, the data have been first normalized between 0 and 1. The number of epochs

22 Stock Price Forecasting Using Hybrid Prophet—LSTM Model …

269

Fig. 2 Stock price index value for S&P 500

Fig. 3 Stock price index value for Nikkei 225

set for the LSTM model is 250. Adam optimizer has been used for the LSTM model. Finally, the LSTM output stock price data have been applied to BPNN for fine tuning and optimization. BPNN consists of an input layer, hidden layer, and output layer.

270

D. Patnaik et al.

Fig. 4 Stock price index value for NIFTY 50 Fig. 5 Flowchart of the proposed hybrid model

Stock price data

Prophet model

Residual of the Prophet model

LSTM model

Forecasting

values

of

LSTM

Optimization by BPNN

Final forecasting values

22 Stock Price Forecasting Using Hybrid Prophet—LSTM Model …

271

Table 1 Statistical analysis of price of indices of S&P 500 Name

RMSE

MAPE (%)

MAE

Prophet

18.66

0.711

15.56

Hybrid model (prophet and LSTM)

16.23

0.674

14.24

Proposed (prophet—LSTM and BPNN)

13.52

0.588

11.57

Table 2 Statistical analysis of price of indices of Nikkei225 Name

RMSE

MAPE (%)

MAE

Prophet

208.76

0.998

151.22

Hybrid model (prophet and LSTM)

201.55

0.945

149.61

Proposed (prophet—LSTM and BPNN)

192.16

0.8412

145.34

MAPE (%)

MAE

Table 3 Statistical analysis of price of indices of NIFTY 50 Name

RMSE

Prophet

152.56

0.967

111.62

Hybrid model (prophet and LSTM)

150.66

0.954

109.34

Proposed (prophet—LSTM and BPNN)

145.87

0.901

105.23

6 Results and Discussion The proposed model is simulated in a Python environment for the very standard stock indices of S&P 500, Nikkei 225, and NIFTY 50. Out of 7 years of data, 6 years of data are used for training purposes; last one year is used for testing purposes. The RMSE, MAPE, MAE obtained for the testing dataset is shown in Tables 1, 2, and 3 for the stock index values of S&P 500, Nikkei 225, and NIFTY 50, respectively. It is observed that the hybrid model performs better in all three cases because it takes care of the linear, nonlinear variations, and also, it is optimized by the BPNN network.

7 Conclusion In this work, a novel hybrid forecasting model (Prophet—LSTM) is proposed, and it has been optimized by the BPNN network. It is found that the proposed optimized model performs better in all respects for various standard stock index values collected for various countries all over the globe. The model will be very much useful for the forecasting of future stock prices. It may further be analyzed for other optimization techniques and also for various other stock values.

272

D. Patnaik et al.

References 1. Granger CWJ, Newbold P (2014) Forecasting economic time series. Academic Press 2. Idrees SM, Alam MA, Agarwal P (2019) A Prediction approach for stock market volatility based on time series data. IEEE Access 7:17287–17298. https://doi.org/10.1109/ACCESS. 2019.2895252 3. Wen M, Li P, Zhang L, Chen Y (2019) Stock market trend prediction using high-order information of time series. IEEE Access 7:28299–28308. https://doi.org/10.1109/ACCESS.2019. 2901842 4. Zavadzki S, Kleina M, Drozda F, Marques M (2020) Computational intelligence techniques used for stock market prediction: a systematic review. IEEE Lat Am Trans 18(04):744–755. https://doi.org/10.1109/TLA.2020.9082218 5. Devadoss A, Antony L (2013) Forecasting of stock prices using multi-layer perceptron. Int J Web Technol 2:52–58. https://doi.org/10.20894/IJWT.104.002.002.006 6. Ding X, Zhang Y, Liu T, Duan J (2015) Deep learning for event-driven stock prediction. In: IJCAI’15: Proceedings of the 24th international conference on artificial intelligence 7. Li W, Liao J (2018) A comparative study on trend forecasting approach for stock price time series. In: Proceedings of the international conference on anti-counterfeiting, security and identification 8. Ariyo AA, Adewumi AO, Ayo CK (2014) Stock price prediction using the ARIMA model, In: 2014 UK Sim-AMSS 16th international conference on computer modelling and simulation, pp 106–112 9. Chen MY, Chen BT (2015) A hybrid fuzzy time series model based on granular computing for stock price forecasting. Inform Sci 294:227–241 10. Elsir AFS, Faris H (2015) A comparison between regression, artifcial neural networks and support vector machines for predicting stock market index. Int J Adv Res Artif Intell 4(7) 11. Bharambe MMP, Dharmadhikari SC (2017) Stock market analysis based on artificial neural network with big data. In: Proceedings of 8th post graduate conference for information technology 12. Shrivas AK, Sharma SK (2018) A robust predictive model for stock market index prediction using data mining technique 13. Xu B, Zhang D, Zhang S, Li H, Lin H (2018) Stock market trend prediction using recurrent convolutional neural networks. In: Proceedings of CCF international conference on natural language processing and Chinese computing, pp 166–177 14. Li X, Cao J, Pan Z (2019) Market impact analysis via deep learned architectures. Neural Comput Appl 31:5989–6000 15. More AM, Rathod PU, Patil RH, Sarode DR (2018) Stock market prediction system using Hadoop. Int J Eng 16138 16. Sezer OB, Gudelek MU, Ozbayoglu AM (2020) Financial time series forecasting with deep learning: a systematic literature review 2005–2019. Appl Soft Comput J 90 17. Chatzis SP, Siakoulis V, Petropoulos A, Stavroulakis E, Vlachogiannakis N (2018) Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert Syst Appl 112:353–371 18. Bashir T, Haoyong C, Tahir MF, Liqiang Z (2022) Short term electricity load forecasting using hybrid prophet—LSTM model optimized by BPNN, Energy reports, 8, Elsevier 19. Zhang X, Tan Y (2018) Deep stock ranker: a LSTM neural network model for stock selection. In: International conference on data mining and big data. Springer, pp 614–623 20. Charan VS, Rasool A, Dubey A (2022) Stock closing price forecasting using machine learning models. In: International conference for advancement in technology (ICONAT), pp 1–7. https:// doi.org/10.1109/ICONAT53423.2022.9725964 21. Rezaei H, Faaljou H, Mansourfar G (2020) Stock price prediction using deep learning and frequency decomposition. Elsevier, Expert systems with applications

Chapter 23

Identification of Genetically Closely Related Peanut Varieties Using Deep Learning: The Case of Flower-Related Varieties 11 Atoumane Sene, Amadou Dahirou Gueye, and Issa Faye

1 Introduction In Senegal, groundnut (Arachis hypogaea L.) is the most important cash crop with a sown area of about 1 million hectares quoted in FAO [1]. Annual production is around one million tons per year, making Senegal one of the largest groundnut producers in Africa. This production is supported by the development of improved varieties developed by ISRA and an increasing demand for certified seed by producers for these new improved varieties. In the seed certification process, there is a need for variety identification. However, variety identification has traditionally been based on morphological traits, particularly pod and seed characteristics. Moreover, these characteristics are highly influenced by environmental factors. Thus, this method of identification can be difficult when genetically related varieties are involved. Isogenic lines resulting from a selection scheme based on backcrossing are particularly difficult to differentiate between them and their parent on the basis of morphological characters. Thus, the development of more advanced identification methods based on approaches using artificial intelligence could contribute to the improvement of variety identification methods. In the ECOWAS region, Distinctness, Uniformity and Stability (DUS) tests are carried out according to the International Union for the Protection of New Varieties of Plants (UPOV) guidelines for groundnuts (TG/93/4) in UPOV [2]. Similarly, the A. Sene (B) · A. D. Gueye Alioune Diop University of Bambey, TIC4DEV Team, Bambey, Senegal e-mail: [email protected] A. D. Gueye e-mail: [email protected] I. Faye Senegalese Agricultural Research Institute/National Center for Agricultural Research, Bambey, Senegal © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_23

273

274

A. Sene et al.

official seed control services use the same descriptor (www.upov.int). With the need for new varieties according to the various uses of the stakeholders, there is a real need to strengthen the varietal portfolio. In Senegal, ten new groundnut varieties were recently released in WAAPP [3]. These varieties are interspecific introgression lines resulting from a cross between the Fleur 11 variety and the amphidiploid (Arachis duranensis × Arachis ipaënsis) with four backcrosses made on the Fleur 11 parent variety WAAPP [3]. These varieties are morphologically quite similar and all resemble the parent variety Fleur 11. Identification based solely on the UPOV descriptor characteristics may not be very effective. Traditional methods of variety identification are relatively limited. This may affect the quality of the seed. So far, little research has been done to develop innovative methods to improve variety identification WAAPP [3]. Therefore, the objective of this study is to propose an automated identification model based on deep learning to identify the variety from its pod and seed characteristics. This model is based on VGG16 which is an improved learning technique using deep convolutional networks. It has the advantage of being fast and reliable for the identification of peanut varieties, especially those that are genetically very close. The rest of the paper will be organized as follows: in Sect. 2, we present related work in peanut variety identification. In Sect. 3, we first propose the methodology adopted. In Sect. 4, we present the VGG16 model on which our proposal is based.

2 Related Work In this section, we first present the situation of groundnut seed production and certification in Senegal; then we detail the existing identification methods for groundnut seeds.

2.1 Situation of Peanut Seed Production and Certification in Senegal Peanut (Arachis hypogaea L.) is a leguminous plant native to Latin America by Kouadio [4]. It is cultivated throughout the inter-tropical zone and is of great nutritional and economic importance. It is the sixth most important oil seed crop in the world FAO [5] and is cultivated by more than 100 countries on more than 26.4 million hectares with an average productivity of 1.4 tons per hectare FAO [5] and Ntare et al. [6]. In Senegal, groundnuts generate income for about a third of the population cited in Dia et al. [7] and are the second most important export of the entire agricultural sector after fisheries products ANSD[8]. Moreover, Senegal is the leading exporter of groundnuts in Africa in terms of value and volume, and these exports are primarily

23 Identification of Genetically Closely Related Peanut Varieties Using …

275

Fig. 1 Groundnut production and yield in Senegal. Source DAPSA 2019

directed to China Baborska [9]. The sector has been going through a deep crisis for several decades, resulting from a combination of structural factors (disorganization of the chain, dilapidated processing infrastructure, degradation of seed capital); climatic factors (irregular and insufficient rainfall, soil degradation); and cyclical factors (emergence of new, cheaper oils at the international level). Between 2006 and 2014 (see Fig. 1), production rarely exceeded 700 thousand tons due to low yields.

2.2 Existing Methods of Identification of Peanut Seeds in Senegal In the seed certification process, varietal purity is an important parameter. Thus, the identification of varieties on the basis of distinguishing characteristics is necessary. At present, identification is based on a set of morphological characteristics which are presented in document TG/93/4 UPOV [2] on Guidelines for the Examination of Distinctness, Uniformity and Stability of Peanut. These distinguishing characteristics are plant habit, plant density, anthocyanin coloration of branches, branching type, leaflet shape, pod and seed shape. These observations may have limitations when dealing with genetically closely related varieties shown area of about 1 million hectares FAO [1]. Annual production is around one million tons per year, making Senegal one of the largest groundnut producers in Africa. This production is supported by the development of improved varieties developed by ISRA and an increasing demand for certified seed by producers for these new improved varieties. In the seed certification process, there is a need for variety identification. However, variety identification has traditionally been based on morphological traits, particularly pod and seed characteristics. Moreover, these characteristics are

276

A. Sene et al.

highly influenced by environmental factors. Thus, this method of identification can be difficult when genetically related varieties are involved. Isogenic lines resulting from a selection scheme based on backcrossing are particularly difficult to differentiate between them and their parent on the basis of morphological characters. Thus, the development of more advanced identification methods based on approaches using artificial intelligence could contribute to the improvement of variety identification methods.

3 Methodology The methodological approach is based in a first step on an image acquisition on seeds and pods of a panel of six varieties related to flower 11. In a second step, we proceeded to the implementation of a data set of images classified according to the varieties.

3.1 Existing Methods of Identification of Peanut Seeds in Senegal The material used is composed of seven varieties, all of which are derived from the Fleur 11 variety. The Taaru variety is a cross between the Fleur 11 variety and the 73–30 variety. The other six varieties—Raw Gadu, Rafet Kaar, Yakaar, Jaambar, Tosset and Kom Kom—are all sister lines resulting from a cross between Fleur 11 and the amphidiploid (Arachis duranensis × Arachis ipaënsis) × 4. For each variety, images of the seed and shell are recorded. Thus we have a dataset of 1102 images divided into three folders: train, test and validation.

3.2 Image Acquisition The samples of these peanut varieties used in this paper all come from the National Center for Agricultural Research (CNRA) in Bambey, Senegal. They are all approved and rigorously selected to ensure accurate results. An iPhone 11 prox max camera was used with a resolution of 12 Mega pixel to record the images. The phone was mounted on a stand that allowed easy vertical movement and a stable support for the camera. For each peanut variety clear images of the shell and seed were obtained. All seeds and hulls in the sample were certified varieties that were humanely selected in bags. Each hull or seed could be placed in any random orientation and at any position in the field of view. The background was a white tablecloth.

23 Identification of Genetically Closely Related Peanut Varieties Using …

277

Fig. 2 Seeds and pods of the six peanut varieties from flower 11

The field of view was 12 mm × 9 mm. And the spatial resolution was approximately 0.019 mm/pixel. For each variety, images of the seed and shell are recorded. Thus we have a dataset of 1102 images divided into three folders: train, test and validation (Fig. 2).

4 Model Construction Deep learning is a specific subfield of machine learning: a new approach to learning representations from data that focuses on learning successive layers of increasingly meaningful representations. The word “deep” in “deep learning” does not refer to any deeper understanding achieved by this approach of successive layers of representations. The number of layers contributing to a data model is called the model depth Datascientest [10]. We have used the CNN model using the python language. In the field of deep learning, a convolutional neural network (CNN) is a class of deep neural networks, most often applied to visual imagery analysis. It uses a special technique called convolution. In mathematics, convolution is a mathematical operation on two functions that produces a third function expressing how the shape of one is modified by the other. In effect the convolutional layer, a matrix named Kernel is passed over the input matrix to create a feature map for the next layer. The Kernel matrix onto the input matrix. Thus convolution is a specialized type of linear operation that is widely used in many fields including image processing, statistics, physics. If we have a 2-dimensional image input, I, and a 2-dimensional kernel filter, K, the convolved image is calculated as follows: F(i, j) = (I ∗ K )(i, j) =



I (i − m, j − n)K (m, n)

(1)

278

A. Sene et al.

Fig. 3 VGG16 architecture. Source Datascientest

We used in this paper the open-source software library TensorFlow for machine learning and Keras, as a wrapper, which is a high-level neural network library that is built on top of TensorFlow. We used Sublim text as a development environment. Then we used the VGG16 model, a well-known algorithm in computer vision that is often used by transfer learning to avoid having to re-train it and solve similar problems on which VGG has already been trained Datascientest [10]. As the model is trained on a large dataset, it has learned a good representation of low-level features such as space, edges, rotation, illumination, shapes and these features can be shared to allow knowledge transfer and act as a feature extractor for new images in different categories Aung and al. [11]. These new images can be of completely different categories from the source dataset of the model that was pre-trained. In this paper, we will unleash the power of transfer learning using a VGG16 pre-trained model. The VGG16 pre-trained model is used as an efficient feature extractor to classify the six (06) varieties. With a dataset of 1102 images consisting of seeds and pods for each variety listed above; our model, through each layer, filters each image keeping only discriminative information such as atypical geometric shapes (Fig. 3).

5 Model Training We used a reduced number of images to add a constraint on the input images. We took 1102 images of seed and pod data of the peanut varieties listed above for training and 90 images for validation. The pre-trained VGG16 model served us as a feature extractor and will also use the low-level features such as edges, corners and rotation specific to the target problem which is to classify these images according to the corresponding variety proposed by Naveen and Diwan [12].

23 Identification of Genetically Closely Related Peanut Varieties Using …

279

The default input image size of VGG16 is 224 × 224. We started by resizing the images in our dataset. Then we converted our images into a pixel array by transforming them into an array with numpy array with the function img_to_array() Karen [13]. Finally, the number of epochs is set to 100 and then the training is launched. This will allow us to deduce that with each iteration on the dataset, the model is strengthened and therefore becomes better. Following the training, our model was tested through the generation of accuracy and loss curves, as shown in Figs. 4 and 5. Through the previous curves, we can see that the model has not finished learning. Indeed, the curve concerning the validation data set is stagnating. This led us to use data augmentation to avoid overflowing. To improve our model, we need huge amounts of data. Indeed, the quantity and especially the quality of our dataset will have a major role in the elaboration of a good model. Indeed, it is logical to have Fig. 4 Evolution of accuracy during training

Fig. 5 Evolution of accuracy during training

280

A. Sene et al.

data that are comparable between them, that is to say that they have the same format, the same size and length, etc. This is why we decided to use data augmentation. Data augmentation is based on the principle of artificially increasing our data, by applying transformations to it and increasing the diversity and therefore the learning field of our model, which will be able to adapt better to predict new data. After a new training, we have the following curves with a denser precision and a loss that tends toward 0 as the training goes on, as shown in Figs. 6 and 7. Data Augmentation is only for training data. Fig. 6 Evolution of loss during training

Fig. 7 Evolution of the loss of validation during training

23 Identification of Genetically Closely Related Peanut Varieties Using …

281

Fig. 8 Evaluation of model performance and accuracy

6 Model Evaluation and Testing Figure 8 shows the loss rate results for a convolution neural network in the training and test sets in 100 repetitions. This indicates that the convolution neural network learned the data efficiently and can serve as a good model for variety recognition for efficient variety identification. After the data augmentation, we evaluate our model. Thus, we note a loss of 0.54 and an accuracy of 0.72, which allows us to say that we have succeeded in improving our model. Following this, we took an image from the test dataset that constitutes 30% of the training. We randomly selected an image of the Kom kom variety. Another image could be tested. We called our already trained and evaluated model to test this image. For a given image, the model is inspired by the distinctive characteristics of the seed or pod in order to give the likelihood of resemblance to the already registered varieties. In the test results, we tested a “kom kom” image and Fig. 9 shows the representations obtained for each variety through categorical classification.

7 Conclusion In this paper, we sought to provide an answer to the problem of identifying peanut varieties derived from flower 11, given that they are genetically very similar and therefore difficult to identify with the naked eye. The choice of the species flower 11

282

A. Sene et al.

Fig. 9 Identification results of seed image of the kom kom variety

is justified by several reasons of a nutritional nature, yield and adaptation to climate change conditions. To achieve this, we approached the agricultural experts at the CNRA in Bambey to obtain images of these approved varieties in order to build our dataset on 1102 images consisting of images of peanut shells and seeds classified according to the six (06) selected varieties. We first trained our deep learning model on a dataset of 1102 and found that the model is not optimal. We then applied data augmentation to re-train the model which gave satisfactory performance results with a loss of 0.54 and an accuracy of 0.72. Therefore, after the test, we were able to classify a kom kom seed with a probability of 60%. In the perspectives we intend to increase the size of our dataset, to improve the quality of the images taken, to increase also the training time of our model for more precision. We also intend to go beyond the characteristics of the pod and the seed by adding in our model other characteristics related to the morphology of the plant, i.e., the color and the leaves of the plant to obtain a model of better varietal purity and thus of better output.

References 1. FAO: Bases de données FAOSTAT (2021) Données de l’alimentation et de l’agriculture 2. UPOV: TG/93/4(proj.2), https://www.upov.int/meetings/fr/doc_details.jspmeeting_id= 25646&doc_id=202761, Last Accessed 23 June 2022 3. WAAPP, http://www.waapp-ppaao.org/senegal/index.php/actualit%C3%A9s/240-seuls30-des-besoins-en-semences-d%E2%80%99arachide-assur%C3%A9s-expert.html, Last Accessed 10 Jan 2022 4. Kouadio AL (2007) Des Interuniversitaires en gestion des risques naturels: Prévision de la pro-duction nationale d’arachide au Sénégal à partir du modèle agrométéorologique AMS et du NDVI. In: ULG-Gembloux, p 54 5. FAO: L’évaluation de la dégradation des terres au Sénégal (2003) Projet FAO land degradation assessment. In: Rapport préliminaire. Avril., p 59 6. Ntare BR, Diallo AT, Ndjeunga J, Waliyar F (2008) Groundnut seed production manual. In: Manual. International Crops Research Institute for the Semi-Arid Tropics

23 Identification of Genetically Closely Related Peanut Varieties Using …

283

7. Dia D, Diop AM, Fall CS, et Seck T (2015) Sur les sentiers de la collecte et de la commercialisation de l’arachide au Sénégal. In: Les notes politiques de l’ISRA-BAME n°1. ISRA Bureau d’Analyses Macro Économiques (BAME), Dakar 8. ANSD (2007) Note d’Analyse du Commerce Extérieur. In: Edition 2018, Gouvernement du Sénégal, Dakar 9. Baborska R (2021) Suivi des politiques agricoles et alimentaires au Sénégal 2021. In: Rapport d’analyse politique. Suivi et analyse des politiques agricoles et alimentaires (SAPAA). Rome, FAO 10. Datascientest, https://datascientest.com/computer-vision, Last Accessed 25 Feb 2022 11. Aung H, Bobkov AV, Tun NL (2021) Face detection in real time live video using Yolo algorithm based on Vgg16 convolutional neural network. In: 2021 International conference on industrial engineering, applications and manufacturing (ICIEAM) 12. Naveen P, Diwan B (2021) Pre-trained VGG-16 with CNN architecture to classify X-rays images into normal or pneumonia. In: International conference on emerging smart computing and informatics (ESCI) 13. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large scale image recognition. In: International conference on learning representations 14. TALKAG, https://www.talkag.com/blogafrique/?p=3199, Last Accessed 15 Jan 2022 15. ANSD: Situation Économique et Social du Sénégal 2017–2018, https://satisfac-tion.ansd.sn/, Last Accessed 01 Mar 2022

Chapter 24

Efficient Color Image Segmentation of Low Light and Night Time Image Enhancement Using Novel 2DTU-Net and FM2 CM Segmentation Algorithm Chandana Kumari

and Abhijit Mustafi

1 Introduction The significance of SS is highlighted owing to the drastically rising demand for autonomous driving with higher-level scene understanding abilities [1]. A system of categorizing every pixel into a predetermined class for each pixel of an input image is termed SS [2]. Several SS models centered on Convolutional Neural Networks (CNN) have been hugely researched with the enhancement of deep learning technologies along with computer hardware [3]. In the daytime, many prevailing SS depicts higher performance; however, in the night time, the performance is low [4]. Owing to an inadequate quantity of external light, the images’ brightness is extremely less at night time along with the noise caused by the camera sensor raises [5]. Moreover, because of the camera’s longer exposure time, the motions together with optical blur are engendered in images [6]. SS is very complex in a LL environment, and performance enhancement is a difficult issue because of those issues [7]. Enormous LL image segmentation methodologies are researched for tackling that issue. The prevailing technique has large accuracy; however, the neural network’s training procedure is tedious and the training time is huge [8, 9]. Moreover, the prevailing systems’ accuracy rates aren’t adequate [10]. Thus, by employing a new 2DTU-NET and FM2 CM that effectively segments the LL images with a better accuracy rate, an effectual color image segmentation of LL and night time image enhancement is proposed. The balance part is arranged as: the related works concerning the proposed system are elucidated in Sect. 2; the proposed technique is expounded in Sect. 3; the outcomes and discussions grounded on performance metrics are illustrated in Sect. 4; the paper’s conclusion with future work is depicted in Sect. 5. C. Kumari (B) · A. Mustafi Birla Institute of Technology, Mesra, Ranchi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_24

285

286

C. Kumari and A. Mustafi

2 Literature Survey Cho et al. [11] introduced an altered Cycle Generative Adversarial Network (CycleGAN)-centric multi-class segmentation methodology, which enhanced the multi-class segmentation execution for LL images. LL databases engendered by ‘2’ road scene open databases, which offer segmentation labels were wielded. When weighed against state-of-the-art techniques, this system depicted higher performance. But it had a limitation of deprived reliability and high needs. Cho et al. [12] presented a LL image segmentation technique grounded on an altered CycleGAN. LL databases engendered from ‘2’ well-known road scene open databases like Cambridge-driving Labeled Video Database (CamVid) along with KITTI were wielded. When analogized to the prevailing state-of-the-art methodology, this system depicted enhanced segmentation performance in hugely LL surroundings. Nevertheless, the disadvantage is that the cost constraint frequently limits larger-scale applications. Baoet al. [13] determined an effectual segmentation technique, which merged dark images’ LL enhancement, level set segmentation’s bias correction, and local entropy segmentation. Initially, through the histogram equalization and V-channel of Hue, Saturation, and Value (HSV) color space the dark images were improved. Next, the level set technique corrected the enhanced image’s bias field. Afterward, the image local entropy was obtained. Ultimately, the segmentation outcomes were achieved. The system’s efficacy and feasibility were exhibited. But it is tedious to perform segmentation for several objects at the same time. Li et al. [14] conducted a network architecture termed Edge-Conditioned CNN (EC-CNN) for thermal image semantic segmentation. In EC-CNN, a gated featurewise transform layer was implemented, which adaptively merged edge prior knowledge. With edge guidance, the entire EC-CNN was end-to-end trained along with engendered high-quality segmentation outcomes. In thermal image semantic segmentation, for comprehensive appraisals, the “Segmenting Objects in Day And Night” (SODA) dataset was wielded. The EC-CNN’s effectiveness against state-of-the-art methodologies is expounded by doing extensive experiments on SODA. However, the noisy data was not effective. Kim et al. [15] developed a multi-spectral unsupervised domain adaptation meant for thermal image semantic segmentation. With pixel-level domain adaptation bridging day together with night thermal image domains, this scheme developed the thermal segmentation network’s generalization ability. Hence, devoid of any groundtruth labels, the thermal image segmentation network acquired superior performance. The system’s efficacy and robustness are exhibited quantitatively and qualitatively. Nevertheless, the accuracy rate was extremely low.

24 Efficient Color Image Segmentation of Low Light and Night Time …

287

3 Proposed Methodology Images attained in real-world LL circumstances are not just low in brightness, but also undergo color bias, unknown noise, detail loss, along with halo artifacts. Thus, the LL image segmentation is tedious. By utilizing a new 2DTU-Net and FM2CM segmentation algorithm, an effective color image segmentation of LL and night time image enhancement is proposed. The technique endures further steps for enhancing the segmentation procedures’ efficiency. First, the input LL or night images are resized into a standard format. Next, by deploying 2DTU-Net, the contrast enhancement function is implemented for the resized images. After that, by FM2CM, the contrastenhanced images are segmented. In Fig. 1, the proposed techniques’ architecture is elucidated.

3.1 Input Source From the KITTI dataset, the LL or night time images are taken as the input source. The dataset includes 7481 training images annotated with 3D bounding boxes that are openly accessible on the Internet.

3.2 Image Resizing Here, the input LL or night time images are resized. The procedure that modifies the image size by maximizing and minimizing the entire pixels is termed image resizing. The image is resized into length*width of 320*240; thus, the zoom image quality can be improved. This is equated as, Input LDR Image

Image Enhancement Color Conversion

Image Threshold

RGB-HSV

AMOT

Enhanced HDR Image

RCAB-RDMCNN

Image Segmentation RBSHM

Fig. 1 Architecture of the proposed framework

288

C. Kumari and A. Mustafi

Rz img = λR (Inimg )

(1)

Here, the resized images are signified by Rz img , the input images’ resizing function is mentioned by λR (Inimg ).

3.3 Image Enhancement The images’ contrasts are improved once the images are resized. The manipulation or redistribution of image pixels in a linear or non-linear fashion to enhance the images’ contrast is termed contrast enhancement. Thus, the image details in the low dynamic range might be evaluated effectively and enhance the images’ quality. So, in image processing, contrast enhancement is a significant process. By utilizing the 2DTU-Net, the images’ contrasts are improved.

3.3.1

Image Enhancement Using 2DTU-Net

U-net is grounded on CNN. A particular encoder-decoder plan is included in the U-net architecture. In each layer, the encoder decreases the spatial dimensions along with boosts the channels. Alternatively, while decreasing the channels, the decoder maximizes the spatial dimensions. In the end, to make a prediction for every pixel, the spatial dimensions are restored. In Fig. 2, the U-net architecture’s general structure is depicted. Encoder (Contraction path): The repeated application of two 3×3 convolutions is included. Every convolution is followed by ReLU and batch normalization. Next, for decreasing the spatial dimensions, a 2×2 max pooling operation is implemented. The feature channels get doubled at every down-sampling pace. Decoder (Expansion path): Every pace in the decoder encompasses the feature map’s up-sampling succeeded by a 2×2 transpose convolution that divides the feature channels and also concatenates with the equivalent feature map as of the contracting path, along with wields a 3×3 convolutional each succeeded by a ReLU. To engender contrast-enhanced images, a 1×1 convolution is wielded in the last layer. Usually, it makes the neuron useless and makes it unable to flow on other data points again for the leftover process, when a huge gradient flows through the ReLU in the CNN layers, the U-Net employs Rectified linear unit (ReLU) activation function. Hence, the ReLU activation is replaced with the Tanish function. Tanish, which constantly updates the weight and bias values even if the gradient is large, represents the Tanh and swish activation function. The Tanish function is depicted by, Tanish = T (z) ∗ Sw

(2)

Here, the Tanh activation function is indicated by T (z), and the swish activation function is mentioned by Sw.

24 Efficient Color Image Segmentation of Low Light and Night Time …

289

Fig. 2 General architecture of the U-Net

T (z) =

e z − e−z e z + e−z

Sw = X ∗ Sigmoid(X )

(3) (4)

Hence, the images’ contrast is considerably raised by th2DTU-Net. x signifies the contrast-enhanced images.

3.4 Segmentation Next, image segmentation is conducted. The process of segmenting the images into several divisions that aid in decreasing the image complexity to make image simpler processing is termed image segmentation. By utilizing FM2 CM, the images are segmented.

3.4.1

Segmentation using FM2 CM

A soft clustering methodology in which every pixel might belong to two or more clusters with differing degrees of membership is termed Fuzzy C-Means (FCM). The objective function is signified by the sum of distances between cluster centers and patterns. Usually, for estimating the distance between the cluster centers and initial

290

C. Kumari and A. Mustafi

pixels, the FCM wields Euclidean Distance (ED). However, the ED estimation is intricate along with degrading the clustering accuracy. For a huge amount of data, the ED isn’t apt. Thus, the Chebyshev distance, which is also called Maximum Metric (M2 ) that effectively surpasses the above-mentioned issue, is employed rather than ED. The usual FCM is termed FM2 CM due to that alteration. The FM2 CM steps are given below. Step 1: Input the input pixel of the contrast-enhanced images x. This is equated as: x j = {x1 , x2 , x3 , . . . , x N }

(5)

Here, the number of pixels in the image x is defined by N . Step 2: Input, the number of cluster centroid (Cn k ) arbitrarily, which is given by, Cn k = Cn 1 , Cn 2 , Cn 3 , . . . , Cn M

(6)

Step 3: Next, the M2 distance D[x j , Cn k ] betwixt the initial pixels x j and the cluster centroids (Cn k ) is estimated by, D(x j , Cn k ) = Max(|x2 − x1 |, |Cn 2 − Cn 1 |)

(7)

Step 4: The FM2 CM’s objective function (Obj F N ) is calculated by, ObjaFN =

N  M  

Z jk

a   2 · D x j , Cn k

(8)

j=1 k=1

where the real number greater than 1 maintains the degree of fuzziness is a. The membership function is depicted by Z jk , here Z jk ∈ [0, 1]. Step 5: The fuzzy membership function Z jk , which is equated as, Z jk

   −2  M x j − Cn k  a−1    = x j − Cn i 

(9)

k=1

Here, the cluster centroid selected from Cn k is expounded by Cn i . Step 6: By reassigning the cluster centroids  Cn k and Cn i , the process continues by optimizing the objective function ObjFN until the following condition is met.  jk   Z (N ) − Z jk (N + 1) ≤ Q

(10)

Here, the constant that ranges from 0 to 1 is signified by Q. Step 7: Hence, the images are effectively segmented by the proposed FM2 CM. The segmented images are equated as (Fig. 3),

24 Efficient Color Image Segmentation of Low Light and Night Time …

291

Fig. 3 Pseudo-code for the FM2 CM

Simg = {S1 , S2 , S3 , ..., Sn }

(11)

4 Results and Discussion Here, the evaluation of the proposed methodologies’ final results is elucidated. The performance and comparative evaluation are done to state the efficacy. In the working platform of MATLAB, the proposed method is deployed. From the KITTI dataset that is openly accessible on the Internet, the input data is gathered.

4.1 Performance Analysis of Proposed 2DTU-Net Regarding Peak Signal-to-Noise Ratio (PSNR), Mean Square Error (MSE), together with Structural Similarity Index (SSIM), the proposed 2DTU-Net’s performance appraisal is verified with several prevailing U-net, CNN, and Deep Neural Network (DNN) methodologies.

292

C. Kumari and A. Mustafi

Regarding PSNR, MSE, and SSIM, the proposed 2DTU-Net is analogized with several prevailing U-net, CNN, and DNN methodologies in Fig. 4. The images’ enhanced quality is offered by the high value of PSNR in Fig. 4a. For PSNR, the 2DTU-Net attains 20.14345; while the prevailing methodologies achieve the value of 11.54058. The image degradation caused by image compression and other processing techniques is quantified by the MSE metric value. The model’s efficacy is indicated by the low value of MSE. The 2DTU-Net obtains 0.01142; while the current methodologies achieve the value of 0.11071 regarding MSE. Moreover, the 2DTU-Net was also evaluated regarding SSIM metrics. For SSIM, 0.63568 is the value obtained by the 2DTU-Net; while the prevailing methodologies attain the lower value of 0.33896. The 2DTU-Net is an error-prone scheme along with conveys a quality image without any distortion.

4.2 Performance Analysis of Proposed FM2 CM Regarding clustering time and accuracy, the proposed FM2 CM’s performance evaluation is examined; in addition, the results are analogized to the prevailing FCM, K-means, K-medoid, and Mean Shift (MS) techniques. For the proposed FM2 CM, the clustering time analysis is depicted in Table 1. The time consumed by the whole system to create an effectual cluster is termed clustering time. The outcomes will be complicated if more time is consumed. For clustering time, the FM2 CM takes a less time of 16,352 ms; while the current methodologies consume a huge time of 23,421 ms for FCM, 26,617 ms for K-means, 30,337 ms for 30,337 ms, and 34,294 ms for MS. Hence, with limited time and cost, the FM2CM achieves effectual clusters along with decreases complications; thus, time complexity could be lightened. The clustering accuracy of the proposed FM2CM and the prevailing FCM, Kmeans, K-medoid, and MS methodologies are analogized in Fig. 5. For forming clusters, the FM2CM has an accuracy of 97%; while 89, 82, 73, and 70% is the clustering accuracy of current FCM, K-means, K-medoid, and MS methodologies, which is low comparatively. Thus, in cluster formation, the FM2CM exhibits superior performance, which provides an enhanced effect on the segmented images.

5 Conclusion By deploying a fresh 2DTU-Net and FM2 CM segmentation algorithm, an effective color image segmentation of LL and night time image enhancement is proposed. Resizing, image enhancement, and image segmentation were the ‘3’ key steps on which the proposed system focused. Next, for examining the proposed methodologies’ efficacy, the experiment evaluation is done, where the performance and comparative evaluation is conducted regarding a few performance metrics. Several

24 Efficient Color Image Segmentation of Low Light and Night Time …

(a)

(b)

(c) Fig. 4 Comparative analysis of proposed 2DTU-Net based on a PSNR, b MSE, and c SSIM

293

294 Table 1 Performance analysis of proposed FM2 CM algorithm in terms of clustering time

C. Kumari and A. Mustafi Techniques Proposed

FM2 CM

Clustering time (ms) 16,352

FCM

23,421

K-means

26,617

K-medoid

30,337

Mean shift

34,294

Fig. 5 Comparative analysis of the proposed FM2 CM in terms of clustering accuracy

uncertainties can be tackled by the presented technique along with it can give propitious outcomes. With a limited time like 16,352 ms, the clustering algorithms form effective clusters along with segments the images with an accuracy rate of 97%. The proposed framework surpasses the prevailing methodologies and sustains to be reliable and robust. The system will be elaborated with a few enhanced neural networks and perform the image segmentation procedure for complicated datasets in the future.

References 1. Valada A, Vertens J, Dhall A, Burgard W (2017) AdapNet adaptive semantic segmentation in adverse environmental conditions. In: IEEE international conference on robotics and automation (ICRA), May 29–June 3, 2017, Singapore 2. Dai D, Van Gool L (2018) Dark model adaptation semantic image segmentation from daytime to nighttime. In: 21st International conference on intelligent transportation systems (ITSC), November 4–7, 2018, Maui, Hawaii, USA 3. Long J, Shelhamer E, Darrell T, Berkeley UC (2015) Fully convolutional networks for semantic segmentation. arXiv:1411.4038v1 4. Lore KG, Akintayo A, Sarkar S (2016) LLNet a deep autoencoder approach to natural low-light image enhancement. Patt Recogn 61:650–662 5. Wang Y, Ren J (2018) Low-light forest flame image segmentation based on color features. J Phys Conf Ser 1069(1):1–9 6. Shen L, Yue Z, Feng F, Chen Q, Liu S, Ma J (2017) MSR-net low-light image enhancement using deep convolutional network. arXiv:1711.02488v1

24 Efficient Color Image Segmentation of Low Light and Night Time …

295

7. Dev S, Savoy FM, Lee YH, Winkler S (2017) Nighttime sky/cloud image segmentation. In: IEEE international conference on image processing (ICIP), 17–20 Sept 2017, Beijing, China 8. Badrinarayanan V, Kendall A, Cipolla R (2016) SegNet a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495 9. Haltakov V, Mayr J, Unger C, Ilic S (2015) Semantic segmentation based traffic light detection at day and at night. Springer, Cham, ISBN: 978-3-319-24946-9 10. Sun L, Wang K, Yang K, Xiang K (2019) See clearer at night towards robust nighttime semantic segmentation through day-night image conversion. In: Proceedings artificial intelligence and machine learning in defense applications, 19 Sept 2019, Strasbourg, France 11. Cho SW, Baek NR, Koo JH, Park KR (2020a) Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation. IEEE Access 9:6296–6324 12. Cho SW, Baek NR, Koo JH, Arsalan M, Park KR (2020b) Semantic segmentation with low light images by modified cycle GAN-based image enhancement. IEEE Access 8:93561–93585 13. Bao XY, Sun ZL, Wang N, Chen YQ (2019) Solar panel segmentation under low contrast condition. In: Chinese control and decision conference (CCDC), 3–5 June 2019, Nanchang, China 14. Li C, Xia W, Yan Y, Luo B, Tang J (2020) Segmenting objects in day and night edge-conditioned CNN for thermal image semantic segmentation. IEEE Trans Neur Netw Learn Syst 32(7):3069– 3082 15. Kim YH, Shin U, Park J, Kweon IS (2021) MS-UDA multi-spectral unsupervised domain adaptation for thermal image semantic segmentation. IEEE Robot Autom Lett 6(4):6497–6504

Chapter 25

An Architecture to Develop an Automated Expert Finding System for Academic Events Harshada V. Talnikar and Snehalata B. Shirude

1 Introduction Expert in a specific area is the person who is knowledgeable and has in-depth skill in that area. Expert finding problems have various applications in the business operations field and in everyone’s everyday life. For instance, people may seek experts’ advice related to domains like academics, medical problems, laws, and finance [1]. As a result of the emergence of innovative technologies and recent swift advancements, it leads to huge data flow all over the world. Most of the search engines concentrate on words rather than concepts. It allows only to use a certain number of keywords to narrow the search. While using such search engines, search outcomes may be either relevant or irrelevant. If it is relevant, sometimes, the range varies from tens to hundreds. To meet this problem, the proposed work presents the use of natural language model-based information retrieval. It recovers the meaning insights from the enormous amount of data available on the Internet [2].

2 Related Work • Wu et al. [3] used ResearchGate and explored the features using questionnaires and interviews. The proposed approach has considered pages and navigations. It showed the complete process to find an expert on different academic social H. V. Talnikar (B) Department of Computer Science, S. N. Arts, D. J. M. Commerce, B. N. S. Science College (Autonomous), Sangamner, India e-mail: [email protected] H. V. Talnikar · S. B. Shirude School of Computer Sciences, K. B. C. North Maharashtra University, Jalgaon, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_25

297

298

H. V. Talnikar and S. B. Shirude

networking sites. A pathway defined by the authors is a series of navigations between pages and while linking it has followed chronological order. The study summarized specific indications for academic social networking sites to (a) Improve search pages (b) Focus on individual user’s needs (c) Use relational networks. • Fu and Li [4] proposed a novel recurrent memory reasoning network (RMRN), exploring the implicit relevance between a requester’s question and a candidate expert’s historical records by perception and reasoning taken into consideration. The considerations introduced a Gumbel-Softmax-based mechanism to select relevant historical records from candidate experts’ answering histories. The judgment has been made that the proposed method could achieve better performance than the existing state-of-the-art methods. • Javadi et al. [5] suggested a recommendation system for finding experts in online scientific communities. A dataset including bibliographic information, venue, and various listed-related published papers has been used. In evaluation using the IEEE database, the proposed method reached an accuracy of 71.50% that seems to be an acceptable result. • Joby [2] emphasized on using a natural language model-based information retrieval to recover the meaning insights from the enormous amount of data. The method has used the latent semantic analysis to retrieve significant information from the questions raised by the user or the bulk documents. The carried out method utilized the fundamentals of semantic factors occurring in the dataset to identify the useful insights. The experiment analysis of the proposed method has been carried out with few state-of-the-art datasets such as TIME, LISA, CACM, and NPL, and the results obtained demonstrated the superiority of the method proposed in terms of precision, recall, and F-score. • Hussain et al. [6] reported expert finding systems for the span, 2010–2019. Authors indicated specific scope by formulating five different research questions. This study focused on useful sources. Sources have three different categories, viz., text, social networks, and hybrid. Literature depicts the models for building the expert finding systems like generative and discriminative probabilistic, networkbased, voting, and some hybrid models. Datasets were used to evaluate the expert finding systems. Different datasets were broadly used in the environments as— enterprises, academics, and social networks. The differences between experts’ retrieval and experts seeking were discussed. Finally, the review concluded that nearly 65% expert finding systems are designed for academic purpose and domain. • de Campos et al. [7] proposed a machine learning perspective by clustering expert textual sources to build profiles and capture the different hidden topics in which the experts are interested. The approach represented experts by means of multifaceted problems. A judgment has been made that it is a valid technique to improve the performance of expert finding and document filtering.

25 An Architecture to Develop an Automated Expert Finding System …

299

• Shirude and Kolhe [8] investigated a framework for finding experts required for academic programs and committees. The framework used online research groups such as ResearchGate and Google Scholar as the resources. The authors assumed that performance depends on retrieval of keywords from many online research groups and suggested improvement by keywords weighting in the vectors. • Rostami and Neshati [9] used the idea of an agile team. Software companies require it. The idea has suggested a T-shaped model for expert finding and used the models XEBM, RDM. • Yuan et al. [10] reviewed and categorized current progress of expert finding in community question and answering (CQA). The existing solutions have been categorized into—matrix factorization-based models, gradient boosting tree-based models, deep learning-based models, and ranking models. According to authors, matrix factorization-based models outperform. • Lin et al. [1] reviewed and summarized expert finding methods and categorized according to their underlying algorithms and models. The conducted review has concluded with the categorization of models as—(a) generative probabilistic models (candidate generation models, topic generation models—(a) document models), (b) voting models, and (c) network-based models (HITS and PageRank algorithms, propagation models). The authors pointed out many unsolved challenges as notion of expert, finding relevant people, model evaluation, knowledge area classification. Automatically expert finding tasks are more challenging as large-scale expertise-related information is available on various data sources. For the expert finding purpose, the information retrieval process is popularly used in examining the information from the big amount of dataset. There are multitudes of possibilities available in the information retrieval. The enormous information flow quantity through the Web pages heightens the difficulties in the useful and as well as reliable information retrieval. Firstly, to decide who is an expert in a specific domain, it is necessary to acquire all relevant data of the person [11]. The following are the three channels used as data resources: (a) Meta databases (b) Document collections (c) Referral networks.

3 Findings About Expert Finding Systems The literature survey resulted in the classification of various studied expert finding systems. It made it possible to classify the systems according to domains used and applied techniques.

300

H. V. Talnikar and S. B. Shirude

Fig. 1 Classification based on domain

3.1 Classification Based on Used Domains The task of searching experts falls into two major categories according to the domains used as shown in Fig. 1. i. Enterprise: In this first category enterprise, it uses three sources to find expert’s area and level as (a) self-disclosed documents, (b) documents, and (c) social networks. The study summarized that self-disclosed information is difficult for timely updating, while the remaining two other sources documents and social networking are important. ii. Community question answering (CQA) platforms as Quora, StackOverflow, Yahoo answers: In this second category, i.e., online communities, it uses two sources as (a) social networks and documents. According to Wang et al. [12], knowledge in online communities has heterogeneous structure and may be low quality and highly affects expert finding systems performance.

3.2 Classification Based on Used Techniques It does not matter which domain is used either in enterprise or online communities while classifying expert finding techniques based on used techniques. Such a classification has divided used techniques into two groups as graph-based techniques and machine learning-based techniques as demonstrated in Fig. 2 [12].

3.2.1

Graph-Based Techniques

These techniques make use of graphical representation of retrieved data. A graph. G = (V, E) is prepared, where V is a set of experts, and E is a link between them to connect them by means of question asker-answerer, co-authorship, email

25 An Architecture to Develop an Automated Expert Finding System …

301

Fig. 2 Classification based on techniques

communication, etc. Number of methods are applied on these graphs to rank the experts as computing measures (e. g., PageRank, HITS) or graph properties (e. g., centrality, connections) [13].

3.2.2

Machine Learning-Based Techniques

Machine learning (ML) is useful to identify the patterns in the given training dataset. ML methods make use of feature extraction from various sources whether enterprise or online communities. Features are considered as either content-based or noncontent-based. Some examples of the ML-based techniques are—logistic regression, support vector machine, reinforcement, clustering, group recommendation, etc.

4 Proposed Architecture The proposed methodology includes the important tasks to (a) (b) (c) (d) (e) (f) (g)

Identify need of experts Ascertain various online research groups Remove noise from the collected data Apply artificial intelligence-based machine learning techniques Use natural language processing (NLP) tools to retrieve relevant information Extract names, address, expertise area, etc., of experts from relevant information Compare the result with existing tools. Let us summarize the process (Fig. 3). The above proposed architecture consists of all the following steps:

I. Resource selection

302

H. V. Talnikar and S. B. Shirude

Fig. 3 Conceptual expert finding procedure

It is a pinpoint task of correctly identifying an expertise area. To initiate the process of an expert finding system, it is the first step to identify the exact purpose to find the expert [14]. To meet the requirement, it is necessary to select the appropriate resources and to collect the relevant data. Various useful data resources may be any of the following: • Meta databases: Some organizations use databases to store the expertise of their employees. • Document collections: One approach is to construct a database manually, but the better is to automatically extract from documents like reports, publications, emails, question and answers forums, comments in such forums, Web pages, etc. • Referral networks: There are some Web groups of people who share the same interests and are in the same profession. Such groups may create referral networks which may consist of colleagues, students-teachers, authors, etc. In these networks, expert is recommended by other person who knows about the expert’s knowledge and the specific skills. II. Data cleaning These selected resultant documents containing Web pages can have noise contents as extra spaces or tabs or delimiters, advertisements, unwanted images, etc. Once the noise is removed, the produced clean data prompt to extract required contents very easily and in less time. It improves the accuracy of the system. If one has wrong or bad quality data, then it can be detrimental to process and analysis. On the other hand,

25 An Architecture to Develop an Automated Expert Finding System …

303

good-quality data can produce outstanding results using a simple algorithm. There are many kinds of constraints to conform for the data being valid like range, data type, and constraints, cross field examination, unique requirements, set membership restrictions, regular pattern, accuracy, completeness, consistency, uniformity, and many more. Considering the importance of data cleaning, the following data cleaning techniques may be implemented, • • • • • • •

Remove duplicate or irrelevant observations Filter outliers Avoid typo errors Data type conversions Deal with missing values Get rid of extra spaces Delete auto-formatting.

III. Build strategy to retrieve experts from cleaned data To implement strategy building steps, the innovative and modified strategies may be planned to improve efficient outcomes. The earlier studied techniques helped to design new combinatorial strategies. As a result, it has enabled us to propose welldefined concrete structures. While considering an expert retrieval problem, it may consider following two search criteria: (1) “Who is the expert person on topic Y?”—This query helps to find an expert in a particular knowledge domain. It is termed as expertise identification [18]. (2) “What does expert X know?”—This query helps to find expert’s information and knowledge. It is termed as expert identification. Most of the currently used algorithms focused on above stated first search criteria. Expert finding strategy concerns: • Representation of experts, • Finding expertise evidences, and • Association of query topics to candidates [13]. Any expert retrieval model has three components—candidate (a person to consider as an expert), document (the data resources), and topic (the specific domain). The following are the approaches which are used in expert finding task: (a) Generative probabilistic models These models are used in many expertise retrieval methods. It uses an idea to rank candidate with the probability p(Ca|q), and it is the probability of a candidate Ca to be an expert on topic q. Balog et al. [13] stated two different generative probabilistic models such as (i) candidate generation and (ii) topic generation models. Candidate generation model computes the probability of a candidate Ca being an expert on topic q as p(Ca|q). Topic generation model finds out the probability using Bayes’ theorem as

304

H. V. Talnikar and S. B. Shirude

p(Ca|q) =

p(q|Ca) p(Ca) p(q)

(1)

where p(Ca) is candidate’s probability and p(q) is a query probability. Generative probabilistic models are based on the foundation of language modeling. (b) Voting models Expert finding can be done with the voting process. Macdonald and Ounis, 2006 used retrieved documents ranking [15, 16]. Data fusion techniques were used for the ranking. They aggregated the votes for every candidate and determined a final ranking for the candidate. Zhang et al. stated a reciprocal rank (RR) fusion technique. In that, a candidate’s expertise is calculated as ScoreRR (Ca, q) =



1 rank(d, q) d:Ca∈d,d∈R(q)

(2)

where R (q) is set of documents retrieved as a result of query q. Rank (d, q) is the rank of document d. This reciprocal rank (RR) fusion technique is the simplest method to determine ranking in the voting model. (c) Network-based models For finding expert’s information, referral Webs and social networks are common channels for data retrieval. In such network-based models, expert retrieval graphs can be constructed using any of the following two ways: i. The graph in which nodes are represented as documents and candidate’s expertise. Edges are represented as their association. ii. The graph in which nodes are represented as only candidates and edges are represented as their relationship. In the various network-based models, HITS and PageRank algorithms are used as well random walk propagation is often used. IV. Forming experts groups The identified experts can be put into one group according to domain of expertise. Such collaborative groups can prove a good platform for communication as well as knowledge exchange, etc. [8]. It may prove to be advantageous for knowledge upgradation.

25 An Architecture to Develop an Automated Expert Finding System …

305

5 Result and Discussion The discussed framework is aiming to develop the artificial intelligence-based system with the highest possible accuracy with consideration of the broad domain for expert finding tasks. The effective implementation can produce improved and more efficient systems over the existing ways to find domain specific experts. It can be resulted into an intelligent application with enriched domain specific experts suggestions. Collaborative groups of similar domains’ experts can be found as an additional outcome of the implemented process. The field where it needs to focus the research is to design an efficient experts retrieval algorithm [17]. The discussions related to building strategy for retrieving experts from cleaned data points toward the appropriate model selection. To identify an expert by analyzing his related documents and collecting expertise evidence is an effective way, but there is a possibility that such evidence is outside the scope of collected documents. It arises from a need for implicit or explicit relation mapping among people and documents [18]. Network-based models are found well for such mapping. Generative probabilistic models have good empirical performance and the potential to incorporate extensions in a transparent manner. The voting models for the domain specific expert search aggregate scores from single strategy across members of a document aggregate rather than aggregating multiple systems’ score on only one document. The study has observed that identifying the domain relevant experts and ranking them over the non-relevant experts is the challenging task [19, 20].

6 Conclusion This article investigates primary key issues in the field of expert finding tasks such as resources selection, expertise data retrieval, and retrieval model extending. Pertaining to each issue specific tasks and algorithms needs to be implemented. In future, the proposed architecture may be implemented in several applicable domains. Further, the use of natural language model-based information retrieval may play an important role in the development of expert finding systems.

References 1. Lin S, Hong W, Wang D, Li T (2017) A survey on expert finding techniques. J Intell Inf Syst. © Springer Science+Business Media New York 2. Joby P (2020) Expedient information retrieval system for web pages using the natural language modeling. J Artifi Intell Caps Netw 02(02):100–110 3. Wu D, Fan S, Yuan F (2021) Research on pathways of expert finding on academic social networking sites. Inform Process Manage 58(2)

306

H. V. Talnikar and S. B. Shirude

4. Fu J, Li Y, Zhang Q, Wu Q, Ma R, Huang X, Jiang YG (2020) Recurrent memory reasoning network for expert finding in community question answering. Assoc Comput Mach 20 5. Javadi S, Safa R, Azizi M, Mirroshandel SA (2020) A recommendation system for finding experts in online scientific communities. J AI Data Mining 8(4):573–584 6. Husain O, Salim N, Alinda RA, Abdelsalam S, Hassan A (2019) Expert finding systems: a systematic review. Appl Sci 7. de Campos Luis M, Fernandez-Luna JM, Huete JF, Luis RE (2019) Automatic construction of multi-faceted user profiles using text clustering and its application to expert recommendation and filtering problems, Vol 190. Elsevier, Knowledge-Based Systems 8. Shirude S, Kolhe S (2019) A conceptual framework of expert finding system for academic events and committees. Int J Comp Sci Eng 7(2) 9. Rostami P, Neshati M (2019) T-shaped grouping: expert finding models to agile software teams retrieval. Expert Syst Appl 118:231–245 10. Yuan S, Zhang Y, Tang J, Hall W, Cabotà JB (2018) Expert finding in community question answering: a review. https://biendata.com/competition/bytecup2016/, Accessed on 10 Apr 2021 11. Huna A, Srba I, Bielikova M (2016) Exploiting content quality and question difficulty in CQA reputation systems 12. Wang GA, Jiao J, Abrahams Alan S, Fan W, Zhang Z (2013) ExpertRank: a topic-aware expert finding algorithm for online knowledge communities. Decision Support Systems 54 (2013), © 2012 Elsevier B.V. All rights reserved, pp 1442–1451 13. Balog K, Yi F, De Rijke M, Serdyukov P, Si L (2012) Expertise retrieval. Found Trends Inf Retr 6(2–3):127–256 14. Zhang J, Tang J, Liu L, Li J (2008) A mixture model for expert finding. PAKDD 2008, LNAI 5012, © Springer-Verlag Berlin Heidelberg, pp 466–478 15. Macdonald C, Ounis I (2007) Using relevance feedback in expert search. In: European conference on information retrieval. Springer, Heidelberg, pp 431–443 16. Macdonald C, Ounis I (2006) Voting for candidates: adapting data fusion techniques for an expert search task. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, pp 387–396 17. Dawit Y, Alfred K (2003) Expert finding systems for organizations: problem and domain analysis and the DEMOIR approach. J Org Comput Electron Commer. https://doi.org/10.1207/ S15327744JOCE1301_1 18. Petkova D, Croft WB (2008) Hierarchical language models for expert finding in enterprise corpora. Int J Artif Intell Tools 17(01):5–18 19. Zhang M, Song R, Lin C, Ma S, Jiang Z, Jin Y, Liu Y, Zhao L, Ma S (2003) Expansion-based technologies in finding relevant and new information: thu trec 2002: novelty track experiments. NIST Spec Publ SP 251:586–590 20. McDonald DW, Ackerman MS (2000) Expertise recommender: a flexible recommendation system and architecture. In: Proceedings of the 2000 ACM conference on Computer supported cooperative work, pp 231–240

Chapter 26

A Seismcity Declustering Model Based on Weighted Kernel FCM Along with DPC Algorithm Ashish Sharma and Satyasai Jagannath Nanda

1 Introduction Earthquakes are linked with different types of clusters in the space-time domain that generate complex patterns. These seismic clusters are more predictable around major faults and tectonic boundary regions (Spatial clustering), and linked with aftershocksforeshocks and earthquake swarms [1, 2]. They represent various triggering processes like static and dynamic stress transfer, fluid flow, and seismic mass flow along the faults [3, 4]. The process of categorizing the events into aftershocks-mainshocksforeshocks (clusters) and regular events (backgrounds) is known as seismicity declustering. Declustered catalog is used in many applications like understanding the interaction between active fault lines structure [5], time-dependent probability estimation [6], and in many robust estimations like climatic, tidal, seasonal triggering of seismicity [7], development of seismic hazard maps [8], focal inversion for background stress fields [9], and localization of seismicity before the mainshocks [10]. Segregation of seismic catalogs into clustered and regular events is a complex task due to the high correlation in the spatial-temporal domain as there is no unique solution. Eventually, the final declustered catalogs deviate significantly according to the employed method. Seismic declustering is necessary to remove temporal and spatial bias due to aftershocks that overestimate seismic rates in regions. Researchers also investigated that it is essential to correct seismic rates to compensate for the reduction in rates due to declustering [11]. Many researchers have studied various declustering algorithms to find the observed seismicity [12–15]. These approaches are based on conA. Sharma (B) · S. J. Nanda Department of Electronics and Communication Engineering, Malaviya National Institute of Technology Jaipur, Rajasthan 302017, India e-mail: [email protected] S. J. Nanda e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_26

307

308

A. Sharma and S. J. Nanda

straints derived from the characteristic of space-time patterns of seismicity [16]. A K-means clustering is a well-known clustering algorithm widely used in the seismicity analysis of earthquake regions. Rehman et al. analyzed the seismic events of the Pakistan region and categorized them using K-means clustering [17]. The problem with K-means algorithms is that it does not detect the accurate cluster centroids. It is more challenging to find centroids due to heterogeneous features in the seismic catalog (Latitude, Longitude, Time, and Magnitude). Recently, Zhuang et al. developed an epidemic-type aftershock sequence (ETAS) model for seismicity analysis based on the event’s intensity. Hanzi et al. analyzed the background (BG) events based on inter-event time distribution [18]. Nanda et al. developed a tri-stage clustering model using an iterative K-means algorithm with single distance formula. They used spatial-temporal clustering and magnitude thresholding to segregate the catalogs of earthquake-prone regions. Later, Vijay et al. proposed a tetra-stage model and included a depth parameter to analyze the seismicity [19]. Both tri- and tetrastage models are based on the K-means algorithm. The model fails to provide good results in the case of non-spherical datasets. Recently Vijay et al. proposed a shared nearest neighborhood-based model and categorized the events based on magnitude, event location, and occurrence time of Iran and Philippines region [20]. Florent et al. designed a model based on a random forest train generated by an epidemic-type aftershock sequence model and compared it with classical machine learning models in terms of AF events [21]. Density-based clustering approaches (DBSCAN) are more effective and able to identify the clusters of arbitrary shapes even in the presence of outliers [22]. The main advantage of DBSCAN is that it does not require any prior information about the number of clusters and identifies based on the density of each data point and its surroundings. It requires two parameters, radius “Eps” and a minimum number of points in the neighborhood “MinPts.” Recently, Vijay et al. proposed a variable -DBSCAN algorithm in  made dependent on the magnitude of the event [23]. Evolutionary algorithm-based models like quantum gray wolf optimization model [24], binary NSGA-II model [25], and most recent multi-objective chimp optimization algorithms [26] are developed to solve the seismicity declustering problem. These analyses and results motivate the researchers to build a more efficient and robust model to reduce the complexity associated with seismicity declustering. Recently, authors developed a model based on fuzzy C-means with density peak clustering for seismicity analysis of the Philippines and New Zealand [27]. The major drawback of the fuzzy C-means algorithm is that it is susceptible to noise and outliers and is only suitable for spherical or ellipsoidal clustering. This manuscript proposes a multi-stage systematic approach using events coordinate, time, and magnitude information to estimate aftershock and background events in the catalog accurately. In the first phase, the improved fuzzy kernel clustering algorithm, known as weighted fuzzy kernel clustering (WFCM) algorithm, is introduced to find the potential seismic zones in the spatial domain with sufficient events in each zone. The major earthquake mainshocks are cluster centroids, and the dataset is classified based on the number of mainshocks. Later, each spatial zone is analyzed using weighted density peak temporal clustering using the famous clustering by fast

26 A Seismcity Declustering Model Based on Weighted …

309

search and find of density peaks algorithm [28]. A decision graph is plotted to detect the potential cluster centroid. Every event is allocated to a respective cluster to bifurcate the catalog into AF and BG events. The decision graph shows the distribution of the local density of events and distances on the X and Y axis, respectively. The event having higher local density and distances is chosen cluster centroids in the temporal domain. The decision graph ensures the correct assignment of the event to its corresponding cluster. The proposed model has less computational cost due to fewer mathematical calculations of Euclidean distances. The performance of the proposed model is tested on a historical seismic catalog of Japan and Indonesia with the help of Epicenter Plot, Cumulative Plot, Lambda Plot, and Coefficient of Variance. The rest of the paper is organized as follows: Sect. 2 provides brief details about the earthquake catalogs used in the analysis. Section 3 gives the detailed step-wise procedure of proposed spatio-temporal fuzzy clustering with density peaks to classify the earthquake catalogs. The obtained results are discussed in Sect. 4. The key points of the proposed declustering model are concluded in Sect. 5.

2 Seismic Cataog Used in the Analysis In this paper, the historical earthquake catalog of Japan and Indonesia is used in the analysis to measure the performance of the proposed model. The catalog is downloaded from the official website of the United States Geological Survey (USGS) [29] by setting the input parameters mentioned in Table 1. The brief details of each catalog used in the analysis are as follows. • Japan Catalog is one of the most seismic-active regions globally due to its location at the “Pacific Ring of Fire,” which lies across three tectonic plates, including the Pacific plate under the Pacific Ocean and the Philippine Sea Plate. Here, a total 19510 number of events are used in the analysis from the year 1992 to the year 2022. Japan has seen various devastating earthquakes, like the Great Hanshin earthquake in 1995 and the Tohoku earthquake in March 2011. The Epicenter Plot of the catalog is as shown in Fig. 1a. • Indonesia Catalog Indonesia is the most seismically active area on the planet, with a long history of powerful eruptions and earthquakes due to its existence on the Indian plate and Eurasian plate. A total 18106 number of seismic events are used in the analysis, which comprises significant earthquakes like the Sulawesi earthquake in the year 2018, the Sumatra earthquake the year 2009, and the Java earthquake in the year 2006. Figure 1b represents the Epicenter Plot of the earthquake.

01/01/1990

Indonesia

01/01/2022

01/01/2022

23:59:59

00:00:00

01/01/1990

End time

Start time

Japan

Region Max 41.4◦ 12◦ S

Min

30.9◦

6◦ S

Latitude Min 90◦ E

129.8◦ 120◦ E

143.3◦

Max

Longitude Min 2.5

2.5

9.5

9.3

Max

Magnitude

Table 1 Input parameters to download the earthquake catalog of Japan and Indonesia from USGS [29] Min 0

0

700

570

Max

Depth

18106

19510

events

Number of

310 A. Sharma and S. J. Nanda

26 A Seismcity Declustering Model Based on Weighted …

311

Latitude

40°N

35°N

200 km 200 mi

Earthstar Geographics

130°E

140°E

135°E

145°E

Longitude

Latitude

(a)



10°S

500 km 200 mi

Earthstar Geographics

100°E

110°E

120°E

Longitude

(b) Fig. 1 Epicenter distribution plot of seismic event for a Japan and b Indonesia

3 Proposed Model The main goal of seismicity declustering is to estimate and discriminate between highly dense clusters (AFs) and uniformly distributed BGs, precisely. In this analysis, non-spatio-temporal parameters like magnitude and depth are also considered because seismic clusters strongly depend on them. This manuscript proposes a two-phase clustering model that detects effective seismic clusters in space and time domains. Clustered AF events and uniformly distributed BGs are determined based on density. The complete flowchart of the

312

A. Sharma and S. J. Nanda

proposed model is given in Fig. 2. The step-wise procedure of the proposed model is given in as follows. Step 1: Seismic Catalog The input dataset to the proposed model is the earthquake catalog given as ⎤ ⎡ t1 e1 ⎢ e2 ⎥ ⎢ t2 ⎢ ⎥ ⎢ =⎢ . ⎥=⎢. ⎣ .. ⎦ ⎣ .. ⎡

E N ×D

eN

θ1 θ2 .. .

φ1 m 1 φ2 m 2 .. .. . . tN θN φN m N

d1 d2 .. .

⎤ ⎥ ⎥ ⎥ ⎦

(1)

dN

Any ith event where i = 1, 2, 3, …N in earthquake catalog has information about origin time (t), coordinate location information in terms of latitude (θ ) and longitude (φ), earthquake magnitude (m), and depth (d). N represents the total number of events in the catalog. Step 2: Identification of Shallow and Deep focus Events The seismic events with epicenters near the earth’s surface are more hazardous and generate more AF events than deep focus earthquakes. Here, depth threshold (dth ) of 70 km is applied to segregate the shallow (Sc ) and deep focus (Dc ) catalog, and analysis is carried out separately.

Seismic Catalog

Depth Threshold

Data Processing / Visualization

Shallow Catalog

Deep Catalog

Mainshock Identification Calculate Distance Ds Metric between ti and tj

Parallel Event Selection Ei ϵ Szn

Calculate Density Metric between ti and tj

Euclidean Distance calculation between Event and Mainshocks

Sz1

Sz2

……

Szn

Spatial Seismic Zones Temporal cluster identification using Decision Graph Cluster Label Assignment DPC Clustering Fig. 2 Proposed declustering model

Aftershock Event Identification Background Event Identification

Magnitude Threshold Results Validation Comparison

26 A Seismcity Declustering Model Based on Weighted …

 E N ×D =

ei ∈ Deep Catalog(Dc ) if d ≥ dth ei ∈ Shallow Catalog(Sc ) otherwise

313

(2)

. Step 3: Identification of Mainshocks The mainshocks are those events with higher magnitude, and the epicenter is near the earth’s surface. The spatial seismic zones are identified based on an optimal number of mainshocks. These pre-determined mainshocks are centroids of each spatial zone. It is represented as ⎡ ⎤ e11 e12 e13 · · · e1D ⎢ e21 e22 e23 · · · e2D ⎥ ⎢ ⎥ (3) M K ×D = ⎢ . . . . .. ⎥ . . . . ⎣ . . . . . ⎦ eK 1 eK 2 eK 3 · · · eK D where M K ×D ∈ S f . The K is a pre-determined number of mainshock events. Step 4: Spatial Analysis using Weighted Kernel Fuzzy C-means Algorithm Fuzzy C-means clustering (FCM) is one of the most famous classical fuzzy clusterings. The initial centroids of FCM are selected randomly. The input data to FCM comprises P features, and the output is the matrix (U ) having c rows and n columns where c represents the number of clusters and n represents the number of the dataset in each cluster. The events nearby in space and time are correlated with primary mainshock. The Euclidean distance function measures the similarity between the events. Let us consider earthquake catalog E = (ei , i = 1, 2, 3..., N ) is input to FCM, c is pre-determined number of categories according to number of mainshocks (m) is given interval. u i, j is the membership function where i = 1, 2, 3..., N and j = 1, 2, 3..., N . Then, the distance function is calculated as D(ei , m k ) =

Ns K

ηi,z j [(ei (θ ) − m j (θ ))2 + (ei (φ) − m j (φ))2 ]1/2

(4)

i=1 j=1

D(ei , m k ) =

Ns K

ηi,z j [di, j,θ + di, j,φ ]1/2

(5)

i=1 j=1

Dτ (i, j) = |ti − t j |

(6)

where z represents the constant used to control the degree of fuzzy overlapping. Then, the degree of membership ηi,z j is calculated between jth mainshock and any event ei as

K 2 −1 ||ei (θ, φ) − m j (θ, φ)|| z−1 z (7) ηi, j = ||ei (θ, φ) − m k (θ, φ)|| k=1

314

A. Sharma and S. J. Nanda

Here, j = 1, 2, . . . K number of mainshock and i = 1, 2, . . . Nc number of events in shallow catalog Sc . One of the problems with classical FCM is that it is only suitable for spherical and ellipsoidal clustering and is highly sensitive to an outlier in the dataset. This problem can be effectively solved by employing two new parameters. The first parameter is the kernel function in the clustering. The basic idea is to map the input space into the Rs to high-dimensional feature space (g) using nonlinear transformation. The frequently used nonlinear transformation is kernel function which is radial basis function. The second parameter is weight function (ai ), which allows the algorithm to assign weights to different classes improving the clustering effect. Let the events E N ×D ⊂ Rs is feature data-space in Rq mapped to sample dataset in feature space Rs . The mathematical formulation for weighted kernel fuzzy C-means clustering is given as D(ei , m k ) =

Ns K

ηi,z j aim [(d K ,i, j,θ ) + (d K ,i, j,φ )]1/2

(8)

i=1 j=1

if Kernel function K is selected then the Euclidean distance is calculated between the seismic event vector ei and mainshock event vector m j d K ,i, j,θ = [K (di, j,θ ) + K (di,k,θ )]1/2

(9)

The ai represent dynamic weights. The significance of dynamic weights is that the classes having more elements in the iterative process have more dense concentration and have high importance. The membership degree of the corresponding element will be larger. At the same time, fewer elements in a particular class that are much sparsely distributed have less importance. The ai satisfies the following condition C

ai = 1

(10)

i=1

Step 4: Spatial Seismic Zone Identification Each seismic event is allocated to the different seismic zone (Sz ) based on the distance between the event and mainshock using Eq. 7. (11) Labeli = min(d K ,i, j,θ ) where number of events are i = 1, 2 …N and number of mainshocks j = 1, 2 …m. Then, the seismic zones identified are given as E N ×D =

N1 i=1

E 1 (i, :) +

N2 i=1

E 2 (i, :) +

N3

E 3 (i, :) . . . . . .

i=1

E N ×D = Sz1 + Sz2 + . . . . . . Szm

NT

E m (i, :)

(12)

i=1

(13)

26 A Seismcity Declustering Model Based on Weighted …

315

where Sz represents the seismic zones according to predefined m number of mainshocks. Step 5: Temporal Analysis using Weighted Density Peak clustering In this phase, seismic zones identified in step 4 are further classified based on their density in the temporal domain. The objective is to identify those clustered highly correlated events in the time domain, i.e., events that occurred nearby in time on the same fault line with high intensity. This is carried out with the help of temporal density peak clustering proposed by Rodriguez and Laio [28] with a better weight adjustment mechanism. These methods find the clusters by assuming that points surround centroids with comparatively less local density and high distances from the points have high local density. The key advantage of the algorithm is that it determines the centroid based on density and assigns the rest of the points to the corresponding cluster with high density and nearest neighbor approach. It also identifies the clusters irrespective of their shape and dimensions. This procedure determines two parameters for every spatial zone (Sz ) first is local density ρi and distance δi between higher density event and and ith event of corresponding spatial zone (Sz ). The local density of each event in the specific spatial zone is determined with the help of the Gaussian kernel function using the time and magnitude information of each event in (Sz ) given as ρi =



2 ds (ti , t j ) exp − dc2

(14)

The dc is a critical parameter to perform robust clustering. Its value is around 1–2 % of total events present in seismic zone Sz . Then, the weighted local density ρiw for any ith event is determined using the magnitude information of each event in seismic zone Sz given as (15) ρiw = ρi × Mi where Mi =

Mi Max(M Sz )

(16)

where Mi is magnitude of ith seismic event and M Sz is maximum magnitude of event in seismic zone Sz . The distance ρi distance is calculated by finding the minimum distance between events and any other events having high local weighted density given as  δi =

Min dτ (ti , t j ), ifρ wj ≥ ρiw Max dτ (ti , t j ), otherwise

(17)

On the basis of ρ w and δi , a decision graph is plotted to find high density and larger distance points. These points are considered cluster centroids in the temporal domain.

316

A. Sharma and S. J. Nanda

Step 6: Identification of Background Events The AF is identified based on their density. The density of each point in the seismic zone is compared with the density of the clusters ρ c . Those points having a higher density than ρ c are assigned to a cluster, and the rest are considered BG events. This procedure effectively segregates the events located nearby in the space-time domain considered AF events. The rest of the events not considered as part of any clusters are BG events. Step 7: Magnitude thresholding of Deep catalog In this stage, magnitude thresholding is applied to deep seismic catalog Dc X ×D where X is the number of events. The events with magnitude intensity higher than the specific threshold value are considered AF events.  e ∈ AF if Mei ≥ M¯ Dc X ×Y = i (18) ei ∈ BG Otherwise where M¯ is mean value of the magnitude for deep focus catalog Dc X ×D . The identified aftershock events in spatial-temporal domain.

4 Result and Discussion In this section, the performance of the proposed model tested on earthquake catalogs is explained in Sect. 2. The proposed model is tested on both the catalogs and classified in AF and BG events. The higher magnitude earthquake events are identified based on their magnitude intensity. These events are considered cluster centroids for each cluster. These cluster centroids are represented as black stars, as shown in Fig. 3a, b. A total of 8 and 10 cluster centroids are identified using the WKFCM algorithm for Japan and Indonesia, respectively. Based on spatial information of centroids, events are classified into different spatial zones by applying the procedure mentioned in step 3. The identified spatial zones for Japan and Indonesia are represented with different colors in Fig. 3a, b. After categorizing the events into respective spatial zones, a weighted density peak clustering algorithm is applied for both catalogs. Initially, the unweighted temporal density is determined using Eq. 12. Here, the value of dc = 0.1 is considered in the analysis. In spatio-temporal analysis, the density of a number of events directly depends on magnitude. So the weighted local density ρ w is identified using Eq. 13. A decision graph is drawn between distance δ, and weighted local density ρ represents the higher magnitude earthquake events in the temporal domain as shown in Fig. 3c, d. It is observed from Fig. 3c, d, events having higher distance δ and weighted local density ρ are clearly separable from the rest of the events. The events having high local density represent the cluster centroids in the temporal domain. In spatio-temporal analysis of the seismic events, the clusters may overlap due to occurrence at the same spatial location but distant in the temporal domain. The overlap clusters have a low δ value in both space and time. In the proposed model, spatial separation between the events is performed using the WKFCM algorithm; then, time density peak clustering finds the events nearby in

26 A Seismcity Declustering Model Based on Weighted …

317

40°N

Latitude

Latitude



35°N

10°S 500 km

200 km 200 mi

200 mi

Esri, HERE, Garmin, FAO, NOAA, USGS

130°E

135°E

140°E

Esri, HERE, Garmin, FAO, NOAA, USGS

100°E

145°E

110°E

Longitude

Longitude

(a)

(b)

12

120°E

15

10 10 Distance

Distance

8 6 4

5

2 0

0

5

10

15

20

25

30

0

0

2

4

6

8

10

12

Weighted Local Density

Weighted local density

(c)

(d)

14

16

Fig. 3 Results obtained from proposed model; spatial seismic zone identified for a Japan and b California; decision graph to identify potential seismic events in temporal domain for c Japan and d California

time. In the spatial domain, clusters are already defined according to mainshocks to avoid overlapping. Then, density peak clustering finds the non-overlapping centroids according to the decision graph and avoids merging clusters in the time domain even if they present the same spatial location. After this, the events are classified as AF and BG events based on the density of the cluster centroid, as mentioned in step 6 of Sect. 3. The results obtained are explained in the following subsections.

4.1 Epicenter Plot The events classified in AF and BG for the Japan and Indonesia catalog are depicted in Fig. 4a, b, respectively. It has been observed from Figs that the aftershock events (black dots) are highly dense and compact near the location of mainshocks. Aftershocks mainly exist at the fault boundaries where several mainshock events occur.

318

A. Sharma and S. J. Nanda

Latitude

Latitude

40°N



35°N

10°S

200 km 200 mi

130°E

Esri, HERE, Garmin, FAO, NOAA, USGS

135°E

140°E

145°E

500 km 200 mi

Esri, HERE, Garmin, FAO, NOAA, USGS

100°E

110°E

Longitude

Longitude

(a)

(b)

120°E

Fig. 4 Epicenter distribution plot of seismic event with depth for a Japan and b Indonesia

The events not associated with mainshocks are considered BG events represented by gray dots in Fig. 4 for Japan and Indonesia. It has been observed that BGs are uniformly distributed across the entire region for both the catalogs. The BG events are not associated with any significant event and show the absence of a dense region.

4.2 Cumulative and Lambda Plot Cumulative and Lambda Plots are essential measures to test the effectiveness of the proposed model in terms of total events, AF, and BG events. A Cumulative Plot represents a cumulative sum of events with respect to specific time interval. Lambda Plot signifies the total number of occasions concerning a given period. Cumulative and Lambda Plots in terms of total events, clustered aftershocks, and non-clustered background event are shown in Fig. 5. It is observed from Fig. 5a, b that the cumulative rate for BG events (gray curve) follows the linear trend concerning time. It reveals that all the events that occurred over the period across the entire region follow the uniform distribution. The characteristics of aftershock events (black curve) follow the exact pattern of total events (pink curve). The non-uniform characteristics of AF events for both the catalogs and similar patterns between AF events and true events, as shown in Fig. 5a, b, show that the events are efficiently segregated using the proposed model. Figure 5c, d shows the occurrence rate of a number of events in a year, known as the Lambda Plot. It is evident that the seismicity rate in the case of BG events (gray line) is uniformly distributed and does not deviate even in the presence of a significant seismic event. It reveals that the BG seismicity rate is independent of mainshock events and becomes stationary throughout the interval. It is also observed that AF events show the non-uniform seismicity rate and presence of significant

26 A Seismcity Declustering Model Based on Weighted … 10

4 Total Events Clustered AFs Non Clustered BGs

1.5

1

0.5

0

10

2

Cumulative No of Events

Cumulative No of Events

2

10

15

20

25

4

Total Events Clustered AFs Non Clustered BGs

1.5

1

0.5

0 5

319

5

30

10

15

(a)

25

30

(b) 3000

4000 3500

20

No of Years

No of Years

Total Events Clustered AFs Non Clustered BGs

Total Events Clustered AFs Non Clustered BGs

2500

No of Events

No of Events

3000 2500 2000 1500

2000 1500 1000

1000 500 500 0

0 5

10

15

20

25

30

5

10

15

20

25

30

No of Years

No of Years

(d)

(c)

Fig. 5 Epicenter distribution plot of seismic event with depth for a Japan and b Indonesia

peaks at the time of mainshocks at years 11 and 22 in Fig. 5c and years 5, 11, and 16 in Fig. 5d. The characteristics of AF (pink curve) and true events (black curve) follow a similar trend, indicating the proposed model’s potential for segregation of events.

4.3 Temporal Seismicity Analysis Time domain analysis of all the events is performed using Coefficient of Variance (COVT ) [30]. It is the standard deviation to mean ratio for a given inter-event time (τ ). The inter-event time for consecutive earthquake events is given as τ = Ti − Ti+1

∀i = 1, 2, . . . N

(19)

320

A. Sharma and S. J. Nanda

Then, COVT will be determined as  COVT =

E[τ 2 ] − (E[τ ])2 E[τ ]

(20)

E represents the average of inter-event time within given interval. The value of COVT seggregates the events in three categories. • In case of periodic time series, τ will be constant and COVT = Zero. • If COVT  1, time series follow the Poisson distribution then τ varies exponentially. • If COVT ≥ 1, time series follow power law distribution with τ growing along with time. The value of COVT for AF, BG, and Total events is mentioned in Table 2. The obtained results in terms of COV are described following subsection.

4.4 Comparative Analysis with State-of-the-Art Declustering Techniques For many years, various researchers and seismologists have made various attempts to classify earthquake catalogs. Gardner and Knopoff [12] developed a spatio-temporal window technique and analyzed different magnitude ranges to identify AF and BG events. The event that falls within the time window is considered AFs, and the rest are treated as BG events. Raesenberg [13] classified the earthquake sequences based on interaction zones in space-time domain. Spatial extent is determined near mainshocks using stress distribution, and time extent is identified using Omori’s law. The results of these algorithms are highly dependent on the default parameter setting. Here, the performance is also compared with the Uhrhammer window [14] technique and recently developed tetra-stage model [19] in terms of number of clusters, number of classified events, and C O VT . The results obtained from each algorithm are given in Table 2. Gardner’s method detects more AF events and less BG. The number of clusters is also higher compared to other methods. The results obtained from Uhrahemmer and Raesenber methods are opposite, where Uhrahemmer identifies higher AF (less than GK method) events and Raesenberg detects less number of AF. It shows that all these methods give inconsistent results. The results obtained from the tetra-stage model are more promising, but the high value of C O VBG is contradictory. The value of C O VBG is near the unity, and high statistical values of C O V AF show the superiority of the proposed model.

Total Events

19510

18106

Catalog

Japan

Indonesia

Gardner Knopff 12336 7174 1210 4.63 5.69 2.69 10522 8988 806 3.26 4.31 2.01

Methods

AF BG Clusters COVT COV AF COV BG AF BG Clusters COVT COV AF COV BG

13209 6301 1080 3.78 5.87 2.87 12127 7383 995 4.23 6.52 3.2

Gruenthal Window 9527 9989 1170 4.82 6.31 3.25 11356 6780 910 4.56 6.83 2.99

Uhrahemmer Method

Table 2 A comparative analysis between proposed model and benchmark declustering algorithms

7916 11594 889 4.51 5.87 3.62 13289 4817 827 3.91 5.49 2.76

Raesenberg Method 8510 10984 760 3.87 4.89 1.96 8726 9380 886 2.89 4.28 1.89

Tetra-stage Model

8782 10728 802 3.34 5.21 1.12 8951 9155 851 3.29 4.51 1.25

Proposed Model

26 A Seismcity Declustering Model Based on Weighted … 321

322

A. Sharma and S. J. Nanda

5 Conclusion In this manuscript, a two-phase space-time clustering is reported for the segregation of aftershocks and background events in the seismic catalog. A WKFCM clustering is applied in the spatial domain with a predefined number of mainshocks to determine the potential seismic zones. Then, Gaussian kernel-based density is estimated in the temporal domain. A magnitude-based weighted strategy is applied in the decision graph to identify the mainshocks in the time domain. This multi-stage clustering approach is used to decluster the seismicity of the Japan and Indonesia region. The results obtained from the proposed model are evaluated in terms of Cumulative Plot, Lambda Plot, Epicenter Plot, number of clusters, and Coefficient of Variance. The results reveal that the proposed model efficiently decluster the seismicity and outperforms the other conventional methods.

References 1. Yehuda B-Z (2008) Collective behavior of earthquakes and faults: continuum-discrete transitions, progressive evolutionary changes, and different dynamic regimes. Rev Geophys 46(4) 2. Utsu T (2002) Statistical features of seismicity. Int Geophys Ser 81(A):719–732 3. Lengliné O, Enescu B, Peng Z, Shiomi K (2012) Decay and expansion of the early aftershock activity following the 2011, mw9. 0 Tohoku earthquake. Geophys Res Lett 39(18) 4. Ross et al (2017) Aftershocks driven by afterslip and fluid pressure sweeping through a faultfracture mesh. Geophys Res Lett 44(16):8260–8267 5. Ruhl CJ, Abercrombie RE, Smith KD, Zaliapin I (2016) Complex spatiotemporal evolution of the 2008 mw 4.9 mogul earthquake swarm (reno, Nevada): interplay of fluid and faulting. J Geophys Res Solid Earth 121(11):8196–8216 6. Edward et al (2017) A spatiotemporal clustering model for the third uniform California earthquake rupture forecast (ucerf3-etas): toward an operational earthquake forecast. Bull Seismol Soc Am 107(3):1049–1081 7. Johnson CW, Fu Y, Bürgmann R (2017) Stress models of the annual hydrospheric, atmospheric, thermal, and tidal loading cycles on California faults: perturbation of background stress and changes in seismicity. J Geophys Res Solid Earth 122(12):10–605 8. Irsyam et al (2020) Development of the 2017 national seismic hazard maps of Indonesia. Earthquake Spectra 36(1_suppl):112–136 9. Petersen et al (2017) 2017 one-year seismic-hazard forecast for the central and eastern united states from induced and natural earthquakes. Seismol Res Lett 88(3):772–783 10. Ben-Zion Y, Zaliapin I (2020) Localization and coalescence of seismicity before large earthquakes. Geophys J Int 223(1):561–583 11. Eroglu Azak T, Kalafat D, Se¸ ¸ setyan K, Demircio˘glu MB. Effects of seismic declustering on seismic hazard assessment: a sensitivity study using the Turkish earthquake catalogue. Bull Earthquake Eng 16(8):3339–3366 12. Gardner JK, Knopoff L (1974) Is the sequence of earthquakes in southern California, with aftershocks removed, Poissonian? Bull Seismol Soc Am 64(5):1363–1367 13. Reasenberg P (1985) Second-order moment of central California seismicity, 1969–1982. J Geophys Res Solid Earth 90(B7):5479–5495 14. Uhrhammer RA (1986) Characteristics of northern and central California seismicity. Earthquake Notes 57(1):21

26 A Seismcity Declustering Model Based on Weighted …

323

15. Knopoff L (2000) The magnitude distribution of declustered earthquakes in southern California. Proc Nat Acad Sci 97(22):11880–11884 16. Utsu T (1969) Aftershocks and earthquake statistics (1)-some parameters which characterize an aftershock sequence and their interrelations. J Fac Hokkaido Univ Ser 7, 3:125–195 17. Rehman K, Burton PW, Weatherill GA (2014) K-means cluster analysis and seismicity partitioning for Pakistan. J Seismol 18(3):401–419 18. Hainzl S, Scherbaum F, Beauval C (2006) Estimating background activity based on intereventtime distribution. Bull Seismol Soc Am 96(1):313–320 19. Vijay RK, Nanda SJ (2017) Tetra-stage cluster identification model to analyse the seismic activities of japan, Himalaya and Taiwan. IET Signal Process 12(1):95–103 20. Vijay RK, Nanda SJ (2019) Shared nearest neighborhood intensity based declustering model for analysis of spatio-temporal seismicity. IEEE J Sel Top Appl Earth Observ Remote Sens 12(5):1619–1627 21. Aden-Antoniow F, Frank WB, Seydoux L (2021) Transfer learning to build a scalable model for the declustering of earthquake catalogs 22. Ester M, Kriegel H-P, Sander J, Xiaowei X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD 96:226–231 23. Vijay RK, Nanda SJ (2019) A variable epsilon-DBscan algorithm for declustering earthquake catalogs. In: Soft computing for problem solving. Springer, pp 639–651 24. Vijay RK, Nanda SJ (2019) A quantum grey wolf optimizer based declustering model for analysis of earthquake catalogs in an ergodic framework. J Comput Sci 36:101019 25. Sharma A, Nanda SJ, Vijay RK (2021) A binary NSGA-II model for de-clustering seismicity of Turkey and Chile. In: 2021 IEEE congress on evolutionary computation (CEC). IEEE, pp 981–988 26. Sharma A, Nanda SJ (2022) A multi-objective chimp optimization algorithm for seismicity de-clustering. Appl Soft Comput, 108742 27. Sharma A, Nanda SJ, Vijay RK (2021) A model based on fuzzy c-means with density peak clustering for seismicity analysis of earthquake prone regions. In: Soft computing for problem solving. Springer, pp 173–185 28. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496 29. United State Geological Survey. https://earthquake.usgs.gov/earthquakes/search/ (2022) 30. Bottiglieri M, Lippiello E, Godano C, De Arcangelis L (2009) Identification and spatiotemporal organization of aftershocks. J Geophys Res Solid Earth 114(B3)

Chapter 27

Wearable Small, Narrow Band, Conformal, Low-Profile Antenna with Defected Ground for Medical Devices Archana Tiwari and A. A. Khurshid

1 Introduction The emergence of antennas has led to many revolutionary discoveries in the field of defense, health care, communication, etc. Considering the field of health care, the need for microstrip antennas is rising magnanimously and is progressing at a very rapid speed. The need for the up-gradation of features and reduction in the size of biomedical devices is increasing to a greater extent day by day. This technological enhancement necessitates a miniaturized form of antenna that can be suitable for medical applications. The microstrip monopole antenna used in microwave imaging is a less-expensive solution for detecting diseases at early stages [1, 2]. These microstrip antennas have a huge potential for further development and also play an important role in wireless applications due to their comparatively greater bandwidth, higher directivity, low profile, easy fabrication, and integration [3, 4]. The rapid progress in body area network (BAN) [5, 6] has resulted in major research success in the past years due to their encouraging applications. Since these antennas are positioned in the near vicinity of the human body, the lossy tissues create a loading effect, and therefore, efficient antenna design is challenging [7]. This work introduces a compact antenna, which can be used for onbody/biomedical telemetry applications. It has been verified through the parametric optimization of the antenna and other performances; the proposed miniaturized antenna meets the requirements of radiation pattern, bandwidth, and frequency. The paper presents the literature review in Sect. 2, describes the design and analysis A. Tiwari (B) · A. A. Khurshid Electronics Engineering Department, Shri. Ramdeobaba College of Engineering and Management, Nagpur 440013, India e-mail: [email protected] A. A. Khurshid e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_27

325

326

A. Tiwari and A. A. Khurshid

in Sect. 3, and the fabrication results are tabled in Sect. 4. Section 5 presents the comparison of proposed design with the work of other researchers as cited in the literature and also the concluding remarks.

2 Literature Review In the process of designing, several research works related to techniques for miniaturization, and structures for its suitability for medical devices were reviewed, and the relevant literature is described below. The compact monopole patch antennas can be used for a variety of applications in ISM band (2.4–5.8 GHz) and therefore has attracted the interest of researchers. The miniaturized shape made these antennas suitable for embedding directly into biomedical and communication devices [8–10]. Al-Zoubi et al. [11] proposed a circular microstrip patch antenna with a ring-shaped patch with a return loss of − 35 dB, bandwidth of 12.8%, and simulated gain of 5.7 dBi at 5.8 GHz. Peng et al. [12] proposed monopole patch antenna with three stubs. The antenna was excited by a microstrip line, and the peak antenna gain 1.90–2.16 dBi for 2.4 GHz band and 3.30–3.85 dBi for 5 GHz band was obtained. Liu et al. [13] presented the microstrip monopole antenna with a circular patch. The bandwidth of 18% with a gain of 6dBi was achieved. Rahaman et al. [14] designed a compact microstrip wideband antenna targeting resonant frequency of 2.45 GHz, return loss of −48.99 dB, having bandwidth 900 MHz. The dimensions of the proposed design were 30 * 40 * 1.76 mm2 , and the simulated gain achieved was equal to 4.59 dBi. Yang and Xiao [15] designed an implantable antenna using single-feed and wide bandwidth that operates at 2.4 GHz with the antenna dimensions of 11 * 7.6 * 0.635 mm3 . The bandwidth ranged from 2.24 to 2.59 GHz with a peak gain of 20.8 dBi. Rahaman and Hossain [16] proposed a compact open-end slot feed microstrip patch antenna targeting a resonant frequency of 2.45 GHz and could achieve a return loss of −46.64 dB, bandwidth of 16%, and gain of 7.2 dBi. Different structures have been analyzed to be applicable for their use as wearable antennas, together with perpendicular monopoles [7], and microstrip monopole planar antennas [12, 13]. The planar monopole antennas have a small area; but, significant energy goes into the human body due to the radiation properties being omnidirectional. This work focuses on an efficient narrowband, low-profile, and small form factor antenna design. Since monopole antennas are simple to design, efficient, and have relatively high reactive impedance over the frequency range, their suitability can be further explored for on-body medical devices. Though in an isolated chamber, monopole antenna impedances vary; impedance matching can be controlled by the designer without the need for extraneous matching components. With the use of emerging techniques of defected ground structure, the antenna parameters can be improved [17–19].

27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna …

327

From the literature survey, it can be concluded that miniaturization of antennas is a promising approach to enhance scalability of wearable devices, and the merits of monopoles can be utilized to explore its suitability. Hence, this work is directed to design a narrowband, low-profile, efficient monopole antenna for the ISM band.

3 Antenna Design and Analysis In order to achieve narrow bandwidth and compactness, the inverted G-shape patch with a meandering element with full ground plane, half ground plane, and defected ground structure was simulated using FR4 as substrate material. Antenna with a meandering line is used to transform the design of monopole antenna. The variations were experimented with the objective to increase the radiation part and decrease electric length so as to achieve miniaturization. The experience of each simulation is combined with the next to achieve the desired results. With parametric variations, the optimal design was derived with dimensions 23 * 20 * 1.6 mm3 including the finite ground plane of dimensions 8 * 20 mm2 is found to be best suited using HFSS. The defect on the ground is etched at the bottom side as shown in Fig. 1. The substrate dielectric constant of 4.4 with a thickness 1.6 mm was used for simulation. Table 1 depicts the derived parameters of the proposed antenna in Fig. 1. The designed inverted G-shape patch with a meandering element is an inset fed antenna. Using the technique from [17] on defected ground, it is inferred that the increase in length of slots on the ground plane the resonant frequency is reduced.

Fig. 1 Proposed antenna design view: front and back

328 Table 1 Dimension of the antenna

A. Tiwari and A. A. Khurshid Antenna parameters

Variables

Size (mm)

Patch

M A1 A2 A3 A4 A5 A6 A7 A8

3 11 8 3 7.5 1 5 2 0.8

Ground

Q Z Zg Q2

20 23 8 20

Slot

c1 c2

2.5 3

Therefore, using Optimetrics, the slot is adjusted with changes in its dimensions to achieve frequency of 2.54 GHz. It has been observed that, with the use of defects on the ground plane, current distribution is disturbed which depends on the dimension and shape of the defect thereby leading to changes in the input impedance. Thus, through variations of slots, the excitation and wave propagation through the substrate is controlled, and a better degree of compactness is obtained. The variations are shown in Table 2, and after multiple variations, the design-6 having slot size of 3 * 2.5 mm2 achieved the desired return loss as shown in Fig. 2. The gain of the proposed design antenna is found to be 3.8 dBi as shown in Figs. 3 and 4, with directivity equal to −3 dB. Figure 5 shows the impedance Smith chart plot of the proposed design which is found to be matching to 50- impedance. The defected ground structure has improved the radiation without the use of additional circuits.

4 Fabrication Results and Analysis The proposed design-6 (Table 2) is fabricated using low-profile FR4 substrate material with thickness 1.6 mm with the dielectric constant of 4.4. The fabricated antenna front and back view is presented in Fig. 6. Figure 7 shows the return loss vs frequency plot and Smith chart plot for the fabricated antenna. Table 3 compares the simulated and fabricated design, and the return loss for both simulated and fabricated design is found to be the same as −16.54 dB. It is observed that the resonance occurs at 2.27 GHz for the fabricated antenna. It can be concluded that the proposed antenna is having good compromise between the simulated and fabricated results. The actual measurement data are improved

27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna …

329

Table 2 Ground plane and slot variations of antenna Antenna design

Ground dimensions (Zg * Q2) in mm

Slot dimensions (c2 * c1) in mm

Frequency in GHz

Return loss in dB

Gain in dBi

Design 1

11 * 20

4*4

3.5

−10.001

Design 2

11 * 20

3*3

3.5

−17.57

Design 3

10 * 20

3*3

3

−8.84

Design 4

9 * 20

3*3

3.5

−0.0004

1.21

Design 5

8 * 20

3*3

2.54

−14.24

1.21

Design 6

8 * 20

3 * 2.5

2.54

−16.54

3.89

Design 7

7 * 20

3 * 2.5

3.5

−10.29

0.99

Design 8

6 * 20

3 * 2.5

3

−9

0.99

0.15 0.07 −11.79

Fig. 2 Return loss plot of proposed design

Fig. 3 2D gain plot of proposed design

in comparison to simulation data. The meandered line model and defected ground technique are able to solve the problems. Table 4 compares the antenna parameters with the work done by other researchers, indicating that the proposed design and the technique used enables miniaturization. The design observes the required conditions

330

A. Tiwari and A. A. Khurshid

Fig. 4 3D gain plot of proposed design

Fig. 5 Smith chart plot of proposed design

Fig. 6 Front and back view of fabricated antenna

of compacted structure meeting the set performance objective, thus providing an improved solution as a reference for future designers.

27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna …

331

Fig. 7 Return loss plot and Smith chart plot of fabricated antenna Table 3 Comparison of simulated and fabricated design Sr. No.

Parameter of comparison

Simulation results

Fabrication results

1

Return loss

−16.54 dB

−16.54 dB

2

Frequency

2.54 GHz

2.27 GHz

3

Bandwidth

60 MHz

130 MHz

4

Impedance

50  (approx.)

60 

Table 4 Proposed antenna work and existing work comparison Paper reference

[20]

[21]

[22]

[23]

Proposed antenna

Substrate material

FR4

Rogers RO4003C

Rogers Arlon DiClad 880

FR4

FR4

Dielectric constant

4.3

3.38

2.2

4.4

4.4

Frequency 2.2 (in GHz)

2.53

2.4

2.6

2.54

Size of antenna (in mm3 )

45 * 38 * 0.065 40 * 41.5 * 0.508 57 * 46 * 0.07 31.3 * 34.9 * 1.6 23 * 20 * 1.6

Return loss (in dB)

−19

−21

> −10

> −10

−16.54

332

A. Tiwari and A. A. Khurshid

5 Conclusion The inverted G-shaped antenna with a meandering element with a monopole patch has been proposed and analyzed in this paper. The miniaturized design is comprehended with a high dielectric constant FR4 substrate and a meandered line, thereby controlling the size. The fundamental constraints were modeled and estimated with HFSS software, and the results of the fabricated design with the defected ground structure are found to be matching. The effects of using defected ground structure have successfully experimented with different dimensions, and its role in dimension reduction is successfully demonstrated. The performance of different designs is evaluated on the basis of their radiation, bandwidth, and return loss characteristics. The proposed design provides a good compromise between volume, bandwidth, and efficiency. It could be concluded that the proposed design can be used for medical devices. The fixed substrate material and thickness limit the work. In the future, substrate materials can be used to achieve miniaturization.

References 1. Ahadi M, Nourinia J, Ghobadi C (2021) Square monopole antenna application in localization of tumors in three dimensions by confocal microwave imaging for breast cancer detection: experimental measurement. Wirel Pers Commun 116:2391–2409. https://doi.org/10.1007/s11 277-020-07801-5 2. Rodriguez-Duarte DO, Tobón Vasquez JA, Scapaticci R, Crocco L, Vipiana F (2021) Assessing a microwave imaging system for brain stroke monitoring via high fidelity numerical modelling. IEEE J Electromagn RF Microw Med Biol 5(3) 3. Zhang ZY, Fu G, Gong SX, Zuo SL, Lu QY (2010) Sleeve monopole antenna for DVB-H applications. Electron Lett 46:879–880. https://doi.org/10.1049/el.2010.1035 4. Ahmad S, Paracha KN, Ali Sheikh Y, Ghaffar A, Dawood Butt A, Alibakhshikenari M, Soh PJ, Khan S, Falcone F (2021) A metasurface-based single-layered compact AMC-backed dualband antenna for off-body IoT devices. IEEE Access 9 5. Hall PS, Hao Y (2012) Antenna and propagation for body-centric wireless communications. Artech House 6. Jiang ZH, Cui Z, Yue T, Zhu Y, Werner DH (2017) Compact, highly efficient, and fully flexible circularly polarized antenna enabled by silver nanowires for wireless body-area networks. IEEE Trans Biomed Circ Syst 11(4) 7. Hall PS (2007) Antennas and propagation for on-body communication systems. IEEE Antenn Propag Mag 49:41–58 8. Ammann MJ, Chen ZN (2003) A wide-band shorted planar monopole with Bevel. IEEE Trans Antenn Propag 51:901–903. https://doi.org/10.1109/TAP.2003.811061 9. Suh SY, Stutzman W, Davis WA (2018) A new ultrawideband printed monopole antenna: the planar inverted cone antenna (PICA). IEEE Trans Antenn Propag 52:1361–1364. https://doi. org/10.1109/TAP.2004.827529 10. Elsheakh D, Elsadek HA, Abdallah E, Elhenawy H, Iskander MF (2009) Enhancement of microstrip monopole antenna bandwidth by using EBG structures. IEEE Trans Antenn Wirel Propag Lett 8:959–962. https://doi.org/10.1109/LAWP.2009.2030375 11. Al-Zoubi A, Yang F, Kishk A (2009) A broadband center-fed circular patch-ring antenna with a monopole like radiation pattern. IEEE Trans Antenn Propag 57(3):789–792. https://doi.org/ 10.1109/TAP.2008.2011406,March

27 Wearable Small, Narrow Band, Conformal, Low-Profile Antenna …

333

12. Peng L, Ruan CL (2007) A microstrip fed monopole patch antenna with three stubs for dualband WLAN applications. J Electromagn Waves Appl 21:2359–2369. https://doi.org/10.1163/ 156939307783134263 13. Liu J, Xue J, Wong Q, Lai HW, Long Y (2013) Design and analysis of a low-profile and broadband microstrip monopolar patch antenna. IEEE Trans Antenn Propag 61:11–18. https:// doi.org/10.1109/TAP.2012.2214996 14. Rahaman A, Hossain QD (2018) Design of a miniature microstrip wide band antenna for on-body biomedical telemetry. In: International conference on smart systems and inventive technology (ICSSIT 2018), IEEE Xplore Part Number: CFP 18P17-ART, ISBN: 978-1-53865873-4 15. Yang ZJ, Xiao S (2018) A wideband implantable antenna for 2.4 GHz ISM band biomedical application. National Natural Science Foundation of China under Grant 61331007 and 61731005, IEEE 978-1-5386-1851-6/18 16. Anisur Rahaman M, Hossain QD (2019) Design and overall performance analysis of an open-end slot feed miniature microstrip antenna for on-body biomedical applications. In: International conference on robotics, electrical and signal processing techniques (ICREST) 17. Yi N et al (2010) Characterization of narrowband communication channels on the human body at 2.45 GHz. IET Microw. Antenn Propag 4:722–732 18. Khandelwal MK, Kanaujia BK, Kumar S (2017) Defected ground structure: fundamentals, analysis, and applications in modern wireless trends. Int J Antenn Propag 2018527:22 19. Abdel Halim AS (2019) Low-profile wideband linear polarized patch antenna using metasurface: design and characterization. Res Rev J Eng Technol. ISSN: 2319-9873 20. Zhang H, Chen D, Zhao C (2020) A novel printed monopole antenna with folded stepped impedance resonator loading. IEEE Access 8:146831–146837 21. Zhang H, Chen D, Zhao C (2020) A novel printed monopole antenna with stepped impedance hairpin resonator loading. IEEE Access 8:96975–96980 22. Johnson AD, Manohar V, Venkatakrishnan SB, Volakis JL (2020) Low-cost S-band reconfigurable monopole/patch antenna for CubeSats. IEEE Open J Antenn Propog 1:598–603 23. Modak S, Khan T, Laskar RH (2020) Penta-notched UWB monopole antenna using EBG structures and fork-shaped slots. Radio Sci 55:01–11

Chapter 28

A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense Circular Polarization Antenna for 5G/Wi-MAX and C-Band Satellite Applications Krishna Chennakesava Rao Madaka and Pachiyannan Muthusamy

1 Introduction In mobile communication systems, the positions of transmitter and receiver are not essentially fixed; rather their positions are prone to be continuously changing concerning each other. This may result in zero signal reception due to polarization mismatching if conventional linear polarized (LP) antennas are used. Circularly polarized (CP) radiations are more immune to polarization mismatch losses, multipath fading and antenna orientations. Multiband circular polarized antenna provides an improved communication link with reduced antenna size. Various techniques have been reported to implement multiband dual-sense antenna accounting for polarization diversity. In [1] C-shaped grounded stub in [2], L-patch with grounded rectangular stubs in a square slot, are used to achieve dual-sense characteristics, in [3] parasitic elements are introduced, in [4], a dielectric resonator has been loaded with the circular patch, in [5], a cylindrical dielectric resonator with truncated notches and a pair of arc shape slots, in [6] a rectangular DRA with asymmetrical square ring, are used to obtain dual-sense polarization. The use of asymmetric resonators excited by surface integrated waveguide has been studied in [7], two-port feeding technique is studied in [8], a circular slot and a tri strip embedded corner truncated rectangular patch [9] a dual polarized monopole antenna with parasitic annular ring in a quasi pentagonal slot [10], a slanted patch in an asymmetric square slot and ground [11] are studied, to obtain a dual-sense nature. However, all these reported techniques have the major constraint of large antenna size and complex antenna design. In this proposed work K. C. R. Madaka (B) · P. Muthusamy Vignan’s Foundation for Science, Technology and Research, Vadlamudi, Andhra Pradesh 522213, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_28

335

336

K. C. R. Madaka and P. Muthusamy

single-port antenna is studied to obtain dual-band dual-sense (DBDS) characteristics using a grounded annular ring, which is suitable to use for 5G communication (2.7–3.7 GHz) and satellite communication (6.8–7.6 GHz) applications [12–14].

2 Antenna Design and Analysis The schematic representation and the geometrical dimensions (mm) of the proposed (30 mm × 30 mm) antenna are illustrated in Fig. 1 and Table 1, respectively. A square slot is engraved in the ground plane and a slitted rectangular patch is placed in it. The radiating patch is excited by a 50  feed of 3.2 mm line width and is located at 0.4 mm from the ground plane. Fire Retardant substrate FR4 of 1.6 mm height, dielectric constant 4.4 and loss tangent value of 0.02 is used to realize this antenna. λg/2 auxiliary stubs are attached to the rectangular patch to get two orthogonal field components of the same amplitude with phase quadrature.

a b

Fig. 1 a Proposed antenna geometry. b Fabricated antenna

Table 1 Geometrical dimensions l1

l2

lg

lpp

ls

l

w

H

wf

wp

f

r

26.2

16.1

4.2

8.1

8.1

30

30

1.6

5.2

1.4

3.2

2

28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense …

337

2.1 Prototype Analysis Hierarchical steps in the improvement of the antenna are demonstrated in Fig. 2 with antenna prototypes, their resultant return loss values and axial ratio (AR) values are sketched in Figs. 3 and 4, respectively.

Fig. 2 Prototype analysis

Fig. 3 Comparison of S 11

338

K. C. R. Madaka and P. Muthusamy

Fig. 4 Comparison of AR

The design is initiated by engraving a modified square slot with a narrow slit in the quasi ground plane and placing a rectangular patch, as portrayed in prototype I. It resonates with poor return loss characteristics and axial ratio (AR > 30 dB). By controlling the flared ground plane at the feeding port, the input impedance is gradually transformed. In prototype II, two half wavelength rectangular stubs are connected to the patch at its diagonal opposite ends, to get the orthogonal field components with 900 phase difference. Thus it results in the improvement in impedance matching and also causes dual-band with linear polarization characteristics with AR values ranging between 7.1 and 10.4 dB. The 3 dB axial ratio bandwidth (ARBW) is tuned to improve CP, by connecting another half waveguide length stub parallel to the auxiliary stub, as shown by prototype III. Three parallel slits of width 0.3 mm are etched in the patch and a semicircular ring is attached as shown in prototype IV and prototype V, respectively, for further improvement of return loss in lower band. The dual-sensing CP nature is obtained with the intruded annular ring in the square slot as figured in prototype VI. From Fig. 3, it is evident that the proposed prototype VI antenna resonates with good return loss characteristics in both resonating bands and provides a wide impedance bandwidth extending from 2.6 to 3.9 GHz and 6.6–8.7 GHz. From Fig. 4 it is observed that prototype I and prototype II are linearly polarized. The CP nature is introduced from prototype III. The axial ratios observed in the prototypes III, IV and V are poor in the lower band. The axial ratios in both the resonating bands are improved in prototype VI. The proposed antenna in prototype VI exhibits good circular polarization with ARBW extending from 2.7 to 3.7 GHz and 6.8–7.6 GHz.

28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense …

339

2.2 CP Mechanism and Analysis The distribution of surface current vectors in both the circularly polarized bands is sketched in Figs. 5 and 6. The sense of polarization in the azimuthal plane at 3.2 and 7.3 GHz is studied by using the advancing current vectors at 00 , 900 , 1800 and 2700 . The predominant current vectors are rotating anti-clock wisely in the lower band, w.r.t. + z axis as the propagating direction, and is depicted in Fig. 5 and rotates clockwise in the higher band as shown in Fig. 6. Hence conforming to right-handed circular polarization (RHCP) for the lower radiating band while left-handed circular polarization (LHCP) for the higher radiating band.

a)0°

b)90°

c)180°

d)270°

Fig. 5 Surface current vectors (3.2 GHz)

340

K. C. R. Madaka and P. Muthusamy

a)0°

c)180°

b)90°

d)270°

Fig. 6 Surface current vectors (7.3 GHz)

3 Results and Discussion The measured return loss of the presented antenna is sketched in Fig. 7. It shows the −10 dB impedance bandwidth (ImBW) which is extending in 2.6–3.9 GHz and 6.6–8.7 GHz frequency bands. A good agreement with slight deviation is observed between measured and simulated return loss. The observed deviation is due to the connector and soldering losses. The two diagonally connected auxiliary stubs of the radiating patch and the intruded annular ring provide the broadband CP characteristics. The 3 dB ARBW extends from 2.7 to 3.7 GHz and 6.8–7.6 GHz, illustrated in Fig. 8. The gain of this antenna in both the resonating bands is sketched in Fig. 9 and that is conforming a flat gain of ≈ 3dBi with a variation of ±0.2 dBi in both the resonating bands. The dual-sensing CP nature of the antenna is investigated using normalized RHCP and LHCP radiation patterns of xz-plane (E-plane) and yz-plane (H-plane) at 3.2 and 7.3 GHz.

28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense …

341

Fig. 7 Measured versus simulated return loss

Fig. 8 Axial ratio

At 3.2 GHz, right polarization is dominant with good polarization purity (>30 dB) in both xz and yz planes, thus conforming the RHCP in the lower band as depicted in Fig. 10a and b. At 7.3 GHz, left polarization is dominant with good polarization purity (>30 dB) in both xz and yz planes, thus conforming to the LHCP in a higher frequency band as plotted in Fig. 10c and d.

342

K. C. R. Madaka and P. Muthusamy

Fig. 9 Antenna gain (in dBi)

The proposed circularly polarized dual-sense antenna is compact when compared to the state of art antennas referred in Table 2.

4 Conclusion A CPW fed dual-band dual-sensing circularly polarized antenna has been investigated in this letter. The antenna exhibits both RHCP (2.7–3.7 GHz) and LHCP (6.8–7.6 GHz) characteristics. The précised transformation of the input impedance is realized by flaring the ground at the feed; which results in dual resonating bands. Circular polarization is generated by using auxiliary stubs having a length of λg/2. Dual-sense characteristics are achieved by perturbing the square slot with an annular ring that is embedded in the ground plane. The lower radiating band is right-handed circular polarized while the higher radiating band is left-handed circular polarized. The presented antenna has a compact geometry with a simplified single-port structure and can be implemented for 5G communication (2.7–3.7 GHz) and satellite communication (6.8–7.6 GHz) applications.

28 A CPW Fed Grounded Annular Ring Embedded Dual-Band Dual-Sense …

343

Fig. 10 Radiation patterns of normalized RHCP and LHCP. a 3.2 GHz (xz-plane), b 3.2 GHz (yz-plane), c 7.3 GHz (xz-plane), d 7.3 GHz (yz-plane) Table 2 Comparison of DBDS antennas

References

l×w

f c (GHz)

Dual-sense

[2]

1.3 λg × 1.3 λg

3.1

Yes

[3]

1.3 λg × 1.5 λg

3.5

Yes

[5]

2.8 λg × 2.8 λg

5.3

No

[6]

1.1 λg × 0.96 λg

2.7

Yes

[9]

0.8 λg × 0.8 λg

3.1

No

[10]

1.7 λg × 1.7 λg

4.9

No

Proposed

0.7 λg × 0.7 λg

3.2

Yes

344

K. C. R. Madaka and P. Muthusamy

References 1. Chen YY, Jiao YC, Zhao G, Zhang F, Liao ZL, Tian Y (2011) Dual-band dual-sense circularly polarized slot antenna with a C-shaped grounded strip. IEEE Antenn Wirel Propag Lett 10:915– 918 2. Rui X, Li J, Wei K (2016) Dual-band dual-sense circularly polarized square slot antenna with simple structure. Electron Lett 52(8):578–580 3. Saini RK, Dwari S, Mandal MK (2017) CPW-fed dual-band dual-sense circularly polarized monopole antenna. IEEE Antenn Wirel Propag Lett 16:2497–2500 4. Pan YM, Zheng SY, Li W (2014) Dual-band and dual-sense Omni directional circularly polarized antenna. IEEE Antenn Wirel Propag Lett 13:706–709 5. Zhou YD, Jiao YC, Weng ZB, Ni T (2015) A novel single-fed wide dual-band circularly polarized dielectric resonator antenna. IEEE Antenn Wirel Propag Lett 15:930–933 6. Sahu NK, Sharma A, Gangwar RK (2018) Design and analysis of wideband composite antenna with dual-sense circular polarization characteristics. Microw Opt Technol Lett 60(8):2048– 2054 7. Kumar K, Dwari S, Mandal MK (2018) Dual-band dual-sense circularly polarized substrate integrated waveguide antenna. IEEE Antenn Wirel Propag Lett 17(3):521–524 8. Saini RK, Dwari S (2016) A broadband dual circularly polarized square slot antenna. IEEE Trans Antenn Propag 64(1):290–294 9. Khan MI, Chandra A, Das S (2019) A dual band, dual polarized slot antenna using coplanar waveguide. Adv Comp Commun Contr. Lect Notes Netw Syst 41:95–103 10. Madaka KC, Muthusamy P (2020) Mode investigation of parasitic annular ring loaded dual band coplanar waveguide antenna with polarization diversity characteristics. Int J RF Microwave Comput Aided Eng 30(4). https://doi.org/10.1002/mmce.22119 11. Fu Q, Feng Q, Chen H (2021) Design and optimization of CPW-fed broad band circularly polarized antenna for multiple communication systems. Progr Electromagn Res Lett 99(00):65– 75 12. Tang H, Zong X, Nie Z (2018) Broadband dual-polarized base station antenna for fifthgeneration (5G) applications. Sensors 18(8):2701 13. Federal Communications Commission. https://docs.fcc.gov/public/attachments/FCC-19130A1.pdf, Regulations 2019/12/16 14. Intelsat. Polarization 2013. http://www.intelsat.com/wp-content/uploads/2013/02/Polarizat ion.pdf

Chapter 29

An Analytical Appraisal on Recent Trends and Challenges in Secret Sharing Schemes Neetha Francis and Thomas Monoth

1 Introduction Security is a challenging matter in the recent scenario where all are connected to a public network and the data are usually stored on large servers. Anybody can steal private data of an organization which is open in a public place. But organizations need to protect data from disclosure. One way to protect information is by using conventional encryption. But what happens when the encrypted information is corrupted or when the secret key is lost. Secret sharing addresses this problem and finds solutions for ensuring both confidentiality and reliability. Instead of storing the valuable data in a single place, it is distributed and stored at several places. When the necessity ascends, they can be reconstructed from the distributed shares. The loss of a key cryptographic key is equivalent to data loss as the original data cannot be retrieved back without the encryption key. The security of secret keys used in cryptographic algorithms is very important. The key kept in a single location is highly undependable as a single misfortune such as computer failure, sudden death of a person possessing the secret may lead to great problems. A clear answer is to store the keys at several places. But this makes the situation even worse which provides chances for hackers and hence vulnerable to different types of attacks. The secret sharing-based solution provided a better key management which provides both confidentiality and reliability. This paper presents a comprehensive study of various Secret Sharing Schemes (SSS). Recent research developments in secret sharing are reviewed and conducted N. Francis (B) Department of Information Technology, Kannur University, Kannur, Kerala, India e-mail: [email protected] T. Monoth Department of Computer Science, Mary Matha Arts & Science College, Mananthavady, Wayanad, Kerala, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_29

345

346

N. Francis and T. Monoth

an analysis on different SSS. The major issues are verifiability, cheating detection and cheater identification. In order to overcome the challenges in existing SSS, new techniques using threshold SSS and Verifiable Secret Sharing (VSS) can be developed by various mathematical models.

2 Secret Sharing Schemes The concept of secret sharing is to begin with a secret and split it into pieces called shares or shadows and they are given to shareholders in such a way that the collective shares of chosen subsets make the reconstruction of the secret possible. Secret sharing provides a strong key management scheme that is secure and reliable. The key is made secure by distributing it to n shareholders. If t or more shareholders are there, they can reconstruct it by combining the separate shares held by each shareholder. An authorized set can be defined as any subset of shareholders which comprises t or more than t participants. This method is called t-out-of-n threshold scheme and is denoted as (t, n), where n is the entire number of shareholders and t is the threshold value. This will not disclose the secret if less than t shares are known. Knowledge of less than t shares will not reveal any information about the secret. The size of the share is very important in a SSS. The competence of the scheme can be calculated by the information rate which is the ratio of size of the secret to size of the share. A SSS in which the information rate is equal to one is considered to be ideal. In a secret sharing scheme, the dealer is assumed to be honest. However a dishonest dealer may send inconsistent shares to the participants. To avoid such malicious behavior, protocols need to be implemented which permits the participant to validate the consistency of the shares. VSS will make the shareholders confirm that their shares are consistent. There are two types of VSS protocols, interactive proof and non-interactive proofs. In the Publicly VSS scheme, besides the participants, everyone can check whether the shares are properly allocated. Shamir [1] proposed a scheme based on Lagrange’s interpolating polynomials. For a (t, n) threshold scheme, Dealer picks a random t – 1 degree polynomial: q(x) = a0 + a1 x + . . . + at−1 x t−1

(1)

where a0 is the secret S and choose a prime p such that p ≥ n + 1. Dealers then generate n shares S 1 = q(1), S 2 = q(2),…….,S n = q(n) and securely distribute them to n participants. The Shamir’s scheme is depicted in Fig. 1. Let q(x) = a0 + a1 x + …….. + a t-1 × t−1 , where a0 = S. The n shadows are computed by evaluating q(x) at n different values x 1 , x 2 ,……,x n and x i = 0 for any i Si = q(xi ), 1 ≤ i ≤ n

(2)

Each point (x i , S i ) is a point on the curve defined by the polynomial. The values x 1 , x 2 ,……,x n need not be private and could be the numbers through 1,……,n. As

29 An Analytical Appraisal on Recent Trends and Challenges in Secret …

347

Fig. 1 Shamir’s Secret Sharing Scheme

t points uniquely determine the polynomial q(x) of degree t – 1, the secret S can be constructed from t shares. If P is the set of participants and access structure A ⊆ P such that |A| ≥ t , then q(x) can be constructed using Lagrange’s interpolation formula with t shares of participants. q(x) =

t  (Si j. j=1

 1≤k≤t,k= j

x − xik ) xi j − xik

(3)

Since S = q(0), we can rewrite the formula as S = q(0) =

t  (Si j. j=1

 1≤k≤t,k= j

xik ) xik − xi j

(4)

The above method is implemented mathematically with an example. Given S = 9406, n = 5, t = 3. Pick a prime number larger than p: 104,729. Generate two random coefficients: 55,142, 238. Polynomial is q(x) = 9406 + 55142x + 238x 2 Evaluate q(x) at x = 1, 2, 3, 4, 5 q(1) = 64786 mod 104729 = 64786 q(2) = 120642 mod 104729 = 15913 q(3) = 176974 mod 104729 = 72245 q(4) = 233782 mod 104729 = 24324 q(5) = 291066 mod 104729 = 81608 Hence the keys are: (1,64,786), (2, 15,913), (3, 72,245), (4,24,324), (5, 81,608). Suppose we have the keys 2, 3 and 5. To reconstruct, apply Lagrange’s interpolation method and compute q(0).

348

N. Francis and T. Monoth

q(0) =

3 

yi.

i=1

 x − xj (mod p) xi − x j j=i

−2. − 5 −3. − 5 + 72245. −1. − 3 1. − 2 −2. − 3 + 81608. (mod 104729) 3.2 = 15913.5 + 72245. − 5 + 81608(mod 104729) = 15913

= −2000052(mod 104729) = 9406 Thus secret S = 9406 is recovered successfully.

3 Recent Research Advances in Secret Sharing Schemes Threshold SSS were proposed independently by Shamir and Blakley [2] and since then much research works are going on in this area. Both schemes implement t-outof-n schemes. Polynomial-based constructions are used by Shamir whereas vector space constructions are used by Blakley. Schemes based on number theory were also introduced for the threshold secret sharing scheme. The Mignotte scheme [3] is based on Chinese Remainder Theorem (CRT) and modulo arithmetic. The shares are created using a special sequence of integers called Mignotte sequence and the secret is reconstructed by solving a set of congruence equations using CRT. This scheme is not perfect. Another perfect scheme based on CRT is introduced by Asmuth and Bloom [4]. They also use a special sequence of pairwise co-prime positive integers. Threshold Secret Sharing (TSS) schemes are primarily based on polynomial, vector space and number theory. The following section will give a brief description about various secret sharing methods that exploded in the literature during the period of 2006 to 2022. Bai [5] presented a robust (k, n) threshold SSS with k access levels using matrix projection. The secrets are represented as the members in a square matrix S. Here, the size of the share is comparatively lesser than the size of the secret. Due to the information concealment ability, it has many desired properties. Even though the technique is not a perfect SSS, the secrets are made secure. The scheme is capable of sharing multiple secrets and hence can be used as it is efficient, secure and reliable. Kaya et al. [6] studied the possibility of applying threshold cryptography with the Asmuth-Bloom SSS and also proposed three different functional secret sharing schemes for Elgamal and RSA. These schemes were based on Asmuth-Bloom SSS and also considered as the first security enhanced system. It would be a noteworthy improvement if there is a method to find out the messages or signatures exclusive of the correction phase. Also, extra characteristics like robustness and proactivity can

29 An Analytical Appraisal on Recent Trends and Challenges in Secret …

349

be integrated into the future techniques. The concepts described in this paper helped to obtain functional secret sharing for various public key cryptographic systems. Tang and Yao [7] proposed a novel (t, n) TSS method. It is based on Secure Multiparty Computation (SMC) in which secret K is distributed even if (t – 1) users are fraudulent and the discrete logarithm problem is challenging. It also depends on the multi-prover zero-knowledge arguments. As the distribution protocol in Shamir’s TSS scheme the dealer distributes secret K. SMC protocol allows any set of t users to rebuild secret K. t participants can confirm that they possess the secret K using multi-prover zero-knowledge arguments. Wang and Wong [8] studied the communication competence issues of secret reconstruction in SSS. They proved that there exists a relation between the communication cost and the number of users included in the process of secret reconstruction. They compromised the necessity of devising a confidential point to point communication means as in traditional methods and presented that some partial broadcast means are enough to perform secret reconstruction. An interesting research challenge is to discover a few more efficient structures which have optimal or suboptimal communication intricacy. Lao and Tartary [9] analyzed the properties of threshold changeable schemes. A novel CRT-based secret sharing is introduced, in which it approves several threshold alterations following the actual set-up phase deprived of demanding any communications with the dealer. One advantage of their interpretation is that the secret is at all times assured to be reconstructed after any threshold changes conflicting other schemes where recovery is only probabilistic. To consider the users who move away from the threshold update procedure is one of the challenging issues faced by the system. Bai and Zou [10] familiarized a novel and safe PSS technique for secret sharing constructed on matrix projection. This technique permits sharing of more than one secret whereas Shamir’s method allows sharing of one secret at a time. Also, this method concentrated on forming a distributed PSS scheme to withstand passive attacks which are hard to fix. A matrix is generated using the Pythagorean triples which ensure the security against passive attacks. Lin and Harn [11] proposed two variations of Shamir’s SSS. In the first method, each participant preserves both x and y coordinates of a polynomial as their confidential share. Any t private shares together with some public shares can allow secret reconstruction. These revised techniques are proved to be ideal and perfect. The suggested method utilizes polynomials to generate shares for the participants and applies Lagrange’s interpolation method to rebuild the secret. A multi-level TSS method is created for secret reconstruction and proved to be safe and secure. Wang et al. [12] proposed a multiple SSS based on matrix projection method. This method has the benefit that there is no restriction on the number of secrets that can be shared and it is not required to fill dummy elements into the secret matrix. It attains share size which is not varying as that of a single secret. The proactive feature of matrix projection technique thereby increases the scheme’s overall security. It can periodically update shares without changing the secrets. This method is not completely verifiable based on the features of the projection matrix. As it is only

350

N. Francis and T. Monoth

required to modify the public remainder matrix to distribute another set of secrets, the scheme is said to be dynamic to secret change. Shi and Zhong [13] investigated the issue of changing the threshold limit of Shamir’s threshold technique without assistance of the dealer. The challenging issue of the existing known methods is that it requires the dealer to pre-calculate the public information as per each threshold limit or announce the public function in advance. A new method with the threshold value increasing in the semi-honest model is presented in this paper. In the proposed technique, all users work together to take the role of dealer and to furnish the share renewal process. Each user stores only one share which has the same size as that of the secret. Hence the proposed method is perfect, secure and ideal. The challenge is how to fulfill the protocol in the malicious model. Lin and Harn [14] proposed a new technique to model a multi-SSS with unconditional security. Here, the dealer creates a polynomial with t-1 degree to allow n participants to share t master secrets. Every participant stores a single private share and applies this share to retrieve t secrets successively. This multi threshold scheme turns out to be more efficient which uses Shamir’s scheme. Sun et al. [15] utilized and optimized SSS rather than Lagrange’s method. Two way authentications are offered to guarantee that only the approved sets of participants can retrieve the correct session key. Every participant has to keep only one secret share for all sessions. The secret shares stored by participants can be used at all times instead of altering with various group keys as they exist before the generation of the group key. Farras and Padro [16] presented a normal description for the group of hierarchical access structures and an overall representation of the ideal hierarchical access structures. It is shown that each hierarchical matroid port allows an ideal linear SSS over a finite field. An open challenge is the optimization of the size of the shares with respect to the size of the secret in SSS. Zhang et al. [17] analyzed the security of four recently presented VSS. The study revealed that each of these techniques is vulnerable to cheating by dishonest dealers. The dealer has to announce some repeated information for the purpose of consistency testing. The dealer randomly provides secrets for sharing and also one reusable share can be stored by every shareholder. Every shareholder can find out the cheating by other shareholders with the help of a non-interactive protocol. The dealer and the shareholders are connected by an open channel. The dealer is not aware of the other user’s shadow. Singh et al. [18] presented multi-level SSS based on CRT. In this method, participants are categorized into various security subsets and each participant will hold a part of multi-secret. Multiple secrets are distributed between the users as each subset in successive order. Upper level shares can be used by a lower level subset to recover the secret. Verification is offered to identify cheating in the proposed technique. Asmuth-Bloom sequence is utilized to allocate multiple secrets. It has a controlled range to share a single secret. The shares are reusable in this method. It is unconditionally secure and efficient. Muthukumar and Nandhini [19] discussed two algorithms for securely sharing medical data in the cloud. SSS used polynomials to divide the data whereas the

29 An Analytical Appraisal on Recent Trends and Challenges in Secret …

351

information dispersal algorithm used matrices. It decreases transmission cost and space complexity. The proposed scheme increases the security of the system. It is used for dynamic databases where the participants are not involved. It has the capability of sharing highly confidential data in a multi-cloud environment. Deepika and Sreekumar [20] presented two modifications for a SSS using Gray code and XOR. The shares are created with Gray code and the secret is retrieved using XOR operation on the shares. Using this method, two different schemes 7-outof-7 scheme and 3-out-of-7 scheme are constructed. Two groups of shares which are Qualified set and Forbidden set are also created. The Qualified set includes 3 shares from 7 shares and the Forbidden set includes 4 shares from 7 shares. The presented technique can be used with algorithms in cryptography and secret sharing. Basit et al. [21] proposed a hierarchical, multi-stage SSS based on one-way function and polynomials. This technique has the same security level as that of Shamir’s method. It differs from the level of hardness of the one-way function. Shareholders are divided into various levels based on hierarchical access structure and every level has a separate value as threshold. Only one share of a multi-secret is held by each shareholder. Hence it thereby decreases participant’s difficulty in holding more than one share. It is not essential to refresh the shares for future communication. Shares of higher level shareholders can be used for reconstructing the secrets if the number of available shareholders is smaller than the threshold. Babu et al. [22] considered a multi-stage SSS based on CRT. This method requires only n-t + k + 2 public values. To reconstruct the secret, only one Lagrange’s interpolation polynomial is needed. All shareholders can together verify whether the share submitted by a shareholder is exact or not. By computing n additional public values and publishing on the bulletin board, cheater identification can be performed by the participants. The size of the secret is increased by k times even though each shareholder stores only one share for all the secrets. Liu et al. [23] discussed cheating issues in bivariate polynomial-based SSS and proposed two algorithms for cheating identification. The initial one can detect cheaters by m participants who are included in secret reconstruction. The next one can attain cheater identification with more capability. It is performed with the alliance of the remaining n–m participants who are not engaged in secret reconstruction. Thus proposed algorithms are competent in terms of cheater identification potentials. Jia et al. [24] proposed a threshold changeable SSS in which threshold can be modified in the interval [t, t’] without renewing the shares. In this method, another threshold can be initiated at any time using the public broadcast channel. The scheme makes use of a suggested sequence of nested closed intervals by large co-prime numbers. Harn et al. [25] presented an extended SSS, called secret sharing with secure secret reconstruction in which the secret can be safeguarded in the retrieval stage from both inside and outside attacks. Outsiders have to capture all the published shares to reconstruct the secret. Since a further number of shares is required in reconstruction, it will enhance the security. The fundamental scheme is extended so that the reconstructed secret is only available to participants.

352

N. Francis and T. Monoth

Kandar et al. [26] presented VSS with cheater identification. The method enables combiner verification by the shareholders to verify whether the request for share submission is from an authenticated combiner or not. The scheme has proved that it can withstand different types of attacks. In this method, each user is allotted a shadow share which eliminates the risk of reconstruction and then combining the shares of the minimum number of participants. The authenticity of the combiner is also checked by each participant before submitting their shares and this will eliminate the risk of an opponent acting as a combiner. Meng et al. [27] proposed a threshold changeable SSS using bivariate symmetry polynomial which is both immune cheating and prevents the illegal attack by participants. In the basic threshold changeable scheme during secret reconstruction, threshold is permitted to intensify from t to the precise number of all participants. If valid shares are produced by all participants, then the secret can be recovered. Moreover, a revised TCSS scheme is proposed in order to reduce the coefficients of shares for each participant. Huang et al. [28] uses the error correction ability of QR code to suggest a (n, n) TSSS. A secret QR code can be divided and encrypted into n cover QR codes. The created QR codes still contain cover messages so that unauthorized people cannot detect the presence of secret messages while transmitting in the public channel. The secret QR code can be easily recreated using XOR operation if all the n authorized participants provide their shares. The method is proved to be both feasible and robust. Yuan et al. [29] proposed a hierarchical multi-SSS based on the linear homogeneous recurrence (LHR) relations and one-way function. This method decreases the computational complexity of HSSS from exponential time to polynomial time. It can simultaneously share multiple secrets. Every participant only holds a single share during the execution. This method is both perfect and ideal. It also evades the verification of non-singularity of the matrices in the above method. Ding et al. [30] analyzed the security of existing secure secret reconstruction schemes based on bivariate polynomials. A theoretical model for the construction of secure secret reconstruction schemes in the dealer-free and non-interactive scenario is proposed. The share sizes are identical to other prevailing insecure (t, n) SSR schemes.

4 Comparative Analysis of Various Secret Sharing Schemes An analysis is made on different methods used in secret sharing along with their advantages and challenges. The summary of various modifications in SSS which are explained in the previous sections can be shown in Table 1. From Fig. 2, it can be seen that 55% of the research works are based on polynomial methods, 26% are using CRT-based methods, 16% are using matrix-based methods and 3% are using vector-based methods. Threshold secret sharing schemes are mainly based on polynomial, vector space, matrix and number theory. Other threshold schemes are hierarchical threshold secret sharing, weighted threshold secret

29 An Analytical Appraisal on Recent Trends and Challenges in Secret …

353

Table 1 Comparison of Various SSS Sl. No

Authors and Year

SS schemes used

Techniques used

Advantages/Challenges

1

Shamir [1]

Threshold

Polynomial based

Information theoretic security, minimal, extensible, dynamic, flexible, not verifiable

2

Blakley [2]

Threshold

Vector space based

Secret is an element of vector space, shares are n distinct (t-1) dimensional hyper planes, not perfect

3

Asmuth and Bloom [3]

Threshold

CRT based

Uses special sequence of pairwise co-prime integers, perfect

4

Mignotte [4]

Threshold

CRT based

Uses Mignotte sequence of integers, not perfect

5

Bai [5]

Threshold

Matrix projection based

Information concealment capability, smaller share size, not perfect

6

Kaya et al. [6]

Threshold

CRT based

Function sharing schemes for RSA, ElGamal and Paillier cryptosystems, robustness, proactivity has to be integrated

7

Tang and Yao [7]

Threshold

Polynomial based

Secure multiparty computation, proof by zero-knowledge arguments

8

Wang and Wong [8]

Threshold

Polynomial based

Partial broadcast channels for secret reconstruction, easy to implement, smaller share size, require secure multiparty cryptographic protocols

9

Lao and Tartary [9] Threshold

CRT based

Allows multiple threshold changes, perfect security, dealer-free environment

10

Bai and Zou [10]

Proactive

Matrix projection based

Share multiple secrets, counter passive adversary attacks, no measures to withstand active attacks

11

Lin and Harn [11]

Threshold

Polynomial based

Ideal, perfect, multi-level threshold SS

12

Wang et al. [12]

Multiple threshold

Matrix projection based

Constant share size, partially verifiable, dynamic (continued)

354

N. Francis and T. Monoth

Table 1 (continued) Sl. No

Authors and Year

13

SS schemes used

Techniques used

Advantages/Challenges

Shi and Zhong [13] Threshold

Polynomial based

Participants renew shares, secure, perfect, ideal

14

Lin and Harn [14]

Multi threshold

Polynomial based

Shareholder keeps one private share, unconditional security

15

Sun et al. [15]

Threshold

Polynomial based

Provide mutual authentication, reduced storage cost, improved computation efficiency

16

Farras and Padro [16]

Hierarchical threshold

Polymatroid based

Total characterization of ideal hierarchical access structures, optimization of length of shares

17

Zhang et al. [17]

Verifiable

Polynomial based

No secret channels, reusable shadows, detect cheating, lack of consistence test of information

18

Singh et al. [18]

Multi-level multi-stage threshold

CRT based

Detect cheating, reusable shares, unconditionally secure, efficient

19

Muthukumar and Nandhitha [19]

Threshold

Polynomial based, Matrix based

Reduced transmission overhead and space complexity, share high sensitive data in multi-cloud environment

20

Deepika and Sreekumar [20]

Threshold

Gray code and XOR Used as a cryptographic operation algorithm for SS and visual SS, no information loss, can be used for visual cryptography

21

Basit et al. [21]

Hierarchical multi-stage multi-secret threshold

Polynomial based

Unconditionally secure, shares can be reused, participant’s risk to keep multiple shares are minimized, no verification

22

Babu et al. [22]

Multi-stage threshold

Polynomial based

Less number of public values, cheater identification, size is increased by k times (continued)

29 An Analytical Appraisal on Recent Trends and Challenges in Secret …

355

Table 1 (continued) Sl. No

Authors and Year

SS schemes used

Techniques used

Advantages/Challenges

23

Liu et al. [23]

Threshold

Polynomial based

Identify cheaters by m participants engaged in secret reconstruction and remaining n-m participants who are not engaged in reconstruction

24

Jia et al. [24]

Threshold

CRT based

Threshold can be changed in an integer interval, smaller share size and low complexity for recovery

25

Harn et al. [25]

Threshold

Polynomial based

Secure from insider and outsider attacks, uses symmetric bivariate polynomial to generate the shares, enhanced security

26

Kandar et al. [26]

Verifiable

Polynomial based

Cheater identification feature, combiner, verification by the shareholders

27

Meng et al. [27]

Threshold

Polynomial based

Based on univariate polynomial and bivariate symmetry polynomial, prevents illegal participant attack, reduce coefficients of shares, dealer-free, non-interactive and cheating immune

28

Huang et al. [28]

Threshold

Polynomial based

utilizes the error correction capacity of QR code, feasible, high robustness, higher security

29

Yuan et al. [29]

Hierarchical multi-secret

Polynomial based

Reduces computational complexity, share multiple secrets, each participant only holds one share, both perfect and ideal

30

Ding et al. [30]

Threshold

Polynomial based

Based on asymmetric bivariate polynomials, easy to construct, same share size, dealer-free and non-interactive

356 Fig. 2 Graphical representation of techniques used in SSS

N. Francis and T. Monoth

Techniques Used 16% 3% 55% 26%

Polynomial CRT Vector Matrix

sharing and compartmented secret sharing. It can be analyzed that a polynomial based method can be considered as the best method based on the review.

5 Conclusion A comprehensive study of SSS for information security is presented in this paper. This also compared and analyzed the recent research advances in secret sharing done by different researchers. By reviewing the literature, it is realized that several SS methods are investigated to overcome the challenges existing in the fundamental methods proposed by Shamir and Blakley. Most of the studies are based on modifications to TSS schemes as they are easy to implement. Hence future studies can be focused on the multi-level schemes using TSS. As there is a need to improve the efficiency and security of existing SS methods and to obtain a cheating immune system, an approach of combining TSS with VSS schemes is suggested based on the review.

References 1. Shamir A (1979) How to share a secret. Commun ACM 22(11):612–613 2. Blakley, G. R.: Safeguarding cryptographic keys. In Managing Requirements Knowledge, 313–313. IEEE Computer Society, New York (1979). 3. Asmuth C, Bloom J (1983) A modular approach to key safeguarding. IEEE Trans Inf Theory 29(2):208–210 4. Mignotte, M.: How to share a secret. In Workshop on cryptography, 371–375. Springer, Berlin, Heidelberg (1982). 5. Bai, L.: A strong ramp secret sharing scheme using matrix projection. In 2006 International Symposium on a World of Wireless, Mobile and Multimedia Networks, 5–656. IEEE, (2006). 6. Kaya K, Selçuk AA (2007) Threshold cryptography based on Asmuth-Bloom secret sharing. Inf Sci 177(19):4148–4160 7. Tang, C., Yao, Z. A.: A new (t, n)-threshold secret sharing scheme. In International Conference on Advanced Computer Theory and Engineering, 920–924. IEEE, (2008). 8. Wang H, Wong DS (2008) On secret reconstruction in secret sharing schemes. IEEE Trans Inf Theory 54(1):473–480

29 An Analytical Appraisal on Recent Trends and Challenges in Secret …

357

9. Lou, T., Tartary, C.: Analysis and design of multiple threshold changeable secret sharing schemes. In International Conference on Cryptology and Network Security, 196–213, Springer, Berlin, Heidelberg. (2008). 10. Bai L, Zou X (2009) A proactive secret sharing scheme in matrix projection method. Int J Secure Network 4(4):201–209 11. Lin, C., Harn, L., & Ye, D.: Ideal perfect multilevel threshold secret sharing scheme. In Fifth International Conference on Information Assurance and Security (Vol 2), 118–121, IEEE (2009). 12. Wang, K., Zou, X., & Sui, Y.: A multiple secret sharing scheme based on matrix projection. In 33rd Annual IEEE International Computer Software and Applications Conference (Vol 1), 400–405, IEEE (2009). 13. Shi, R., Zhong, H.: A secret sharing scheme with the changeable threshold value. In International Symposium on Information Engineering and Electronic Commerce, 233–236, IEEE (2009). 14. Lin, C., Harn, L.: Unconditionally secure multi-secret sharing scheme. In IEEE International Conference on Computer Science and Automation Engineering (Vol 1), 169–172, IEEE (2012). 15. Sun Y, Wen Q, Sun H, Li W, Jin Z, Zhang H (2012) An authenticated group key transfer protocol based on secret sharing. Procedia Engineering 29:403–408 16. Farras O, Padró C (2012) Ideal hierarchical secret sharing schemes. IEEE Trans Inf Theory 58(5):3273–3286 17. Liu Y, Zhang F, Zhang J (2016) Attacks to some verifiable multi-secret sharing schemes and two improved schemes. Inf Sci 329:524–539 18. Singh, N., Tentu, A. N., Basit, A., & Venkaiah, V. C.: Sequential secret sharing scheme based on Chinese remainder theorem. In IEEE International Conference on Computational Intelligence and Computing Research, 1–6, IEEE (2016). 19. Muthukumar, K. A., Nandhini, M.: Modified secret sharing algorithm for secured medical data sharing in cloud environment. In Second International Conference on Science Technology Engineering and Management, 67–71, IEEE (2016). 20. Deepika, M. P., Sreekumar, A.: Secret sharing scheme using gray code and XOR operation. In Second International Conference on Electrical, Computer and Communication Technologies, 1–5, IEEE (2017). 21. Basit, A., Kumar, N. C., Venkaiah, V. C., Moiz, S. A., Tentu, A. N., & Naik, W.: Multi-stage multi-secret sharing scheme for hierarchical access structure. In International Conference on Computing, Communication and Automation, 557–563, IEEE (2017). 22. Babu, Y. P., Kumar, T. P., Swamy, M. S., & Rao, M. V.: An improved threshold multistage secret sharing scheme with cheater identificaion. In International Conference on Big Data Analytics and Computational Intelligence, 392–397, IEEE (2017). 23. Liu Y, Yang C, Wang Y, Zhu L, Ji W (2018) Cheating identifiable secret sharing scheme using symmetric bivariate polynomial. Inf Sci 453:21–29 24. Jia X, Wang D, Nie D, Luo X, Sun JZ (2019) A new threshold changeable secret sharing scheme based on the Chinese Remainder Theorem. Inf Sci 473:13–30 25. Harn L, Xia Z, Hsu C, Liu Y (2020) Secret sharing with secure secret reconstruction. Inf Sci 519:1–8 26. Kandar S, Dhara BC (2020) A verifiable secret sharing scheme with combiner verification and cheater identification. Journal of Information Security and Applications 51:102430 27. Meng K, Miao F, Huang W, Xiong Y (2020) Threshold changeable secret sharing with secure secret reconstruction. Inf Process Lett 157:105928 28. Huang PC, Chang CC, Li YH, Liu Y (2021) Enhanced (n, n)-threshold QR code secret sharing scheme based on error correction mechanism. Journal of Information Security and Applications 58:102719 29. Yuan J, Yang J, Wang C, Jia X, Fu FW, Xu G (2022) A new efficient hierarchical multi-secret sharing scheme based on linear homogeneous recurrence relations. Inf Sci 592:36–49 30. Ding J, Ke P, Lin C, Wang H (2022) Bivariate polynomial-based secret sharing schemes with secure secret reconstruction. Inf Sci 593:398–414

Chapter 30

A Comparative Study on Sign Language Translation Using Artificial Intelligence Techniques Damini Ponnappa and Bhat Geetalaxmi Jairam

1 Introduction Gestures are naturally used to convey meaning among humans and evolution in the IT area has had a vital impact on the manner with which individuals interact with each other. The study has always been centered around the exchange of information [1]. Communication facilitates interaction between humans to reciprocate emotions and intentions. The community of deaf-mutes faces plenty of challenges in communicating with normal people [2]. Typically, this problem can be solved by possessing a dedicated person to serve as an interpreter for facilitating communication among them. Nevertheless, a substitute solution must be offered as the translator may not be available at any given time, unlike a computer program [1]. Sign Language (SL) is the one where communication is nonverbal [3]. In SL, gestures and signs are grouped to form a single language. SL uses fingers, hands, arms, eyes, head, face, and more to communicate. Each gesture in SL has its meaning. In SL, understanding the gestures is the key to understanding the meaning of the words. When a person using this gestural SL solely for communication tries to communicate with someone who does not understand it, there is a problem in the communication. It is important to note that every nation possesses sign language. It is referred to as the Indian Sign Language (ISL) in India [4]. A gesture is the general means of communication for the people who face hardship in speaking and hearing. Despite not being able to communicate with most people, they can interact well with each other [1]. Various analysts are engaging in developing designs that are transforming how humans and computers D. Ponnappa (B) · B. G. Jairam Department of Information Science and Engineering, The National Institute of Engineering, Mysore, India e-mail: [email protected] B. G. Jairam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_30

359

360

D. Ponnappa and B. G. Jairam

Fig. 1 Major subsets of Artificial Intelligence

interact with each other in light of advances in science and engineering. Computer programs are developed in such a fashion that they can help in the translation of sign language to a textual format which includes frames that are static and dynamic [3]. With recent advancements in artificial intelligence, several algorithms are developed which are categorized under deep learning and machine learning, this will improve the quality and accuracy of predicting sign language.

1.1 Artificial Intelligence (AI) The replication of human intelligence by machines that can simulate their actions and even think like humans are labeled as artificial intelligence. The term may also apply to any machine that exhibits human characteristics, such as problem-solving and learning. Artificial Intelligence is proving to be highly useful in the prediction of sign language due to its rapid development. The most popular subsets of AI are machine learning and deep learning as shown in Fig. 1.

1.2 Machine Learning (ML) The subsection of AI that enables machines to train and grow rather than being specifically developed is called machine learning. Data acquisition and self-learning are functions of machine learning. To begin the machine learning process, real-time data or collected information is used. In order to draw conclusions from the examples presented, it seeks to find patterns in the data. By removing the need for human interaction and modifying the behavior of the computers, ML enables computers to train by themselves.

30 A Comparative Study on Sign Language Translation Using Artificial …

361

1.3 Deep Learning (DL) Deep learning is one of the subdivisions of AI and ML that impersonates the way humans gain insight. DL is highly useful in areas that include collection, inspection, and depicting an enormous proportion of data, it quickens and reduces the complexity of the technique. To its ultimate fundamental extent, DL can be considered a mode to automatize predictive analytics. DL algorithms are developed in a chain of rising complication and abstraction, far from classic ML algorithms that are straightforward.

2 Literature Review • Someshwar et al. [5] used TensorFlow, a library utilized for the design and development of the model. To serve the motive of Image Recognition, a DL algorithm, Convolutional Neural Network (CNN) was employed. CNN algorithm assists in translating the images into the form of a matrix which is recognized by the developed model and makes it ready for classification. OpenCV behaves as the eyes of the system which captures and processes real-time gestures made using hands. • Yusnita et al. [1] made use of Computer Vision which is concerned with obtaining images with the help of Image Processing and extracting the key specifics of the image. A classification procedure compares and classifies the current gesture performed by the user using the model which is trained. The base of the procedure is ML and Artificial Neural Network (ANN) is the categorization method employed in the conducted research. • Guo et al. [6] introduced a basic Convolutional Neural Network for image categorization. An examination was conducted by taking into consideration alternative techniques of learning rate sets and different optimization algorithms. This provided the optimal parameters for the influence on picture classification using the Convolutional Neural Network. It was also noticed how the composition of different Convolutional Neural Networks affects picture classification results. • Harini et al. [3] generated a model for the purpose of recognition of sign language that transforms the sign into text in both static and dynamic frames. The gestures are pre-processed and photographed with a webcam. Background subtraction is employed to remove the background in the pre-processing stage allowing the model to adjust to any change in the background. The image should be rightly collected and filtered which is a key challenge faced with software-based designs. Figure 2 depicts a sum-up of the layers used in the proposed CNN model. The Convolution Layer The technique performs the task of identifying specific characteristics of an image. Feature extraction is performed by filtering the input signal using the image and kernel matrices. The dot product of image and kernel matrices are the resulting matrices.

362

D. Ponnappa and B. G. Jairam

Fig. 2 Overview of the CNN model

The Max-pooling Layer It may be necessary to reduce a convoluted image without sacrificing its features if it is too large or too small. On a max-pooling surface, only the highest value in a certain area is chosen. The Flattening Layer The multi-D matrixes are converted to 1D arrays in the flattening surface so that they can be fed into a classifier. • Shinde and Kagalkar [7] utilized the Canny Edge Detection (CED) technique since it has higher accuracy and consumes less time. The method effectively removes noise from the input and detects a clear image for the next level of processing. The rate of error produced by the algorithm is relatively low, with edge points that are localized, and responses that are single. Java NetBeans was used to create the system. • Badhe and Kulkarni [4] applied the movements to the system that are identified from the collected input photos and transformed them into a graded setup, i.e. the transformed gestures made by hands are expressed in the form of English words. The input in the video format is separated into frames that are single, and every frame is sent to the function which handles pre-processing. Each frame passes through numerous filters to minimize unnecessary regions and improve the speed. • Mapari and Kharat [8] developed a system formed on a type of supervised machine learning algorithm called Support Vector Machine (SVM). The data was obtained from students who already knew how to do sign signals or had undergone training to do so. A still camera with 1.3 million pixels was used to record the data. Because only a few motions were taken into account, the precision of the model after it was trained was determined to be 93.75%. • Wu and Nagahashi [9] used a novel training methodology based on an AdaBoost classifier for the purpose of training on the images’ Haar features. This AdaBoost operates on Haar-like characteristics including the variation of frames to study the skin color by detecting whether the hand is left or right instantly. The classifier

30 A Comparative Study on Sign Language Translation Using Artificial …

363

Fig. 3 Example of Haar-like features

has an enhanced tracking algorithm that captures and processes the patch of the hand from the preceding frame to generate its fresh patch for the present frame. The algorithm correctly anticipates gestures at a rate of 99.9% of the time. As shown in Fig. 3, a collection of rectangle masks is used to calculate Haar-like features. The addition of pixel intensities which is available inside the white-colored rectangle is always deducted from the addition of pixel intensities available inside the black-colored rectangle to determine the value of each feature. It is an ML technique that is applied in a step-by-step manner. The approach chooses the classifiers which are weak depending on the Haar-like characteristics before combining the classifiers that are weak to improve execution. All classifiers that are weak are combined to form a single strong classifier. To reduce training error, the classifier which is stronger impulsively modifies the heaviness of the samples. This type of load change for the classifier that is stronger is too slow to operate in real-time. As a result, the classifiers that are weak are stacked in a stream, with subsequent classifiers being instructed exclusively based on examples that have passed through the previous classifiers. • Nath and Anu [10] applied Perceptron, a neural network operation for developing the system. SciPy, a built-in Python tool, was used to create the design. Few characteristics which include accuracy, F1 score, and recall are used in the system’s execution measurements. The developed model uses a strategy called pruning to assist cut the network size to increase performance. The concealed layers were gradually increased from 10 to 120 throughout the training phase. • Sajanraj and Beena [2] made use of the processor called ARM CORTEX A8 to implement sign recognition in the system. To capture and process images in the present time OpenCV python library was used. Haar training characteristics were employed to predict both positive and negative pictures. • Rao et al. [11] designed a model that was trained on a dataset that included 300 distinct ISL number pictures. In 22 epochs, the system’s accuracy was 99.56 percent. Various activation functions and rates of learning were used to test the model. Keras API assisted by TensorFlow was utilized to build the backend. The algorithm correctly anticipated the symbols which were static when it was tested with hundred photos for every sign. • Bantupalli and Xie [12] employed 4 convolution layers where the window size varied in the system; it also included the activation function called ReLu. The developed model was put to the test with three different pooling algorithms, with

364

D. Ponnappa and B. G. Jairam

stochastic pooling proving to be the most effective. For the purpose of feature extraction, stochastic pooling (2 layers) was used. • Suresh et al. [13] employed the initial design of CNN for the recognition of gestures to draw out spatial characteristics. The Recurrent Neural Network(RNN) model was utilized to bring out data that is temporally taken from the video streamlet. By making use of the identical samples for training and testing both CNN and RNN models were tested separately. This guarantees that either CNN or RNN does not make use of test data to improve prediction while the phase of training is in progress. To train both the models ADAM optimizer was utilized which minimizes loss. • Jiang and Chen [14] proposed a system to foretell SL motions produced by users, the designed model was constructed utilizing a 2-layered CNN. For the purpose of classifying and comparing prediction accuracy, two separate models were utilized. The optimizers namely SGD and Adam, both of these use the cost function namely Categorical Cross entropy, which was applied to optimize the output. Even with blurry images and under varying lighting situations, the model was found to accurately anticipate gestures. The developed system has identified a total of six distinct SLs with SGD having a precision of 99.12% and Adam having a precision of 99.51% percent, both optimizers were used separately. The accuracy is more while making use of the Adam optimizer.

3 Comparison of Artificial Intelligence Techniques Table 1 gives the accuracy of each AI Technique employed for sign language translation and suggestions to improve the same. The performance of the AI algorithms based on accuracy is depicted in Fig. 4.

4 Conclusion This paper outlines a prior analysis of the recognition and translation of sign language employing several artificial intelligence techniques. It is understood that the dataset input and selection of features are equally critical in acquiring worthier prediction results. This comparative analysis noticed that Convolutional Neural Network and AdaBoost classifier are the prominent techniques that yield higher accuracy individually in discovering and speculating the sign language.

30 A Comparative Study on Sign Language Translation Using Artificial …

365

Table 1 Comparison table of existing artificial intelligence techniques SL No

AI Technique used

Method used

Remarks

1

Convolutional Real-time Hand Neural Network [5] Gestures

Dataset input

CNN is used for Image Recognition and for making the Classifier ready

CNN has an individual accuracy of 99.91% which is commendable

2

Artificial Neural Network [1]

ANN is the image ANN has an classification method individual accuracy used here of 90% which can be improved by increasing the count of the layers which are hidden

3

Convolutional The MNIST Neural Network [6] dataset

CNN is the image CNN has an classification method individual accuracy used here of 99.91% which is commendable

4

Convolutional Real-time Hand Neural Network [3] Gestures

CNN is used for the analysis and classification of images

CNN has an individual accuracy of 99.91% which is commendable

5

The Canny Edge Detection Technique [7]

Real-time Hand Gestures

CED removes noise from the input and helps in detecting a clear image for processing

CED has an accuracy of 91.56% which can be improved by employing more parameters

6

Gesture Recognition Algorithm [4]

Indian Sign Language

GRA conducts data accession and pre-processing of signs to follow movements made by hands

GRA has an accuracy of 97.5% which can be improved by employing more parameters

7

Support Vector Machine [8]

Real-time data SVM is the image from students who classification method knew sign used here language

SVM has an accuracy of 93.75% which can be improved by using an ensemble of SVMs

8

An AdaBoost Classifier based on Haar-like features [9]

Videos containing sign language

AdaBoost has an individual accuracy of 99.9% which is commendable

SIBI—An Indonesian SL

An AdaBoost classifier is used for training the model centered on the images’ Haar-like features

(continued)

366

D. Ponnappa and B. G. Jairam

Table 1 (continued) SL No

AI Technique used

Dataset input

Method used

Remarks

9

Perceptron, a Neural Network Mechanism [10]

Real-time hand gestures

Perceptron uses the pruning technique to reduce the size and improve systems’ efficiency

Perceptron has a precision of 88% which can be improved by increasing the count of the layers which are hidden

10

Convolutional Neural Network, Haar classifier [2]

Indian Sign Language

The combination of CNN and Haar classifier was used for classification and image recognition

The accuracy of the model is higher at 99.56% with normal light conditions when compared with low-light conditions at 97.26%, this can be improved by integrating with other neural networks

11

Convolutional Indian Sign Neural Network Language with Keras API [11]

CNN was used for image recognition and classification along with Keras API which was used at the backend

The CNN with Keras model has an accuracy of 99.56% which is acceptable but can be enhanced by making use of ensemble learning

12

Convolutional Neural Network with ReLu activation function, Stochastic pooling [12]

A website named Kaggle which contained twentysix American sign language letters

CNN was used for image recognition and classification along with the ReLu activation function. Stochastic pooling was applied to extract the features

The model has an accuracy of 99.3% which can be enhanced by employing ensemble learning

13

Convolutional Neural Network, Recurrent Neural Network, Adam optimizer [13]

Video streams

CNN was used for gesture recognition and RNN was used for feature extraction. To train the two models, the Adam optimizer was utilized

The model has higher accuracy 99.91% when CNN is used. The performance of RNN can be improved with ensemble learning (continued)

30 A Comparative Study on Sign Language Translation Using Artificial …

367

Table 1 (continued) SL No

AI Technique used

14

Convolutional 6 different sign Neural Network, languages SGD optimizer, and Adam optimizer [14]

Dataset input

Method used

Remarks

2-layer CNN was used for the prediction of sign language, where the SGD optimizer and Adam optimizer were used for the optimization of output

The model has higher accuracy of 99.51% when the Adam optimizer is used and 99.12% when the SGD optimizer is used. SGD generalizes better and Adam Optimizer converges faster

Fig. 4 Performance of the Techniques used based on Accuracy

5 Future Work As per the analysis, it is clear that some techniques such as Support Vector Machine and Artificial Neural Network do not provide the expected accuracy as the count of hidden layers employed is limited. This concern can perhaps be overcome in the future by increasing the count of hidden layers employed and by using alternative combinations or hybrid AI algorithms which use ensemble learning over the existing technique.

References 1. Yusnita L, Rosalina R, Roestam R, Wahyu R (2017) Implementation of real-time static hand gesture recognition using artificial neural network. CommIT J 11(2):85 2. Sajanraj TD, Beena MV (2018) Indian sign language numeral recognition using region of interest convolutional neural network. In: 2nd International conference on inventive

368

D. Ponnappa and B. G. Jairam

communication and computational technologies, Coimbatore, India 3. Harini R, Janani R, Keerthana S, Madhubala S, Venkatasubramanian S (2020) Sign language translation. In: 6th International conference on advanced computing and communication systems, Coimbatore, India 4. Badhe PC, Kulkarni V (2015) Indian sign language translator using gesture recognition algorithm. In: IEEE International conference on computer graphics, vision and information security, Bhubaneswar, India 5. Someshwar D, Bhanushali D, Chaudhari V, Swathi N (2020) Implementation of virtual assistant with sign language using deep learning and TensorFlow. In: Second international conference on inventive research in computing applications, Coimbatore, India 6. Guo T, Dong J, Li H, Gao Y (2017) Simple convolutional neural network on image classification. In: IEEE 2nd International conference on big data analytics, pp 1–2 7. Shinde A, Kagalkar R (2015) Sign language to text and vice versa recognition using computer vision in Marathi. In: National conference on advances in computing, Kochi, India 8. Mapari R, Kharat G (2012) Hand gesture recognition using neural network. Int J Comput Sci Network 1(6):48–50 9. Wu S, Nagahashi H (2013) Real-time 2D hands detection and tracking for sign language recognition. In: 8th International conference on system of systems engineering, Maui, HI, USA 10. Nath GG, Anu VS (2017) Embedded sign language interpreter system for deaf and dumb people. In: International conference on innovations in information embedded and communication systems, Coimbatore, India 11. Rao GA, Syamala K, Kishore PVV, Sastry ASCS (2018) Deep convolutional neural networks for sign language recognition. In: Conference on signal processing and communication engineering systems, Vijayawada, India 12. Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision.In: IEEE International conference on big data, Seattle, WA, USA 13. Suresh S, Mithun H, Supriya MH (2019) Sign language recognition system using deep neural network. In: 5th International conference on advanced computing & communication systems, Coimbatore, India 14. Jiang S, Chen, Y (2017) Hand gesture recognition by using 3DCNN and LSTM with Adam optimizer. In: Advances in multimedia information processing. Harbin, China

Chapter 31

WSN-IoT Integration with Artificial Intelligence: Research Opportunities and Challenges Khyati Shrivastav

and Ramesh B. Battula

1 Introduction Sensor networks are formed not only from the data related to text but also audio, video, images and data from small-scale industrial sectors as well. There is clustering of data as well as a need for the power efficiency in sensor networks of multimedia. Clustering strategy was robust and effected packet data transmission with different traffic conditions. Cluster head formation, lifetime and delivery of data packet ratio exist in some methods. One has to use the packet delivery ratio for finding out the number of successful packet transfers by Salah ud din et al. [1]. For obtaining quality of service (QoS) for IoT, actual smart sensors need to be used for city network management with no human interventions. IoT-enabled devices are used in smart cities, intelligent and balanced networks as recommended by Keshari et al. [2]. IoT networks are unique networks used for a range of localized data, error localizations and other variances. Here, IoTs do not apply to all the categories of environment. Localizations of forests, oceans and buildings may be of free range or with some interconnecting ranges. IoT objectives are to set up such networks which are better in performance with minimal resources used in systems. There are IoT devices in industrial regions, schools, colleges, campuses, outside and inside also. Buildings, traffic, oceans, deserts, etc., are also networked in WSN-IoT. Global positioning systems (GPS) is prevalent for finding positions, but with other situations, indoor, or other building environments, localizations are less possible as proposed by Barshandeh et al. [3]. IoT is set for millions or billions of devices which are industrial or smart cities and houses, where human interference is least possible. This IoT has to be K. Shrivastav (B) Gautam Buddha University, Greater Noida, Nagar GB, India e-mail: [email protected] R. B. Battula Malviya National Institute of Technology, MNIT, Jaipur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_31

369

370

K. Shrivastav and R. B. Battula

integrated with machine learning (ML), deep learning (DL) and also other industrial analytics for the growth of fifth generation (5G) and block chain systems. This system has seven layers in the architecture comprising physical devices, device-to-device (D2D) communication, edge computing, data storage network management, application and other collaborative things. From an innovative standpoint, technologies have to be derived for IoT communities and attackers. Sharma et al. [4] proposed that more research is thus required in the fields of IoT, block chain, data sciences, cloud and fog computing, etc. There are many data compression techniques for wireless sensor networks. Data collection is on a large-scale basis for applications related to the healthcare monitoring, industrial, agricultural management and other areas of industries which come under WSN. Communication takes place in the presence of short range devices like Bluetooth, zigbee, etc. Parameters of complexity include information processing, static or dynamic memory, data exchange, redundant data acquisition, security, realtime data management, robustness and consciousness for the realization of QoS as said by Ketshabetswe et al. [5]. There is energy minimization using modulation techniques for the IoT. WSNs have a short lifespan limitation for a given power supply. There is a M-ary quadrature amplitude modulation (M-QAM) system for the modulation technique. Abu-Baker et al. [6] proposed that cluster density and sizes are important in energy savings and form a lower and upper bound for calculation of energy consumption and throughput. For the 5G IoT networks, most importantly, there is a requirement of security and privacy. Protection of the IoT networks from various unfair Internet means is a prime need in today’s world. Latest technologies are required for the hybrid combination of things related to machine learning, AI and other devices by Kumar et al. [7]. Communication and computing latency are a few factors for the coverage of wireless sensor networks to attain server stabilities, etc. There are trade-offs among latency, network coverage and stability by Chen et al. [8]. Cloud-based AI engines are an application for host server regions to use sensor nodes for data exchange policies in a group or cluster. Various constructed or obstructed IoT networks have AI-based things for making decisions. IoT sensor nodes (SNs) and low power wide area networks (WAN) technologies work together in long-range communication. A well-planned IoT network is for the literature-based range predicting learning techniques by Lami and Abdulkhudhur [9]. Big amounts of data are considered for the concentration of wireless sensor networks. Two types of query-driven and query-finding WSNs work in collaboration with normal low power sensor nodes to solve the data collection and aggregation problems related to agriculture, traffic monitoring, earthquake and flood detections. Various protocols, issues and other applications exist for time-driven and hybrid architectures in prioritizing data transmission and achieving accuracy in a process by Sahar et al. [10]. The Internet has emerged for sharing with technologies to connect with communication, embedded actuator and sensor nodes, readers, target id’s, etc., for establishing communication with protocols and applications of smart cities, structural health of buildings, waste management, air quality and noise monitoring up to a certain level of traffic congestion. Challenges of the IoT paradigm have the components

31 WSN-IoT Integration with Artificial Intelligence: Research …

371

of privacy, compatibility, scalability, mobility management and cost effectiveness. Kassab and Darabkh [11] proposed that disconnection is a problem while establishing connections among heterogeneous devices. Now, services of IoT provide seamless connectivity of upper and physical worlds. IoT-related terms such as web-of-things (WoT), IoT cloud of things as well as machine-to-machine (M2M) communication are suitable in such systems where a major factor is providing systems with sensing capability. Connectivity, storage, computational and other capabilities led to the development of fog, cloud networking server based and nano-technological solutions which can work with IoTs by Colakovic and Hadzialic [12].

2 Related Works There are WSN clustering and routing methods that have been investigated in previous research works. Nowadays with the development of algorithms/protocols for IoT, it has become easier to integrate WSN-IoTs for specific AI applications. Thus, they have been highlighted in the following subsections.

2.1 Algorithms Related to WSN as Well as IoT Intelligent computation and adaptation are related to WSNs for the deployment, topology changing structure, computation storage and communication capability. Applications and protocols are energy efficient, scalable and robust, and the environment or the context can be changing and intelligent behavior needs to be demonstrated by Serpen et al. [13]. Wireless sensor networks have a high amount of big data complexities related to data which is either inside the networks or outside the networks. Data collected from networks are in distribution and centralization with other collected outputs of data by Fouad et al. [14]. Development of IoT-based energy efficient algorithms maximizes lifetime based on an analytical hierarchical process and a genetic clustering protocol (LiM-AAP-G-CP) for IoT-based areas divided into different IoT areas, dimensions, sizes of field areas for sensor nodes and energy residual selections presented by Darabkh et al. [15]. Virtual, physical devices, entities and components which are heterogeneous in nature are the IoT things which can be millions and billions in number. Internet of things communicates intelligently with each other and varies in their structure, abilities and other issues. Business and social network structure enabling technologies provide new open challenges for IoT by Li et al. [16]. Internet of things is built on the basis of challenges faced due to wireless sensor networks and radio frequency identification (RFID) tags. There are layers of presentation in IoT, viz. application layer, perception layer, presentation layer and transportation layer as well. Time and memory are some of the physical constraints, while other limitations include energy processing, etc. There is a huge amount of data generated by IoT which needs to be set up in WSN-IoT networks or

372

K. Shrivastav and R. B. Battula

heterogeneous systems. There is a danger of security attacks, threats, etc., by Jing et al. [17]. Vehicular networks, smart cities and grids also come under IoT ranging of capacities in research areas related to memory networks, cellular networks, social networking, etc. Wireless sensor and actuator networks (WSANs), e-health issues, cloud computing and software defined networking come under the latest areas of research. For cellular networks and machine-to-machine communications, there are cluster heads, cluster neighbors in them for a selection of suitable 5G networks. Medium access control (MAC) and routing protocols are also working with IoT in some or the other contexts related to networking or transport layers. Geographical information and geo-social distances come under location-based services of IoT by Rachedi et al. [18]. There is an implementation of multi-level time sensitive network protocol based on a platform of real-time communication systems using distributions in a variety of ways. Quality of service profiles have packet loss for global positioning system (GPS) acquiring or such signals’ connection with these systems. Sensors and adapted systems are there in which there is a need to calculate latency, throughput, etc., by Agarwal et al. [19]. IoT has effects on millions of devices, or sensors like smart phones, wearables, etc., which are deployed and used for various purposes. They are used in various platforms for smart farming, grids and manufacturers, etc. They have volumes, heterogeneous nature of data and useful data analysis of IoT. Cloud data and real-time data can be an essential part of IoT. Georgakopoulos and Jayaraman [20] presented security and privacy as important for developing large as well as small-scale devices. WSN comes across different areas of living or day-to-day life like embedded systems, sensors and actuators, etc., for future use. We have a scalability cloud for the enduser, meeting demands for the ranging of huge data amounts to be usefully utilized for IoT operations. These are in industry, government and non-government organizations by Gubbi et al. [21]. WSN is to be later involved in social IoTs (SIoTs) with the necessary resources and protecting them for privacy matters. Connecting people, objects and other vehicles, etc., are used for IoT applications. Source location protected protocol based on dynamic routing (SLPDR) is for the locating of source privacy policies. Transmission delays, secure networks and lifetime with different energy consumption levels are some of their characteristics presented by Han et al. [22]. Low-cost devices of IoT are playing an important role in research, development and the sensor deployment for the integrated software and hardware components in this environment. Standard protocols, development at ease, simulation or emulation are used with contiki and routing protocols. Storage, memory capabilities, efficiency and connectivity are some important terms associated with these protocols by Zikria et al. [23]. From an application point of view, IoT-based museum processes are for the preservation of security with AI and IoT for maintaining infrastructure, wireless systems, etc. For preserving the culture using smart city systems, sensor networks and web are some important things associated with the test, validity and other investigative cases in a proper fashion by Konev et al. [24].

31 WSN-IoT Integration with Artificial Intelligence: Research …

373

5G-enabled IoT is used for backbone or strength of data in the block chain industrial automation with smart city, home, agriculture and healthcare applications. Billions of things/data connected together with each other for various 5G applications involve different devices and protocols with the centralized architecture. Industryrelated applications for high connectivity among different networks are specifically useful for healthcare systems, and other dynamic processes by Mistry et al. [25]. Hybrid techniques are used for IoT applications with the sensors, mobile devices, cluster nodes, data in real time and other methods. Low costs, minimum energy utilization and management are simply useful for hybrid approaches of chip on which WSN and IoT are integrated as system-on-chip (SoC), sensor arrays, power supply, distribution units and wireless communication interfaces. In this integration process, various nodes act in a different way. Nowadays, AI and WSN are the most important IoT conceptualization for the cyber security approaches of smart city monitoring and e-governance systems. This can be picturized with its application in various sectors by Sundhari and Jaikumar [26]. There are virtual and real devices for smart and intelligent management of things. Networking, processing, security and privacy are some important terms related to IoT things and devices. Saving time, decision making methods, health care, home automation and smart cities, vehicles and parking are some tasks and systems where governing is done by IoT. Multimedia, wireless ad hoc networks (WANETs) can also be part of such IoT options by Goyal et al. [27]. Fog computing is a term associated with real-time applications of IoT and latency-related areas. Linking of IoT smart systems, cloud centers, business associated with potential technologies and environments has led to the thinking about weaknesses and strengths of the IoT technologies with AI/ML. Interoperability is also important for these sections of society by Zahmatkesh and Turjman [28]. IoT things have huge heterogeneous devices with data that will collaborate with other sensor networks for achieving QoS and QoE. Clustering in the form of cluster members and cluster heads is modeled according to the selection of cluster heads for secure communication. It includes a different number of cluster heads used in sensor networks. Quality of experience is important for 5G purposes in terms of coverage, latency, longer lifetime and reliable issues to provide fast, secure systems by Kalkan [29]. IoT is a new communication area or technology used for people, things and physical or biological entities. Agriculture sectors, live field views, financial areas get affected due to IoT environments. If there is an agricultural sector, then land area, crop, field size, water, humidity, all are the parameters for efficient monitoring of agricultural land and validation of performance in real-time areas. Computer, Internet and IoT communication technologies/areas are giving best outcomes in the government sector. They use RFID, Internet protocols (IP), etc. IoT deals with data updation to the level of balanced energy consumption among sensors by Pachayappan et al. [30]. Health care, smart cities and fitness are many of the different applications of IoT. There are body sensor networks with their unique advantages and leading to motivations for different features, decision and data extraction to the various levels of applying logic-wise sensors at different times. Fusion leads to different sensor

374

K. Shrivastav and R. B. Battula

data collected from body parts associated with the cloud level for scalable workflow connection and security by Gravina et al. [31]. Sensors fulfill the task in an efficient or inefficient manner depending on how they are used for all applications in smart things, computations, energy consumed and transmission range. These parameters are decided at various stages for a small or large set of devices. Deploying sensors for collection of data and communication among users at end-side or middle level is associated with topology related contexts. Mobility, bio-inspired protocols, latencies and challenges are features associated with motivation and guidance to researchers Hamidouche et al. [32]. A conventional approach or a non-conventional approach such as dynamic security is used for machine learning applications to overcome attacks, and powerful technologies need to be developed for secure approaches. Spoofing, denial of service and jamming are some problems of WSN which lead to changing things in the future if algorithmic models are developed for them. They present comfort based, easy going life, smoothness and secure things against attacks or unwanted effects with the help of ML-based algorithms for filling gaps in such technologies by Tahsien et al. [33]. Spectrum is shared with cognitive radio networks with IoT for solutions in smart things associated with packets and their scheduling. The worldwide network has all sensor devices connected for handling latency, short or long time responses. Fairness of queuing delay, packets dropped or complexity associated issues lead to nodes, servers, IoT devices that deal with complex issues of spectrum and index by Tarek et al. [34]. Systems based on IoT and other service technologies are an important criterion for web services and heterogeneous fusion of information. There are RFID, sensors, GPS and other laser scanningbased approaches. Intelligent transport, protection of environment, work of government and monitoring of industries with position and track of things can have smart home and residential capabilities. Applications of research and new modern stages have homes, intelligent district and component techniques available for future-based applications proposed by Li and Yu [35]. Major research issues for WSN-IoT toward AI. In collaboration with WSN-IoT, there is a need for analysis of the algorithmic models and approaches for IoT. These are formatted according to some parameters which are prominent features or research areas for intelligent systems. • • • • • • •

IoT connectivity with AI Heterogeneity at different levels among devices Scalability of the networks where the IoT and AI exist together Size, shape and dynamic nature of the network Choice of appropriate AI algorithms/protocols for WSN/IoT applications Clustering based on AI techniques AI-oriented WSN/IoT layers.

Summary of Existing WSN, IoT Algorithms/Protocols. The following Table 1 summarizes some recent algorithms/protocols for clustering/routing in WSN-IoT systems.

31 WSN-IoT Integration with Artificial Intelligence: Research …

375

Table 1 Algorithms/Protocols associated with WSN-IoTs and their advantages Algorithm/Protocol

Full form of algorithm/Protocol

Advantages/Methods

PE-WMoT [1]

Power efficient multimedia of things

Cluster head, sub-cluster head, Fuzzy-technique for network period enhancement

GWOAP algorithm [2]

Gray wolf optimization affinity Fitness function used for propagation minimizing the cost of communication in SDN-IoT networks. They are deployed with multiple microcontrollers in smart cities for balancing traffic load

Data compression algorithm used [5]

ALDC algorithm modified

Data bits reduced in size, compression is done for energy efficiency

Distance-based adaptive step function is introduced [6]

Adaptive modulation with clustering

Minimum energy consumption between cluster members and cluster heads

DXN [9]

Dynamic AI-based analysis and optimization of IoT networks

Nodes are freely available and the lifetime of sensor nodes is more with balance in energy consumption

LiM-AHP-G-CP [15]

Lifetime maximizing based on IoT cluster heads selection and analytical hierarchical process their hop decision and genetic clustering protocol

TSN [19]

Multi-level time sensitive networking (TSN) protocol

Network traffic and data sending on the basis of priority set

SLPDR [22]

Source location protection protocol based on dynamic routing

Boundary nodes used for packet forward process with sending of dummy packets

HCSM and DSOT [26]

IoT-assisted hierarchical computation strategic making, dynamic-assisted stochastic optimization technique

Enhancing lifetime and sustainability for smart buildings and cities in IoT contexts

SUTSEC [29]

SDN utilized secure clustering Energy efficient mechanism communication is reliable and less preferences user contexts key distributions

3 WSN-IoTs and AI Collaboration Earlier developments in WSN-IoT have been a motivating factor for their interconnection with AI, so there are some common regions of research interests in WSN-IoT or AI-IoT or WSN-AI. More prominently, a joint WSN-IoT-AI intelligent system can be developed for utilizing all these three sectors in an interdependent manner. This is

376

K. Shrivastav and R. B. Battula

Fig. 1 Representation of common areas/regions of WSN, IoT and AI

Fig. 2 Top to bottom layered approach for WSN-IoT with AI

shown below in Fig. 1. AI techniques with minor or major changes can be designed to work with WSN-IoT according to desired applications. Layered architecture in the form of top to bottom approach is depicted in Fig. 2. Different layers for the working of WSN/IoT have been in existence since the development of their protocols or algorithms. Presently, there is a need for the top to bottom layered architecture in which there are some unique layers on which WSN/IoT can work with AI to form smart and intelligent systems like smart cities, industries or cyber-physical systems. If there is going to be a connection of WSN-IoT clustering with AI algorithms, then the flowchart in Fig. 3 below provides a brief and concise idea for the development of an intelligent system.

31 WSN-IoT Integration with Artificial Intelligence: Research …

377

Fig. 3 Flowchart for AI algorithm in clustering of WSN-IoT intelligent systems

4 Conclusion and Future Scope WSN primitively has been an essential network in the field of wireless communicating devices which are small in size and low in cost. As the years passed, a new term came as multimedia of things. Later with the development of the devices, which can be connected with the Internet, WSN-IoT came into existence. Further, clustering/routing approaches have been developed to work together with AI algorithms/protocols. AI algorithms can be applied for WSN-IoT networks for collection, aggregation, clustering and sending of data to solve the issues related to transmission of data packets, balanced power consumption and energy efficiency. In the future, WSN-IoT with AI can work together by considering various communication parameters like mobility, security, heterogeneity and reliability to achieve QoS and QoE. AI-based intelligent systems need to be designed and developed for different real-world applications. Analysis of such systems and comparison with previously developed models will prove their efficiency and robustness. Mathematical modeling of data collected by heterogeneous sensors and clustering protocol with AI approach could be a motivating factor which will attract the attention of researchers to work in this field.

References 1. Salah ud din M, Rehman MAU, Ullah R, Park CW, Kim DH, Kim B (2021) Improving resourceconstrained IoT device lifetimes by mitigating redundant transmissions across heterogeneous wireless multimedia of things. Digital Commun Netw Elsevier J 1–17 2. Keshari SK, Kansal V, Kumar S (2021) A cluster based intelligent method to manage load

378

3. 4. 5.

6.

7.

8. 9. 10. 11.

12. 13. 14. 15.

16. 17. 18. 19.

20. 21. 22.

23.

24.

K. Shrivastav and R. B. Battula of controllers in SDN-IOT networks for smart cities. Scalable Comput: Pract Experience 22(2):247–257 Barshandeh S, Masdari M, Dhiman G, Hosseini V, Singh KK (2021) A range-free localization algorithm for IoT networks. Int J Intell Syst 1–44 Sharma P, Jain S, Gupta S, Chamola V (2021) Role of machine learning and deep learning in securing 5G-driven industrial IoT applications. Ad Hoc Netw 123(3):1–38 Ketshabetswe KL, Zungeru AM, Mitengi B, Lebekwe CK, Prabaharan SRS (2021) Data compression algorithms for wireless sensor networks: a review and comparison. IEEE Access 9:136872–136891 Abu-Baker A, Alshamali A, Shawaheen Y (2021) Energy-efficient cluster-based wireless sensor networks using adaptive modulation: performance analysis. IEEE Access 9:141766– 141777 Kumar GEP, Lydia M, Levron Y (2021) Security challenges in 5G and IoT networks: a review. In: Velliangiri S, Gunasekaran M, Karthikeyan P (eds) Secure communication for 5G and IoT networks. EAI/Springer innovations in communication and computing. Springer, Cham Chen Y, Liu J, Siano P (2021) SGedge: stochastic geometry-based model for multi-access edge computing in wireless sensor networks. IEEE Access 9:111238–111248 Lami I, Abdulkhudhur A (2021) DXN: dynamic AI-based analysis and optimization of IoT networks connectivity and sensor nodes performance. Signals 2:570–585 Sahar G, Bakar KA, Rahim S, Khani NAKK, Bibi T (2021) Recent advancement of data-driven models in wireless sensor networks: a survey. Technologies 9(76):1–26 Kassab W, Darabkh KA (2020) A–Z survey of internet of things: architectures, protocols, applications, recent advances, future directions and recommendations. J Netw Comput Appl 163:1–49 Colakovic A, Hadzialic M (2018) Internet of things (IoT): a review of enabling technologies, challenges and open research issues. Comput Netw 144:17–39 Serpen G, Li J, Liu L (2013) AI-WSN: adaptive and intelligent wireless sensor network. Procedia Comput Sci 20:406–413 Fouad MM, Oweis NE, Gaber T, Ahmed M, Snasel V (2015) Data mining and fusion techniques for WSNs as a source of the big data. Procedia Comput Sci 65:778–786 Darabkh KA, Kassab WK, Khalifeh AF (2020) Maximizing the lifetime of wireless sensor networks over IoT environment. In: Fifth international conference proceedings on fog and mobile edge computing (FMEC). IEEE, Paris, France, pp 1–5 Li S, Xu LD, Zhao S (2015) The internet of things: a survey. Inf Syst Front 17:243–259 Jing Q, Vasilakos AV, Wan J, Lu J, Qiu D (2014) Security of the internet of things: perspectives and challenges. Wireless Netw 20:2481–2501 Rachedi A, Rehmani MH, Cherkaoui S, Rodrigues JJPC (2016) The plethora of research in internet of things (IOT). IEEE Access Editorial 4:9575–9579 Agarwal T, Niknejad P, Barzegaran MR, Vanfretti L (2019) Multi-level time-sensitive networking (TSN) using the data distribution services (DDS) for synchronized three-phase measurement data transfer. IEEE Access 7:131407–131417 Georgakopoulos D, Jayaraman PP (2016) Internet of things: from internet scale sensing to smart devices. Computing 1–18 Gubbi J, Buyya R, Marusic S, Palaniswami M (2013) Internet of things (IoT): a vision, architectural elements, and future directions. Future Gener Comput Syst 29(7):1645–1660 Han G, Zhou L, Wang H, Zhang W, Chan S (2018) A source location protection protocol based on dynamic routing in WSNs for the social internet of things. Futur Gener Comput Syst 82:689–697 Zikria YB, Afzal MK, Ishmanov F, Kim SW, Yu H (2018) A survey on routing protocols supported by the Contiki internet of things operating system. Futur Gener Comput Syst 82:200– 219 Konev A, Khaydarova R, Lapaev M, Feng L, Hu L, Chen M, Bondarenko I (2019) CHPC: a complex semantic-based secured approach to heritage preservation and secure IoT-based museum processes. Comput Commun 148:240–249

31 WSN-IoT Integration with Artificial Intelligence: Research …

379

25. Mistry I, Tanwar S, Tyagi S, Kumar N (2020) Blockchain for 5G-enabled IoT for industrial automation: a systematic review, solutions and challenges. Mech Syst Signal Process 135:1–21 26. Sundhari RPM, Jaikumar K (2020) IoT assisted hierarchical computation strategic making (HCSM) and dynamic stochastic optimization technique (DSOT) for energy optimization in wireless sensor networks for smart city monitoring. Comput Commun 150:226–234 27. Goyal P, Sahoo AK, Sharma TK (2021) Internet of things: architecture and enabling technologies. Mater Today: Proc 34(3):719–735 28. Zahmatkesh H, Turjman FA (2020) Fog computing for sustainable smart cities in the IoT era: caching techniques and enabling technologies-an overview. Sustain Cities Soc 59:1–15 29. Kalkan K (2020) SUTSEC: SDN utilized trust based secure clustering in IoT. Comput Netw 178:1–11 30. Pachayappan M, Ganeshkumar C, Sugundan N (2020) Technological implication and its impact in agricultural sector: an IoT based collaboration framework. Procedia Comput Sci 171:1166– 1173 31. Gravina R, Alinia P, Ghasemzadeh H, Fortino G (2017) Multi-sensor fusion in body sensor networks: state-of-the-art and research challenges. Inf Fusion 35:68–80 32. Hamidouche R, Aliouat Z, Gueroui AM, Ari AAA, Louail L (2018) Classical and bio-inspired mobility in sensor networks for IoT applications. J Netw Comput Appl 121:70–88 33. Tahsien SM, Karimipour H, Spachos P (2020) Machine learning based solutions for security of internet of things (IoT): a survey. J Netw Comput Appl 161:1–18 34. Tarek D, Benslimane A, Darwish M, Kotb AM (2020) A new strategy for packets scheduling in cognitive radio internet of things. Comput Netw 178:1–11 35. Li B, Yu J (2011) Research and application on the smart home based on component technologies and internet of things. Procedia Eng 15:2087–2092

Chapter 32

Time Window Based Recommender System for Movies Madhurima Banerjee, Joydeep Das, and Subhashis Majumder

1 Introduction Shopkeeper: “Good Morning, how can I help you?” Customer: “Can you show me some blue shirts?” Shopkeeper: “Yes, of course, there are several designs in blue shirts you’ll get in my shop. . ..here they are. . .I think this will definitely look good on you.” Customer: “Yes, you are right” Shopkeeper: “Since it is a formal shirt, I guess you would need a trouser and a tie. Let me show you some choices in trousers and tie.” Customer: “Sure”. Above is a very common scenario—a shopkeeper analyses the requirement of a customer and then recommends items to the customer. Reasons for the recommendation: 1. The customer cannot possibly know what all options are available for a certain item that is sought in that shop. 2. To make the options readily available to the customers so that he does not get frustrated fishing items 3. To make him feel important. 4. To show the customer more items associated with the item he wants to purchase. 5. To increase the sale of the items in the shop. We find above that recommendation system always existed in our life and was always in practice. E-commerce highly needed recommendation system to add a personal touch to the selling process that otherwise would be missing in an online platform. M. Banerjee (B) · J. Das The Heritage Academy, Kolkata, WB, India e-mail: [email protected] J. Das e-mail: [email protected] S. Majumder Dept. of Computer Sc. & Engg, Heritage Institute of Technology, Kolkata, WB, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_32

381

382

M. Banerjee et al.

More importantly, with the availability of data over the internet, the concept of recommender systems has successfully emerged as a support system serving the customers. Recommender systems recommend items and products to user, depending on the requirement and liking of the user [2, 9, 13, 14]. The system analyzes the need of a customer by leveraging on the existing data from the internet and generates a list of items that the customer may possibly have interest in. Content based filtering, Collaborative filtering, Hybrid filtering are the three well known methods of generating recommendations, but with growing availability of data and need for better recommendations, the above methods are combined with other dimensions like context, demography and time. Demographic parameters like age, gender, locality, might affect the preference of people towards various items. Huynh et al. rightly observed that [8]—“Similar users may prefer different items in a different context.” Thus, time might also prove to be an important factor in influencing preferences of user. In our work, we have considered the dimension of time within our recommendation algorithm. Let us consider that User A used the site and rated movies in the site in year X. User B used the same site 10 years later. Now the question is if user-user collaborative filtering is used, should User A be considered as a suitable neighbor to recommend movies to User B? Internet is getting overloaded with information about users. Going with the example of movie recommending sites, there might be information in the system ranging over several years. Now as database goes on building, so does the un-clustered dataset for recommendation prediction and as a result, time for recommendation also increases. The aim of this paper is to find out how the temporal context can be used to cluster the dataset so that the recommendation quality can be made more accurate and also the recommendation process can be scaled. The rest of the paper is organized as follows in Sect. 2, we have provided background information and past work related to context aware recommender systems. Section 3 outlines our contribution while Sects. 4 and 5 present our clustering and Recommendation schemes respectively. In Sect. 6, we describe our experimental settings while in Sect. 7, we report and analyze our results. We conclude in Sect. 8 discussing our future research directions.

2 Related Work In recent years, there has been an increasing trend of incorporating contextual information in the recommendation algorithms [10, 12]. Das et al. proposed a scalable collaborative filtering algorithm by clustering the users of the system on the basis of contextual attributes [4]. The authors utilized age, gender and time as the contextual attributes and projected the ratings of the users as vectors in the contextual space. Qi et al. has mentioned that in the Quality of Service recommender system, time context makes the data dynamic [15]. Without time, the QoS data would become static, which would not give accurate result in the recommender system. In their work, they have extended the traditional LSH-based service recommendation system to incorporate a

32 Time Window Based Recommender System for Movies

383

time factor. They have proposed a new time-aware and privacy-preserving service recommendation approach to improve the recommendation accuracy. In another work, Ahmadian et al. has pointed out that most of the recommendation methods harp on prediction accuracy, but sanctity of data being used should also be considered. They have pointed out that likes and dislikes of a user is most likely to change over time, therefore, it is important to consider the time factor in the recommendation model as well [1]. De Zwart has showed that user-user correlation of users have changed when time has been taken into consideration [5]. The author incorporates the temporal recency factor which means, more recent ratings should contribute more to the prediction. In his paper on temporal dynamics, Koren [11] has stated that one challenge of recommendation system is data that drifts with time. Now, in almost every aspect of recommendation, we get drifting data. Thus data and preferences change with time. Apart from time, some other circumstantial aspects are present as well. Wasid and Ali have also worked on movie recommendation system. According to them, traditional recommendation systems do not work efficiently for clusters that are based on multiple criteria [17]. They have proposed a method where, neighbor of a user is found based on their Mahalanobis distance within the user cluster. Recently, a time aware music recommender system have been proposed [16]. They have stated that for music recommendation, time is a very important factor since preference of user in music changes over time. They have stated that music as a product is different where one user can listen to the same music several times. Secondly, unlike most other products, music, or a song is not bought singularly. In their work, they have considered the “time of day” as a context for recommending music.

3 Our Contribution In this manuscript, we are trying to study and establish that time is an important context when we consider user-user collaborative filtering. In our work, we have proposed a clustering approach where one of the clusters will be identified as the cluster of contemporary users. The concept of contemporary users as considered in this paper is as follows. Let T Su,m denote the timestamp of a user u rating a movie m. We convert this timestamp into years in order to find the active year of the user u. Since u might have rated more than one movie, we find all the active years of u. Then an average of these active years have been calculated and it is used as the pivotal year (T Su ) of u. A user x is considered to be a contemporary user of the target user u if the timestamp T Sx is in between T Su − n and T Su + n, where, the quantity 2n + 1 is a chosen number of years for deciding whether two users will be considered to be contemporary. The year range {T Su − n to T Su + n} has been considered as u’s contemporary years. In other words, if timestamp T Sx of a user x falls within the range of contemporary years of u, then x will be considered to be a contemporary user in relation to user u. All other users in the database have been divided into sets U1 , U2 , · · · Un , where the users in these sets fall into different timestamps beyond contemporary years.

384

M. Banerjee et al.

Now our aim is to show that predictions computed using the preferences of the contemporary users of a target user yield better results. At first, the years have been divided into a range of five years taking T Su as a pivotal year and it is found that the data sparsity beyond 10 years is too high to consider. Finally, after some deliberation, the range of the years for a target user u with timestamp T Su has been decided to be T Su − 12 to T Su + 12, i.e, 12 years preceding and succeeding the pivotal year of the target user. The range has been chosen such that adequate similar users are available to estimate the result. The year of rating is considered to cluster the users into windows of {5, 7, 9} years and we intend to find the window that gives the best prediction corresponding to the target user.

4 Clustering Scheme The clustering scheme used in this work is very simple. In most clustering methods, users are assigned to a pre-defined cluster [3]. To study the importance of context of time, in this work, clusters have been created around individual users, and then we have reached to a conclusion as to whether time is making any difference to the recommendations. We already stated that T Sc is the average of years of rating in the data-set for the target user c. Now, for all other users t in the dataset timestamp T St is calculated and the clustering is done as per the procedure presented in Algorithm 1. An example of our clustering process is shown pictorially through Figs. 1, 2 and 3. In the figures, T Sc indicates timestamp of the target user c and T St indicates timestamp of another user t in reference to timestamp of user c. Note that, in Figs. 1, 2 and 3, the colored bubbles represent the clusters of contemporary users.

Algorithm 1: Clustering the Users

1 2 3 4 5 6 7 8 9 10 11

Input: Set of users U , y, T Sc Output: Clusters of users based on T Sc A cluster for contemporary users C L x is formed such that time stamp x of users in the cluster lies between: T Sc − y/2 to T Sc + y/2 Let t = T Sc − y/2 while (t > T Sc − 12) do Cluster C L y is formed such that time stamp of every user in the cluster lies in window of max(t − y, T Sc − 12) to (t − 1) years; t = t − y; end Let t = T Sc + y/2; while (t < T Sc + 12) do Cluster C L y is formed such that time stamp y of every user in the cluster lies in window of (t + 1) to min (t + y, T Sc + 12) years; t = t + y; end

32 Time Window Based Recommender System for Movies

385

Fig. 1 Clustering for time slot of 5 years

Fig. 2 Clustering for time slot of 7 years

Fig. 3 Clustering for time slot of 9 years

5 Recommendation Scheme In this work, we have used user-user collaborative filtering, where the correlation between two users has been calculated using the Pearson’s correlation coefficient. Following is the formula for Pearson correlation. Suppose we have two users p and q. The Pearson’s correlation coefficient and Vector cosine similarity between them are calculated using equation 1.  sim ( p, q) =  i∈I

   r p,i − r¯ p rq,i − r¯q i∈I  2   2 r p,i − r¯ p rq,i − r¯q

(1)

i∈I

where I is the set of items rated by both users p and q. r¯ p and r¯q are the average ratings given by p and q while r p,i and rq,i are respectively ratings of user p and user q on item i. For all the users having Pearson correlation greater than a threshold value, prediction for the target user u for an item p is calculated as

386

M. Banerjee et al.

predu, p = r¯u + k

n 

  sim (u, j) r j, p − r¯ j

(2)

j=1

where n denotes the number of users similar to u, while r¯u and r¯ j represents the average rating of user u and its neighboring user j respectively. k is a normalizing factor and sim (u, j) calculates the correlation or similarity between u and j. After prediction, the Root Mean Square Error (RMSE) [7] for the training set is calculated. Three set of results for each user have been calculated with the threshold Pearson correlation being taken as 0.55, 0.65, 0.75. We calculate the Pearson correlation of a target user with every other user in the system. The entire recommendation module is presented in Algorithm 2. In Algorithm 2, the subroutine User_Cluster() clusters the users using Algorithm 1.

Algorithm 2: Recommendation Algorithm

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Input: Pearson correlation threshold r , set of users L Output: RMSE of predicted recommendation for every cluster for each user c in L do for each user I in L do Find PCc,I ;  PCc,I = Pearson correlation between c and I end end for y in {5,7,9} do for each user c in L do Call User_Cluster (L − c, y, T Sc ) ;  T Sc = Timestamp of user c for each cluster CL do for each r in {0.55, 0.65, 0.75} do For all user I in C L, find RMSE of prediction for c using each I where PCc,I > r ; end end end for each r in {0.55, 0.65, 0.75} do for each user c in L do Find the cluster C L with minimum RMSE for the user c for Pearson correlation r ; end end Count the number of minimum RMSE under each cluster; Find percentage of minimum RMSE in each cluster; end

32 Time Window Based Recommender System for Movies Table 1 ML-10M dataset: example User Item 1 1 1 1 1

122 185 231 292 316

387

Rating

Timestamp

5 5 5 5 5

838985046 838983525 838983392 838983421 838983392

6 Experimental Settings 6.1 Data Description We have tested our algorithms on MovieLens-10M (ML-10M) [6] dataset. The dataset contains 10000054 ratings from 71567 users on 10681 movies. These ratings are integers on a scale from 1 to 5. Note that, ML-10M dataset includes only those users who have rated at least 20 movies. The dataset also contains the timestamp of the ratings in seconds since the Unix epoch (01 Jan 1970, 00:00:00 (UTC)). An example of the dataset is shown in Table 1. From the data reported in Table 1, timestamp T Su,m of a user u for a movie m is calculated in years as follows: T Su,m = Round(Timestampu,m /31, 536, 000)

(3)

where T imestampu,m is the timestamp of user u rating movie m as given in the table, and 31, 536, 000 is the number of seconds in a non-leap year of 365 days. To keep things simple we avoided considering the leap year separately. In any case, even if it was considered, it would have just shifted the boundaries between the years slightly, from where they are now. The timestamp given in Table 1 is expressed in seconds. We have divided these timestamps by 31, 536, 000 to convert the seconds into years. For example, let us consider the timestamp 83, 89, 85, 046 given in Table 1. Then 83, 89, 85, 046 / 31, 536, 000 gives 26 years. Now since the dataset contains the timestamps of the ratings in seconds since the Unix epoch (01 Jan 1970, 00:00:00 (UTC)), 26 years signify the year 1970 + 26 = 1996. Since 1970 has already been specified as the zeroth year in the dataset, and in this paper we have not considered any context which depends on the marked year in the Gregorian Calendar, we have simplified the calculation process and have considered 26 as the active year of the user. One user can rate several movies in different timestamps. We consider all the years calculated from the different timestamps as the active years of that user. The arithmetic mean of the active years of a target user is termed as the pivotal year around which the clusters for the target user will be created. The pivotal year T Su for a user u is calculated as follows:

388

M. Banerjee et al.

⎛ Round ⎝ T Su =

n 

⎞ T Su, j ⎠

j=1

(4)

n

where n is the number of movies u has rated and T Su, j is the timestamp (in years) when u gave the rating to movie j. Note that T Su, j is calculated using Eq. 3.

6.2 Evaluation Metric In order to evaluate the accuracy of our proposed recommendation approach, RMSE (Root Mean Square Error) metric has been used. We have calculated RMSE for every target user considering each and every cluster of 5, 7 and 9 years separately. RMSE for a user u considering a cluster C L is defined below. n  (ratingActualu, j − ratingPredictedu, j,C L )2

RMSEu,C L =

j=1

n

(5)

where ratingActualu, j is the original rating given by a user u to movie j and rating Pr edictedu, j,C L is the predicted rating of user u for movie j calculated by the algorithm on the basis of similar users in cluster C L, and n is the total number of movies rated by u. For a user in the training dataset, all movies rated by that user is predicted using each of the clusters created. Lower RMSE values denote better prediction accuracy. If RMSEu,C L < RMSEu,C L 1 where C L and C L 1 are two different clusters, then the predictions computed on the basis of similar users in cluster C L are better (closer to actual rating) than the predictions computed on the basis of similar users in cluster C L 1 .

7 Results and Discussions In Tables 2 and 3 we have tabulated the RMSE values of two sample users user1 and user2 for every 5 years clusters, where the clusters are created based on timestamps of user1 and user2 respectively. In Table 2, we notice that first two clusters (2nd and 3rd column) contain no RMSE data. This is because no similar users corresponding to the time range of first and second clusters were available for user1 and thus no prediction could be calculated. Similarly, in Table 3, no similar users corresponding to the time range of the first cluster (2nd column) was available for user2 and therefore no prediction could be calculated.

32 Time Window Based Recommender System for Movies Table 2 RMSE data for user1 using 5 years clusters Pearson T St − 12 to T St − 7 to T St − 2 to correlation T St − 8 T St − 3 T St + 2 0.55 0.65 0.75

– – –

– – –

0.456399 0.413239 0.397759

Table 3 RMSE Data for user2 using 5 years clusters Pearson T St − 12 to T St − 7 to T St − 2 to correlation T St − 8 T St − 3 T St + 2 0.55 0.65 0.75

– – –

0.659119 0.390583 0.369447

0.504781 0.437442 0.387289

389

T St + 3 to T St + 7

T St + 8 to T St + 12

0.482681 0.459307 0.462992

0.548482 0.5488 0.581031

T St + 3 to T St + 7

T St + 8 to T St + 12

0.55534 0.4728 0.406254

0.618151 0.66903 0.626839

Table 4 Percentage of target users having similar users in 5 years clusters T St − 12 to T St − 7 to T St − 2 to T St + 3 to T St − 8 T St − 3 T St + 2 T St + 7 24.62311558

60.8040201

100

94.97487437

Table 5 Percentage of target users having similar users in 7 years clusters All years = T St + 11 39.1959799

In Tables 4, 5 and 6 we have tabulated the percentage of target users having similar users in the different clusters, i.e., we have calculated the number of similar users (considering the entire ML-10M data) in each cluster for each user in the training dataset. To explain further, let us consider an example. In Tables 2 and 3, we find that the RMSE value in the first cluster (2nd column) for both the users (considering Pearson Correlation 0.55) are empty. It implies that for both the users, similar users are not available in that cluster. So, if we consider the 2 users as the only users in the system, then the percentage of users having similar users in the first cluster would be (0/2 ∗ 100) = 0%. For the 2nd cluster (3rd column), user1 does not have any RMSE value, hence no similar users. However for user2, we have got RMSE values. Thus 1 out of 2 users have similar users in second cluster and accordingly the percentage of users having similar users in the 2nd cluster is (1/2 ∗ 100) = 50%. In this way we calculate the percentages in the different clusters. From the data reported in Tables 4, 5 and 6, it is clear that the density of available data is 100% for time slot containing the contemporary years (T St − 2 to T St + 2), whereas it is evident that density of data reduces as we go beyond the contemporary

390

M. Banerjee et al.

Table 6 Percentage of target users having similar users in 9 years clusters All years < = T St − 5 T St − 4 to T St + 4 All years > = T St + 5 41.20603015

100

80.90452261

Table 7 Percentage of minimum RMSE under time slot of 5 years cluster Year slot of 5 years Pearson T St − 12 to T St − 7 to T St − 2 to T St + 3 to correlation T St − 8 T St − 3 T St + 2 T St + 7 0.55 0.65 0.75

2.010050251 4.020100503 3.015075377

7.035175879 3.51758794 6.030150754

83.41708543 85.42713568 87.93969849

7.537688442 6.532663317 2.512562814

Table 8 Percentage of minimum RMSE under time slot of 7 years cluster Year slot of 7 Years Pearson All years = T St + 11 0 0 0.502512563

Table 9 Percentage of minimum RMSE under time slot of 9 years cluster Year slot of 9 years Pearson correlation All years = T St + 5 0.55 0.65 0.75

6.030150754 4.020100503 5.527638191

93.46733668 94.97487437 91.45728643

0.502512563 1.005025126 3.015075377

years. Therefore it can be concluded that there is a considerable improvement in data density when data is clustered around the active years of a target user. From Table 5 it is clear that beyond 10 years the density of available similar users has reduced drastically. So it can be said that as we move away from the contemporary years, it is less likely to get similar users corresponding to a target user. In Tables 7, 8 and 9 we have tabulated the percentage of minimum RMSE obtained from each cluster. In the clusters of Tables 7, 8 and 9, we have reported the percentages of users for whom we got the best RMSE (minimum) from that cluster for the different Pearson’s correlation values. The best results have been marked in blue and bold. For example, in Table 7, we can see that the percentage of minimum RMSE for the 1st cluster for Pearson Correlation of 0.55 is 2.01. It means that in the training dataset, for only 2% of the population, best results were obtained from the 1st cluster, whereas,

32 Time Window Based Recommender System for Movies Table 10 Average RMSE for pivotal cluster Pearson No of clusters Clustering on 5 correlation years 0.55 0.65 0.75

0.4377384 0.38532707 0.359144155

0.408510402 0.347097256 0.308908849

391

Clustering on 7 years

Clustering on 9 years

0.417659111 0.357899407 0.324361211

0.42305104 0.365267116 0.335202503

Fig. 4 RMSE comparisons for different Pearson Correlation values

for 83.42% of the population, best results were obtained from the 3rd cluster. From Tables 7, 8 and 9, it is clear that in more than 80% of the cases the minimum RMSE for the individual users have been obtained from the cluster of contemporary years. We have presented the average RMSE of the time slot based clusters and average RMSE value of the unclustered data in Table 10. We observe in Table 10 that irrespective of the Pearson’s correlation value the average RMSE for 5 years cluster is minimum (marked in blue), indicating that clustering users in clusters of 5 years around the pivotal year gives much better results than the unclustered data and the other clusters considered in this study. From the subgraphs of Fig. 4, it can be concluded that percentage of minimum RMSE and Average RMSE both increase with larger time slots. This is quite obvious because when the number of years in the time

392

M. Banerjee et al.

slot increases, more users get clustered in the larger time slots resulting in maximum number of minimum RMSE falling in the larger time slot. Thus we see as number of years increases, the bars in the chart becomes taller for all the three Pearson Correlation values. Again from Fig. 4 it is clear that Average RMSE for 5 years slot is minimum since due to context of time, the users falling in 5 years time slot, i.e., T St − n and T St + n, n = 2 is closer to the target user, yielding better results than 7 years or 9 years which have users with larger difference in years, n > 2. Further to the above observation, from Table 10 we find that best result for average RMSE is given by time slot of 5 years and Pearson Correlation of 0.75. From this, inference can be drawn that 0.75 correlation filters better correlated users than correlations of 0.65 and 0.55, and applying time slot of 5 years gives very close neighbors yielding in good predictions. From Table 10 it is also clear that average RMSE for clustered data set is better than un-clustered data set, which does not consider the context of time.

8 Conclusion and Future Work Time is an important context when designing recommendation system. Since items change over time, the users, age group, items also change over a period of time. While clustering user based on time slot, important task is to judiciously choose the length of time. Too large time slot, or too narrow time slot, might yield bad predictions. Now in this work, we have studied the effect of time context at individual level. This method may help us to study the effect of context of time, but would not be feasible when predicting in real time environment, where clusters are predefined and for scalability, users are grouped in clustered during an offline process. Thus in future scope, a dynamic method of clustering around a user can be proposed to enhance prediction results.

References 1. Ahmadian S, Joorabloo N, Jalili M, Ahmadian M (2022) Alleviating data sparsity problem in time-aware recommender systems using a reliable rating profile enrichment approach. Expert Syst Appl 187:115849 2. Anelli VW, Di Noia T, Di Sciascio E, Ferrara A, Mancino ACM (2021) Sparse feature factorization for recommender systems with knowledge graphs. In: Fifteenth ACM conference on recommender systems, pp 154–165 3. Das J, Banerjee M, Mali K, Majumder S (2019) Scalable recommendations using clustering based collaborative filtering. In: 2019 international conference on information technology, pp 279–284 4. Das J, Majumder S, Mali K (2017) Context aware scalable collaborative filtering. In: 2017 international conference on big data analytics and computational intelligence (ICBDAC), pp 184–190

32 Time Window Based Recommender System for Movies

393

5. De Zwart T (2018) Time-aware neighbourhood-based collaborative filtering. Res Paper Bus Analyt 1–46 6. Harper FM, Konstan JA (2016) The movielens datasets: history and context. ACM Trans Inter Intell Syst 5(4) 7. Herlocker JL, Konstan JA, Terveen LG, Riedl J (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inform Syst 22(1):5–53 8. Huynh HX, Phan NQ, Pham NM, Pham V, Hoang Son L, Abdel-Basset M, Ismail M (2020) Context-similarity collaborative filtering recommendation. IEEE. Access 8:33342–33351 9. Jalili M, Ahmadian S, Izadi M, Moradi P, Salehi M (2018) Evaluating collaborative filtering recommender algorithms: a survey. IEEE Access 6:74003–74024 10. Jeong SY, Kim YK (2021) Deep learning-based context-aware recommender system considering contextual features. Appl Sci 12(1):45 11. Koren Y (2009) Collaborative filtering with temporal dynamics. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 447–456. KDD ’09 12. Kulkarni S, Rodd SF (2020) Context aware recommendation systems: a review of the state of the art techniques. Comp Sci Rev 37:100255 13. Lu J, Wu D, Mao M, Wang W, Zhang G (2015) Recommender system application developments: a survey. Decis Supp Syst 74:12–32 14. Musto C, Gemmis MD, Lops P, Narducci F, Semeraro G (2022) Semantics and content-based recommendations. In: Ricci F, Rokach L, Shapira B (eds) Recommender systems handbook, pp 251–298. Springer 15. Qi L, Wang R, Hu C, Li S, He Q, Xu X (2019) Time-aware distributed service recommendation with privacy-preservation. Inform Sci 480:354–364 16. Sánchez-Moreno D, Zheng Y, Moreno-García MN (2020) Time-aware music recommender systems: modeling the evolution of implicit user preferences and user listening habits in a collaborative filtering approach. Appl Sci 10(15):5324 17. Wasid M, Ali R (2018) An improved recommender system based on multi-criteria clustering approach. Proc Comp Sci 131:93–101

Chapter 33

Approximate Multiplier for Power Efficient Multimedia Applications K. B. Sowmya and Rajat Raj

1 Introduction Approximate computing is a novel method in digital design that minimizes the requirement for the exact calculation to gain considerable power, speed, and area advantages. This method is becoming increasingly significant for embedded and mobile systems that must operate under severe energy and speed limitations. In several error-tolerant scenarios, approximate computing can be advantageous. Multimedia processing, image multiplication, and machine learning are some examples. Multipliers are essential components in microprocessors, DSPs, and embedded systems and have a variety of uses, including filtering and convolutional neural networks. Unfortunately, due to their advanced logic architecture, multipliers are one of the most power-hungry digital components [1, 2]. A multiplier comprises three basic blocks: partial product production, partial product reduction, and carrypropagate addition. A partial product Pj, i is often produced by an AND gate (i.e., Pj, i = AiBj) where Ai and Bj are the ith and jth LSBs of the inputs A and B, respectively. Some of the partial product accumulation structures that are commonly used include Dadda, Wallace tree, and a carry-save adder array. The same technique is repeated until only two rows of partial products remain in each layer’s adders, which function concurrently without carrying propagation. Approximations can be added to any of these blocks, as shown in Fig. 1 [3]. Often, while optimizing one parameter, a restriction for the other parameter is taken into account. Specifically, getting the necessary performance (speed) while

K. B. Sowmya (B) · R. Raj RV College of Engineering, Bengaluru, India e-mail: [email protected] R. Raj e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_33

395

396

K. B. Sowmya and R. Raj

Fig. 1 Fundamental arithmetic operation of 4*4 unsigned multiplication

taking into account the restricted power budget of portable devices is a difficult issue. The multiplier is one of the most common arithmetic blocks used in a variety of applications, notably signal processing. There are two major designs for multipliers: sequential and parallel. While sequential designs consume little electricity, they have a relatively long delay. Parallel architectures, on either hand (e.g.., the Wallace tree and Dadda), are quick while consuming a lot of power. In high-performance applications, parallel multipliers are used to avoid hotspots on the device due to their high-power consumption [4].

2 Related Work In the literature, numerous designs have been presented. Narayanamoorthy et al. [5] proposed multiplier designs that could balance computational precision and energy consumption at the time of design. The suggested multiplier may operate with an average computational error of 1% while using less energy per operation than a precise multiplier. The design and study of two, approximate 4–2 compressors for use in a multiplier are presented by Momeni et al. in the paper [6]. The findings demonstrate that the suggested designs significantly reduce power dissipation, latency, and

33 Approximate Multiplier for Power Efficient Multimedia Applications

397

transistor count compared to an exact design. Additionally, excellent image multiplication competences are present in two of the proposed multiplier designs. Chang et al. [7] proposed approximate 4–2 compressor can save power consumption and delay by 56 and 39%, respectively, in comparison with the exact compressor. According to the simulation findings, the approximate 4–2 compressor-based multiplier may decrease power consumption and latency by 33 and 30%, respectively, when compared to the exact multiplier. Reddy et al. [8] propose designing a new, approximate 4-to-2 compressor. For optimal use of the proposed compressor and to minimize error, a modified Dadda multiplier architecture is presented. Some image processing applications analyze the multiplier’s effectiveness. The suggested multiplier typically processes images that are 85% structurally similar to the exact output image.

3 4:2 Compressors A 4:2 compressor is used to shorten the latency of the parallel multiplier’s partial product summation stage. As illustrated schematically in Fig. 2a, the compressor has four equal-weighted inputs (M1–M4) and an input carry (C in ), as well as two outputs (sum and carry) and an output C out [4]. The output sum is the same weight as the inputs, but Carry and C out have double the weight [9]. Carry is not reliant on C in because of the compressor’s architecture. An exact 4:2 compressor’s internal construction is made by serially coupling two full adders, as illustrated in Fig. 2b. C out , Carry, and Sum are given as expressions 1–3. Cout = M3(M1 ⊕ M2) + M1(M1 ⊕ M2)

(1)

Carry = Cin (M1 ⊕ M2 ⊕ M3 ⊕ M4) + M4(M1 ⊕ M2 ⊕ M3 ⊕ M4)

(2)

SUM = C ⊕ M1 ⊕ M2 ⊕ M3 ⊕ M4

(3)

The truth table for the exact compressor is shown in Table 1. Multipliers have a larger area, long latency, and high-power consumption. As a result, developing a multiplier with fast speed and less power is a major challenge. However, as area and speed are typically antagonistic, advances in speed results in more areas. To get a result, a variety (multiplicand) is multiplied by another number (multiplier) several times (product). In AI and DSP applications, multiplication is certainly a performance-deciding operation. Many applications need parallel operations at high speed with acceptable precision, which necessitates the need for high-speed multiplier designs. Approximation in multipliers enables faster computations with less power consuming hardware, complexity, latency and while retaining acceptable accuracy. Partial product summation is the multiplication step that cannot be completed quickly due to the propagation delay in adder networks. Compressors are used to shorten

398

K. B. Sowmya and R. Raj

Fig. 2 a Block diagram b Full adder-based 4:2 compressor configuration

propagation time. Compressors at each level compute the sum and carry it at the same time [4].

4 16*16 Bit Approximate Dadda Multiplier Architecture Two alternative approximate 4:2 compressor architectures are used to construct a 16 × 16 Dadda multiplier. One is a high-speed variant, while the other is a modified version of the first. As the error is higher with high-speed architecture, it will be utilized for the first few LSBs, while the later one will be used in the middle of the architecture as explained in Fig. 3. For the last few bits, we utilized the exact multiplier because they will have the most weightage in the final result [10]. The simulation results are compared with the exact multiplier. SUM is generated using a multiplexer (MUX)-based design technique. The XOR gate’s output serves as the MUX’s select line. (M3M4) is picked when the select line is high, while (M3 + M4) is selected when it is low. The suggested 4: 2 compressors can use OR gate to simplify carry generation logic by inserting an error with error distance (ED) 1 in the truth table of the exact compressor [11, 12]. The following Eq. (4–5) are the logical formulations for realizing SUM and CARRY. SUM = (M1 ⊕ M2)M3M4 + (M1 ⊕ M2)(M3 + M4)

(4)

CARRY = M1 + M2

(5)

For the binary input numbers 0011, 0100, 1000, and 1111, an error has been introduced. To guarantee that equal positive and negative deviation with error distance = 1 (minimum) is attained. The term “ED” refers to the difference between the exact and

33 Approximate Multiplier for Power Efficient Multimedia Applications

399

Table 1 Exact multiplier truth table A1

A2

A3

A4

C in

C out

CARRY

SUM

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

1

0

0

0

1

0

0

0

1

1

0

1

0

0

0

1

0

0

0

0

1

0

0

1

0

1

0

1

0

0

0

1

1

0

0

1

0

0

0

1

1

1

0

1

1

0

1

0

0

0

0

0

1

0

1

0

0

1

0

1

0

0

1

0

1

0

0

1

0

0

1

0

1

1

0

1

1

0

1

1

0

0

1

0

0

0

1

1

0

1

1

0

1

0

1

1

1

0

1

0

1

0

1

1

1

1

1

1

0

1

0

0

0

0

0

0

1

1

0

0

0

1

0

1

0

1

0

0

1

0

0

1

0

1

0

0

1

1

0

1

1

1

0

1

0

0

1

0

0

1

0

1

0

1

1

0

1

1

0

1

1

0

1

0

1

1

0

1

1

1

1

1

0

1

1

0

0

0

1

0

0

1

1

0

0

1

1

0

1

1

1

0

1

0

1

0

1

1

1

0

1

1

1

1

0

1

1

1

0

0

1

0

1

1

1

1

0

1

1

1

0

1

1

1

1

0

1

1

0

1

1

1

1

1

1

1

1

400

K. B. Sowmya and R. Raj

Fig. 4 a Dual-stage compressor b High-speed compressor c Exact multiplier

approximate 4: 2 compressor output. The approximation is made here by removing C out . While the inputs are ‘1111’ this brings down a problem. When the input bits are ‘1111’ the CARRY and SUM are both set to ‘11’ and a −1 error is imported [4]. In addition to the MUX, the high-speed area-efficient compressor design requires one AND, one XOR, and two OR gates, AND and OR gates each need six transistors. The paper offers a design using NOR and NAND gates, as depicted in Fig. 4b, to minimize transistor count [4]. Even if the sum and carry obtained by the modified design are not identical to those generated by the suggested 4: 2 compressor architecture, the inaccuracy is eliminated by cascading the compressor in multiples of 2 [13–17]. The proposed and exact compressor designs presented are utilized to build 16*16 Dadda multipliers. Levels 1 to 17 of the 16*16 Dadda multipliers use approximate compressors, whereas levels 18 to 32 use exact compressors. Figure 5 illustrates a 16*16 input multiplication procedure utilizing the proposed compressors. When there are two stages of cascaded partial products for summation, the improved dualstage compressors are utilized. For all other partial product levels below 14, should employ full adders, half adders, and the suggested high-speed area-efficient 4: 2 compressors. The simulation is run in Vivado using Verilog HDL. The high-speed compressor block takes a minimum number of cells and nets compared to the dual-stage compressor. The analysis is done using Vivado design suite and shown in Table 2. The tradeoff is the error involved with both blocks.

33 Approximate Multiplier for Power Efficient Multimedia Applications

401

Fig. 5 Approximate 16 × 16 multiplier with proposed 4:2 compressor

Table 2 Comparison of the architectural designs

Cell

Nets

Dual-stage compressor

9 cells

12 nets

High-speed 4:2 compressor

8 cells

11 cells

Exact compressor

11 cells

15 nets

4.1 Comparison of High-Speed 4:2 Compressor with Dual-Stage Compressor For comparing the device utilization and power consumption of both the designs, an 8*the 8-bit Dadda multiplier is tested in both designs, i.e., with only high-speed 4:2 compressor and with only dual-stage compressor. The simulation and synthesis are done in Vivado design suite.

402

K. B. Sowmya and R. Raj

5 Implementation Results and Analysis The exact and approximate multipliers were functionally verified using the Vivado simulator with 16-bit input data. Vivado design suite includes a Vivado simulator which is a compiled-language simulator that enables mixed-language simulation using Verilog, System Verilog, VHDL, and System C, as well as IP-centric and system-centric development environments. It allows you to simulate behavior and timing. The built-in logic simulator, i.e., ISE simulator is used for high-level logic synthesis in Vivado. The Vivado tool was used to synthesize both multipliers, and the device utilization and power consumption are given in Table 3. Figures 6 and 8 depict the synthesized designs for the exact multiplier and the approximate multiplier, respectively. Figures 7, 8, and 9 illustrate the simulation results for both architectures implemented in Verilog HDL. Specific test benches were created for the encoder and decoder, and the results of the behavioral simulation are given in Figs. 7, 8, and 9. During simulation, the time resolution was set to 1 ps. Table 4 displays the component distribution used to synthesize the exact and approximate multipliers. The exact multiplier is significantly more complicated and requires more LUTs. The proposed multiplier reduces total on-chip power, LUT count, and latency. Table 3 Comparison of high-speed 4:2 compressor with dual-stage compressor Dual-stage compressor

High-speed 4:2 compressor

Slice LUTS (32,600)

48

48

IO (150)

33

33

Total on-chip power

9.647 W

9.647 W

Data path delay (Max at slow process corner)

6.448 ns

9.752 ns

Data path delay (Min at fast process corner)

1.974 ns

2.248 ns

Fig. 6 RTL schematic of exact 16 bit multiplier

33 Approximate Multiplier for Power Efficient Multimedia Applications

403

Fig. 7 Exact 16 bit multiplier output waveform

Fig. 8 RTL schematic of approximate 16 bit multiplier

Fig. 9 Approximate 16 bit multiplier output waveform

6 Conclusion Two approximate 4:2 compressor topologies are shown in this study. To begin, a highspeed area-efficient compressor architecture is designed and evaluated in terms of delay to a modified dual-stage compressor without affecting the accuracy metrics for

404

K. B. Sowmya and R. Raj

Table 4 Device utilization and power consumption Exact 16 bit-multiplier Approx. 16 bit-multiplier Slice LUTS (32,600)

421

285

IO (150)

54

65

Total on-chip power

38.867 W

33.056 W

Data path delay (max at slow process corner) 17.185 ns

15.417 ns

Data path delay (min at fast process corner)

1.974 ns

1.974 ns

an 8 bit multiplier. When compared to the exact multiplier, a 16 bit multiplier design using a mix of both the approximate compressor design and the exact multiplier yielded a significant reduction in latency, area, and power. The architecture was developed and built with the Vivado design suite, which includes a Vivado simulator. The proposed approximate 4:2 compressor multiplier designs are intended for errortolerant applications. Future work involves developing an approximate multiplier that can be enhanced further to reduce delay and power usage. Since the area reductions are obtained at the expense of greater power usage and an increased critical path delay compared to separated configurations.

References 1. Zacharias N, Lalu V (2020) Study of approximate multiplier with different adders. In: 2020 International conference on smart electronics and communication (ICOSEC), pp 1264–1267. https://doi.org/10.1109/ICOSEC49089.2020.9215425 2. Esposito D, Strollo AGM, Napoli E, De Caro D, Petra N (2018) Approximate multipliers based on new approximate compressors. IEEE Trans Circuits Syst I Regul Pap 65(12):4169–4182. https://doi.org/10.1109/TCSI.2018.2839266 3. Jiang H, Liu C, Liu L, Lombardi F, Han J (2017) A review, classification, and comparative evaluation of approximate arithmetic circuits. ACM J Emerg Technol Comput Syst (JETC) 13:1–34. https://doi.org/10.1145/3094124 4. Edavoor PJ, Raveendran S, Rahulkar AD (2020) Approximate multiplier design using novel dual-stage 4:2 compressors. IEEE Access, vol 8, pp 48337–48351. https://doi.org/10.1109/ ACCESS.2020.2978773 5. Narayanamoorthy S, Moghaddam HA, Liu Z, Park T, Kim NS (2015) Energy-efficient approximate multiplication for digital signal processing and classification applications. IEEE Transa Very Large-Scale Integr (VLSI) Syst 23(6):1180–1184. [6858039]. https://doi.org/10.1109/ TVLSI.2014.2333366 6. Momeni A, Han J, Montushi P, Lombardi F (2015) Design and analysis of approximate compressors for multiplication. IEEE Trans Comput 64:984–994 7. Chang Y-J et al (2019) Imprecise 4–2 compressor design used in image processing applications. IET Circuits Devices Syst 13:848–856 8. Reddy KM, Vasantha MH, Nithin Kumar YB, Dwivedi D (2019) Design and analysis of multiplier using approximate 4–2 compressor. Int J Electron Commun (AEÜ) 107:89–97 9. Van Toan N, Lee J (2020) FPGA-based multi-level approximate multipliers for highperformance error-resilient applications. IEEE Access 8:25481–25497. https://doi.org/10. 1109/ACCESS.2020.2970968

33 Approximate Multiplier for Power Efficient Multimedia Applications

405

10. Gu F-Y, Lin I-C, Lin J-W (2022) A low-power and high-accuracy approximate multiplier with reconfigurable truncation. IEEE Access 10:60447–60458. https://doi.org/10.1109/ACCESS. 2022.3179112 11. Akbari O, Kamal M, Afzali-Kusha A, Pedram M (April 2017) Dual-quality 4:2 compressors for utilizing in dynamic accuracy configurable multipliers. IEEE Trans Very Large Scale Integr (VLSI) Syst 25(4):1352–1361. https://doi.org/10.1109/TVLSI.2016.2643003 12. Vahdat S, et al (2017) TruncApp: a truncation-based approximate divider for energy efficient DSP applications. In: Design, automation & test in Europe conference and exhibition (DATE) 2017, pp 1635–1638 13. Maheshwari N, Yang Z, Han J, Lombardi F (2015) A design approach for compressor based approximate multipliers. In: 2015 28th international conference on VLSI design, pp. 209–214. https://doi.org/10.1109/VLSID.2015.41 14. Strollo AGM, Napoli E, De Caro D, Petra N, Meo GD (2020) Comparison and extension of approximate 4–2 compressors for low-power approximate multipliers. IEEE Trans Circuits Syst I Regul Pap 67(9):3021–3034. https://doi.org/10.1109/TCSI.2020.2988353 15. Salmanpour F, Moaiyeri MH, Sabetzadeh F (Sept 2021) Ultra-compact imprecise 4:2 compressor and multiplier circuits for approximate computing in deep nanoscale. Circ Syst Sig Process 40(9):4633–4650. https://doi.org/10.1007/s00034-021-01688-8 16. Gorantla A, Deepa P (2017) Design of approximate compressors for multiplication. ACM J Emerg Technol Comput Syst (JETC) 13:1–17 17. Garg B, Patel SK (2021) Reconfigurable rounding based approximate multiplier for energy efficient multimedia applications. Wirel Pers Commun 118:919–931

Chapter 34

A Study on the Implications of NLARP to Optimize Double Q-Learning for Energy Enhancement in Cognitive Radio Networks with IoT Scenario Jyoti Sharma, Surendra Kumar Patel, and V. K. Patle

1 Introduction The conservation system across the globe has become more popular enough it has been considered to be a smart system in the country’s infrastructure development. The country becomes smart [1] in many aspects by considering technological development especially in the computing science arena or due to the development in information technology. The developments are more and more unimaginable in the course of time, earlier in the era of mechanical engineering technology lasted for a few years but in the information technology or digital era the technology lasts for few days or sometimes for few hours in certain cases. The contribution made by these technologies has been immeasurable. Network Lifetime Aware Routing Protocol (NLARP) has contributed with its best of the algorithm that has imbibed the [2] automation system and been utilized in many distinctive ways resulting in much conservation especially in energy harvesting, control system as well apart from conservation system. Q-learning [3] has equally contributed in the same area, but it has been considered as overoptimistic in most of the areas. The recent trends and the development in the information technology sector is IoT-Internet of Things where it utilizes the best of its kind in various sectors that optimizes in the area where it has been adopted. The study has taken the four distinctive areas into consideration and tries to measure the implications made by Network Lifetime Aware Routing Protocol (NLARP) by inculcating double Q-learning [4]. Cognitive radio networks along with

J. Sharma · V. K. Patle Computer Science and IT Pt., Ravishankar Shukla University, Raipur, Chhattisgarh, India S. K. Patel (B) Department of Information Technology Government Nagarjuna P.G. College of Science, Raipur, Chhattisgarh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_34

407

408

J. Sharma et al.

Internet of Things (IoT) and the drawn contribution will enhance the energy system being adopted in the country.

2 Methods An embedded system using cognitive radio networks with Internet of Things for the Network Lifetime Aware Routing. The system has been developed using the double Q-learning algorithm where the relevant developments have shown a positive enhancement in many energy conservation projects earlier and now the same can be used right here, Q-learning is a widely used reinforcement learning algorithm [5]. The Q-earning that was proposed by Watkins have showed better performance but an error of over estimation in the course of time, therefore imbibing double Q-learning into the same system will enhance the purpose of the paper on (i) optimization of Network Lifetime Aware Routing Protocol and (ii) energy enhancement in cognitive radio networks. Fig. 1 has been established and clarifies while applying the system to work. Applying the double Q-learning algorithm to [6] enhance and overcome the challenges faced by the cognitive radio networks has been explained below.        Q t+1 st, at = Q t (st , at ) + αt st, at r t + γ Q t st+1, a − Q t (st , at )

(1)

Q-learning used belongs to off policy statement whereas SARSA works on policy learning, where maximum Q value is available in the next state Q ⇒ Q∗, Optimal action-value function. The action are repeated as. Initialize Q(s, a).

Fig. 1 Key challenges in cognitive radios network and with their machine learning-based solutions [7]

34 A Study on the Implications of NLARP to Optimize Double Q-Learning …

409

Picking the S t Start state and at . Do observe at . Selecting which is based on Q(s, a) e-greedy. Do a and concern r, s’  Q(s, a) = Q(s, a) + α r + γ max Q(s’, a’) − Q(s, a) a’

S = s Until S point of terminal influences. Q-learning back up the value of a’ ∗ which is the best next event performance with the higher of highest Q value of Q(s’, a’), the actions are followed by the behavior and not a ∗ . With this QL-off policy optimal policy is followed along edge than SARSA where non-optimal policy along with edge is followed ∈ -greedy algorithm denotes. ∈ = 0, SARSA is used in the online, when ε → 0 gradually both converge to optimal. In the above equation, the finding value of all the states is worked along with the pairs Q(s’, a’), it indicates how the event action is performed in the network state st and then make observing the optimal policy for the intellectual best action as (Fig. 2) V ∗ (St ) = max at Q ∗ (st , at )

(2)

For each possible state of st + 1, the probability value is moved like P(st+1 |st , at ) and performed continuously with whatever top policy to the expected collective

Fig. 2 Q-learning Q(s’, a’)

410

J. Sharma et al.

reward of V (st+1 ), so the steps are discounted one step later. With the expected reward and action of at the bellman’s equation (Bellman 1957 is applied as). First step to initiation of the V (s). Perform and repeat the action or value. For every s ∈ S. For every a ∈ A   P(s\s, a) V s Q(s, a) ← E r |s , a + γ s,∈S

V (s) ← max a Q(s, a) Till V (s) converge. The Q ∗(st , at ) values, by performing a greedy search strategy at step, the ideal system of steps that maximizes the collective reward. The above equation, Qt (s, a) gives the value of the action a in states at time t and thereward rt is drawn from a fixed recompense delivery R : S × A × S → R where  s = Rsa . E (s, a, s’) = st , at, st+1 Hence, the next number of states st+1 is estimated or evaluated by a fixed state transition distribution P : S × A × S → R[0, 1], 

s produced the chance of probability with ending up in state s’ after where Psa s performing a in s, and Psa = 1. s

Performing and learning cost αt (s, a) ∈ [0, 1] which ensures that the update averages value over a number of possible randomness in their wards and the number of transitions performed in order to converge in the ending limit to the best event of action-value function. Hence, the best value function [8] is the outcome to solve the following set of expressions. ∀s, a : Q ∗ (s, a) =



   s s Rsa Psa + γ Q∗ s, a

(3)

s

The above expression solves the problem of overestimation and thus brings the network to meet the requirements of the system we develop. We can use the single estimator technique to calculate approx. performing the value of the next state by maximizing over the estimated event of action in that network state. The value obtaining of the maximum by implementing single estimator method E{Xi} = E{μi} ≈ μi(s)

(4)

The value obtaining of the maximum by implementing single estimator method

34 A Study on the Implications of NLARP to Optimize Double Q-Learning …

  E{Xi} = E μ B i ≈ μ B a ∗

411

(5)

Therefore, we will obtain the maximum from the implemented Network Lifetime Aware Routing.

3 Background Theorem: We concern a state that denotes s in which all the true ideal performance values are equal at. Q*(s; a) = V *(s) for some V *(s). Let Qt are the arbitrary value which calculate that are on the complete unbiased in the sense that a(Qt(s, a) − V *(s)) = 0, hence that should not all correct form, such that 1/m a(Qt(s, a) − V *(s))2 = C for some C > 0, where m ≥≥ 2 performs a number of actions in states. Under these conditions, c ∗ Q t (s, a) ≥ V (s) + m−1 . Hence, this lower bound value is tight. [9] Under the identical conditions, the inferior bound on the utter fault of the double Q-learning approximation is zero. This method is one of the baselines [10] through which the study has chosen to develop the system and understand the impact of Network Lifetime Aware Routing. Implementation of the machine learning [1] aspects will be an added advantage to the system where Internet of Things contributes the best and enhances the facility. Cognitive radio network is widespread technology where it applies the secondary devices to engage to acquire the ideal range of spectrum from the basic user. It avoids the false intruder in the [11, 12] cognitive network and thus brings the possible alarm as such. The cognitive radio network also proposes to implement Q-learning, while implementing double Q-learning is an added advantage.

3.1 Cognitive Radio Network 3.2 The History and Technology of Internet of Things (IoT’s) The IoT technology has become very familiar nowadays, but the history has started some time ago that connects much devices starting from Car, medical devices, mobile phones, [13] computers and many more, but that was just a beginning it has moved further to all the majority of the devices being used inside the home, from home it has moved to the city to make it smart, many of the India’s city are the witnesses, and now the benefit has been sustained to the agriculture as well. All the [9] applications have been consuming the energy using cognitive radio networks and now the study takes the Network Lifetime Aware Routing Protocol that connects all the said technologies and logics (Figs. 3 and 4).

412

J. Sharma et al.

Fig. 3 CR surrounded by different RATs [7]

Fig. 4 Impact of connected life, understanding IoT [14]

4 Implications of NLARP The study has identified the available platform in the form of technology, model, algorithm and network using intellectual and fast cognitive radio networks and many more. A common phenomenon being [15] understood is to double Q-learning and the Internet of Things. All the mentioned technologies and logics have been applied to find the best outcome and measure its implication while implementing Network Lifetime Aware Routing Protocol (NLARP) to enhance energy in the cognitive radio network. The result of the analysis is to enhance the energy in this particular [16] network or routing plan where the following suggestions and intentions have been identified for the betterment.

34 A Study on the Implications of NLARP to Optimize Double Q-Learning …

413

4.1 Network Layer for IoT Devices And hence, below graphical representation describes that the [17] network has been layered in the IoT, being implanted in the devices we utilize on a device used in the day to day life. Both the figures have listed its implication on the devices where it has been applied (Figs. 5 and 6). The above figures depict the application of IoT in the regular or routine [19] usage of human life and advanced technology. The above figure is also the evidence of the state and the [20] implication of the desired NLARP devices that has been used for

Fig. 5 Network layer for IoT devices [18]

Fig. 6 Other IoT devices in the daily use [18]

414

J. Sharma et al.

Fig. 7 Value chain [23]

routine activity in human life. It also states that the device that consumes energy has been very successful in the conservation and in its performance as well.

4.2 The GSMA’s Visualization of IoT Services—The Linked Life The understanding of any network started and became more familiar among the people through the mobile services by the service providing [21] company. The figure below explains where the embedded system can be enhanced, and the application flow is based on the value chain created through the system created with the help of all the [22] technology or logic being adapted in the study (Fig. 7).

4.3 Matrix of Energy and Security Visualization by Using Payoff in Coalition Game The below table has made an [23] important impact using the network, and it synthesizes the impact on the energy and security while keeping the energy constant or decreasing as mentioned in table (Table 1).

34 A Study on the Implications of NLARP to Optimize Double Q-Learning …

415

Table 1 Based on energy and security related matrix representation using payoff in coalition game Payoff matrix

No interference

Interference with noise in selected network

There is no interference

Power and security are the same System and network security thing enhances with constant level of energy and also decreasing energy

Interference in the specified System and network security channel enhances with constant level of energy and also decreasing energy

System privacy increases with a way of decreasing energy

Interference with noise in indolent network system

System privacy increases with a way of decreasing energy

System and network security enhances with constant level of energy and also decreasing energy

Table 2 Spectrum utilization using Q-learning Monitoring of energy related with spectrum scenario

Method of coalition with Q-learning by means of battery with 50%

Method of non-coalition by means of battery with 50%

Energy spent

56.1

51.2

Energy consumption

49.3

47.1

Spectrum utilization packets sent

55,681

50,572

4.4 Spectrum Utilization Using Q-Learning As mentioned in the table which inculcates the information and the impact from the previous table that keeps the [23] spectrum as the key point. Energy and spectrum have been monitored by taking a method of coalition with Q-learning and the battery usage was also taken into consideration. The output [13] has brought a great change in the energy spent, consumed and the spectrum utilization, the table witnesses that there is a reduction in turn that benefits in the course of time (Table 2).

5 Conclusion The Network Lifetime Aware Routing Protocol has made a significant implication, while double Q-learning being utilized and the attempt of energy enhancement using cognitive radio network with Internet of Things being embedded were found positive. The implication has been measured right from the history stating that the spread of application is really vast almost in all segments. Furthermore, the application has

416

J. Sharma et al.

been moved from the mobile services and enters into the medical segment and the crux is the energy enhancement. While going through all the above history and the double Q-learning integration have made the concept more confident to take it forward, there has been a witness of saving and earning money in many aspects. Spectrum utilization has also made a great and positive impact in the course of time. The impact has been tested with the Q-learning system along with the utilization of the battery following two different strategies like coalition and non-coalition. All the implications have been found positive, and the energy will definitely be enhanced while following the logic, system technology and much more.

References 1. Bindhu V (2020) Constraints mitigation in cognitive radio networks using computing. J Trends Comput Sci Smart Technol 2(1):1–10 2. Gu Y, Chen H, Zhai C, Li Y, Vucetic B (2019) Minimizing age of information in cognitive radio-based IoT systems: underlay or overlay? IEEE Internet Things J 6:10273–10288 3. Azade Fotouhi MD (2021) Deep Q-learning for two-hop communications of drone base station. J Sens 21(6)1–14 4. Albaire NB (2021) Cognitive radio based internet of things: applications, challenges and future research aspects. Int J Eng Inf Syst 5(5):58–62 5. Wenli Ning XH (2020) Reinforcement learning enabled cooperative spectrum sensing in cognigive radio networks. J Commun Networks 22(1):12–21 6. Koushik AF (2019) Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks. J IEEE 1–11 7. Upadhye A, Saravanan P (19 June 2021) A survey on machine learning algorithms for applications in cognitive radio networks, arXiv:2106.10413v1 [eess.SP] 8. Macro Lombardi FP (2021) Internet of Things: a general overview between architectures, protocols and applications. J Inf 12(2):12–87 9. Thuslimbanu DK (2014) Spectrum holes sensing policy for cognitive radio network spectrum holes sensing policy for cognitive radio network. Int J Adv Res Comput Sci Technol 2(1):170– 175 10. Zhou JS (2020) Dependable scheduling for real-time workflows on cyber-physical cloud systems. IEEE Trans Ind Inf 109(1):1–10 11. Sharma DK (2018) A machine learning based protocol for efficient routing in opportunistic networks. IEEE Syst J 12(3):2207–2213 12. Jiang T (2011) Reinforcement learning-based spectrum sharing for cognitive radio. New York, Department of Electronics University of York 13. Zhang WZ (2018) Satellite mobile edge computing: improving QoS of high-speed satellite terrestrial networks using edge computing techniques. IEEE Network 97(c):70–76 14. https://www.gsma.com/iot/wp-content/uploads/2014/08/cl_iot_wp_07_14.pdf. Accessed 13 April 2022 15. Djamel Sadok CM (2019) An IOT sensor and scenario survey for data researchers. J Braz Comput Soc 25(4):2–17 16. Zikira HY (2020) Cognitive radio networks for internet of things and wirless sensor network. J Sens 20(5288):1–6 17. Liu XM (2021) Movement based solutions to energy limitation in wireless sensor networks: state of the art and future trends. IEEE Networks 9(1):188–193 18. Nilsson E, Anderson D (2018) Internet of things a survey about thoughts and knowledge. National Category Engineering and Technology

34 A Study on the Implications of NLARP to Optimize Double Q-Learning …

417

19. Wu Z (2020) Scheduling-guided automatic processing of massive hyperspectral image classification on cloud computing architectures. IEEE Trans Cybern 51(7):1–14 20. Marchese M, Patrone F (2018) Energy-aware routing algorithm for DTN-nanosatellite networks. In: Proceedings of IEEE global communications conference, Abu Dhabi 21. Zhao YM (2020) On hardware trojan-assisted power budgeting system attack targeting many core systems. J Syst Archit 109(10):1–11 22. Zhang WG (2017) IRPL: an energy efficient routing protocol for wireless sensor networks. J Syst Archit 11(3):35–49 23. Vimal Shanmuganathan LK (2021) EECCRN: energy enhancement with CSS approach using Q-learning and coalition game modelling in CRN. Inf Technol Control 50(1) 24. Suresh P (2014) A state of the art review on the internet of things (IoT) history, technology and fields of deployment. In: 2014 International conference on science engineering and management research (ICSEMR), pp 1–8 25. Jyoti Sharma SK (2020) Hybrid firefly optimization with double Q-learning for energy enhancement in cognitive radio networks. Int J Eng Res Technol 7(3):5227–5232 26. Deng XH (2020) Task allocation algorithm and optimization model on edge collaboration. J Syst Archit 110:1–14 27. Sun YZ (2019) An efficient and scalable framework for processing remotely sensed big data in cloud computing environments. IEEE Trans Geosci Remote Sens 4294–4308

Chapter 35

Automatic Generation Control Simulation Study for Restructured Reheat Thermal Power System Ram Naresh Mishra

1 Introduction A crucial component of improving the operation of the power system is automatic generation control (AGC). This is used to adjust the frequency in response to client power demands [1]. Basically, the bulky power systems contain the control areas indicating the coherent sets of generators, which is needed to monitor the frequency including tie-line power closer for fixed values. The system frequency is vulnerable by the load variations of the power system, and the reactive power has lower sensitivity due to the frequency deviations [2]. So, the control operations of active with reactive power could be independent. The frequency is not similar to power generation together with load requirement when the deviation occurs in frequency. Numerous studies have been published that consider various types of traditional AGC schemes [2, 3]. Companies for power generation, transmission, distribution, and individual power producers (IPPs) are the different firms that carry out their tasks with a monitoring unit as independent system operators in a restructured environment [3]. The constant level of power transmission triggers the power system functions to make it good and reliable. Because of the increasing number of industries, the power system became exceptionally complex. The variation of active power demand of industries creates the changes in system frequency. The unbalanced condition among the supply and load decreases the power system performance and makes the complex control operation

R. N. Mishra (B) GLA University, Mathura, UP 281406, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_35

419

420

R. N. Mishra

[4]. The control related issues of power systems mainly occur during the implementation process as well as operation process. The load frequency control (LFC) is also called frequency regulation of power systems that is important for automated generation control. The major goal of LFC is to diminish the frequency deviation, interchange tie-line power then to ensure the zero steady state errors. Christie and Bose [5] highlighted an increased system performance with better stability function under lower interruptions in the system. The implementation or operational difficulties become a major problem for the interconnected power system due to variation of structures, system size. The frequency as well as load demand should have a certain limit at any time to achieve reliable power delivery and increase the power system performance. The AGC provides the better solutions for achieving the reliable power delivery and increasing system performance but which is only possible through LFC. In addition, the AGC provides scheduled tie-line power as well as required system frequency to overcome inequality of the system. To overcome the inequality among the power generation and power demand, the set point of the power generating source is automatically changed by adjusting the speed [5]. In a deregulated environment, Donde et al. [6] applied AGC of interconnected energy networks by using the concepts of DPM as well as area participation factor (apf) to represent bilateral agreements. Nevertheless, several references to various aspects of load frequency control (LFC)/AGC of electrical systems in restructured environments have been found in the literature [4, 5]. Muthuraman et al. [7] discussed two-area power systems in a reformed environment based on PSO optimized LFC. Ranjan and Lal Bahadur [11] discussed integral control schemes for multi-area deregulated AGC systems. Hassan [12] investigated LFC models for conventional as well as smart power systems. The AGC of a linked two region power framework in a restructured condition is created in this research using a PSOA-optimized PID controller. The paper has been written as follows. Segment 2 presents the proposed power system model, and Sect. 3 discusses the use of the PSOA-optimized PID controller. Segment 4 focuses on the analysis and findings of the simulation. The specified work’s conclusion is described in phase 5.

2 Restructured Reheat Thermal Power System Model Two regionally reformed reheat thermal power systems are taken into consideration for this study’s examination. Figure 1 depicts the contracted power of DISCOs. Figure 2 displays the transfer function block diagram for this system. In [11], system data are provided. Ptie Scheduled = (Demand from GENCOs in control area 1 to DISCOs in control area 2) − (Demand from GENCOs in control area 2 to DISCOs in control area 1)

35 Automatic Generation Control Simulation Study for Restructured …

421

Fig. 1 Contracted power of DISCOs

Scheduled Ptie

=

3  4  i=1 j=3

Cpfij PLj −

6  2 

Cpfij PLj

(1)

i=4 j=1

2π T 12 (F1 − F2) S

(2)

Error Actual Scheduled = Ptie − Ptie Ptie

(3)

actual Ptie =

Error decreases to zero. Here are provided the area control In a stable state, Ptie errors (ACEs) for two areas. Error ACE1 = B1F1 + Ptie

(4)

Error ACE2 = B2F2 + α12Ptie

(5)

Bi = constant for frequency bias of ith area (pu MW/Hz) and F i = frequency deviation of ith area (Hz), and i = 1, 2.a12 = size ratio of control area. Due of their apf for AGC, each region has three GENCOs and the ACE signal is transferred between them. The sum of each area’s apfs must, therefore, equal to 1, and each

422

R. N. Mishra

Fig. 2 Two-area restructured reheat thermal power system model

area’s variance in contracted local load demand can be stated as PL1LOC = PL1 + PL2

(6)

PL2LOC = PL3 + PL4

(7)

35 Automatic Generation Control Simulation Study for Restructured …



Cpf 11 Cpf 12 ⎜ Cpf 21 Cpf 22 ⎜ ⎜ ⎜ Cpf 31 Cpf 32 DPM = ⎜ ⎜ Cpf 41 Cpf 42 ⎜ ⎝ Cpf 51 Cpf 52 Cpf 61 Cpf 62

⎞ Cpf 13 Cpf 14 Cpf 23 Cpf 24 ⎟ ⎟ ⎟ Cpf 33 Cpf 34 ⎟ ⎟ Cpf 43 Cpf 44 ⎟ ⎟ Cpf 53 Cpf 54 ⎠ Cpf 63 Cpf 64

423

(8)

3 Application of PSOA for Tuning PID Controller To get the greatest outcomes, the PID controller’s primary job is to activate the feedback device. Three gains are the parameters of PID controller. Its transfer function is

1 u(s) = kP 1 + + td s g(s) = e(s) tI s where kI = ktIP and kD = kP tD Kennedy and Eberhart created the particle swarm optimization algorithm (PSOA), an optimization technique based on population [8]. To get the best outcomes; the PID controller is modified to obtain the controller gains based on the PSOA. It is said in the literature that ITAE is a superior objective function. Therefore, to optimize the gains of PID controllers utilizing PSOA, the goal function is chosen as integral time absolute error (ITAE). PSOA will continue to update based on this objective feature until there are no more iterations possible. The PSOA flowchart is displayed in Fig. 3 [9].

424

R. N. Mishra

Fig. 3 Flowchart of PSOA

4 Simulation Results and Discussion

4.1 Poolco-Based Transaction (PBT) In this instance, DISCO-1 and 2 have agreed that the load they are requesting from the GENCOs belongs to their respective areas. No GENCOs are required to supply power to the DISCOs in region 2. Therefore, there is no analogous cpfs in DPM.

35 Automatic Generation Control Simulation Study for Restructured …

425

A total load fluctuation that only affects area 1 (=0.01 pu MW), while total load fluctuation in area 2 is zero. The following mentions DPM for PBT [10]. ⎛

⎞ 0.3333 0.3333 0 0 ⎜ 0.3333 0.3333 0 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ 0.3333 0.3333 0 0 ⎟ DPM = ⎜ ⎟ ⎜ 0 0 0 0⎟ ⎜ ⎟ ⎝ 0 0 0 0⎠ 0 0 00 Figure 4 proves dynamic responses for deviation in tie-line power (pu MW), deviation in system frequency, and GENCOs power responses (pu MW) for both areas subjected to PBT. The overall performance of the PSO-tuned PID controller to manage variations in frequency and power of tie-line of PBT for restructured thermal (reheat) power system is proven in accordance with the time of settling including apex undershoot/overshoot.

4.2 Bilateral-Based Transaction (BBT) In this situation, every DISCO has a contract with every GENCO, and they all accept the terms of that contract [10]. ⎛

0.2 ⎜ 0.2 ⎜ ⎜ ⎜ 0.1 DPM = ⎜ ⎜ 0.2 ⎜ ⎝ 0.2 0.1

0.1 0.2 0.3 0.1 0.2 0.1

0.3 0.1 0.1 0.1 0.2 0.2

⎞ 0 0.1666 ⎟ ⎟ ⎟ 0.1666 ⎟ ⎟ 0.3336 ⎟ ⎟ 0.1666 ⎠ 0.1666

According to cpfs, each DISCO only requests 0.005 pu MW of power from the GENCOS in their region. As a result, each area’s overall load disturbance is 0.01 pu. In Fig. 5, which has been exposed to BBT, it is demonstrated that dynamic responses for deviations in system frequency for each area, deviations in tie-line power (pu MW), and GENCOs power responses (pu MW) for both areas. The overall performance of the PSO-tuned PID controller to manage variations in frequency and power of tie-line of the BBT for the restructured thermal (reheat) power system is tested in accordance with the time of settling including apex undershoot/overshoot.

426

Fig. 4 Dynamic responses for PBT

R. N. Mishra

35 Automatic Generation Control Simulation Study for Restructured …

427

Fig. 4 (continued)

4.3 Contract Violation-Based Transaction (CVBT) In this instance, DISCO-1 receives 0.003 pu MW of contract extensions power from the GENCOS in its region. As a result, the updated figure of area-1’s load demand is 0.013 pu MW [10], where 0.01 pu MW represents the total load disturbance that occurs in each region. Figure 6 shows dynamic responses for deviation in tieline power (pu MW), variance in system frequency (pu Hz) for each location, and GENCOs power responses (pu MW) for both areas. The overall effectiveness of the PSO-tuned PID controller to manage variations in frequency and power of tie-line of CVBT for the restructured thermal (reheat) power system is tested in accordance with the time of settling including apex undershoot/overshoot.

5 Conclusions In this paper, a PSOA-tuned PID controller is used to study the AGC of a twoarea reconstructed reheat power system. There are three types of power transactions that are taken into consideration; poolco, bilateral, and contract violations. Where

428

Fig. 5 Dynamic responses for BBT

R. N. Mishra

35 Automatic Generation Control Simulation Study for Restructured …

429

Fig. 5 (continued)

DISCOs break their contracts, there are larger frequency variances. Furthermore, when an agreement is broken, the response of the tie-line power deviation deteriorates. This simulation study demonstrates that overall performance of PSOA-tuned PID controller to regulate deviations in frequency and tie-line power for restructured thermal (reheat) power system is validated for poolco, bilateral, and contract violation-based power transactions. Performance is measured in accordance with the time of settling including apex undershoot/overshoot. Additionally, each control area’s GENCOs’ power generation is adequate.

Fig. 6 Dynamic responses for CVBT

430

R. N. Mishra

Fig. 6 (continued)

References 1. Elgerd OI (1971) Electric energy systems theory: an introduction, 2nd edn. McGraw Hill Education, New Delhi, India 2. Ibraheem, Kumar P, Kothari DP (2005) Recent philosophies of automatic generation control strategies in power systems. IEEE Trans Power Syst 20(1):346–357 3. Shayeghi H, Shayanfar HA, Jalili A (2009) Load frequency control strategies: a state-of-the-art survey for the researcher. Energy Convers Manage 50(2):344–353 4. Mishra RN, Chaturvedi DK, Kumar P (2020) Recent philosophies of AGC techniques in deregulated power environment. J Inst Eng India Ser B. https://doi.org/10.1007/s40031-020-004 63-8 5. Christie RD, Bose A (1996) Load frequency control issues in power system operations after deregulation. IEEE T Power Syst 11:1191–1200

35 Automatic Generation Control Simulation Study for Restructured …

431

6. Donde V, Pai MA, Hiskens IA (2001) Simulation and optimization in a AGC system after deregulation. IEEE Trans Power Syst 16(3):481–489 7. Muthuraman, Priyadarsini A, Arumugom (2016) PSO optimized load frequency control of two area power system in deregulated environment. J Electr Electr Syst 5(4). https://doi.org/ 10.4172/2332-0796.1000210 8. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Processes of IEEE international conference on neural networks pp 1942–1948 9. Alam MN (2016) Particle swarm optimization: algorithm and its codes in MATLAB. https:// doi.org/10.13140/RG.2.1.4985.3206 10. Sharma M, Dhundhara S, Arya Y (2020) Frequency stabilization in deregulated energy system using coordinated operation of fuzzy controller and redox flow battery. Int J Energy Res Wiley 7457–7473 11. Kumar R, Prasad LB (2021) Performance analysis of automatic generation control of multi-area restructured power system. ICAECT. https://doi.org/10.1109/ICAECT49130.2021.9392417 12. Alhelou HH, Hamedani-Golshan M-E, Zamani R, Heydarian-Forushani E, Siano P (2018) Challenges and opportunities of load frequency control in conventional, modern and future smart power systems: a comprehensive review. Energies 11:2497. https://doi.org/10.3390/en1 1102497

Chapter 36

Processing and Analysis of Electrocardiogram Signal Using Machine Learning Techniques Gursirat Singh Saini and Kiranbir Kaur

1 Introduction The electrocardiogram (ECG) is a diagnostic tool which detects the electrical activity that is present in the heart during pumping of blood. The interpretation of this electrical activity can help us to understand various underlying abnormalities in the heart. The ECG can give significant data of the person’s heart rhythm, increased thickness of heart muscle, a past heart attack, indications of diminished oxygen delivery to the heart, and issues with conduction of the electrical current starting with one part of the heart then onto the next. There are no dangers of ECG. The electricity is not passed through the body, and there is no danger of the shock. ECG can be used to understand the various issues if present inside the heart related to conduction of the impulses or issue in contraction of heart muscles due to damaged muscle fiber.

1.1 Conditions that Are Diagnosed with the ECG 1. 2. 3. 4.

It detects if the heart rate is abnormally fast. It detects if the heart rate is abnormally slow. If the waves or waveform of the ECG is not as per normal, it can depict underlying issues in the heart. It can depict if there is a trace of past heart attack.

G. S. Saini (B) · K. Kaur Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar 143005, India e-mail: [email protected] K. Kaur e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_36

433

434

G. S. Saini and K. Kaur

5. 6.

Evidence of an evolving and acute cardiac attack. It can depict if any heart damage is there due to less supply of blood to heart during a heart attack. 7. Heart diseases can have an effect on the heart and can be seen in ECG. 8. Lung diseases like emphysema, etc., can also show deviations in ECG. 9. Congenital heart abnormalities can also be seen as variation in ECG from normal waveform. If the wave is not normal, it can also suggest imbalance of electrolytes like sodium, potassium, etc. 10. Magnitude of peaks can tell about the inflammation or enlargement of the heart.

1.2 Limitations of the ECG ECG is a graph and does not show the dynamics of the heart condition during various periods of the day. There is a possibility that a person has a heart problem but the ECG is normal. In this condition, if ECG is taken during stress increase, it can show the underlying problems in the heart. Sometimes, variation in ECG cannot be directly interpreted and can also represent more than one issue in the heart. Possibly the variations in ECG sometimes are not that specific. This can be sorted by doctor consultation and other cardiovascular tests.

2 ECG Signal Processing In the past few years, the electrocardiogram has played a vital role in heart disease detection, emotional and stress assessment, human computer interface (HCI), etc. Karthikeyan et al. [1] reported that digital signal processing and analysis of data are mostly commonly applied methods in biomedical engineering research. These methods find wide range applications in ECG signal processing especially and extravagant research has been done in this field since the past 30 years. Researchers and industries across the world have progressed a lot and have been quite successful in this endeavor of acquisition of ECG signal and then processing it for further detection and classification. The major aim behind ECG signal processing is to improve the accuracy and reproducibility and the deduction of features which are difficult to extract from the signal by viewing it from naked eyes. At times, the signal is recorded during stress conditions that the ECG gets faulted due to various kinds of noise, those come from the other physiological activities happening in the person. Therefore, disturbance removal called as noise reduction is a very crucial aim of electrocardiogram signal processing; actually, the noise masks the waveforms of interest so largely that their presence can only be revealed on the application of first signal processing. Hamilton [2] pointed out that to identify intermediate disturbances in the rhythm of the heart, electrocardiographic signals can be recorded for a few days. Hence, large amounts of data sizes will fill in the present space for storage that comes

36 Processing and Analysis of Electrocardiogram Signal Using Machine …

435

out from the ECG recording. Another application which involves a large amount of data is transmission of signals across public telephone networks. This is another important aim for ECG signal processing as in both conditions, data compression is an important step. Liu et al. [3] explained that there is an important contribution of signal processing for the latest understanding of electrocardiogram and its changing features as determined by beat morphology and changes in rhythm. Standard electrocardiogram signal print cannot recognize any of the two oscillatory signal properties. Few fundamental algorithms are developed which process the signal with different kinds of artifacts and noises, check heartbeat, withdraw basic ECG quantification of wave time as well as amplitudes and squeeze the data for better transmission and storage. This is common to various ECG analysis—like ambulatory testing, stress monitoring, resting ECG interpretation or care monitoring as reported by Sornmo and Laguna [4]. Sometimes these algorithms are incorporated into another different algorithm without disturbing the sequence of the algorithms to improve on performance. The noise which comes in the ambulatory ECG scanning is much more than what is taken from the ECG at rest. Thus, complexity of the algorithms will be changed based upon the application of the ECG diagnostics system. The ECG signal analysis will give us relevant data about the features of the ECG signal which is under analysis. The data can be further processed in high end algorithms based on the application in order to get the desired results about the morphology of the heart.

3 Significance of Denoising The noises present in the ECG signal can very well provide hindrance to the accurate interpretation of the ECG signal. The noises are of three types as mentioned above. The denoising of the ECG signal will give a pure ECG signal which can have a lot of information regarding the morphology of the heart. This can be used in order to finely interpret the underlying problems in the heart if any present. Recently, with the advent of MATLAB and its increasing popularity, MATLAB tools and functions can be used to devise codes and formulate them to reduce unwanted noise from the waveform. Hence, new techniques for denoising of the ECG signal have been implemented using MATLAB. The different kinds of noises like baseline wandering and power line interference can be removed by designing specific filters. The EMG noise has overlap with the QRS complex in the spectral analysis. Thus, this noise is more difficult to remove from the signal. But as ECG is recurrent, we can take more ECG signals in order to reduce the misinterpretation due to EMG noise. During the pre-processing of the signal, filtering of the signal is done in order to get the signal of high quality for feature extraction. Filtering should not affect the information of the ECG but should only affect the noise. Filtering of a signal is done to remove any type of distortion or noise present in the signal. Jeyarani and Singh [5] explained the use of three different filters for the three major types of noises in the ECG signal. The frequency of baseline wander is low. The frequency band is less than 1 Hz. The noise can be removed by using a high pass filter with cut-off frequency near 1 Hz.

436

G. S. Saini and K. Kaur

The power interference noise can be reduced by using a notch filter with notch at 50 Hz. The moving average filter can be used to reduce high frequency noise. This filter will smoothen the signal and provide an average of the values at any instant of time to remove high frequency noise from the signal.

4 Related Works Over the last many years, numerous papers have been reported on the study of ECG signals by using various techniques. Chazal and Reilly [6] reported the study with focus on a patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features. Übeyli [7] illustrated the ECG beats classification using multiclass support vector machines with error correcting output codes and highlighted the advantages of multiclass support vector machines. Pourbabaee and Lucas [8] studied automatic detection and prediction of paroxysmal atrial fibrillation by using ECG signal feature classification methods and described the merits of this technique. Melgani and Bazi [9] reported the study of classification of ECG by employing different methods viz., particle swarm optimization method and classification of super vector machine method. They applied such methods to detect working of the heart in different forms such as normal, atrial premature beat, ventricular premature beat, paced beat and right and left bundle branch block. In this work, the research analysis was carried out by considering data from the MIT-BIH arrhythmia database of twenty patients with total pulses around 40,438 with highest accuracy level of 89.72%. Llamedo and Martinez [10] described heartbeat classification using feature selection driven by database generalization criteria and presented the significant findings that may play a very important role in predicting the actual problem in the heart of the patient. Zia et al. [11] well discussed the role of efficient and simplified adaptive noise cancellers for ECG sensor-based remote health monitoring. Here, the problem of noise cancelation from ECG signals is done using error normalization-based adaptive filters on real signals with different artifacts obtained from the MIT-BIH database. The computational complexity is in terms of multiply and accumulate (MACs) and signal-to-noise ratio (SNR). Swathi et al. [12] presented the study of R peak detection and feature extraction for the diagnosis of heart diseases and detected the R peaks obtained from denoised ECG signal with 97.56% accuracy. They achieved 80% classifier accuracy by applying an algorithm on the MIT-BIH arrhythmia database. An investigation focusing on computational techniques for ECG analysis and interpretation in light of their contribution to medical advances was illustrated by Lyon et al. [13]. A big data classification approach using LDA with an enhanced SVM method for ECG signals in the cloud was reported by Varatharajan et al. [14]. They used image processing filters viz., finite impulse response (FIR) and infinite impulse response (IIR) filters to remove unwanted noises. They achieved 92% accuracy by employing support vector machine (SVM) and linear discriminant analysis (LDA).

36 Processing and Analysis of Electrocardiogram Signal Using Machine …

437

5 Various Techniques Used for Analysis and Prediction 5.1 Wavelet Transform Technique The previous methods of analyzing the ECG signals were based on time-domain analysis. The ECG has various information in frequency domain as well. In order to extract the information present in the frequency domain, fast Fourier transform (FFT) is used. The shortcoming of this method is that the exact location of the frequency component in the time domain cannot be found out. Thus, time frequency representation is not possible. For this, short-term Fourier transform is used to get time frequency representation of the ECG signal. The drawback of this method is that SFT does not give very accurate results of time frequency representation of the signal. This disadvantage can be overcome by wavelet transform. It decomposes the signal into coefficients. Karthikeyan et al. [15] described that the wavelet transform has its own time and frequency space. These coefficients can be used to analyze the ECG signal as they have information about time and frequency band making it suitable for denoising of ECG. Saritha et al. [16] highlighted that wavelets can be broadly classified into two major types: continuous wavelet transformation (CWT) and detrital wavelet transform (DWT).

5.2 Machine Learning and Predictive Analysis Machine learning is a branch of artificial intelligence in which predictive models are generated based on training dataset. The training dataset consists of known outcomes, and the models are generated based on the known properties from the training dataset. The models generated by machine learning approach predict the molecules having activity against a particular biological target. These models when applied to a test set of ECG signals (unscreened signals) give a set of probable normality or abnormality of the ECG signal. In our study, we have employed a machine learning approach to generate binary classifiers which can classify a set of ECG signals based on their normality and abnormality. The following techniques have been used for this purpose. Weka uses a 2 × 2 matrix which consists of four parts: true positives (TP) for normal signals correctly classified as normal, false positives (FP) for abnormal signals incorrectly predicted as normal, true negatives (TN) for abnormal signals classified as abnormal and false negatives (FN) for normal signals incorrectly classified as abnormal. Since false negatives are more important, misclassification costs are set on false negatives to minimize their number. However, increasing the cost for false negatives simultaneously increases the false positive rate. We have placed a limit of 20% to control the rate of false positives. Now, the misclassification cost for false negatives is incremented until the rate of false positives reaches up to 20%. The misclassification

438

G. S. Saini and K. Kaur

cost setting in Weka depends on the base classifier used. We will use the following four classifiers: Naive Bayes classifier. It, based on Bayes theorem, considers that the presence of one descriptor has no effect on the other as all the descriptors are independent. The overall probability of the molecule’s activity is taken as the product of all the descriptor-based probabilities. Random forest classifier. It is developed by Leo Breiman, is an ensemble classifier which uses multiple decision trees and the output is the mode of the individual trees output. It is the most accurate classifier. J48 (implementation of C4.5 decision tree learner). It, developed by J. Ross Quinlan, uses a decision tree in which one attribute of data is taken and the data is split into subsets, one for every value of the attribute. The decision is made by the attribute having the maximum information. Decision tables. It is a visual representation of selecting or specifying the tasks to be performed based on the conditions. It represents conditional logic by creating a list of tasks. The proposed method in Fig. 1 follows the methodology involving dataset collection at first in which MIT-BIH dataset has been used. Then, it is followed by denoising of the same data as described above and followed by detection of peaks as well as features. At last, machine learning classifiers are used on extracted features to detect normality or abnormality of ECG signal and detection of diseases in case of abnormal ECG. Fig. 1 Method flow diagram

36 Processing and Analysis of Electrocardiogram Signal Using Machine …

439

6 Results and Discussion 6.1 Step-Wise Results for a Normal ECG Signal Figure 2 depicts the original signal of ECG given as an input, which has been used for the purpose of denoising and detecting whether it is normal or abnormal. For this purpose, we have performed denoising of signals using discrete wavelet transform technique and the denoised signal is shown in Fig. 3. In Fig. 4, removal of baseline wandering has been shown on the ECG signal obtained in Fig. 3, which includes removal of low frequency artifacts like breathing, electrically charged electrodes, body movements, etc., by designing specific filters. Fig. 2 Zoomed original signal

Fig. 3 Denoised signal

440

G. S. Saini and K. Kaur

After obtaining a complete denoised ECG signal, peaks have been detected to extract the features and applying machine learning classifiers on these features, the algorithm decides about normality or abnormality of the ECG signal. Figure 5 depicts normality of the signal obtained by denoising of the original ECG signal.

Fig. 4 ECG after baseline wandering removal with new baseline at 0 mV

Fig. 5 Detected peaks and final result showing that the ECG signal is normal and the denoising percentage

36 Processing and Analysis of Electrocardiogram Signal Using Machine …

441

Table 1 Conditions for detecting various heart diseases Diseases

Conditions (s = seconds, mV = millivolts)

Ventricular tachycardia

R-R interval < 0.6 s and QRS interval > 0

Long Q-T syndrome

Q-T interval > 0.57 s

Sinus bradycardia

R-R interval > 1 s or P-P interval > 1 s

Hyperkalemia

Q-T interval < 0.35 s and tall T (Tamp > 0.4 mV)

Hypokalemia

Q-T interval > 0.43 s and flat T (Tamp < 0.05 mV)

Hypercalcemia

Q-T interval < 0.35 s

Hypocalcemia

Q-T interval > 0.43 s

First degree Atrio-ventricular block

P-R interval > 0.20 s

Right atrial enlargement (RAE)

Pamp > 0.25 mV

Myocardial ischemia

Tamp > 0.5 mV

Atrial flutter

P-P or R-R interval < 0.6 s and QRS interval < 0.12 s and regular tachycardia and visible P and atrial rate > ventricular rate

6.2 Results for Abnormal Signals After listing down the diseases and the corresponding conditions that are sufficient to detect the disease, an algorithm was made for proper detection using the basic “if-else” statements. Table 1 consists of the diseases listed down along with the conditions that are sufficient to detect the diseases. Following the information in the below table, code was generated for detecting the mentioned diseases. If the conditions match in the input signal, the corresponding disease is detected. The steps remain the same for abnormal signals too. The only difference is that the detection of the disease should be done accurately, and the output should display the name of the disease(s) after the program is run on the input signal. Here is a result of one of the input signals, showing possible diseases. In Fig. 6, as the height of the P-wave is more than 0.25 mV for some cases of P peaks, right atrial enlargement has been diagnosed. As the height of T-wave is more than 0.5 mV for most of the cases, thus myocardial ischemia has been diagnosed based on the conditions described in Table 1. Either of the two diseases can be confirmed by further medical tests.

7 Conclusion This study was aimed to process ECG signals and analyze them so as to detect whether the signal is normal or abnormal. An exhaustive research and study were done before the initialization of the project. After having reviewed a slew of research

442

G. S. Saini and K. Kaur

Fig. 6 Detection of right atrial enlargement (RAE) and myocardial ischemia

papers related to biomedical engineering and ECG signal processing, we started the project and now, we finally have positive results as determined. We have developed codes which first generate the ECG signal in MATLAB, then denoise the signal, remove the baseline wandering, detect the peaks and finally machine learning algorithms were applied on the extracted dataset for computational prediction of the normal and abnormal ECG signal and also detecting the disease if abnormal. We have been successful in denoising the original ECG signal completely using the wavelet transform method. The peaks and all the diseases are being successfully and accurately detected for which we have made algorithms. This path breaking research work can be potentially utilized for real-time detection of diseases if implemented on the hardware and can be very useful for doctors and organizations in the medical field.

References 1. Karthikeyan P, Murugappan M, Yaacob S (2011) Review on stress inducement stimuli for assessing human stress using physiological signals. In: Taib MN, Adnan R, Samad AM, Tahir NM, Hussain Z, Rahiman MHF (eds) 2011 IEEE 7th international colloquium on signal

36 Processing and Analysis of Electrocardiogram Signal Using Machine …

443

processing and its applications. Penang, Malaysia, pp 420–425 2. Hamilton P (2002) Open source ECG analysis software documentation. E.P Limited, Somerville, USA, pp 101–104 3. Liu X, Zheng Y, Phyu MW, Endru FN, Navaneethan V, Zhao B (2012) An ultra-low power ECG acquisition and monitoring ASIC system for WBAN applications. IEEE J Emerg Sel Top Circuits Syst 2(1):60–70 4. Sornmo L, Laguna P (2006) Electrocardiogram (ECG) signal processing. Wiley Encyclopedia Biomed Eng 1–16 5. Jeyarani AD, Singh TJ (2010) Analysis of noise reduction techniques on QRS ECG waveformby applying different filters. Recent advances in space technology services and climate change 2010 (RSTS & CC-2010), pp. 149–152 6. de Chazal P, Reilly RB (2006) A patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features. IEEE Trans Biomed Eng 53:2535–2543 7. Übeyli ED (2007) ECG beats classification using multiclass support vector machines with error correcting output codes. Digit Signal Process 17:675–684 8. Pourbabaee B, Lucas C (2008) Automatic detection and prediction of paroxysmal atrial fibrillation based on analyzing ECG signal feature classification methods. In: 2008 Cairo international biomedical engineering conference, pp. 1–8 9. Melgani F, Bazi Y (2008) Classification of electrocardiogram signals with support vector machine and particle swarm optimization. IEEE Trans Inf Technol Biomed 12(5):667–677 10. Llamedo M, Martinez JP (2011) Heartbeat classification using feature selection driven by database generalization criteria. IEEE Trans Biomed Eng 58:616–625 11. Rahman MZUR, Shaik RA, Reddy DVR (2012) Efficient and simplified adaptive noise cancellers for ECG sensor based remote health monitoring. IEEE Sens J 12(3):566–573 12. Swathi ON, Ganesan M, Lavanya R (2017) R peak detection and feature extraction for the diagnosis of heart diseases. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). Manipal, India, pp. 2388–2391 13. Lyon A, Minchole A, Martinez JP, Laguna P, Rodriguez B (2018) Computational techniques for ECG analysis and interpretation in light of their contribution to medical advances. J R Soc Interface 15(138):20170821 14. Varatharajan R, Manogaran G, Priyan MK (2018) A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimedia Tools Appl 77(8):10195–10215 15. Karthikeyan P, Murugappan M, Yaacob S (2012) ECG signal denoising using wavelet thresholding techniques in human stress assessment. Inter J Electr Eng Inf 4(2):306–319 16. Saritha C, Sukanya V, Murthy YN (2008) ECG signal analysis using wavelet transforms. Bulg J Phys 35:68–67

Chapter 37

Design of High Voltage Gain DC-DC Converter with Fuzzy Logic Controller for Solar PV System Under Dynamic Irradiation Conditions CH Hussaian Basha , G. Devadasu , Nikita Patil , Abhishek Kumbhar , M. Narule , and B. Srinivasa Varma

1 Introduction At present, the usage of nonrenewable energy sources for the generation of power is decreasing drastically because of its less availability on the earth. From the literature review, the classification of conventional energy sources are oil, coal, fuel woods, thermal, and nuclear [1]. The conventional energy sources disadvantages are compensated by applying the renewable power resources. The most popular nonconventional resources are geothermal, hydropower, tidal, marine energy, solar, plus wind [2–5]. In this article, solar power is used to supply the electricity to the grid. Solar is a most attractive and popular source because of its unlimited accessibility in the environment. The features of solar are good flexibility, zero noise generation, and high abundance. The solar PV system works comparable to the basic P–N junction diode. The photons clash to the P and N semiconductors. So that the free electrons move from one direction to another [6]. The potential generation of a single cell is 0.75 V to 0.8 V which is not useful for the customers. To make high voltage rating, the cells are formed in the way of serial and parallel to form a module. The plenty of modules form a panel, and the interconnection of panels form an array. The supply voltage

CH Hussaian Basha (B) Department of EEE, Nitte Meenakshi Institute of Technology, Bengaluru, Karnataka, India e-mail: [email protected] G. Devadasu CMR College of Engineering and Technology (Autonomous), Hyderabad, India N. Patil · A. Kumbhar · M. Narule · B. S. Varma Department of EE, Nanasaheb Mahadik College of Engineering, Peth, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Yadav et al. (eds.), Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-8742-7_37

445

446

CH Hussaian Basha et al.

Fig. 1 Structure of PV with DC-DC converter for grid-connected solar system

required to the customer is high, then the plenty modules are formed in serial manner. Otherwise, it may be formed in parallel [7]. From the literature review, the classification of PV panels are polycrystalline silicon, carbon-nanotube, graphene, thin film, mono, and cesium. Among all, the mono crystalline manufacturing technique gives high efficiency [8]. So, most of the researchers are working on the mono type PV cell technologies. There are many research topics involved in the solar PV power systems which are power point tracking design, types of PV cell circuit used, interfacing of converters, inverters, and load. Here, the major focus is MPPT methodology. Basically, solar power systems give nonlinear behavior voltage and current characteristics at diverse variant irradiation conditions. So, the operating PowerPoint finding of PV is a one of the major tasks in PV grid-connected systems [9]. In this article, a flexible MPPT controller is designed to transfer the maximum power from source to load at dynamic irradiation conditions. The schematic structure of the proposed PV system is given in Fig. 1. Another disadvantage of PV is high per unit power installation and generation cost which is limited by using the boost converter. As of now, there are two types: converter topologies are recommended to step-up the voltage. Those are isolated with a transformer and non-isolated without a transformer [10]. The isolated converters require an additional rectifier in the circuit. Due to that the cost of the converter circuit is high. Also, it requires more space to design for the PV system. To overcome this disadvantage, in this article, an inductor coupled non-isolated, single switch, and high voltage gain DC-DC converter is applied to improve the potential conversion ratio of PV systems.

2 Design of Solar PV Array As from the previous discussion, the PV cell manufacturing has been done by using the different silicon materials. In this article, the PV model is designed by utilizing

37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …

447

Fig. 2 Mathematical modeling of PV

monocrystalline silicon. From the literature study, there are plenty of PV cell circuit topologies available. The basic PV cell methods are ideal diode, single diode-based PV cell, dual diode dependent PV cell, and plus triple diode dependent PV cell [11]. In this work, a one diode PV cell is applied to implement the PV array. The attractive features of this PV cell are easy design and implementation. The variables considered for the design of single diode dependent PV cells are series resistance (Rs ), diode diffusion current factor (I 0 ), parallel resistance (Rp ), cell output current (I PV ), and air mas (a). From Fig. 2, the PV array current at output is obtained as  V pv +I pv Rs  V +I R pv pv s I P V = Ice − I0 e ns ∗V t − 1 − Rp Vt =

A ∗ kt ST C q

(1) (2)

From Eq. (1) and (2), I PV and V PV are the PV array output current and output voltage, respectively. Similarly, ns and V t are the total number of series connected cells per string and PV thermal voltage. Here, T STC and ‘ A’ are the junction operating temperature at standard test condition, quality factor of P–N diode. From Eq. (1), the PV system parameters are derived as follows as

IM P P T

 Isc Rs  I R sc s Isc = I pv − I0 ∗ e ns ∗Vt − Rp  Voc  V oc Ioc = 0 = I pv − I0 e ns ∗Vt − 1 − Rp  VM P P T +I M P P T Rs  V M P P T + I M P P T Rs n s ∗Vt − = I pv − I0 ∗ e Rp

(3) (4) (5)

From the nonlinear I–V and P–V characteristics, at peak power point of solar cell, the differential of evaluated power related to the voltage is neglected which is given in Eq. (6). Similarly, the differentiation of current related to the potential is equal to the inversely proportional of the shunt resistance as given in Eq. (7). The detailed design constraints of the solar array is given in Table 1.

448

CH Hussaian Basha et al.

Table 1 Solar array designed parameters

Parameters

Symbol

Value

Peak to peak power

PMPPT

5.188 kW

Voltage of PV at open circuit

V oc

497.38 V

Peak to peak voltage

V MPPT

402.0 V

Total parallel strings

Np

2

Total series cells for each string

ns

24

Maximum peak current

I MPPT

12.89 A

Resistance at parallel

Rp

86.803 

Resistance at series

RS

0.27328 



 dPpv =0 dV V =VM P P T   dI 1 =− dV I =Isc R po

(6) (7)

3 Analysis of Single Switched Power Converter In most of the articles, the PV array power is directly fed to the inverter for converting direct current to alternating current as its desirable characteristics are less size, fair transmission power losses, high efficiency, and less cost. But, it gives less utilization factor of the solar PV [12]. Here, the coupled inductor concept is used in the planned converter to advance the voltage conversion ratio of the supply system and its interfacing between the supply and load is a challenging task. The advantages of coupled inductor technique is it is helpful for obtaining the fast and accurate dynamic response. The proposed structure of the converter is given in Fig. 3.

Li

IPV

Coupled Inductors Ii Rin

I0t

Iot Cs1

+ VPV

R0t L0t

C Cin

Switch (Q)

G

Fig. 3 Single switch inductor coupled converter

E

D

Vs1 Vo

N Cs2

Vs2

37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic … Table 2 DC-DC coupled converter and DC-AC values

449

Parameters

Description

Values

rIN

Internal i/p resistance of converter

32.2 m

cs2

Secondary coupled capacitor

11.0 mF

cin

Primary coupled capacitor

11.00 mF

lot

Output inductor

14.13 mH

cs1

Secondary capacitor

9.98 mF

li

Input inductor

15.60 mH

rot

Output resistor

72.01 m

rtotal

Total equivalent resistor

0.278 

lx

Magnetizing inductor

12.88 mH

cx

Source side capacitor

0.62 mF

From Fig. 3, it is clearly noticed that the solar system improves the boost converter output power without adjusting the duty of the converter. Here, the windings are placed on the two outside limbs of the core to design the high voltage gain of the converter. The small space is provided in between the two limbs of the inductor to make the absence of dispersion. The design parameters of the converter are given in Table.2. The inductor core is designed with less magnetic swell to operate the converter at high efficiency. From Fig. 3, the terms Rin and Rot are the primary and secondary windings internal resistances. The switch is selected as an insulated gate bipolar transistor (IGBT). The benefits of the power converter are high flexibility, moderate electromagnetic interference, improved efficiency, and low switching plus conduction power losses. The production voltage of the converter and its corresponding primary and secondary inductors are derived in terms of transformer turns ratio.

L 0t =

1 + Nratio D V0 = VPV D

(8)

L total  Li =  2 1 + Nratio

(9)

2 Nratio 2 ∗ L total = Nratio ∗ Li 2 1 + Nratio

Rtotal 1 + Nratio   Nratio ∗ Rtotal R0t = 1 + Nratio Rin =

(10) (11) (12)

From Eq. (8–10), the terms L i , L ot , and L total are the primary winding, secondary winding, and total winding inductances and their corresponding internal resistances

450 Table 3 Design values of sliding technique

CH Hussaian Basha et al. Parameters

Description

Values

C(s)

Integral output

1.01

Ki

Integral gain

5.4

A

Error signals

0.20

ωh

High pass frequency

10 rad/sec

ω

Reference frequency

100 rad/sec



Slider surface

1.5

are Rin , Rot , and Rtotal . The transformation of the inductive winding turns is indicated as N ratio . The design values of inverter and slider are given in Table.3.

3.1 Sliding Controller for Single Switch Converter In this work, the lower order filter is used to obtain the good time varying response of the system. Also, the advantages of a used lower order filter give wide input and output operation, high voltage conversion ratio of converter, high stable output voltage, and effective nonlinear behavior handling capability. By using this low pass filter, the converter output DC-link capacitor harmonics are filtered. Based on the proper state variables selection, the converter coupled winding magnetic flux is determined which is given in Eq. (13).  Imag =

t ∈ Ton Ii Ii ∗ (1 + N ) t ∈ Toff

(13)

From Eq. (13), the on time of the IGBT is T on = D*T, and the blocking time period of switch is T off = (1 − D)*T. Finally, the forward bias and reverse bias operating time is indicated as T = T on + T off .

3.2 Design and Analysis of Adaptive MPPT Controller As we discussed previously, the predictable power point tracking controllers are not suitable for various insolation values of PV arrays. Here, the MPPT controller is projected for obtaining the maximum power of the PV module. Under diverse time varying irradiation conditions, the adaptive controller works based on the rotation of PV arrays. The demodulator angle (φd ) and integrator gain (K i ) are selected as 1 and 5.4, respectively. The lead compensator is used for the correction of error signals which is indicated as A(s).

37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …

451

4 Design and Analysis of Two-Leg Inverter Basically, the design of the proposed PV fed grid-connected system is given in Fig. 4 [13]. In this work, a two-leg power inverter circuit is used for the regular power supply of PV to the load. From Fig. 4, it is indicated that the phase ‘x’ supplies the line voltage of V xy . The power generation of phase ‘ y’ is V yx . The root mean square value of the inverter voltage is calculated from the summation of V xy and V yx . The inverter gives three phase voltages which are represented as V px and V py and the corresponding phase currents are I px and I py . The filter inductor (L x ) is linked in sequence with the line, and the capacitor (C x ) is connected in parallel with the grid. The mathematical design of inverter is explained as  dI j 2  = V j − 2Rt I j − V pj − Is Rt − IU dt 3 ∗ Lt IU =

 1  Vs − 2Rt Is − V ps − I j Rt 3 ∗ Lt

(14) (15)

 dIinvej 1  = I j + Iinvej − I pj dt Cx

(16)

 Imag (1 − u) −1  dVcs1 = u x Iinve−x + u b Iinve−y + dt Cs1 Cs1 (1 + Nratio )

(17)

 Imag (1 − u) 1  dVcs2 = (1 − u a )Iinve−x + (1 − u b )Iinv−y + dt Cs2 Cs2 (1 + Nratio )

(18)

Fig. 4 Working analysis of proposed inverter circuit

452

CH Hussaian Basha et al.

Table 4 States of operation of three phase inverter Node

Operation of switches

Output of inverter

x

y

T1

T2

T3

T3

V inve_x

V inve_y

V inve_z

0

0

1

0

1

0

−0.33 V 0

−0.33 V 0

0.66 V 0

0

1

1

1

0

0

−V 0

V0

0

1

0

0

0

1

1

V0

−V 0

0

1

1

0

1

0

1

0.33 V 0

0.33 V 0

−0.66 V 0

where the subscript represented j, K ∈ {x, y}. From Eqs. (17) and (18), the DClink capacitor voltage is purely depending on the converter transformation ratio and slider surface of the converter. The inverter generated voltages vary based on the magnetizing current of the coupled inductor current. The detailed operation of the inverter is given in Table.4. The phase opens delta related voltages, and currents are expressed as vpx , vpy , ipx, and ipy . Basically, the slider controller can be used in this three phase inverter. But, it has the drawback of requiring many sensors to sense all error signals for every switching pulse generation. In addition, it is not useful for the effective operation of the inverter. The disadvantages of sliding technique are complex in implementation, required high cost sensors, and less accurate in controlling grid voltage and current. So, the disadvantages of a sliding controller are limited in this work, a fuzzy logicbased PWM generator is used to achieve the switching pulses to the DC-AC converter. The working diagram of the fuzzy is given in Fig. 5. In this controller, three types of operations are involved which are classified as fuzzification, interference, and defuzzification. The DC-DC circuit voltages and load voltages are fed to the input to the fuzzification controller to convert the real values into linguistic functions. The middle process of the fuzzy controller is called interference. The interference solves all converter and inverter-based supply and output parameters. The last operation of fuzzy is inverse fuzzification that can be used to the inverter to convert linguistic functions into crisp answers. The main consideration of fuzzy logic PWM is to supply the maximum solar power to the grid with constant frequency. The phase voltage displacement angle is obtained by using Eq. (19), and plus the root mean square value of inverter voltages are calculated by applying Eq. (20).   α = K P V0ref −V0 + K i ref = V ph





 V0ref −V0 ∗ dt

V ∗ sin(ς ) sin(ς − α)

(19) (20)

37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …

453

Fig. 5 Working flow of fuzzy pulse generation

5 Discussion of Simulation Results From the previous discussions, the solar PV systems output voltage is varying continuously because of sudden changes of atmospheric conditions as shown in Fig. 6. In this manuscript, a single diode topology built PV array is implemented which is shown in Fig. 2. From the nonlinear curves of solar systems, it is indicated that the MPP existed on the power curves for each irradiation changing condition. From the nonlinear curves, at 1000 W/m2 , the peak voltage of the proposed PV system is 415 V and its corresponding current and power are calculated as 23.33 A and 9.681 kW, respectively. Similarly, at second irradiation 750 W/m2 , the determined PV array voltage, current, and powers are 406 V, 16.82A, and 6.828 W, respectively. Finally, at 500 W/m2 , the solar array peak power, voltage, and currents are 5.181 kW, 402 V, and 12.89A, respectively. Here, the overall system is designed by using Simulink software. In the proposed system, there are two different stages involved which are DC-DC power conversion and DC-AC power transmission to the grid. The overall power generation of the solar system output waveform at peak operating point is given in Fig. 7 at diverse irradiation conditions. From Fig. 7, the solar array power is maximum at 1000 W/m2 and it is maintained constant up to the time period of 4 s. The rising time of power is 0.05 s, and the settling time duration is 0.8 s. After 4 s of time, the power starts reducing from 9.681 kW to 6.828 kW from the time period of 4 s to 4.2 s. Finally, the PV power again reduced from 6.828 kW to 5.181 kW at 500 W/m2 . The solar power is given to the inductor coupled, wide output, and input operation power converter to increase the effective utilization of the solar PV installation. The converter step-up the PV voltage from 410 to 1200 V by varying its winding turns. Also, it is useful for any automotive industry application. Here, the converter topology consists of a single switch. So, the generation of ripples in converter generated voltage is decreased excessively which is shown in Fig. 8. From Fig. 8, it is mentioned that the step down of irradiation from 1000 W/m2 to 800 W/m2 , then the corresponding voltage gets disturbed up to the time interval of 0.7 s. However, this distortion is very

454

CH Hussaian Basha et al.

Fig. 6 Current versus voltage curves and P–V curves at 1000, 750, and 500 W/m2

Fig. 7 Solar array output power at different irradiation conditions

37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …

455

less when compared to the other convention controllers because the slider is working effectively by the proper selection of sliding surface. Also, the adaptive controller gives optimum duty cycle to the converter. Similarly, the inverter output DC-link capacitors generate the maximum AC-voltage for the commercial load applications. The obtained voltage waveforms of DC-link capacitors and its zoom views are shown in Fig. 9. From Fig. 9, it can be said that the DC-capacitors are working under balanced conditions based on their working output voltage. If the capacitor gives unbalanced voltages, then the switches in the inverter break down. So, the values of capacitors should be the same and equal to operate the IGBT switches efficiently. The obtained DC-link capacitor voltages at 1000 W/m2 is 600 V and 580 V. The point of inverter

Fig. 8 Inductor coupled single switch converter output voltage at different irradiation conditions

Fig. 9 Inverter capacitor voltages at time varying insolation condition

456

CH Hussaian Basha et al.

output is connected to a filter (L f –C f ) to eliminate unwanted harmonics components in the grid currents which is given in Fig. 10. The nominal voltage and operating currents of the inverter are 20.5 V and 15.2A, respectively. The per unit grid voltage and currents are given in Fig. 11. Here, the voltage and currents are in phase. So, the phase angle difference between the voltage and current is zero. Hence, the grid is operating effectively with unity power factor.

Fig. 10 3-phase network currents at dynamic insolation condition of solar PV

Fig. 11 Per unit three phase grid currents and voltages at time varying irradiation condition

37 Design of High Voltage Gain DC-DC Converter with Fuzzy Logic …

457

6 Conclusion The proposed MPPT controller is tracing the functioning point of PV efficiently and effectively with high convergence speed. The merits of this MPPT technique are less fluctuations, high error detection accuracy, independent of the PV array installation, and requiring less sensors to sense the variables. The sliding controller is giving the optimum duty cycle to the boost converter for achieving the