Cybernetics, Cognition and Machine Learning Applications: Proceedings of ICCCMLA 2019 (Algorithms for Intelligent Systems) 9811516316, 9789811516313

This book provides a collection of selected papers presented at the International Conference on Cybernetics, Cognition a

123 118 10MB

English Pages 337 [315] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Cybernetics, Cognition and Machine Learning Applications: Proceedings of ICCCMLA 2019 (Algorithms for Intelligent Systems)
 9811516316, 9789811516313

Table of contents :
Preface
Contents
About the Editors
1 A Novel Design and Implementation of Pipeline Inspection System
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 Proposed Design
3.2 Mathematical Model
4 Hardware Design
4.1 Infrared Sensor (IR)
4.2 Motor Driver (L293D) and DC Motors
4.3 Ultrasonic Sensor
4.4 Smoke Sensor (MQ-6)
4.5 Temperature Sensor (DHT)
4.6 Bluetooth (HC-05)
4.7 Architecture of the Bot
5 Software Implementation
5.1 Experimental Results and Discussions
6 Conclusion
References
2 An In-Depth Survey on Massive Open Online Course Outcome Prediction
1 Introduction
1.1 K-Means Classification
1.2 Linear Regression
1.3 Hidden Markov Model
1.4 Fuzzy Classification
2 Literature Review
3 Conclusion
References
3 Prediction of Cardiovascular Diseases Using HUI Miner
1 Introduction
2 Related Work
3 HUI Mining
4 Methodology
4.1 Acquisition of Data Set
4.2 Processing of Data Set
4.3 Patient Alert System Through Email
5 Results
6 Conclusions and Future Scope
References
4 Selection of Connecting Phrases in Weather Forecast
1 Introduction
2 Related Works
3 Experiments
3.1 Dataset
3.2 Setup
4 Results
5 Discussions
5.1 Comparison of Our Result with a Previous Work
6 Conclusion and Future Work
References
5 Analysis of Bayesian Regularization and Levenberg–Marquardt Training Algorithms of the Feedforward Neural Network Model for the Flow Prediction in an Alluvial Himalayan River
1 Introduction
2 Study Area
3 Methodology
4 Results and Discussions
5 Conclusion
References
6 Data Mining on Cloud-Based Big Data
1 Introduction
1.1 The Service Models
2 Background Review
3 Discussion
3.1 HDFS Architecture
3.2 Grid Computing Tools
3.3 Big Data Challenges and Issues
4 Conclusion
References
7 Performance Analysis of gTBS and EgTBS Task Based Sensing in Wireless Sensor Network
1 Introduction
2 Proposed Work
3 System Model
4 Performance Evaluation
5 Conclusion
References
8 A Secured Framework for Cloud Computing
1 Introduction
2 Related Work
3 Different Framework for Mobile Cloud Computing
4 Topic Initiative
5 Proposed Work
6 Conclusion
References
9 A Secure Shoulder Surfing Resistant Hybrid Graphical User Authentication Scheme
1 Introduction
2 Background
3 Proposed Scheme
4 Security Analysis
4.1 Password Space
4.2 Password Entropy
4.3 Secure from Attacks
5 User Study
6 Conclusion
References
10 Origin Identification of a Rumor in Social Network
1 Introduction
2 Related Work
3 Methodology
3.1 Diffusion Model
3.2 Candidate Partition
3.3 Origin Identification of a Rumor
4 Experimental Study and Results
5 Conclusion
References
11 Luminance [Y] Utility to Compact Color Video
1 Introduction
2 Luminance-Based Coding
3 Experimental Result
4 Conclusion
References
12 Apriori Algorithm and Decision Tree Classification Methods to Mine Educational Data for Evaluating Graduate Admissions to US Universities
1 Introduction
2 Review of Related Work
3 Apriori Algorithm
3.1 Apriori Property
3.2 Evaluation Metrics
4 Classification and Regression Trees
4.1 CART Algorithm
4.2 Evaluation Metrics
5 Methodology
5.1 Data Collection and Analysis
5.2 Data Preprocessing
6 Results
6.1 Admissions into Top Ranked Universities
6.2 Admissions into Mid-Ranked Universities
6.3 Admissions into Low-Ranked Universities
7 Discussion
8 Conclusion
References
13 ABC-Based Algorithm for Clustering and Validating WSNs
1 Introduction
2 Artificial Bee Colony
3 ABC-Based Clustering
3.1 Initial Population
3.2 Neighborhood Algorithm
4 Clustering WSN Using ABC
5 Simulation
5.1 Clustering Validation
5.2 Simulation Scenarios
6 Conclusion
References
14 Improving SPV-Based Cryptocurrency Wallet
1 Introduction
1.1 Fundamental Prerequisites of the Cryptocurrency Payment System
1.2 Decentralized Information Sharing Over Internet
1.3 Digital Signature
1.4 Wallet
2 Literature Review
2.1 Current Trends and Acceptance of Wallet Currencies
3 Challenges in Various Available Wallet Types
4 New Approach to Handle the Above Issues
4.1 Assumptions
4.2 What Can We Alter?
4.3 Comparison Chart
5 Conclusion
References
15 Modelling Fade Transition in a Video Using Texture Methods
1 Introduction
2 Shot Transition
3 Methodology
3.1 Texture Methods
3.2 Extraction of Texture Features
3.3 Machine Learning for Video Transition
3.4 Learning Phase
3.5 Verification Phase
3.6 Identification Phase
4 Experimental Study and Results
5 Evaluation
6 Conclusion
Appendix 1
Appendix 2
Appendix 3
References
16 Video Shot Detection and Summarization Using Features Derived From Texture
1 Introduction
2 Related Work
3 Feature Extraction
3.1 Gray Level Co-occurrence Matrix
3.2 Texture Spectrum
4 Video Shot Detection
5 Video Summarization
5.1 Affinity Propagation
6 Experimental Results
6.1 Video Shot Detection
6.2 Video Summarization
6.3 Analysis
7 Conclusion
References
17 Numerical Approximation of Caputo Definition and Simulation of Fractional PID Controller
1 Introduction
2 Fractional Calculus
3 Properties of Fractional Diffintegration
4 Fractional Numerical Method
5 Proof of Lemma 1
6 MATLAB: Algorithm and Simulation
7 MATLAB Simulink Model
8 Fractional PID
9 Conclusion
References
18 Raitha Bandhu
1 Introduction
2 Literature Survey
3 Working
4 Implementation
5 Conclusion
References
19 Queuing Theory and Optimization in Banking Models
1 Introduction
2 Background and Existing Research
3 Field Research
4 Performance of Machines
5 Performance of Servers Shown Graphically
6 Discussion on Obtained Results
7 Conclusion
References
20 A Survey on Network Coverage, Data Redundancy, and Energy Optimization in Wireless Sensor Network
1 Introduction
2 Issues in Wireless Sensor Network
2.1 Coverage Area
2.2 Data Redundancy
3 Approaches for Energy Optimization in WSN
3.1 Ant Colony Optimization
3.2 Deterministic Sensing
3.3 Probabilistic Sensing
3.4 Low-Energy Adaptive Clustering Hierarchy
4 Gap Analysis
4.1 Limited Node Mobility
4.2 Heterogeneous Network with Obstacles
4.3 Optimization of Wake-Up Rate of Sleeping Nodes
4.4 Data Redundancy
4.5 Node Failure Probability
4.6 Sensing Void
5 Conclusion
References
21 An Offline-Based Intelligent Motor Vehicle Driver Behavior Analysis Using Driver’s Eye Movements and an Inexperienced and Unauthorized Driver Access Control Mechanism
1 Introduction
2 Proposed System
2.1 Block Diagrams
3 Future Work
4 Conclusion
References
22 Remote Monitoring and Maintenance of Patients via IoT Healthcare Security and Interoperability Approach
1 Introduction
2 Literature Survey
3 Proposed Method
3.1 Authentication
3.2 Encryption
3.3 Operational Efficiency
3.4 Automatic Data Entry
4 Results
4.1 Comparative Analysis
5 Conclusion
References
23 Survey of Object Detection Algorithms and Techniques
1 Introduction
2 Literature Survey
3 Objectives
4 Methodologies
5 Discussion
6 Performance Analysis and Comparison
7 Conclusion and Future Scope
References
24 Study on Recent Methods of Secure Provable Multi-block Level Data Possession in Distributed Cloud Servers Using Cloud-MapReduce Method
1 Introduction
2 Related Works
3 Research Gaps
4 Conclusion and Future Work
References
25 Insolent Tube Trickle Revealing Classification
1 Introduction
2 Methodology
3 Hardware Components
3.1 Flow Sensor
3.2 GPRS Module
3.3 Microcontroller
4 Construction and Working
4.1 Placement of Flow Sensor
4.2 Flow Rate Monitoring System
4.3 Leak Detection Algorithm
4.4 Integrated Gas Cut-off System
5 Results and Discussion
References
26 Evolving Database for New Generation Big Data Applications
1 Introduction
2 Big Data Application
3 Big Data Challenges
4 Techniques for Big Data Processing
4.1 Mathematical Analytics Techniques
5 Data Analytics Techniques
5.1 Data Mining
6 Research Methodology
7 Conclusion
References
27 Connotation Imperative Mining with Regression for Optimization Methodology
1 Introduction
2 Association Rule Mining
3 Related Works
4 Apriori Algorithm
5 Proposed Algorithm
6 Results
7 Implementation
8 Conclusion
9 Future Scope
Reference
28 Study on Recent Method of Dynamic Time-Based Encryption (DTBE) and Searchable Access Control Scheme for Personal Health Record in Cloud Computing
1 Introduction
2 Related Works
3 Conclusion and Future Work
References
29 Learning of Operational Spending and Its Accompanying Menace and Intimidations
1 Introduction
2 Business System Concepts
3 Elements of a System
4 Control
5 Boundaries and Interface
6 Made Information Systems
7 Systems Models
8 Static System Models
9 Testing Libraries
References
30 Learning on Demonstration of Control Classification Expending HVDC Links
1 Introduction
2 Literature Review
3 Converter Performance Analysis
4 AC/DC Power Flow
5 Conclusions
References
31 Cloud Computing Security in the Aspect of Blockchain
1 Introduction
1.1 The Blockchain Overview
1.2 The Blockchain Ontology
2 Research Methodology
2.1 Risks Involved in Cloud Computing and Security Issues
3 Current Trends and Challenges
3.1 Risks Involved in Cloud Computing
3.2 Threats in Cloud Computing
4 Concluding Remarks and Future Work
References
32 Analysis of Various Routing Protocol for VANET
1 Introduction
1.1 Applications of VANETs
2 Background
3 Result with Simulation
4 Simulation and Result
5 Comparison
6 Conclusion and Future Work
References
33 Correction to: Study on Recent Method of Dynamic Time-Based Encryption (DTBE) and Searchable Access Control Scheme for Personal Health Record in Cloud Computing
Correction to: Chapter 28 in: V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_28

Citation preview

Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar

Vinit Kumar Gunjan · P. N. Suganthan · Jan Haase · Amit Kumar · Balasubramanian Raman Editors

Cybernetics, Cognition and Machine Learning Applications Proceedings of ICCCMLA 2019

Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, Department of Mathematics and Computer Science, Liverpool Hope University, Liverpool, UK

This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings.

More information about this series at http://www.springer.com/series/16171

Vinit Kumar Gunjan P. N. Suganthan Jan Haase Amit Kumar Balasubramanian Raman •







Editors

Cybernetics, Cognition and Machine Learning Applications Proceedings of ICCCMLA 2019

123

Editors Vinit Kumar Gunjan Department of Computer Science and Engineering CMR Institute of Technology Kandlakoya, Telangana, India

P. N. Suganthan School of Electrical and Electronics Engineering Nanyang Technological University Singapore, Singapore

Jan Haase Helmut Schmidt University Hamburg, Germany

Amit Kumar Bioaxis DNA Research Centre (P) Ltd. Hyderabad, Telangana, India

Balasubramanian Raman Department of Computer Science and Engineering Indian Institute of Technology Roorkee, Uttarakhand, India

ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-15-1631-3 ISBN 978-981-15-1632-0 (eBook) https://doi.org/10.1007/978-981-15-1632-0 © Springer Nature Singapore Pte Ltd. 2020, corrected publication 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This book provides a collection of selected papers presented at the International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA 2019), which was held in Goa, India, on 16–17 August 2019. It has come up as comprehensive knowledge bank of the fast-growing area of cybernetics, machine learning and cognitive science research. Editors have focused on facilitating a cohesive view of the framework for computational intelligence and related applied research discipline by focusing through the areas of modern approaches in machine learning, cognitive science and their applications. This book can act as a reference for scholars intending to pursue research in the above-said fields. It offers a valuable guide for researchers and industry practitioners willing to keep abreast of the latest developments in dynamic, exciting and interesting research discipline of communication engineering, driven by next-generation IT-enabled techniques. Chapters related to the development of communication systems using advanced cybernetics, data processing, swarm intelligence, cyber-physical systems, applied mathematics and development of embedded and real-time systems are included. Some chapters of this book present key algorithms and theories that form the core of the technologies and applications concerned consisting of cloud-based big data, wireless sensor networks, clustering, cryptocurrency, optimization techniques, encryption, blockchain and network routing protocols. It also discusses software modules in deep learning algorithms. All in all, this proceedings is a great insight of modern approaches towards artificial intelligence, cognition, machine learning, modelling and network performance measures. Kandlakoya, India Singapore, Singapore Hamburg, Germany Hyderabad, India Roorkee, India

Vinit Kumar Gunjan P. N. Suganthan Jan Haase Amit Kumar Balasubramanian Raman

v

Contents

1

2

A Novel Design and Implementation of Pipeline Inspection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. M. Rajesh, B. Rakesh, P. Rohan, C. K. Santosh, Suhas Shirol and S. Ramakrishna An In-Depth Survey on Massive Open Online Course Outcome Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gaurav, Gagandeep Kaur and Poorva Agrawal

1

11

3

Prediction of Cardiovascular Diseases Using HUI Miner . . . . . . . . D. V. S. S. Aditya and Anjali Mohapatra

23

4

Selection of Connecting Phrases in Weather Forecast . . . . . . . . . . . Pratyoy Das and Sudip Kumar Naskar

33

5

Analysis of Bayesian Regularization and Levenberg–Marquardt Training Algorithms of the Feedforward Neural Network Model for the Flow Prediction in an Alluvial Himalayan River . . . . . . . . . Ruhhee Tabbussum and Abdul Qayoom Dar

6

Data Mining on Cloud-Based Big Data . . . . . . . . . . . . . . . . . . . . . . Anil Kuvvarapu, Anusha Nagina Kunuku and Gopi Krishna Saggurthi

7

Performance Analysis of gTBS and EgTBS Task Based Sensing in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pooja Jadhav and Ajitsinh Jadhav

8

A Secured Framework for Cloud Computing . . . . . . . . . . . . . . . . . Sana M. Bagban and H. A. Tirmare

9

A Secure Shoulder Surfing Resistant Hybrid Graphical User Authentication Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shailja Varshney, Mohammad Sarosh Umar and Afrah Nazir

43 51

61 73

79

vii

viii

Contents

10 Origin Identification of a Rumor in Social Network . . . . . . . . . . . . Sushila Shelke and Vahida Attar

89

11 Luminance [Y] Utility to Compact Color Video . . . . . . . . . . . . . . . Neha Shammi Wahab

97

12 Apriori Algorithm and Decision Tree Classification Methods to Mine Educational Data for Evaluating Graduate Admissions to US Universities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Pranav Manjunath and Kushal Naidu 13 ABC-Based Algorithm for Clustering and Validating WSNs . . . . . 117 Abdo M. Almajidi, V. P. Pawar, Abdulsalam Alammari and Nadhem Sultan Ali 14 Improving SPV-Based Cryptocurrency Wallet . . . . . . . . . . . . . . . . 127 Adeela Faridi and Farheen Siddiqui 15 Modelling Fade Transition in a Video Using Texture Methods . . . . 139 Jharna Majumdar, N. R. Giridhar and M. Aniketh 16 Video Shot Detection and Summarization Using Features Derived From Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Jharna Majumdar, M. P. Ashray, H. M. Madhan and Dhanush M. Adiga 17 Numerical Approximation of Caputo Definition and Simulation of Fractional PID Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Sachin Gade, Mahesh Kumbhar and Sanjay Pardeshi 18 Raitha Bandhu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 G. V. Dwarakanath and M. Rahul 19 Queuing Theory and Optimization in Banking Models . . . . . . . . . . 205 Runu Dhar, Abhishek Chakraborty and Jhutan Sarkar 20 A Survey on Network Coverage, Data Redundancy, and Energy Optimization in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . 215 Asha Rawat and Mukesh Kalla 21 An Offline-Based Intelligent Motor Vehicle Driver Behavior Analysis Using Driver’s Eye Movements and an Inexperienced and Unauthorized Driver Access Control Mechanism . . . . . . . . . . . 225 Jai Bharath Kumar Gangone 22 Remote Monitoring and Maintenance of Patients via IoT Healthcare Security and Interoperability Approach . . . . . . . . . . . . 235 Madhavi Latha Challa, K. L. S. Soujanya and C. D. Amulya

Contents

ix

23 Survey of Object Detection Algorithms and Techniques . . . . . . . . . 247 Kamya Desai, Siddhanth Parikh, Kundan Patel, Pramod Bide and Sunil Ghane 24 Study on Recent Methods of Secure Provable Multi-block Level Data Possession in Distributed Cloud Servers Using Cloud-MapReduce Method . . . . . . . . . . . . . . . . . . . . . . . . . . 259 B. Rajani, V. Purna Chandra Rao and E. V. N. Jyothi 25 Insolent Tube Trickle Revealing Classification . . . . . . . . . . . . . . . . 265 Sara Begum 26 Evolving Database for New Generation Big Data Applications . . . . 273 K. Raja Shekar and B. Bhoomeshwar 27 Connotation Imperative Mining with Regression for Optimization Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 B Bhoomeshwar and K Raja Shekar 28 Study on Recent Method of Dynamic Time-Based Encryption (DTBE) and Searchable Access Control Scheme for Personal Health Record in Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . 289 E. V. N. Jyothi, V. Purna Chandra Rao and B. Rajani 29 Learning of Operational Spending and Its Accompanying Menace and Intimidations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Rachana Pembarti and C. S. R. Prabhu 30 Learning on Demonstration of Control Classification Expending HVDC Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 B. Ravi Teja and Swathi Sharma 31 Cloud Computing Security in the Aspect of Blockchain . . . . . . . . . 309 K. Praveen Kumar and Gautam Rampalli 32 Analysis of Various Routing Protocol for VANET . . . . . . . . . . . . . 315 Akansha Bhati and P. K. Singh Correction to: Study on Recent Method of Dynamic Time-Based Encryption (DTBE) and Searchable Access Control Scheme for Personal Health Record in Cloud Computing . . . . . . . . . . . . . . . . . E. V. N. Jyothi, V. Purna Chandra Rao and B. Rajani

C1

About the Editors

Vinit Kumar Gunjan Associate Professor in Department of Computer Science and Engineering at CMR Institute of Technology Hyderabad (Affiliated to Jawaharlal Nehru Technological University, Hyderabad). An active researcher; published research papers in IEEE, Elsevier & Springer Conferences, authored several books and edited volumes of Springer series, most of which are indexed in SCOPUS database. Awarded with the prestigious Early Career Research Award in the year 2016 by Science Engineering Research Board, Department of Science & Technology Government of India. Senior Member of IEEE, An active Volunteer of IEEE Hyderabad section; volunteered in the capacity of Treasurer, Secretary & Chairman of IEEE Young Professionals Affinity Group & IEEE Computer Society. He was involved as organizer in many technical & non-technical workshops, seminars & conferences of IEEE & Springer. During the tenure he had an honour of working with top leaders of IEEE and was awarded with best IEEE Young Professional award in 2017 by IEEE Hyderabad Section. P. N. Suganthan Professor at Nanyang Technological University, Singapore and Fellow of IEEE. He is a founding co-editor-in-chief of Swarm and Evolutionary Computation (2010-), an SCI Indexed Elsevier Journal. His research interests include swarm and evolutionary algorithms, pattern recognition, forecasting, randomized neural networks, deep learning and applications of swarm, evolutionary & machine learning algorithms. His publications have been well cited (Google scholar Citations: *33k). His SCI indexed publications attracted over 1000 SCI citations in a calendar year since 2013. He was selected as one of the highly cited researchers by Thomson Reuters every year from 2015 to 2018 in computer science. He served as the General Chair of the IEEE SSCI 2013. He is an IEEE CIS distinguished lecturer (DLP) in 2018-2020. He has been a member of the IEEE (S’91, M’92, SM’00, Fellow’15) since 1991 and an elected AdCom member of the IEEE Computational Intelligence Society (CIS) in 2014-2016.

xi

xii

About the Editors

Jan Haase is Lecturer at Helmut Schmidt University of the Federal Armed Forces, Hamburg and temporary professor at University of Lübeck. He is an active volunteer of IEEE and was IEEE Region 8 conference coordination subcommittee chair in 2013-14. He was IEEE Austria Section Chair in 2010-12. His research interests include Embedded Systems, Modelling & Simulation, Power Estimation. Amit Kumar, PhD is a passionate Forensic Scientist, Entrepreneur, Engineer, Bioinformatician and an IEEE Volunteer. In 2005 he founded first Private DNA Testing Company BioAxis DNA Research Centre (P) Ltd in Hyderabad, India with an US Collaborator. He has vast experience of training 1000+ Crime investigation officers and helped 750+ Criminal and non-criminal cases to reach justice by offering analytical services in his laboratory. Amit was member of IEEE Strategy Development and Environmental Assessment committee (SDEA) of IEEEMGA. He is senior member of IEEE and has been a very active IEEE Volunteer at Section, Council, Region, Technical Societies of Computational Intelligence and Engineering in Medicine and Biology and at IEEE MGA levels in several capacities. He has driven number of IEEE Conferences, Conference leadership programs, Entrepreneurship development workshops, Innovation and Internship related events. Currently He is also a visiting Professor at SJB Research Foundation and Vice Chairman of IEEE India Council and IEEE Hyderabad Section. Balasubramanian Raman Professor in the Department of Computer Science and Engineering at Indian Institute of Technology Roorkee from 2013. He has obtained MSc degree in Mathematics from Madras Christian College (University of Madras) in 1996 and PhD from Indian Institute of Technology Madras in 2001. He was a post-doctoral fellow at University of Missouri Columbia, USA in 2001–2002 and a post-doctoral associate at Rutgers, the State University of New Jersey, USA in 2002–2003. He joined Department of Mathematics at Indian Institute of Technology Roorkee as Lecturer in 2004 and became Assistant Professor in 2006 and Associate Professor in 2012. He was a Visiting Professor and a member of Computer Vision and Sensing Systems at University of Windsor, Canada during May August 2009. So far he has published more than 190 papers in reputed Journals and Conferences. His area of research includes vision geometry, digital watermarking using mathematical transformations, image fusion, biometrics and secure image transmission over wireless channel, content based image retrieval and hyperspectral imaging.

Chapter 1

A Novel Design and Implementation of Pipeline Inspection System R. M. Rajesh, B. Rakesh, P. Rohan, C. K. Santosh, Suhas Shirol and S. Ramakrishna

1 Introduction In this modern world, robotics stream is one of the emerging engineering fields of the current era. In order to eliminate the human intervention in work and also to act in remote environment, the robots are designed. Usually, the robots are more commonly used today and robot is no longer substantially used by the large production industries. In order to improve the inspection efficiency pipe inspection system should be introduced by reducing the time and manpower in the whole inspection process [1–4]. Generally industrial plants require improving security and efficiency for such types of introduction of pipe inspection technique should be included. Operations such as inspection, maintenance and cleaning are expensive, thus the application of the robotics one of the most impressive solutions. The pipelines are the vital tools for the transportation of all kinds of liquids such as gas, water and fuel oils. Different industries equipped large pipelines to fulfill the requirements of the work. Pipelines R. M. Rajesh · B. Rakesh · P. Rohan · C. K. Santosh · S. Shirol (B) · S. Ramakrishna Department of Electronics & Communication Engineering, KLE Technological University, Vidyanagar, Hubbali, India e-mail: [email protected] R. M. Rajesh e-mail: [email protected] B. Rakesh e-mail: [email protected] P. Rohan e-mail: [email protected] C. K. Santosh e-mail: [email protected] S. Ramakrishna e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_1

1

2

R. M. Rajesh et al.

connect different parts and helps in transporting various kinds of things such as gas, water, fuel oils etc. [5–8]. The maintenance of these pipelines is a very tedious job and involves more human labor. The main motto of developing robots is to reduce human interaction with the work. Hence, the growth of robots is tremendous in this technologically advanced era. Therefore, environmental factors effects the pipeline in the course of the period and its more difficult to detect leakage, corrosion and wearout of the pipe due to external environmental factors. Robots thus play a very vital rule in inspecting the pipelines considered as one of the good alternatives [9–11].

2 Related Work In order to achieve inspection in pipelines, many research and projects were undertaken. Many papers are also submitted in this approach. There are different methods to achieve inspection in the pipeline using various microcontrollers. But the results of each action are not definite. That is, outcomes are partly random and partly under the control of the pipeline inspection robot. Pipe inspection robots employ a number of approaches to provide the desired information. The different methods employed depend on the sensors used by the robot and the parameter being measured. There are different approaches to achieve the inspection of the pipeline. By referring to the techniques cited above the proposed work focuses on the algorithm takes care of the entire environment such as battery condition, temperature, and every sensor before starting the motion [1–3, 5–8, 12–14]. The proposed technique calculates and monitors the gas, temperature and distance of crack form the beginning of the length of the pipe and continuously uploads these values to the cloud. Thus these values are continuously uploaded to the cloud when the robot moves inside the pipe [3, 4, 9–11, 15]. In this project the robot is designed and moved to move within the pipelines to monitor the cracks and damages which are present inside. Since the viscosity inside the pipeline is more, the robot is designed in such a way that it holds the pipe through caterpillar wheels, which provides a better gripping action. The main area of application is where in industries that operate pipeline networks (water, gas, oil and its derivatives, or any other fluid) spend large sums of money on early failure detection every year. Along with this, it is also used in conventional power plant, refineries, chemical and petrochemical plants, domestic water supply pipelines, natural gas pipelines and so on. So this type of system may help lots of industries at a very low cost.

3 Proposed Methodology In this paper the robot is designed to move inside the pipeline to monitor the cracks and damages which are present inside. Since the viscosity inside the pipeline is more, the robot is designed in such a way that it holds the pipe through caterpillar

1 A Novel Design and Implementation …

3

wheels, which provides a better gripping action. The proposed technique calculates and monitors the gas, temperature and distance of crack from the beginning of the length of the pipe and continuously uploads these values to the cloud. Thus, these values are continuously uploaded to the cloud when the robot moves inside the pipe. The design of the pipeline inspection robot needs the robot anatomy and sensors to inspect the factors inside the pipe. The robot anatomy usually includes the chassis, wheels, and motors to drive it. There are varieties of types of sensors available to detect and inspect the factors like major crack and object obstacle. One should select the sensors which are robust, economical, and more sensible. Based on Natural gas.org list, there are different types of pipelines based on the length and diameter of the pipes used. And they are interstate pipelines (24–36 inches) diameter, main pipelines (16–48 inches) diameter, and lateral pipelines (6–16 inches) diameter. The proposed prototype of the robot here is used for lateral pipelines 6–16 inches diameter pipes.

3.1 Proposed Design Figure 1 shows the block diagram of the pipeline inspection system. Here, infrared (IR) sensors are used to detect crack and obstacle inside the pipeline. NodeMCU is the most suitable controller which provides a simple perfect programming environment for beginners. It is a very flexible controller used by advanced users. Motor drivers are used to drive the motors. It includes L293D and 4-channel relay for driving the motors. There are four motors for wheels and one motor which consists IR sensor

Fig. 1 Functional block diagram of the pipeline inspection system

4

R. M. Rajesh et al.

for the detection of crack. The crack is detected by the robot by incorporating the abovementioned components. The statuses of the crack, temperature, and gas values are continuously uploaded to the cloud.

3.2 Mathematical Model The time taken by the waves from transmitter to receiver with respect to the velocity of ultrasonic waves is used to measure the distance of the object present ahead. So here we are using an ultrasonic sensor for calculating the approximate distance whenever a crack is detected by the IR sensor. And the formula to calculate the distance of crack with help of ultrasonic sensor is discussed below in Eq. 1. The formula to calculate the distance using Ultrasonic Sensor data is given as Distance =

T×S 2

(1)

where S = speed of sound (341 m); T = reflection time.

4 Hardware Design The hardware component of the developed pipeline inspection robot has infrared (IR) sensors for identifying major cracks. Other associated components are microcontroller, motor divers, DC motors, ultrasonic sensor, smoke sensor, temperature sensor, and Bluetooth.

4.1 Infrared Sensor (IR) IR sensor is used to detect crack inside the pipeline. It is a general-purpose proximity sensor. The module consists of IR emitter and receiver pair. Whenever the light falls on an object, the light gets reflected back to the sensor, which in turn switch on the sensor. The object detection by the IR sensor is shown in Fig. 2.

4.2 Motor Driver (L293D) and DC Motors L293D and 4-channel relay are used as motor drivers. These motor drivers allow the DC motor to drive/run in either direction, which is a very important movement of the bot in either directions. Normal DC motors are used for movement of the bot.

1 A Novel Design and Implementation …

5

Fig. 2 Crack detection by IR sensor

Three DC motors have been used for movement and one more motor attached with IR rotates continuously and stops whenever crack is detected. 12 V and 100 rpm DC motors were chosen for the design.

4.3 Ultrasonic Sensor Ultrasonic sensor has an operating voltage of 5 V, and its theoretical distance of measurements are from 2 to 450 cm, whereas its practical distance measurements are from 2 to 80 cm with an accuracy of 3 mm and covers an angle less than 15 and it operates under 15 m ampere.

4.4 Smoke Sensor (MQ-6) Steel exoskeleton is present in the gas sensor as a sensing element. The gas sensor module consists of a steel exoskeleton in which the sensing element is placed. This element is particular experience to current through connecting leads, and this current is known as the heating current through it, the gases coming to the sensing element

6

R. M. Rajesh et al.

get ionized and are engross, which changes the resistance of sensing element through which leads to the changing the value of the current going out of it.

4.5 Temperature Sensor (DHT) The DHT sensor measures both temperature and humidity so either of which can be chosen or both. The voltage is read across the diode working for the sensors. The temperature increase and voltage drop across the terminals of transistor, i.e., base and emitter which results are captured by the sensor. If the voltage difference is amplified, the analog signal generated by the device is directly proportional to the temperature.

4.6 Bluetooth (HC-05) Bluetooth module (HC-05) which is designed for wireless communication, can be used in a master or slave configuration. It uses serial communication to communicate with devices. It communicates with a microcontroller using the serial port.

4.7 Architecture of the Bot The bot system consists of the main chassis, with 6 hanging notches which are used for three motors with 3 wheels and are separated by using standing notches. The width of the bot is nearly 150 mm and the length of the bot is 194 mm. The flexible mechanism is achieved by changing the wheels and grip of the wheels. The whole design of the bot is chosen for a particular length and diameter of the pipe by referring to the standard pipeline parameters chart as given by Natural gas.org list 2018.

5 Software Implementation The whole system is initiated by a serial communication and sensor initialization. In this system, we use the NodeMCU microcontroller. It controls the speed by producing a PWM signal. It controls all the DC motors through serial input. Figure 3 shows the flowchart of the pipeline inspection system. Here, Infrared (IR) sensors are used to detect crack and obstacle inside the pipeline. NodeMCU is the most suitable controller which provides a simple perfect programming environment for beginners. It is a very flexible controller used by advanced users. Motor drivers are used to drive the motors. It includes L293D and 4 channel relay for driving the motors. There are

1 A Novel Design and Implementation …

7

Fig. 3 Flowchart of the pipeline inspection system

four motors for wheels and one motor which consi (Placeholder1)st of IR sensor for detection of crack. The crack is detected by the robot by incorporating the abovementioned components. The statuses of the crack, temperature, and gas values are continuously uploaded to the cloud.

5.1 Experimental Results and Discussions Pipeline inspection system is designed to detect the crack found on the pipe. The bot travels inside the pipe for detecting the crack and meanwhile the system also uploads all the sensors data to the cloud. Parameters considered during implementation are represented in Table 1. Table 2 lists the different sensors data uploaded to the cloud. The PWM value ranges from 0 to 1, i.e., 0 to 255 is mapped as 0 to 1. The proposed algorithm is also tested on the 10 inch pipe which is shown below. The shown pipe has a length of 20 ft and a diameter of 10 inch (Fig. 4).

8

R. M. Rajesh et al.

Table 1 Parametres considered during implementations Sl. no

Parameters

Values

1.

Length of pipe

20 ft

2.

Diameter of the pipe

10 inch

3.

Diameter of wheel

2.5 cm

4.

PWM values

1 (high speed)

5.

Gas concentration range

100—10000 ppm

Table 2 Different sensors data Temperature (°C)

Humidity

Gas concentration

Distance (cm)

Crack status

37

24

435

4

0

32

26

500

6

0

35

28

439

5

1

36

28

532

9

0

37

26

513

7

0

37

30

620

10

0

36

32

530

13

0

Fig. 4 Pipe

On the basis of the battery level, the bot is able to cover the length of the pipe with all necessary calibrations. It stops for a while whenever there is crack detected in the pipe. Table 2 lists the different sensors data uploaded to the cloud.

1 A Novel Design and Implementation …

9

6 Conclusion The position of the crack, temperature, and gas concentration was continuously updated by uploading sensors data to the cloud using client–server architecture the same was monitored. The designed system successfully inspected the pipe damages and was made needful for the transportation of the gases and liquids without any fault. The motion of the system inside the pipe was provided with the help of Bluetoothcontrolled actuation system to move in either direction. In the future, this idea can be extended to domestic water supply pipelines throughout the world, refineries, chemical and petrochemical plants, conventional power plant, natural gas, pipelines etc.

References 1. Wilfong GT (1990) Motion planning for an autonomous vehicle. In: Cox IJ, Wilfong GT (eds) Autonomous robot vehicles. Springer, New York, NY 2. Ryew S, Baik SH, Ryu SW, Jung KM, Roh SG, Choi HR (2000) In-pipe inspection robot system with active steering mechanism. In: Proceedings of intelligent robots and systems 3. Choi HR, Ryew SM (2002) Robotic system with active steering capability for internal inspection of urban gas pipelines. https://doi.org/10.1016/S0957-4158(01)00022-8 4. Xu X, Yan G-Z, Yan B (2002) A new style pipeline inspection robot system 5. Suzumori K, Hori K, Miyagawa T (1998) A direct-drive pneumatic stepping motor for robots designs for pipe-inspection microrobots and for human-care robot 6. Suzumori K, Wakimoto S, Takata M (2003) A miniature inspection robot negotiating pipes of widely varying diameter, robotics and automation. In: 2003 IEEE international conference on robotics and automation (Cat. No. 03CH37422) 7. Barnea DI, Silverman HF (1972) A class of algorithms for fast digital registration. IEEE Trans Comput C-21(2) 8. Tanimoto S, Parlidis T (1975) A hierarchical data structure for picture processing. https://doi. org/10.1016/S0146-664X(75)80003-7 9. Kwon Y-S, Lee B, Whang I-C, Yi B-J (2010) A pipeline inspection robot with a linkage type mechanical clutch. In: Intelligent robots and systems (IROS) 10. Thornton SM, Lewis FE, Zhang V, Kochenderfer MJ, Christian Gerdes J (2018) Value sensitive design for autonomous vehicle motion planning. In: 2018 IEEE intelligent vehicles symposium (IV), Changshu, pp 1157–1162. https://doi.org/10.1109/IVS.2018.8500441 11. Harikiran GC, Menasinkai K, Shirol S (2016) Smart security solution for women based on internet of things (IOT). In: International conference on electrical, electronics, and optimization techniques (ICEEOT) 12. Muramatsu M, Namiki N, Koyama R, Suga Y (2000) Autonomous mobile robot in pipe for piping operations. In: 2000 IEEE/RSJ international conference on intelligent robots and systems (IROS 2000) (Cat. No. 00CH37113) 13. Tavakoli M, Marques L, de Almeida AT (2010) Development of an industrial pipeline inspection robot, industrial robot. Int J 37(3):309–322. https://doi.org/10.1108/01439911011037721 14. Yagi Y, Kawato S, Tsuji S (1991) Collision avoidance using omnidirectional image sensor. In: IEEE international conference on robotics and automation 15. Suzuki M, Yukawa T, Satoh Y, Okano H (2006) Mechanisms of autonomous pipe-surface inspection robot with magnetic elements. In: IEEE international conference on systems, man and cybernetics

Chapter 2

An In-Depth Survey on Massive Open Online Course Outcome Prediction Gaurav, Gagandeep Kaur and Poorva Agrawal

1 Introduction The MOOC concept emerged in 2008 in the OER (Open Educational Resources) movement. These are web-based courses, which have interactive involvement and open access. MOOCs courses are loaded with all the materials that are required and generally utilized extensively in conventional education systems like study materials, videos, problem sets, and lectures. MOOCs also provide a forum for students, which help students, professors, etc. to establish a community between them. In the distance education system, MOOCs are the latest addition. The connective theory influenced many courses of MOOCs, this theory highlights that the main knowledge arises by the network of connections. In 2012, the attention of venture capitalist and media buzz toward the MOOCs industry increased. A large number of MOOCs providers that are affiliated to top colleges and universities increased like Udacity, Coursera, and edX. Choosing MOOCs has a lot of advantages such as zero tuition fees, open contact or communication with top-level professors, helps in monitoring the success ratio of students through data collected from computer programs, increased access to higher education, a good alternative to formal education, sustainable development goals, good online collaboration, etc. The vital disadvantage of MOOCs course is their low completion rate. Some of the major challenges for MOOCs include compulsion of digital education, self-goal setting by participants, transition and language barriers, commitment of students for these courses, etc. Gaurav (B) · G. Kaur · P. Agrawal Symbiosis Institute of Technology (SIU), Pune, India e-mail: [email protected] G. Kaur e-mail: [email protected] P. Agrawal e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_2

11

12

Gaurav et al.

Despite having so many benefits and flexibilities of open online courses, the completion rate is low. So a system is required that can predict the learning completion rate of the students based on data available. To achieve this, various techniques that can be used include k-means classification, linear regression, hidden Markov model and fuzzy classification.

1.1 K-Means Classification K-means is a method of distinguishing or classifying components with different features and values into groups, which can be used for the selection process of massive online courses. The technique used for grouping is to first minimize the grand total of squared distance among given components and the relative centroids. A centroid is the center abundance of a geometric object of consistent density. It is one of the uncomplicated and powerful unsupervised learning algorithms used to resolve the clustering issues. The beginning partitioning or clustering tasks can be finished in distinct ways, i.e., by dynamically, randomly choosing from the higher or lower boundary [1]. The k-means algorithm has a weak relationship with the k-nearest neighbor classifier, a famous machine learning technique for categorization. This algorithm is easily utilized for categorization of the big dataset. It successfully executes in distinct segments like market, computer vision, agriculture, geo-statistics, and astronomy. It is generally used for data preprocessing steps in algorithms. K-means clustering is powered with the Euclidean distance estimation as noted in [2] and as mentioned in Eq. 1. D=



(x1 − x2)2 + (y1 − y2)2

(1)

1.2 Linear Regression Linear regression method is used to identify the regression of the course selection process. This method shows the relationship between two variables. It is a type of statistical analysis and a very important tool for analysis [3]. The linear regression method uses the statistical computation to organize a trend line in a data point set. Here the trend line means anything like company financial performance or a total number of people diagnosed with blood cancer. It defines the relationship between the dependent and independent variable. There are various techniques used to compute linear regression most generally used technique is a least square technique, that calculates unknown variable present in the given data, that is further converted into vertical distance sum amongst trend

2 An In-Depth Survey on Massive Open Online Course …

13

Fig. 1 Hidden Markov model

line and data points. The computation for linear regression is quite difficult. A linear regression technique also has a vital role in the area of AI. The Linear regression equation is depicted in Eq. 2 [4]. Y = Xβ

(2)

where Y is objective variable; X is explanation variable; β is subjective variable.

1.3 Hidden Markov Model To evaluate the process of course selection, Hidden Markov Model (HMM) can be used for in-depth course selection procedure. HMM is the easiest way for representing sequential data. It is a type of statistical model, i.e., a discrepancy on the Markov chain. In HMM, the states are hidden or unobservable in comparison to Markov chains, which have visible states to the observer. The HMM uses probabilities which depend upon present and past state to forecast the variable’s future state [5]. The distinction between HMM and MM (Markov Model) is that—in HMM state is unobservable to an observer whereas in MM the state is observable, but the output is visible in both. HMM is utilized in various machine learning and data mining related tasks. It is especially known for its use in temporal pattern recognition and reinforcement learning. Hidden Markov model chain is shown in Fig. 1 [6].

1.4 Fuzzy Classification Fuzzy classification can be used to classify the online course selection procedures with more clarity. It is the method of collecting or grouping components having the same features into a fuzzy set. The fuzzy set associate function is described as the

14

Gaurav et al.

Fig. 2 Sample fuzzy rules

fuzzy propositional function truth value. A fuzzy propositional function means that once any variable is allocated to it, the declaration or expression is converted into a fuzzy proposition. There are many techniques for fuzzy classification, some of them are k-nearest neighbors, Fuzzy C-means. These techniques depend on fuzzy pattern matching and fuzzy rules. Some sample rules of fuzzy classification are shown in Fig. 2 [7].

2 Literature Review Qiu et al. [8] presented the learning behaviors of students studying patterns in Massive Open Online Courses (MOOCs). The authors conducted a deep analysis of students learning patterns in videos, assignments, and course forums and based on the analysis, they proposed Latent Dynamic Factor Graph (LadFG) which helped in incorporating students learning patterns in a unified framework. It apprehends the homophily correlations and dynamic information among students and forecast students studying behavior into a latent steady space. They considered two tasks for prediction first is assignment grade and second is certificate earner. The first task finds the student’s assignments execution and the second task searches out the scholar who got a certificate after finishing the course. The results of these two tasks prove the success and sustainability of the proposed model. Ramesh et al. [9] proposed a structure that utilizes different student’s behavioral pattern to differentiate between student’s online engagement forms, i.e., passive or active. For example, some students used MOOC to see lectures and solve quizzes without any interaction with the other community member while other students post their views in the forum and also ask and solve the question–answers. The author utilizes these involvement patterns to search out student survival and cause of their behavioral changes with respect to time (see Fig. 3). The authors used real data collected from different online courses and proved that their proposed structure can successfully predict the student’s behavioral patterns and course completion.

2 An In-Depth Survey on Massive Open Online Course …

15

Fig. 3 User activity in MOOC [9]

They also revealed that their latent model also predicts survival of student in the program/course early. The authors also carried out a quantitative examination of the student synergy with the MOOC and recognized the activities of students that are good symbols for their survival at different period of online courses. Kloft et al. [10] proposed a machine learning structure to predict the dropout rate of students in MOOC exclusively from clickstream and forum data. The Machine Learning (ML) algorithm—Support Vector Machine (SVM) recorded the student data weekly and helped in searching out the behavioral change in students over time. This prediction model easily predicts the students that drop out significantly using their current and past data history in comparison to another baseline technique where SVM classifier attributes are organized based on the set labels of the classification. The authors also realized that prediction of participants that drop out is better at the course end in comparison to the course beginning where they receive weak signals. Mattingly et al. [11] presented the important analytics inventiveness to date and their involvement in establishing a culture, i.e., changing from information and reporting to inform and validate the action. They also examined the academic analytics and learning and their applicability in the distance education field in undergraduate and graduate courses as it affected faculty, students and educational institutions. The main priority is to analyze the collection, measurement, analysis, and data reporting as a prognosticator of the success of student and program curriculum and departmental process drivers. Studying and academic analytics in college studies are utilized to identify the success of students by noticing students study patterns and role of academic courses and educational institution in their success. The authors also examined what is being done to support students and if it is effective; if not then what steps do the educators take to improve it. They also examined the use of data to make new metrics and notify a steady improvement cycle. Alias et al. [12] presented how students’ browsing behavior captured in the log file can be clustered using Self-Organizing Maps (SOM) techniques. The authors analyzed the browsing behavior obtained from an e-learning environment for one semester. SOM gave great visualization in clusters for unsupervised learning. Also, the authors were able to conduct experiments in minimal amount of time using data mining techniques. The result indicated the pattern of a student’s behavior. The clusters formed showed that browsing is active in certain attributes in comparison between other clusters of students. Students in cluster 2 are eager to use the learning material prepared in the learning environment from the start of the studying process until the final weeks that consists of formal teaching and studying in class. Using

16

Gaurav et al.

SOM, the authors were able to visualize the self-organization groups of the students based on their geographic location, for example, it was found from the data of students pursuing their masters in USA that students from Andhra Pradesh, India were more in number as compared to students from other Indian states. Wen et al. [13] presented a study regarding the measurement of MOOC student engagement based on linguistic analysis on forum posts. Based on how much personal interpretation are contained in post, the authors tried to measure the cognitive engagement using the abstraction dictionary. The results of the studies proved that if there is increase in the learner motivation, then the participants’ personal view on their post, the risk of student dropout becomes lower for the course. Kizilcec et al. [14] applied an approach for classifying student’s involvement with MOOCs as easy, informative, and scalable. They defined the trajectories of learners as longitudinal patterns of involvement and classification are dependent on assessments and video lectures which are two main features of MOOCs course. In the analysis of MOOCs computer science course, the classifier found prototypical trajectories of involvement. It was observed that remarkable learners affects the trajectories who were continuously engaged learners through the course without receiving assessments. The prototypical trajectories are besides very useful framework to search out the differentiation of learner involvement among distinct program’s framework or instructional proceed. Comparison of learners from different trajectories and programs across demographics, video access, and forum participation is also reported by the authors. The result of these comparisons provides information about future interventions, design, and research for MOOCs. He et al. [15] explored the exact and early recognition of students who are expected of incompletion of their courses. The authors made a predictive model weekly, which depends on multiple course offerings. The authors predicted student at risk of failure, and predicted it with accuracy. To be effective, the expected prediction probabilities should be well calibrated and smoothed across the weeks. Depending totally on logistic regression, they proposed to transfer studying algorithms to trade-off accuracy and smoothness to limit the difference chances of failure between consecutive weeks. Experimental outcomes on two services of Coursera MOOC proved the success of their algorithms. Chaplot et al. [16] analyzed the significance of sentiment analysis on MOOCs forum posts in finding the number of students dropping out and neural network’s effectiveness in modeling given problem. The authors used the forum posts database and clickstream log from Coursera MOOC. The database contains 3 million students click logs and around 5000 forum posts. They used the lexicon depend on approach to remove the sentiment from forum posts. They proved that their proposed algorithm based on artificial neural network knocks the state-of-the-art algorithm in terms of Cohen’s kappa. The proposed algorithm can also be used in smart schools using the digital technique for interactions and learning. Romero et al. [17] presented an assessment of the state of the art to admire to EDM and surveys the maximum related work in this field so far. Every study has been categorized, not only on the basis of data type and data mining methods used but also by means of the educational task. EDM has been added as an upcoming

2 An In-Depth Survey on Massive Open Online Course …

17

study and research area related to numerous well set up fields of research along with adaptive hypermedia, e-learning, web mining, data mining, intelligent tutoring systems, etc. The researchers have shown faster growth of EDM as contemplated in the bigger number of articles published each year in Journals and International conferences, and increasing numbers of effective tools developed for executing data mining algorithms in academic environments. Ong et al. [18] proposed a new construct to improve engineer understanding of e-learning. They also demonstrated that the effect of computer self-efficacy on behavioral purpose to select e-learning. They examined the materiality of the Technology Acceptance Model (TAM) in analyzing the engineers’ decisions to get e-learning and mark an efficient technology management problem. Sample for survey was taken from 140 engineers belonging to 6 international companies. The outcome of the survey strongly favored the continued use of TAM in finding engineers intention to utilize e-learning. Genuer et al. [19] proposed the use of random forest for selection of variables. This algorithm is mainly used for regression and classification problems and it depends on model aggregation plan. The algorithm was introduced in 2001 by Leo Breiman to investigate two important issues of variable selection. The first issue searches out the variables that are important for interpretation and second issue is quite restrictive and is used to plan a good prediction model. There is a twofold contribution— the perception about the variable importance index behavior mainly depends upon the random forest approach and proposes a blueprint which involves a ranking of explanatory variables. Dalipi et al. [20] presented a comprehensive analysis of the research on machine learning algorithm—Support Vector Machine (SVM). SVM is used for predicting, solving, and explaining the reason behind students dropping out in MOOCs. The review highlighted both the factors associated with student and MOOC that have shown that a greater number of students dropping out. They also identified the difficult challenges related to the prediction of students dropping out in MOOC and gave recommendations to researchers for solving these problems in time using the machine learning technique.

3 Conclusion This paper analyzes the past research works for prediction of the outcome of Massive Open Online Courses and thereby realized that most of the research papers have still left large gaps for identification of the outcome of the massive online courses. The work discussed is the latest in this field and will be useful to understand the current state of the art in the area of prediction of outcome of online courses. The paper briefly discusses different algorithms applied to predict the outcome of online courses. The further paper summarizes the benefits and future directions to improve existing techniques (see Table 1). Future directions in this field are as follows:

18

Gaurav et al.

Table 1 Research areas and future directions Research area related

Solutions

Advantages

Future directions

Student behavior

LadFG [8] to incorporate students data like students demographic data, forum activities data and learning behavior

Model and predict the students learning behavior in MOOC’s

Incorporation of human feedback into the system

Student engagement

PSL—probabilistic soft logic [9]

Predicts the chances of student survival early in the course

Additional factors such as tenacity, motivation, and self-regulation can be used for the prediction

Student behavior and engagement

Machine learning method based on support vector machines (SVM). Clickstream data to monitor student behavior—i.e., the activity of student using the data of current and past week [10]

The system is very sensitive to small changes in behaviour overtime

Use of nonscalar features like country, browser, etc. can help to develop a more stable system

Data examining

Learning and academic analytics [11]. Learning management systems (LMSs), content management systems (CMSs), and learning content management systems (LCMSs) make this process more streamlined

Analyses data pertaining to learning analytics with the help of machine learning

Meta data of experience, motivation, and learning can be used to design a more improved system

Students browsing behavior

Self-organizing map (SOM) clustering [12]

SOM provided great visualization in clustering

SOM clustering by using GPULib of education data

Linguistic reflection of student engagement

Logistic regression, lib linear, ridge (L2) regularization [13]

Efficient prediction of the motivation of the students through analysis of their language

Inclusion of social interaction in forum “such as who talks to whom” to understand properly about social learning in MOOC (continued)

2 An In-Depth Survey on Massive Open Online Course …

19

Table 1 (continued) Research area related

Solutions

Advantages

Future directions

Clustering the data of student

Cluster based on “watching course video” and “assessments” [14]

Presented a framework for MOOC about student engagement and dis-engagement. Trajectories framework are important for comparison of learner engagement

Development of simple cognitive tools. Addressing learner prior knowledge. Ubiquity of media multitasking

Identification about “students at risk” status

LR-SEQ-sequentially smoothed logistic regression and LR-SIM—simultaneously smoothed logistic regression [15]

Logistic regression based technique for identification of students at-risk earlier

Efficiency can be increased with the participation of the course instructors

Predicting student attrition

Students attrition using sentiment analysis and artificial neural networks [16]

Efficient weekly data analysis using neural networks

Find students who are likely to drop out and take necessary steps to prevent

Analyze data

Educational data mining [17]

The technique can analyze large datasets to understand student behavior

More unified and collaborative studies can be developed instead of the current plethora of multiple individual proposals and lines

Acceptance of E-learning

Technology acceptance model (TAM) [18]

Direct effects on e-learning based on factors like-computer self-efficacy, perceived usefulness, perceived ease of use and perceived credibility

Additional variables like-subjective norm, gender, internet experience, level of education can be used to improve the efficiency of the system

Variable selections

Random forest [19]

Solved the issue of variable selection

Future work can be focused on overcoming drawbacks like-complexity, time-consuming (continued)

20

Gaurav et al.

Table 1 (continued) Research area related

Solutions

Advantages

Future directions

Analysis on factors related to student and MOOC

Analysis based on machine learning architectures like-logistic regression, deep neural network, support vector machine, hidden markov model, recurrent neural network, natural language processing technique, decision tree [20]

Highlights student related factors and MOOC related factors [20] like lack of motivation and time, insufficient background knowledge and skills, course design, isolation and lack of interactivity, hidden costs

More data needs to be collected for the development and analysis of deep learning algorithms for reducing the MOOC student dropout rate

(i) Incorporation of human feedback and course instructor into the system can help to improve the accuracy and increase the participation of student in course. (ii) In the future system, additional factors such as tenacity, motivation, selfregulation, social interaction on forums, subjective norm, gender, internet experience, level of education, and nonscalar features can be used for the prediction.

References 1. K-Means Clustering for Classification. Towards Data Science, 2 Aug 2017. https:// towardsdatascience.com/kmeans-clustering-for-classification-74b992405d0a 2. Sayyed S, Deolekar R (2017) A heuristic approach to detect novelty data using improved level set methods. In: 2017 international conference on computing methodologies and communication (ICCMC). IEEE, pp 751–756 3. Statistics Solutions (2013) Linear regression. What is linear regression. www. statisticssolutions.com/what-is-linear-regression/ 4. Hirose H, Soejima Y, Hirose K (2012) NNRMLR: a combined method of nearest neighbor regression and multiple linear regression. In: 2012 IIAI international conference on advanced applied informatics. IEEE, pp 351–356 5. Hidden Markov Model. Medium, 31 Aug 2017. https://medium.com/@kangeugine/hiddenmarkov-model-7681c22f5b9 6. Khiatani D, Ghose U (2017) Weather forecasting using hidden Markov model. In: 2017 international conference on computing and communication technologies for smart nation (IC3TSN). IEEE, pp 220–225 7. Amezcua J, Melin P, Castillo O (2015) A new classification method based on LVQ neural networks and fuzzy logic. In: 2015 annual conference of the North American Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th world conference on soft computing (WConSC). IEEE, pp 1–5 8. Qiu J et al (2016) Modeling and predicting learning behavior in MOOCs. In: Proceedings of ninth ACM international conference on web search data mining, pp 93–102

2 An In-Depth Survey on Massive Open Online Course …

21

9. Ramesh A, Goldwasser D, Huang B, Daum H, Getoor L (2014) Learning latent engagement patterns of students in online courses. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence learning, pp 1272–1278 10. Kloft M et al (2014) Predicting MOOC dropout over weeks using machine learning methods. Knowl Manag E-Learn 4(3):60–65 11. Mattingly KD, Rice MC, Berge ZL (2012) Learning analytics as a tool for closing the assessment loop in higher education. Knowl Manag E-Learn 4(3):236–247 12. Alias UF, Ahmad NB, Hasan S (2015) Student behavior analysis using self-organizing map clustering technique. ARPN J Eng Appl Sci 10(23):17987–17995 13. Wen M, Yang D, Ros CP, Rosé CP, Rose CP (2014) Linguistic reflections of student engagement in massive open online courses. In: Proceedings 8th international conference on weblogs social media, ICWSM 2014, pp 525–534 14. Kizilcec RF, Piech C, Schneider E (2013) Deconstructing disengagement: analyzing learner subpopulations in massive open online courses. In: Proceedings of LAK ’13, p 10 15. He J, Bailey J, Rubinstein BIP (2015) Identifying at-risk students in massive open online courses. In: Proceedings of 29th AAAI conference on artificial intelligence, pp 1749–1755 16. Chaplot DS, Rhim E, Kim J (2015) Predicting student attrition in MOOCs using sentiment analysis and neural networks. In: 17th international conference on artificial intelligence in education AIED-WS 2015, vol 1432, pp 7–12 17. Romero C, Ventura S (2010) Educational data mining: a review of the state of the art. IEEE Trans Syst Man Cybern Part C Appl Rev 40(6):601–618 18. Ong CS, Lai JY, Wang YS (2004) Factors affecting engineers’ acceptance of asynchronous e-learning systems in high-tech companies. Inf Manag 41(6):795–804 19. Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31(14):2225–2236 20. Dalipi F, Imran AS, Kastrati Z (2018) MOOC dropout prediction using machine learning techniques: review and research challenges. IEEE. 978-1-5386-2957-4/18

Chapter 3

Prediction of Cardiovascular Diseases Using HUI Miner D. V. S. S. Aditya and Anjali Mohapatra

1 Introduction Cardiovascular diseases (CVD) [1] are the important cause of deaths across the globe except Africa. The death rate was 25.8% with 12.3 million deaths in the year 1990 and the same is increased to 32.1% with 17.9 million in the year 2015 [1]. The developing world countries are witnessing more deaths because of CVD in comparison with the developed countries during the past three decades. Statistics shows that 80% of deaths in men and 75% of deaths in women are due to either Coronary artery disease or stroke. In the USA, 11% of populations have CVD in the age group between 20 and 40 years; 37% between 40 and 60 years; 71% of the population between 60 and 80 years. Coronary artery disease or stroke found mostly in adults around the age of 80 years in developed countries whereas it around 68 years in the developing countries [1]. In general, the sickness of CVD can be detected seven to ten years earlier in men when contrasted with women. The root cause varies based on the type of disease. High blood pressure, smoking, diabetes mellitus, absence of physical activity, obesity, high blood cholesterol, poor diet or irregular eating habits and excessive alcohol consumption etc. causes Coronary artery diseases and stroke. Hypertension is assessed to represent around 13% of CVD deaths, while tobacco represents 9%, diabetes 6%, absence of activity 6% and excessive weight 5%. Rheumatic coronary illness may result in untreated strep throat [1]. In the present day context, patients have to undergo several medical tests and test results are being analyzed and judged by doctors based on their education and clinical experience. At times, analysis or prediction of the disease may lead to a wrong D. V. S. S. Aditya (B) · A. Mohapatra IIIT Bhubaneswar, Bhubaneswar, Odisha, India e-mail: [email protected] A. Mohapatra e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_3

23

24

D. V. S. S. Aditya and A. Mohapatra

decision. Especially, developing countries like India with less literacy percentage, biased decisions play a major role. Medical statistics displaying serious problem related to CVD especially in the developing world. Many patients are visiting the hospitals for medical checkups periodically. During their visits to hospitals, a huge amount of data is being collected by the hospitals. Proper analysis of collected data is not being carried out because of the lack of skills among medical staff, human negligence or technical errors which costs the life of a person. In order to utilize the bulk volumes of collected data and to decrease the probability of errors, it has become necessary to introduce an advanced technological solution in the health care industry. This data is to be utilized in the form of data sets which contains frequent itemsets. These frequent itemsets do not consider the utility function and have got limitations. Utility mining has emerged as an important area to focus on limitations of frequent itemsets and takes care of the utilities of pattern in data sets which reflects the user goals.

2 Related Work A lot of research has been carried out in this area and still many investigations are going on but a few relevant works are presented here. Liu et al. [2] proposed an algorithm which determines the high utility patterns in a single phase without generating candidates. Yao et al. [3] reviewed and defined a unified utility function and presented a framework to incorporate multiple measures of utility in the mining process. Liu and Qu [4] proposed an algorithm which stores the information related to utility and heuristics to prune the search space using the lists of utility. Tseng et al. [5] developed algorithms and named them as Top-K Utility itemsets (TKU) and mining Top-K utility itemsets in one phase (TKO) and they indicated k as the desired number of high utility itemsets to be mined. Chu et al. [6] proposed high utility itemsets with negative item values for effective mining of high utility itemsets from large databases with consideration of negative item values. In this work, data collected during experimentation in the form of data sets are utilized to predict the risk of CVD based on the utility of activities performed by a person under test conditions. HUI miner is used to calculate the utility values of data sets and predicted that high risk is associated when the utility value is higher than the preset threshold value for each patient or person.

3 HUI Mining High utility itemset (HUI) mining process is an extended version of frequent pattern mining [7]. Frequent pattern mining algorithm aims to find the frequent patterns in the transactional database and frequent itemsets. The algorithms such as LCM, FP

3 Prediction of Cardiovascular Diseases Using HUI Miner

25

Growth, Apriori, Eclat, belongs to the frequent pattern mining category. These algorithms accept the transactional database input with minimum support threshold and present the group of items with minimum support threshold transactions. Frequent pattern mining algorithms have important limitations in the analysis of customer transactions [7]. The limitations are (a) an item may only appear once or zero times in a transaction and considers as same (b) all items are viewed as having same importance in terms of their utility weightage. Frequent pattern mining finds many patterns that may not be useful to analyze a particular business. For example, person purchases their daily needs like bread and milk indicate a frequent pattern and this pattern may not be interesting from the business perspective as it does not generate a good amount of profits. Furthermore, algorithms of frequent pattern mining overlook the rare patterns that generate huge profits such as people’s celebrations where a diversified menu of food items is served [7]. HUI mining algorithm is designed to overcome the limitations of the frequent pattern mining process. In this algorithm, a transactional database consists of transactions where purchase quantities are taken into consideration as well as the unit profit of each item. The high utility mining algorithm aims to find the group of items in a database that generates high profit when they are sold together and takes the minimum utility threshold value from the user to return all the itemsets generates minimum utility profit [7, 8]. HUI mining algorithm holds attention for two reasons: (i) from a practical point of view, a group of itemsets are to be identified that generate huge profit from customer transactional database than items that are purchased frequently. (ii) from a research point of view, HUI miner stimulates a lot of interest and thought. Frequent pattern mining algorithm originates from A priori property or Antimonotonicity property which is useful to prune the search space as an itemset is infrequent then all its supersets are also become infrequent and may be pruned. This explains that an itemset and all its supersets must have a lower or equal frequency or support whereas HUI mining algorithm do not have that kind of property, thus given an itemset, the utility of its supersets may be higher, lower or the same [7, 8]. HUI mining algorithm, finds high utility itemsets with utility information that produce a huge profit from transactional database in a store and has got lot of importance in business applications. Transactional database of any business consist of utility information that means items with quantities and their unit prices and HUI miner works efficiently in identifying a group of itemsets that produce a huge profit. A high utility mining algorithm accepts a transactional database with utility information and a user given minimum utility threshold value and finds high utility itemsets that are having utility equal or more than the minimum threshold value. HUI miner algorithm can be applied effectively in business applications as well as in other domains also where a good amount of data is available such as health care.

26

D. V. S. S. Aditya and A. Mohapatra

4 Methodology HUI mining algorithm is applied to the data collected through experimentation which is new in the domain of health care. The data contains numerical values of different sensors such as accelerometer, the gyroscope of different patients in different instances. The methodology implemented in three steps: (a) Acquisition of data set (b) Processing of data set (c) Patient alert system through email and is presented in Fig. 1.

4.1 Acquisition of Data Set Data set is considered from the experiments conducted by the University of California, Irvine Repositories [9–11] with 30 human subjects of different ages. Each Fig. 1 Flowchart of the proposed methodology

3 Prediction of Cardiovascular Diseases Using HUI Miner

27

Fig. 2 Data acquisition using smartphone

human subject performs the activity such as WALKING, WALKING-UPSTAIRS with a wearable device as shown in Fig. 2. Samsung Galaxy SII smartphone was used as a wearable device and the tri-axial linear acceleration and tri-axial angular velocity data acquired at a frequency of 50 Hz with its embedded accelerometer along with the ECG apparatus. The signals of accelerometer and ECG leads were preprocessed by applying noise filters and samples were considered from fixed-width windows of 2.56 s and 50% overlap. Each fixed-width window carries 128 readings. Accelerometer’s signal has got the combined gravitational and body motion components that were detached using Butterworth low-pass filter and its output is gravity and body acceleration. A filter with 0.3 Hz cutoff was used to have low-frequency component of gravitational force with its negligible effect. Variables were calculated from each window in time and frequency domains then a vector of features was obtained [9–11].

4.2 Processing of Data Set Data set with utility information taken in the form of.csv file and supplied as input to the HUI mining algorithm. The algorithm processes data and calculates the utility for patients in different instances. When activity utility is greater than the pre-set threshold value then the system alerts the patient through email about this abnormality. The processing of the data set is done in two ways: (i) Patient wise (ii) Equipment wise.

28

D. V. S. S. Aditya and A. Mohapatra

At first, patient wise data is processed by considering the tri-axial information from accelerometer and ECG while the patient performs different activities. In this case, each file is dedicated to each patient, which means the sensor data in.csv file belongs to one particular patient. But in real-life scenario, there is a possibility that medical staff or hospitals may show some biasness toward the patients based on their income levels, the complexity of the problem, the cost of the treatment, etc. To overcome such kind of real time problems, data set is considered equipment wise. In this, each file of the data set contains the data of different patients measured by one sensor and each row of data set belongs to each patient’s data taken from one measuring instrument while that patient performs activity. Summation of each row represents the activity utility of each patient and higher utility found than the pre-set threshold then the abnormality is detected with that patient. At the same time, an email warning will be sent to that patient.

4.3 Patient Alert System Through Email Aim of this is to keep a patient in attention mode or in a warning state who found with abnormality about ill-health. Simple Mail Transfer Protocol (SMTP) [12] is used for this purpose to route and send an email to a patient. The login credentials of each patient should be provided in the database so as to enable the system to send alert to the corresponding patient for whom the abnormality is detected.

5 Results HUI mining algorithm is implemented using Python language and processed on Intel Core I5 processor. Code was run by considering the data set taken equipment wise and the data of different patients grouped together. Further, the age of the patient is also taken into consideration as it makes all the difference. For example, the heart rate of a boy aged 22 years performing the cycling activity maybe 200 which may be normal but for a boy aged 70 years is not normal. Therefore, Mining algorithm was run with these pre-set values of age as 50 years and threshold values are set as 12942 for patient wise, whereas equipment wise threshold values for accelerometer and ECG as 12300 and 642, respectively. The output of the algorithm considering data set patient wise is presented in Fig. 3. It is observed that two patients got the abnormality out of the data set given as input to algorithm. Output of the algorithm considering data set equipment wise for accelerometer is shown in Fig. 4 and it is observed that three patients found with the abnormality. Further, the output considering data set equipment wise for ECG is exhibited in Fig. 5 and it is observed that three patients are detected with abnormality. The detected abnormalities in all the cases are indicating that the utility value is greater than the given threshold values. The email alert will be

3 Prediction of Cardiovascular Diseases Using HUI Miner

29

Fig. 3 Output of the HUI mining algorithm-dataset considered patient wise

Fig. 4 Output of the HUI mining algorithm-dataset considered equipment wise-accelerometer

sent when utility value is greater than the threshold for each patient and the sample is shown in Fig. 6.

6 Conclusions and Future Scope 90 percent of the cardiovascular diseases can be prevented by adopting a systematic and disciplined daily routine that enhances human metabolism. Healthy eating habits, regular physical activity and avoiding tobacco and alcohol consumption can keep the

30

D. V. S. S. Aditya and A. Mohapatra

Fig. 5 Output of the HUI mining algorithm-dataset considered equipment wise-ECG

Fig. 6 Email alert sent by HUI mining algorithm

CVD away, as advised by our ancestors. Similarly, the risk of CVD can be reduced in people with hypertension, strep throat, diabetes and blood lipids with early detection and proper medical treatment. HUI mining algorithm developed in Python language in this work to detect the abnormalities in humans based on the sensor data with an email alert system. HUI miner finds the presence of a group of itemsets from a store’s transactional database which generates the highest profit by processing the utility information of the database through itemsets quantities and unit price. The algorithm is applied by the researchers to analyze markets and business. But HUI miner is applied successfully to process a large amount of medical data to determine the high utility (risk) patient wise and equipment wise. The degree of complexity in human subjects is detected in both cases and automated email will be sent to those who got the abnormality.

3 Prediction of Cardiovascular Diseases Using HUI Miner

31

A tool is developed using the HUI mining algorithm by considering standard data sets and threshold values are determined based on the ranges of equipment used, which is working efficiently. An interface can be developed to put the tool in practice and absolute values of medical data can be embedded in the code by accepting from users.

References 1. https://en.wikipedia.org/wiki/Cardiovascular_disease. Accessed Apr 2019 2. Liu J, Wang K, Fung BCM (2016) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 28(5) 3. Yao H, Hamilton HJ, Geng L (2006) A unified framework for utility based measures for mining itemsets. In: Proceedings of UBDM’06, 20 Aug 2006, Philadelphia, Pennsylvania, USA. ACM 1-59593-440 4. Liu M, Qu J (2012) Mining high utility item sets without candidate generation. In: Proceedings of CIKM’12, 29 Oct–2 Nov 2012, Maui, HI, USA. ACM 978-1-503 5. Tseng VS, Wu C-W, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-K high utility item sets. IEEE Trans Knowl Data Eng 28(1) 6. Chu C-J, Tseng VS, Liang T (2009) An efficient algorithm for mining high utility item sets with negative item values in large databases. Appl Math Comput 215:767–778. Elsevier 7. http://data-mining.philippe-fournier-viger.com/introduction-high-utility-itemset-mining 8. Liu M, Qu JF (2012) Mining high utility itemsets without candidate generation. In: Proceedings of CIKM 2012, pp 55–64 9. Reyes-Ortiz J-L, Oneto L, Samà A, Parra X, Anguita D (2015) Transition-aware human activity recognition using smartphones. Neurocomputing 10. https://archive.ics.uci.edu/ml/datasets/Activity+Recognition+from+Single+Chest-Mounted+ Accelerometer 11. https://archive.ics.uci.edu/ml/datasets/MHEALTH+Dataset 12. https://www.tutorialspoint.com/python/python_sending_email.htm

Chapter 4

Selection of Connecting Phrases in Weather Forecast Pratyoy Das and Sudip Kumar Naskar

1 Introduction Natural Language Generation (NLG) is a subfield of Artificial Intelligence which focuses on generating natural language texts from nontextual sources like tabular data, knowledge base, meaning representations, images, etc. One of the most popular problems in Natural Language Generation is to generate textual weather forecast from numeric weather data. There are a lot of parallel data-text weather corpus available which can be studied to explore the relationship between numeric data and textual forecast descriptions. Table 1 shows a time series of wind speed, wind directions, and the corresponding parallel textual forecast. The terms “S-SSE” and “S-SW” in the text description denote the direction of the wind and the ranges (e.g., 22–28, 28–32, etc.) denote the maximum and minimum wind speed forecasts of different instances. It can be noticed that although a single wind forecast is given in the table above, the wind forecast is given as a range of speed in the text, since forecast writers prefer to do it this way. Systems producing automatic weather forecast like SumTime-Mousam also generate a range of wind speeds in their output. The word “Increasing” denotes the increase of wind speed from 22–28 to 28–32. “By Afternoon” denotes the time “12:00:00” at the time of noting the second forecast. “Veering” denotes the anticlockwise change in wind direction from S-SSE to S-SW. “Easing” denotes the subsequent decrease of wind speed from 28–32 to 18– 22. “Later” denotes that there is a considerable time difference between the second point of forecast (12:00:00) and 3rd point (24:00:00).

P. Das · S. K. Naskar (B) Jadavpur University, Kolkata, India e-mail: [email protected] P. Das e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_4

33

34 Table 1 Wind time series and parallel text forecast

P. Das and S. K. Naskar Time

Speed

Direction

Text

06:00:00

25

SSE

SSE 22-28 INCREASING

09:00:00

27

SSE

28-32 BY AFTERNOON

12:00:00

30

S

VEERING AND EASING

15:00:00

24

S

SW 18-22 LATER

18:00:00

22

SW

21:00:00

20

SW

24:00:00

20

SW

We refer to phrases like “Increasing”, “Veering”, “Easing”, “Later”, “By Afternoon” as connecting phrases. In this work, we try to predict the connecting phrases in the textual forecast, when the intermediate weather speed range and direction segments (e.g., “22–28”, “SSE”; “28–32”, “18-22”, “SW” in the above example) are available, using a decision tree classifier. We do not generate the whole forecast as such. The motivation behind attempting to predict the connecting phrases rather than generating the entire textual forecast is that the forecasts basically consist of the connecting phrases sandwiched between the data of wind direction and speed as shown below. S-SSE 22–28 INCREASING 28–32 BY AFTERNOON VEERING AND EASING S-SW 18–22 LATER Therefore, once we get the connecting phrases, we can fit them in appropriate places between wind speed and direction data to generate the forecast. To get the number of segments and the weather data at the segments for the forecast, we use the textual human forecasts of the corpus. In this study, we have also tried to determine the factors (e.g., wind speed, wind direction, etc.) on which these connecting phrases depend. The work presented in this paper essentially is a lexical selection task of NLG.

2 Related Works A lot of NLG work has been carried out with parallel data-text weather corpus, especially with the SumTime-Meteo corpus [1, 2]. Reiter et al. presented a study on usage of time phrases on the SumTime-Meteo corpus by associating numeric weather data with wind phrases [3]. Inspired by the corpus, Sripada et al. has created SumTime-Mousam, a system which produces textual weather forecasts for offshore oilrig applications [4]. There are some interesting papers on SumTime-Mousam. Yu et al. [5] describes the segmentation algorithm used by SumTime-Mousam to produce textual forecasts. Reiter et al. [6] describes how different “connecting phrases” are chosen by SumTime-Mousam. Forecasts by humans and SumTime-METEO were evaluated by people in the Aerospace and Marine International, involved in

4 Selection of Connecting Phrases in Weather Forecast

35

marine and offshore oil rig operations and who regularly read marine weather forecasts. They preferred the forecasts produced by the SumTime-Mousam system and the forecasts produced by SumTime-Mousam post edited by experts to forecasts written by human experts. Belz [7] describes a probabilistic synchronous context free grammar (PSCFG) based NLG system the basis of which was a work done on the SumTime-Meteo corpus. Sowdaboina et al. [8] used machine learning techniques to perform content selection in order to summarize a time-series data. Three actions were performed sequentially: (1) identifying the number of segments, (2) estimate segments and identify representative points from the segments, (3) selecting verbs to depict the change of weather condition. It is the third task which is similar to the work we have reported here, although they restricted their work just to selecting words describing a change of wind speed. There have also been efforts to generate weather forecast texts using case based reasoning (CBR) methods. Adeyanju [9] describes a CBR-based NLG system which gets comparable results to earlier NLG systems like [4, 7]. Dubey et al. [10] describes a content selection task using CBR and demonstrated that their method was empirically better than purely data-driven or top-down linguistic models. Dubey et al. [11] proposed an end-to-end CBR approach to generate textual weather forecasts. This work is an extension of the work in [10]. Here, text generation is also done using CBR.

3 Experiments 3.1 Dataset For our experiments, we used the SumTime-Meteo corpus. It is a parallel data-text corpus containing 1,045 human forecasts and the corresponding weather data. It contains both numerical data, i.e., the weather conditions used by forecasters to predict weather, and the output texts written by forecasters [1]. Of our particular interest was the PurseTuple.csv file where the part of the forecast text summarizing the behavior of the wind at 10-meter height was parsed into tuples, with each tuple describing wind data at a particular point of time. E.g., “N-NNW 28-32 SOON INCREASING 35-40 GUSTS 55, EASING 25-30 GUSTS 45 LATER” The above forecast is parsed into three tuples, “N-NNW 28-32”, “SOON INCREASING 35-40 GUSTS 55”, and “EASING 25-30 GUSTS 45 LATER” and the individual pieces of information like maximum wind speed, minimum wind speed, connecting verb, time phrase, etc., are stored under respective columns.

36

P. Das and S. K. Naskar

3.2 Setup We manually analyzed the dataset and found that the connecting phrases can be categorized into four categories as given below. • Category 1: Adverbs expressing rate of change (e.g., gradually, rapidly, etc.) • Category 2: Verbs describing change in wind direction (e.g., backing, veering, etc.) • Category 3: Verbs describing change in wind speed (e.g., rising, decreasing, etc.) • Category 4: Time phrases (e.g., by midnight, later, etc.). Category 4 can be further divided into two subcategories:(i) relative time phrases (e.g., soon, later, etc.), and (ii) absolute time phrases (e.g., by midnight, by evening, etc.). We made an assumption that the connecting phrases depend on two instances of weather data—the preceding one and the succeeding one. For example, the word “rising” needs to have two ranges of wind speeds to show that there has been an increase in wind speed from one range to another. Since the dataset had one instance per tuple, we modified it for our experiments. The attributes of our modified dataset are given below. • • • • • • • • • • • • •

Previous high wind speed, Previous low wind speed, Previous wind direction, Previous time, Current high wind speed, Current low wind speed, Current wind direction, Current time, Category 1 phrase, Category 2 phrase, Category 3 phrase, Category 4.1 phrase, and Category 4.2 phrase.

For Category 1 connecting phrases dealing with adverbs, there are five classes, “gradually”, “steadily”, “slowly”, “quickly”, and “rapidly”. However, there is a gross mismatch between the number of elements of each class; 88% of tuples have no adverbs in this dataset. Moreover, “gradually” dominates all other adverbs, with 66% of the adverbs in the dataset being “gradually”. There are actually very few (30.9 ft).

3 Methodology In the current study, the methodological approach followed the steps shown in Fig. 3. Based on feasibility and data availability, the input parameters chosen to predict the flood discharge were: discharge of the following tributaries of river Jhelum forming the nine input parameters: Bringi, Aripath, Sandran, Lidder, Aripal, Watlara, Romshu, Vishow, Rambira; rainfall from (1990 to 2018); potential evapotranspiration (1990–2018); normalized differential vegetation index (NDVI) (1990–2018). The output was selected as the discharge of the river Jhelum at the Padshahi Bagh

5 Analysis of Bayesian Regularization …

Fig. 2 Flood Inundation depth of Jhelum river—September 2014. Source [5]

Fig. 3 Hierarchal steps followed in methodology

45

46

R. Tabbussum and A. Q. Dar

Table 1 Statistical parameters to compare performance of flood prediction models Statistical parameter

Equation

Mean squared error (MSE)

MSE =

Root mean squared error (RMSE)

RMSE = (MSE) 2

n   i=1

yp − ya

2

1

n

Coefficient of determination (R2 ) R2 Absolute average deviation (AAD)

1 n

=1−

i=1 n i=1

 AAD =



yp −ya

2

(ya −ym )2  

n     n × 100 yp − ya /ya

i=1

Note n = No. of observations yp = predicted value obtained the model ya = actual value ym = average of actual values

station. NN models developed had two hidden layers, with ten neurons in the second hidden layer. In both the hidden layers the transfer function used was tan-sigmoid. The statistical parameters used to validate the results are given in Table 1.

4 Results and Discussions Figure 4 illustrates the network diagram of the developed models. The equations developed between the output predicted by the models and the targets provided are given in Table 2. Bayesian Regularization training algorithm was good at predicting lower values of discharge in the river however it was not good at predicting the discharge at higher levels which correspond to the flood discharge. Whereas Levenberg–Marquardt training algorithm proved to be better training function in predicting the flood discharges than the Bayesian Regularization training algorithm. The training for the NNbr model stopped at the 8th epoch, whereas it took 6 epochs for the NNlm model to minimize the errors. The regression plots were generated at the end of the training for both the

Fig. 4 Neural network for flood prediction

5 Analysis of Bayesian Regularization …

47

Table 2 Linear regression equations between neural network predicted output and target value Model

Training data

Test data

All data

NNbr

o = 0.95 × t + 7.5

o = 0.65 × t + 44

o = 0.8 × t + 25

NNlm

o = 0.86 × t + 24

o = 0.93 × t + 21

o = 0.89 × t + 20

Note o = Predicted output t = Target value

models. The regression plots generated in the MATLAB for each of the NN model after training concludes are provided in the Figs. 5 and 6.

Fig. 5 Regression plots for NNbr

48

R. Tabbussum and A. Q. Dar

Fig. 6 Regression plots for NNlm

The actual discharge was plotted against the predicted discharge for both the models (20% of the data) in Fig. 7. The validation curve is given in Fig. 8. The results show that the R2 values of NNbr and NNlm were 88.63% and 95.8391%, respectively. The MSE of the models NNbr and NNlm were 1.29 and 0.21%, respectively. The variation in the peak discharge of 2014 floods as predicted by all the models from the actual flood discharge is given in Table 3. NNlm model’s peak predicted discharge was just 1.87% less than the actual discharge hence it can prove to be a viable solution to the flood forecasting issue of the Jhelum River. As and when the flood threat is anticipated, all the input parameters can be fed to the model and the output values simulated and the warnings issued.

5 Analysis of Bayesian Regularization …

49

Discharge (cumecs)

2500 2000 1500

Qactual Qpredicted NNbr

1000

Qpredicted NNlm 500 0

Fig. 7 Predicted versus actual discharge curve for NNbr

3 MSE 2

RMSE

1

R2

0

AAD NNbr

NNlm

Fig. 8 Comparison of the MSE, RMSE, R2 , and AAD of NNbr and NNlm models

Table 3 Peak discharge of 2014 floods as predicted by the NNbr and Nnlm models

Model

Predicted discharge m3 /s

Actual discharge m3 /s

Variation %

NNbr

1183.41285

2299.05723

−48.526

NNlm

2255.875321

2299.05723

−1.878

5 Conclusion Enhanced NN technique emanated as a wiser tactic in foreseeing the flood discharge in the circumstances where the prior familiarity of the catchment behaviors is not regarded. The NN models can be fed aptly with the data with not much pre-processing necessary and the findings extracted in a click. In summary, the contribution of this study is as follows:

50

R. Tabbussum and A. Q. Dar

• For flood prognostics in River Jhelum the proposal of the NNlm model, with its high prediction precision, is recommended. • With R2 of 95.839%, MSE by 0.2128%, RMSE by 4.61, and AAD by 2.29%, the model NNlm outperformed the NNbr model. • The analysis corroborated that the NNlm model’s forecasted peak discharge was only 1.87 percent greater than the September 2014 actual flood discharge. And it can be a practicable panacea to our state’s flooding misfortunes.

References 1. Hong H, Panahi M, Shirzadi A, Ma T, Liu J, Zhu A (2017) Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci Total Environ. https://doi.org/10.1016/j.scitotenv.2017.10.114 2. Bhat MS, Alam A, Ahmad B, Kotlia BS, Farooq H (2018) Flood frequency analysis of river Jhelum in Kashmir basin. Quat Int 0–1. https://doi.org/10.1016/j.quaint.2018.09.039 3. Chang F, Hsu K, Chang L (2019) Flood forecasting using machine learning methods 4. Rao GS, Farooq M, Sree M (2016) Satellite-based assessment of the catastrophic Jhelum floods of September 2014, Jammu & Kashmir, India. Geomat Nat Hazards Risk. https://doi.org/10. 1080/19475705.2016.1218943 5. Eptisa C (2018) Jhelum and Tawi flood recovery project Jhelum and Tawi flood recovery project 6. Saleh SF, Rather FF, Jabbar MJ (2017) Floods and mitigation techniques with reference to Kashmir. Int J Eng Sci Comput 7:6359–6363 7. Goodarzi L, Banihabib ME, Roozbahani A, Dietrich J (2019) Bayesian network model for flood forecasting based on atmospheric ensemble forecasts. Nat Hazards Earth Syst Sci Discuss 1–19. https://doi.org/10.5194/nhess-2019-44 8. Fotovatikhah F, Herrera M, Shamshirband S, Chau KW, Ardabili SF, Piran MJ (2018) Survey of computational intelligence as basis to big flood management: challenges, research directions and future work. Eng Appl Comput Fluid Mech 12:411–437. https://doi.org/10.1080/19942060. 2018.1448896 9. Deepika Y (2011) Stream flow forecasting using Levenberg-Marquardt algorithm approach. Int J Water Resour Environ Eng 3:30–40 10. Adamowski J, Karapataki C (2010) Comparison of multivariate regression and artificial neural networks for peak urban water-demand forecasting: evaluation of different ANN learning algorithms. J Hydrol Eng 15:729–743. https://doi.org/10.1061/(asce)he.1943-5584.0000245 11. Romshoo SA, Altaf S, Rashid I, Dar RA, Romshoo SA, Altaf S, Rashid I, Dar RA, Altaf S, Rashid I, Dar RA (2018) Climatic, geomorphic and anthropogenic drivers of the 2014 extreme flooding in the Jhelum basin of Kashmir, India. Geomat Nat Hazards Risk 5705. https://doi. org/10.1080/19475705.2017.1417332

Chapter 6

Data Mining on Cloud-Based Big Data Anil Kuvvarapu, Anusha Nagina Kunuku and Gopi Krishna Saggurthi

1 Introduction The information technology (IT) engineering has experienced significant advancement and discoveries over the years and there are more advancements to come. The development of the IT industry is still in progress, as newer advancements are being made consistently in the industry. Among the business segments in the IT industry, cloud computing has been one of the fastest for the past few years. Cloud computing has found relevance in both business and private environments. Cloud computing offers storage, retrieval, manipulation and access of data on the cloud. Data is stored in the cloud hence there is no need for purchasing expensive storage computing devices. Since data are stored in the cloud, it can be accessed from anywhere provided that the user has an internet connection and login details. One of the advantages that have influenced the adoption of cloud computing is the fact that users have the option to select pay as you go (PAYG) or usage-based pricing for cloud services. Business strategies and practices are aimed at the maximizing output using limited input and cloud computing has proven its ability to decreases computing costs a business can incur. Therefore, cloud computing improves the efficiency and effectiveness of business processes and strategies [1].

A. Kuvvarapu · G. K. Saggurthi University of Michigan, Ann Arbor, MI, USA e-mail: [email protected] G. K. Saggurthi e-mail: [email protected] A. N. Kunuku (B) GRIET, Hyderabad, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_6

51

52

A. Kuvvarapu et al.

Computing resources availed by cloud computing include networks, server, storage, applications, and services. Cloud computing has five basic elements, three service models and four deployment models as described below. The basic elements include the following: • On-demand self-service which enhances automation of provisions of computing capabilities; • Broad network access which enables users to access cloud services over the network using various platforms; • Resource pooling: Allocation of pooled services for several users; • Rapid elasticity which allows the provision of computing resources according to their demand; • Measured service: It has the ability to automatically manage and monitor the usage of computing resources by the provider and users.

1.1 The Service Models • Software as a Service (SaaS): The provision of software and applications that facilitates access of the cloud services by users. • Platform as a Service: The provision of a platform where users can develop and deploy web applications, without the need of maintaining complex infrastructure. • Infrastructure as a Service: The provision of basic hardware (virtual) and networking resources. The deployment models of cloud computing include private cloud, public cloud, and hybrid cloud. In a private cloud, the services are availed specifically for one organization or individual. Exclusive allocation of cloud services to a defined network of users is found in a community cloud. The public cloud provides cloud services to the general public. Hybrid cloud, as the name suggests, offers private and public cloud options together. The major issue with a hybrid cloud option is related to security. Hence cloud providers often add an extra layer of security (e.g., VPN over SaaS) to ensure that the user data will be secure [2].

2 Background Review Despite the advantages that cloud computing has over the traditional computing methods, there are challenges that have resulted in a heated debate on its effectiveness for use in businesses and private environments. Weinstein and Jaques [3] identify the major challenges facing cloud computing to be associated with control,

6 Data Mining on Cloud-Based Big Data

53

performance, invisibility, security, and privacy, associated bandwidth costs, vendor lock-in and standards, and transfer of large volumes of data among others. There are fears for security issues concerning data stored, accessed and retrieved from the cloud. Antonopoulos and Gillam [4] identifies security and privacy as the chief concern of chief information officers and IT executives. Cases of violation or changes in users’ social behaviors have been reported. This is due to the fact that data and applications in cloud computing are usually stored or running on a virtual infrastructure outside an organization’s firewall. Therefore, users rely on service providers instead of themselves for the protection of their data and application. This paper focuses on the issues associated with the transfer of large volumes of data, structured and unstructured. Chang et al. [2] argue that transferring the large volume of data to the cloud is not only time consuming but also cost prohibitive especially for an organization with standard network bandwidth. Big data have enhanced the capacity of cloud computing in addressing issues associated with the transfer of large volumes of data to the cloud. The conventional approaches to manage high volumes of data aren’t sufficient, and hence Big Data is proving to be a highly important IT advancement [5]. One of the indicators of the role of Big Data in increased use of cloud computing is the rapid adoption of Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) technologies. Hassanien [6] argues that PaaS has contributed to the increased scalability of computing resources and reduction of costs. Big data have increased computing and storage capacity as the two technologies, PaaS and IaaS, can be added simultaneously. Therefore, big data enables cloud computing to handle large volumes of data with a lot of ease. However, Raj and Deka [5] state that management and storage of data with the help of big data also have challenges which are different and unique. The major challenge is associated with indexing and storage of large volumes of unstructured data and associated inability to effectively retrieve and distribute the data. Both structure and unstructured data have significantly increased in the cloud. Handling of large volumes of unstructured data has been a major challenge. Since most of the big data are unstructured or semi-structured, there has been a need to develop techniques and tools for handling them. This gave rise to the Hadoop Map.

3 Discussion Hadoop is open-source software that makes it possible for distributed processing of large volumes of data across several servers. The increased popularity of Hadoop is evident in nonrelational platforms and common components of analytical environments. In the nonrelational platform, there is no need for data to be stored in a specific format for processing and uses several programming languages as a supplement to the basic SQL. Hadoop has the ability to handle unstructured or semi-structured data

54

A. Kuvvarapu et al.

that is increasing in big data technology. Franks [7] describes unstructured data as data formatted in a complex manner such that it cannot be easily transformed into an analysis-ready system. The following are characteristics of computing solutions offered by Hadoop: • Scalable: New nodes can be easily added to the cluster without necessarily changing data formats, loading of data, etc. • Cost-effective: Significant reduction in the cost per terabyte of storage due to parallel computing while using commodity servers makes the system affordable. • Flexible: It can handle any type of data or data from any source. • Fault tolerant: Effect of fault is minimized through redirecting work to another location of data to proceed with processing by the system when a node goes offline. Hadoop has two major subsystems, namely: Hadoop Distributed File System (HDFS) and MapReduce. HDFS is used in the management of file storage and global access to users. It is designed with a replication mechanism, separate servers—name nodes for metadata, and data node for application data. The main function of name nodes is to create a mapping between file blocks and data nodes, while data nodes send heartbeat messages (with a default interval of 3 s) to name node to make sure that the data is safe and accessible. MapReduce is connected with the job tracker and task tracker. The function of the job tracker is to accept job requests, splitting data inputs, defining tasks, assigning tasks for parallel processing and handling errors. Task tracker either maps or reduces tasks depending on the order from the job tracker. After submission, data are split into different tasks after which the tasks are assigned to each mapper. Map output file is submitted to the sort phase where tasks are sorted and given to the shuffle phase. The data are arranged randomly and then moved to reducer, which is determined by the user. The reducer allows easy retrieval of data. The final output is then availed for use. This process of conversion of unstructured data to structured data through the use of MapReduce makes it easier to access or transfer large sets of data to and/or from the cloud. Apache Hadoop consists of five different daemons (the background processes) that run its own JVM (Java Virtual Machine) (1) Name Node, (2) Data Node, (3) Secondary Name Node, (4) Job Tracker, (5) Task Tracker (Benslimane and Hongming, 63–67). As shown in Fig. 1 HDFS layer, Job Tracker and Task Tracker keeps track of and perform real-time execution of the job that comes under MapReduce layer. A Hadoop cluster is comprised of multiple slave nodes and a master node. The master node runs the master daemons for each layer i.e., Name node for the HDFS storage layer, and Job Tracker for the MapReduce processing layer. Other parts of the machines will run the “slave” daemons: Data Node is for the HDFS layer and Task Tracker is for MapReduce layer. Master nodes of Hadoop can play a role of slave.

6 Data Mining on Cloud-Based Big Data

55

Fig. 1 Hadoop Daemons

The Hadoop Core architecture is developed with a Hadoop distributed file system. The most advantageous feature of HDFS is its fault tolerance. By delivering fast data transfer between the nodes and enabling Hadoop to carry on delivering service even in the event of node failures decreases the risk of catastrophic failure.

3.1 HDFS Architecture See Fig. 2. Fig. 2 HDFS Architecture

56

3.1.1

A. Kuvvarapu et al.

Name Node

This runs on the master node. Name node manages and stores metadata information about the file system in a special file named ‘Fsimage’. The metadata information is residing in the main memory as a cache memory to provide faster access to the requests by clients. Name node is attached to a slave data node to execute the client I/O tasks. The name nodes contain their own namespace and hence are internally capable of managing their set of files [8].

3.1.2

Data Nodes

These are the primary elements for store data blocks and read/write requests of the files residing in HDFS. In Hadoop cluster, data node daemons run on each slave node. These are controlled by upper structure, the data node. The data blocks in data nodes can be replicated according to the configuration they are designed and to maintain high availability and reliability. Secondary Name Node: This is a backup for the name node. The main responsibility of this secondary node is to identify the changes by reading the file system and log the changes and apply them to fsimage File. Figure 3 shows the process inside the secondary name node. The core structural design MapReduce is the software structure used in Hadoop. MapReducer’s main responsibility is to analyze and process the data sets in parallel on multimode clusters of provided hardware in a reliable, scalable and fault-tolerant way. There are two different steps named map phase and reduce phase to processing and data analysis [9]. According to Fig. 4, MapReduce processes input data by performing split, sort, and merge operations on it. The data is first splitted into data chunks and then processed using two processes: the first process is the map phase and the second process is the reduce phase which is processed as a parallel process among the nodes. Fig. 3 Inside Secondary Name node

6 Data Mining on Cloud-Based Big Data

57

Fig. 4 Map phase and reduce phase

Besides Hadoop, there are several techniques and tools available for big data analysis. The following section further discusses these techniques and their comparison.

3.2 Grid Computing Tools The HPC and grid computing tools have the same characteristics as they share the file system across the SAN and work across the cluster in a distributed manner. In this approach compute nodes are idle when accessing a large volume of data. MapReduce component helps for the data locality to access data fast and compute the node itself. This approach basically makes use of the API as MPI (message passing interface). From that it provides high-level access to the user to control the data flow mechanism. It is a known fact that coordinating the tasks on a vast distribution system is complex and challenging, but MapReduce can act like the problem solver for the issue.

58

A. Kuvvarapu et al.

3.3 Big Data Challenges and Issues 3.3.1

Data Management

• Storage and transportation: With the advent of internet technologies and social media, a large amount of varied data is being produced at a speedy rate. Hence, in the coming years, big data is going to face issues related to data storage, and transportation of data. • Processing: Processing of such large volume of data for analytics would not only require a considerable amount of time but high-performance computing devices as well.

3.3.2

Data Privacy and Security

As big data comprises varied types of data which include user’s personal or sensitive information (financial records, health records etc.), it would be a challenge to keep this data secure from unwanted attacks, such as hacking.

4 Conclusion In cloud computing, to perform efficient data mining of unstructured data sets, the application of the Hadoop system with the incremental tree algorithm is used. MapReduce function is used to split unstructured data sets and transform those data into structured sets of data. Incremental tree application which is in MapReduce processed data can efficiently find the data and analyze them. As this approach is distributed the analyzing time can be reduced with the performance of the computers in the distributed network. Although this mechanism is very efficient in cloud-based data mining there are some implementing issues like need for skilled personals to implement, and the complexity of system structure.

References 1. Aljawarneh S (2015) Advanced research on cloud computing design and applications 2. Chang V, Walters RJ, Wills G (2015) Delivery and adoption of cloud computing services in contemporary organizations 3. Weinstein J, Jaques T (2013) The government manager’s guide to project management. Tysons Corner, Ma, Va 4. Antonopoulos N, Gillam L (2010) Cloud computing: principles, systems, and applications. Springer, London 5. Raj P, Deka GC (2014) Handbook of research on cloud infrastructures for Big Data Analytics

6 Data Mining on Cloud-Based Big Data

59

6. Hassanien, AE (2015) Big Data in complex systems: challenges and opportunities. Internet Resource 7. Franks B (2014) The analytics revolution: how to improve your business by making analytics operational in the Big Data Era. Internet Resource 8. Faghri F (2010) Failure scenario as a service (FSaaS) for Hadoop clusters. In: Proceedings of the workshop on secure and dependable middleware for cloud monitoring and management ACM, Beijing, China 9. Dean J et al (2004) MapReduce: simplified data processing on large clusters. In: OSDI’04: proceedings of the 6th conference on symposium on operating systems design & implementation. USENIX Association, Berkeley, CA, USA

Chapter 7

Performance Analysis of gTBS and EgTBS Task Based Sensing in Wireless Sensor Network Pooja Jadhav and Ajitsinh Jadhav

1 Introduction Efficient design and implementation of wireless sensor networks have become a hot area of research in recent years. Due to the vast potential of sensor networks to enable applications to connect the physical world to the virtual world. Nowadays wireless sensor network was seen in many applications such as environmental monitoring, home security, medical monitoring, traffic flow estimation, military operations, temperature sensing, surveillance, industrial machine monitoring, etc. A WSN consists of a large number of sensor nodes deployed in a random fashion with one or more sinks or base stations (BSs) spread over a specific area where we want to monitor the changes going on there. Sensor nodes require sensing, process the data through a wireless medium with the limited powered battery. The information which is sensed by the sensor nodes is collected at the base station. All the sensor nodes are allowed to communicate through a wireless medium [1]. The wireless medium may either of radio frequencies, infrared or any other medium, of course having no wired connection. The major characteristics of the sensor node used to evaluate the performance of WSN involve the mobility of nodes, fault tolerance, scalability, communication failures, heterogeneity of nodes, dynamic network topology. Many more challenging issues need to be considered while designing the routing protocols in the wireless sensor network. There are several network design issues for WSNs, such as energy, sensor location, limited hardware resources, and massive and random node deployment. Limited energy capacity—Due to limited power supply sensor nodes, routing protocol design cope up with various challenges. Network sensor becomes faulty P. Jadhav (B) · A. Jadhav Department of Electronics & Telecommunication Engineering, D. Y. Patil College of Engineering and Technology, Kolhapur, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_7

61

62

P. Jadhav and A. Jadhav

when it attains threshold and unable to work properly which affects the network performance. Sensor locations—The next important challenge observed in the design of routing is to cope with the sensor locations. A different technique needs to be used to find out the location of the sensor nodes in order to collect the data from the same. Limited hardware resources—Sensor nodes consist of limited storage capacity hence there will be a restriction on the processing of data which undergoes the challenge for the designing of the routing protocol. Massive and random node deployment—In most applications node deployment is carried out randomly in a given area or massively over the contrary environment. Due to massive and random node deployment, the network performance will affect which gives another challenge in designing of the routing protocol. Network characteristics and unreliable environment—Another important challenge in a wireless sensor network is topology. Network characteristics such as sensor mobility affect the topology. Sensor addition or deletion due to node failure also affects network performance. Data Aggregation—Data aggregation is the main task in routing protocol in order to get energy efficiency in the network. Multiple nodes produce similar data packets, therefore they need to aggregate the data uniquely in order to reduce the transmission and reception for energy efficiency. Network lifetime deteriorates as there is consumption of energy by that sensor. While the energy is consumed by different activities carried out by the sensor nodes. The activities carried out by sensors are sensing the data, transmitting the sensed data, etc. More energy is consumed by sensors while transmitting the data as compared to sensing the data. In order to achieve the energy-efficient WSN, by avoiding the above-stated issue the task-based scheduling is performed. The performance of the proposed approach (EgTBS) is compared with the existing approach (gTBS) and evaluated using network simulator ns2.

2 Proposed Work The scope of work is related to the following: • Implementation of basic green Task Based System (gTBS). • Implementation of Weighted routing in gTBS-based on two parameters such as closeness to sink and residual energy of the nodes [Appropriate weight is assigned to the parameters and named as Enhanced gTBS (EgTBS)]. • Performance analysis and comparison of gTBS and EgTBS based on the following parameters: Event delivery ratio (EDR), Delay, Average residual energy, and Control overhead.

7 Performance Analysis of gTBS and EgTBS Task …

63

3 System Model • green Task-Based Sensing (gTBS) A basic gTBS scheme is proposed. The main objective behind the scheme is taskbased sensing in wireless sensor network. The scheme started with an ID assignment in the wireless sensor network. Here, we assign the ID and order to every node of the network. The base station or sink is assigned with ID = 0 and Order = 0. Then order 1 is assigned to the nodes which is found to be the next closest to sink and further increase of order likewise. Gradient is the nearest node to the sink that assures data flow from sink to sensor [1]. The gradient selection is based on the closest node to sink only. The design utilizes the idea of combination of power adaptation and sleep– wake up technique which faces the challenge in synchronization, network efficiency and nodes availability. The use of this combination gives the advantage by avoiding lots of details in routing tables. Because of this only the nodes belong to the intended task are active and the rest of the network is in sleep but ready to receive the next task. Only the intended nodes for the requested task will be triggered and wake up which gives the energy efficiency in the overall network. The performance evaluation of the system is carried using Network Simulator2. Event delivery ratio, delay, average residual energy and control overhead are the performance parameters considered for the gTBS scheme. The contributions are as follows: 1. 2. 3. 4. 5.

ID assignment, gradient-based transmission, Proposing a power adaptation scheme, Proposing a task-driven sleep and wake-up scheme, and Performance evaluation.

• Enhanced green Task-Based Sensing (EgTBS) The merging of power adaptation and sleep–wake-up techniques poses the challenges in terms of network efficiency. The above-stated green Task Based Sensing (gTBS) scheme has an issue as it will dissipate energy in finding the closest node to sink. To avoid the stated issue, in Enhanced green Task Based Sensing (EgTBS) weighted routing is performed. This weighted routing considers two parameters closeness to sink along with residual energy of the nodes and an appropriate weight is assigned to the parameters and named as Enhanced gTBS (EgTBS). Hence the routing load of the closest node is distributed in the network by avoiding energy-inefficient closest node. Figure 1 shows the comparison of gTBS and EgTBS.

64

P. Jadhav and A. Jadhav

Fig. 1 Comparison of gTBS and EgTBS

4 Performance Evaluation The main objective of work is to evaluate the performance of gTBS and EgTBS in order to ensure which system works better. More precisely the work is to study the comparison between the two stated schemes under the performance parameter EDR, delay, average residual energy and control overhead. In order to find out the performance NS2 simulator is used. The performance analysis of the stated system is carried out under three cases, i.e., 1. 500 m * 500 m with varying nodes from 50 to 100 2. 800 m * 800 m with varying nodes from 50 to 100 3. 1000 m * 1000 m with varying nodes from 50 to 100

7 Performance Analysis of gTBS and EgTBS Task … Table 1 Simulation model

65

Simulator

Network simulator 2

Number of nodes

50–100

Area

(1) 500 m x 500 m (2) 800 m x 800 m (3) 1000 m x 1000 m

Interface type

Phy/WirelessPhy

Mac type

IEEE 802.11

Queue type

Droptail/Priority queue

Queue length

50 Packets

Propagation type

Two-ray ground

Routing protocol

AODV

Transport agent

UDP

Application agent

CBR

Initial energy

10 J

Simulation time

50 s

For each case, the performance parameter for both gTBS and EgTBS is simulated and obtained results are compared. The simulation model is given in Table 1. • Event Delivery Ratio (EDR): Event Delivery Ratio is the measure of how many packets are received out of total generated packets in the network. EDR is calculated for both stated systems by varying the number of nodes from 50 to 100 nodes and are compared for the three given cases. The graphs are based on obtained values in the simulation results (Table 2). Figure 2 compares the EDR of gTBS with EgTBS. However compared to the gTBS, the EgTBS scheme improves the EDR by 33%, 25% and 66.68% for each Table 2 Event delivery ratio System

No. of nodes 50

60

70

80

90

100

(a) EDR (%) for 500 m * 500 m gTBS

0.5

1

0.75

0.75

0.875

0.4166

EgTBS

0.75

0

0.75

0.75

0.625

0.5

(b) EDR (%) for 800 m * 800 m gTBS

0.75

0.75

1

0.5

1

0.5

EgTBS

1

0.75

0.5

0.5

0.75

0.4166

(c) EDR (%) for 1000 m * 1000 m gTBS

0.75

0.75

0.75

0.25

0.375

0.1667

EgTBS

0.75

0.75

1

0.75

0.5

0.5

66

P. Jadhav and A. Jadhav

Fig. 2 a EDR (%) for 500 m * 500 m; b EDR (%) for 800 m * 800 m; c EDR (%) for 1000 m * 1000 m

case, respectively. This gain is because of unicasting packets and lowering power transmission. • Delay: It is the time taken for the packet to reach from source sensor to sink. Delay is also the wait for delivering the given data at sink from the source node. It is calculated for the above-stated three cases by varying the number of nodes from 50 to 100 nodes and graphs are plotted based on the obtained results. The plotted graph gives a comparison of delay for both gTBS and EgTBS (Table 3). Figure 3 gives the comparison of delay for gTBS and EgTBS for every stated case. The reduction in the delay observed in the EgTBS than the gTBS. Therefore EgTBS improves the delay. • Average Residual Energy: In a wireless sensor network, the energy level of nodes is represented by the energy model. Initial energy will give the level of energy at the start of the simulation. The initial energy is used as an input for a given node. A node discharges some amount

7 Performance Analysis of gTBS and EgTBS Task …

67

Table 3 Delay System

No. of nodes 50

60

70

80

90

100

10.29

(a) Delay (s) for 500 m * 500 m gTBS

0.082

0.120

EgTBS

0.103

0

0.145

0.439

3.175

0.069

1.119

0.762

0.8171

(b) Delay (s) for 800 m * 800 m gTBS

0.028

0.067

0.053

0.289

0.628

0.652

EgTBS

0.785

0.112

0.122

0.205

0.280

0.3405

(c) Delay(s) for 1000 m * 1000 m gTBS

0.071

0.090

0.094

0

0.332

0

EgTBS

0.061

0.014

2.204

0.164

0.607

0.880

Fig. 3 a Delay (s) for 500 m * 500 m; b delay (s) for 800 m * 800 m; c delay (s) for 1000 m * 1000 m

68

P. Jadhav and A. Jadhav

Table 4 Average residual energy System

No. of nodes 50

60

70

80

90

100

(a) Average residual energy (J) for 500 m * 500 m gTBS

4.943

5.174

4.957

4.916

4.435

4.632

EgTBS

4.941

5.175

4.957

4.915

4.434

4.631

(b) Average residual energy (J) for 800 m * 800 m gTBS

4.941

5.176

4.959

4.918

4.435

4.631

EgTBS

4.941

5.175

4.961

4.918

4.436

4.633

(c) Average residual energy (J) for 1000 m * 1000 m gTBS

4.943

5.176

4.958

4.916

4.436

4.635

EgTBS

4.941

5.175

4.958

4.917

4.437

4.635

of energy in transmitting and receiving of data packets. The remaining amount of energy after transmitting and receiving packets is the residual energy of the node. The average residual energy for each case calculated and plotted the obtained results as a comparison between gTBS and EgTBS (Table 4). Figure 4 shows the comparison for average residual energy. But we observe that the gTBS and EgTBS values for average residual energy are the same and no extra energy will be utilized by EgTBS, hence the remaining energy is the same which concludes that EgTBS improves the energy efficiency. • Control Overhead: The total number of control packets used in the given network operation. It gives the measure of the control packets in network. The stated performance parameter is calculated for given three cases and the obtained results are compared for the gTBS and EgTBS (Table 5). Figure 5 gives the comparison for control overhead for the stated cases. We observed from the graph the values of control overhead for gTBS and EgTBS scheme are nearly the same.

5 Conclusion The green Task Based Scheme (gTBS) and Enhanced green Task-Based Sensing (EgTBS) schemes differ in the gradient selection. In order to achieve the energyefficient wireless sensor network, gradient is selected on the basis of closeness to sink and residual energy of node. We evaluated both schemes under the performance parameter EDR, Delay, Average Residual Energy, Control Overhead using network simulator 2. EgTBS scheme improves the EDR by 33–66.68% moreover the delay reduction takes place by 16.68–66.68%. EgTBS does not utilise the extra energy for gradient selection and hence we obtained the same average residual energy and

7 Performance Analysis of gTBS and EgTBS Task …

69

Fig. 4 a Average residual energy (J) for 500 m * 500 m; b average residual energy (J) for 800 m * 800 m; c average residual energy (J) for 1000 m * 1000 m Table 5 Control overhead System

No. of nodes 50

60

70

80

90

100

(a) Control overhead (packets) for 500 m * 500 m gTBS

40,761

49,199

57,623

66,856

78,429

103,561

EgTBS

40,948

49,118

57,598

67,380

78,343

103,050

(b) Control overhead (packets) for 800 m * 800 m gTBS

40,152

48,225

40,152

65,387

77,915

103,550

EgTBS

40,283

48,335

56,371

65,260

78,003

102,443

(c) Control overhead (packets) for 1000 m * 1000 m gTBS

40,032

48,612

57,246

66,111

78,267

99,748

EgTBS

40,104

48,654

57,218

65,824

78,225

99,545

70

P. Jadhav and A. Jadhav

Fig. 5 a Control overhead (packets) for 500 m * 500 m; b control overhead (packets) for 800 m * 800 m; c control overhead (packets) for 1000 m * 1000 m

control overhead for both stated schemes. Thus we claim from the justified results, EgTBS improves the energy efficiency in the wireless sensor network.

References 1. Alhalafi A, Sboui L, Naous R, Shihada B, gTBS: a task-based Sensing for energy efficient wireless sensor network. In: IEEE conference on computer communications workshop, 978-14673-9955-5 (2016) 2. Prayati A, Antonopoulos C, Stoyanova T, Koulamas C, Papadopoulos G (2010) A modeling approach on the TelosB WSN platform power consumption. J Syst Softw 83(8):1355–1363 3. Tripathi A, Yadav N, Dadhich R (2015) Secure-spin with cluster for data centric wireless sensor networks. In: Proceeding of fifth international conference on advanced computing and communication technologies (ACCT). IEEE, pp 347–351 4. Kubisch M, Karl H, Wolisz A, Zhong LC, Rabaey J (2003) Distributed algorithms for transmission power control in wireless sensor networks. Proc IEEE Wirel Commun Netw (WCNC) 1:558–563

7 Performance Analysis of gTBS and EgTBS Task …

71

5. Charalambos S, Vasos V (2012) Source-based routing trees for efficient congestion control in wireless sensor networks. In: Proceeding of IEEE 8th international conference on distributed computing in sensor systems (DCOSS). IEEE, pp 378–383 6. Yao Y, Cao Q, Vasilakos AV (2015) EDAL: an energy-efficient, delay-aware, and lifetimebalancing data collection protocol for heterogeneous wireless sensor networks. IEEE/ACM Trans Netw 23(3) 7. Jiang B, Ravindran B, Hyeonjoong C (2013) Probability based prediction and sleep scheduling for energy efficient for energy-efficient target tracking in sensor networks. IEEE Trans Mob Comput 12(4) 8. Alhalafi A, Javaid N, Iqbal A, Khan ZA, Alrajeh N (2013) On adaptive energy-efficient transmission in WSNs. Int J Distrib Sens Netw 10:10 9. Son D, Krishnamachari B, Heidemann J (2004) Experimental study of the effects of transmission power control and blacklisting in wireless sensor networks. IEEE communications society conference on sensor and ad hoc communications and networks (SECON ’14). IEEE, pp 289–298

Chapter 8

A Secured Framework for Cloud Computing Sana M. Bagban and H. A. Tirmare

1 Introduction Smartphones, provide wider range of applications such as image processing, files with different extensions, speech recognition, and many more. The demands for application and smartphone with such resources are increasing. Mobile cloud computing is a new concept that integrates cloud and mobile device to extend battery life and increase the performance of an application. The framework is used to design one or more constrained, such as energy consumption, CPU utilization, execution time and memory storage. AES technique is used for encrypting data for security purposes. The cloud computing consists of different deployment techniques that is: Private, public and hybrid cloud. The framework on leveraging energy efficiency focuses on service on cloud datacenters on the minimal instance of computation at runtime. In computational offloading, the main ideas are to migrate computational tasks from mobile to server in order to save energy on the mobile device. The main aims of the offloading decision we need to monitor the usage of network and bandwidth. For transferring data we require high bandwidth and network connectivity, where the system consists of wireless connection. It requires 4G, 3G or Wi-Fi connection for transferring data. Offloading refers to technique or terms of memory and computation. To login to the cloud and get the benefit of the access we have to provide security by signing up with the Gmail account of user and notification and the link is sent for verifying and allowing access to use the cloud. Different modules and architecture systems have been proposed.

S. M. Bagban (B) · H. A. Tirmare Computer Science and Technology, Department of Technology, Shivaji University, Kolhapur, Maharashtra, India e-mail: [email protected] H. A. Tirmare e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_8

73

74

S. M. Bagban and H. A. Tirmare

2 Related Work Previous work is used to purpose many challenges in the computational process. (1) Kovache et al. [1] where this paper is used to provide adaptive extension on android mobile devices. The experiment results evaluate the MACS framework with two case phone application. The application in this paper involves face detection and video file. The second case processes a video file, and provide a time point for video navigation. (2) Khan et al. [2] this paper poses a challenge of smartphone resource constraints where devices are limited to computational power, limited storage and battery consumption. This paper decribes mobile cloud architecture, application models, and computational offloading processes. (3) Chun et al. [3] this paper is used to present the clone cloud concept which automatically transfers mobile applications to the cloud. Clone cloud is used for static and dynamic profiling for partitioning the application. This paper describes the method to overcome the basic argument and execution in the cloud. (4) Shiraz et al. [4] this paper gives Distributed Application Processing Frameworks (DAPFs) for SMDs in MCC domain. The aims challenge of a distributed application is defined while using the devices. By using thematic taxonomy it reviews current offloading frameworks and aspects of analyzes the implications and critical of current offloading frameworks. There is still a lack of CPU utilization, memory storage which gives the limitation of smart mobile devices. The key area focuses on application layer for creating new software level of research. (5) Kosta et al. [5] in this paper, a Think Air framework is application proposed which is simple for a developer to use their smartphone on the cloud. Think Air framework is evaluated to more complex applications to simple microbenchmarks this with a range of benchmarks starting from paper uses N-queens algorithm and in order of face detection and virus scan application. (6) Zhou et al. [6] in this paper, the authors proposed multi-cloud resources where ad hoc, cloudlets, and a context-aware cloud computing algorithm. The paper follows the main contribution such as: it presents a design of mobile cloud computing with near mobile cloud, cloudlet. Secondly, it provides cost estimation locally on mobile devices.

3 Different Framework for Mobile Cloud Computing Various types of framework are used to give the offloading process in the mobile cloud and some of them are (1) Energy-Efficient Computational Offloading Framework (EECOF): This framework processes an intensive application on Mobile Cloud. The main aim

8 A Secured Framework for Cloud Computing

(2)

(3)

(4)

(5)

75

is to focus on services on datacenters with instance code migration at runtime. The results decrease the consumption of energy cost of mobile application in remote processing component. The main component used is Orchestrator which is used to switch the operation (online/offline) of the mobile application. Preferences Manager is used to read and write data from persistent storage during activation and deactivation of mobile application. Upload Manager sends the requested file to synchronize the server node. Download Manager utilizes the service of download manager and upload of the distributed application process. Mobile Augmentation Cloud Service (MACS): It is service-based computing. Android application use MACS which is benefits for offloading a part of the application in the local or remote cloud. The execution of the elastic mobile application can be done by MACS. MACS contain application code which is not used to offload multiple services. MACS use two case phone applications. First is N-queen problem and second is face detection and video files. Depend on network connectivity i.e. Wi-Fi the file storing and retrieving is done. MAUI: This framework is based on fine-grained offloading to mobile cloud. MAUI has the advantage of combing the file and provides the appropriate result. By using programming MAUI is used to identify the method which can be remotely uploaded. MAUI method is used to search applications and used to define network connection. Cuckoo: This framework is applied to the Android applications that use an activity/service model in developing. The cuckoo framework is used to offload services directly to the cloud. Cuckoo application is a simple and static and context-aware system. It is used to provide security from malicious attacks for applications. Phone2Cloud: The Phone2Cloud framework is used to offload the application data to the cloud. It uses an algorithm for the offloading decision. This algorithm uses the network bandwidth, CPU workload, and power consumption in making a decision.

4 Topic Initiative The existing system is used to give framework operation performances depending upon the behavior and methods used for offloading. The framework that uses offloading to offload a task of mobile application and some of the drawbacks are • • • •

The data transfer from mobile to cloud which requires higher bandwidth. The partitioning application of mobile cloud requires high system availability. MCC must able to solve heterogeneous computing resources. Sometimes, it is difficult to connect mobile devices to signals because of traffic congestions. • In saving energy offloading is not always effective.

76

S. M. Bagban and H. A. Tirmare

Wireless Connection Mobile Device

Cloud Application Core

Estimator

Cloud Methods

Profiler

Network Bandwidth

Decision Maker

Mobile Manager

Cloud Manager

Fig. 1 Basic system architecture

5 Proposed Work The below diagram shows how the data been used for transferring from mobile to cloud storage with wireless connections. The decision-maker is to determine based on two modules first the network and bandwidth which is used to check the network status of the smartphone and the current bandwidth value. The second module is a profile which is used to give information about method name, input size, execution time, memory usage. Decision-maker is used to decide whether to upload data locally or remotely. The remote executions take place in cloud storage. Where the cloud manager is used to manage the execution on the cloud. An AES algorithm is used for security purposes (Fig. 1).

6 Conclusion The offloading technique is used to offload a data remotely on cloud and locally on a mobile device for storage. Various application and frameworks are used for transferring data. By storing the data directly in the cloud we are able to reduce memory storage in mobile device, execution time and storage. Hence for offloading the data on the cloud we require high bandwidth and network connectivity. So, the data transferring process can be done where there is a network connection.

8 A Secured Framework for Cloud Computing

77

References 1. Kovache D, Yu T, Klamma R (2012) Proposed a paper adaptive computation offloading from mobile devices into cloud. In: 2012 10th IEEE international symposium on parallel distributed processing with application 2. Khan AUR, Othman M, Madani SA (2013) A survey of mobile cloud computing application models. In: 2013 IEEE communications survey & tutorials 3. Chun BG, Ihm S, Patil A, Maniatis P, Naik M (2011) Clone cloud: elastic execution between mobile device and cloud. In: Conference on computer system, pp 301–314 4. Shiraz M, Gani A, Khokhar RH (2013) A review on distributed application processing frameworks in smart mobile devices for mobile cloud computing. IEEE Commun Surv Tuts 15(3):1294–1313 (Third Quater 2013) 5. Kosta S, Aucinas A, Hui P, Mortier R, Zang X (2012) Think air: Dyna resource allocation and parallel execution in the cloud for mobile code offloading. In: IEEE INFOCOM, pp 945–953 6. Zhou B, Dastjerdi AV, Calheiros RN, Srirama SN, Buyya R (2015) A context sensitive offloading scheme for mobile cloud computing service. In: 2015 IEEE 8th international conference on cloud computing

Chapter 9

A Secure Shoulder Surfing Resistant Hybrid Graphical User Authentication Scheme Shailja Varshney, Mohammad Sarosh Umar and Afrah Nazir

1 Introduction In the present days when online transactions have increased to an unprecedented level, protection from unauthorized access to information becomes an essential part of network security. Traditionally there are three authentication processes, i.e., tokenbased, biometric and knowledge-based [1]. Textual Password is a widely used user authentication scheme that is based on knowledge-based authentication. Text password can be a combination of random numbers, characters, and special characters. This scheme provides strong security against a brute force attack, but it suffers from memorability issues and is exposed to brute force, dictionary, guessing, shoulder surfing attacks, etc. Graphical Password schemes were developed to overcome the problem in the text password scheme [2]. Studies have shown that pictures are memorized better than text strings by humans [3, 4]. According to a user study, users can easily recollect image-based password. User can set complex image passwords and easily recall those pictures after a long time. However, most of these image-based password schemes suffer from shoulder surfing attack and brute force attack. Shoulder surfing attack can be occurred by direct observation or by capturing video with the help of an external camera [5]. In this paper, a hybrid graphical authentication scheme is discussed that is secure from a dictionary attack, brute force attack, and shoulder surfing attack. This authentication scheme is a combination of graphical and text password schemes. In this S. Varshney · M. S. Umar (B) · A. Nazir Department of Computer Engineering, Zakir Hussain College of Engineering and Technology, Aligarh Muslim University, Aligarh, India e-mail: [email protected] S. Varshney e-mail: [email protected] A. Nazir e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_9

79

80

S. Varshney et al.

scheme, images are combined with dynamic graphics. This proposed scheme is vigorous, indelible, and resistant to security attacks. In this paper, Sect. 2 describes the details of the background. Section 3 describes the functionality of the proposed scheme. Section 4 describes protection from security attacks. Section 5 describes the results and user study. Section 6 contains the conclusion.

2 Background Knowledge-based authentication means “you know something” by which the system can identify you, for example PIN, password, etc. Knowledge-based authentication scheme is divided into two schemes text-based password scheme and graphical password scheme. Graphical password scheme is further divided into four groups: Cognometric, Drawmetric, Locimetric and Hybrid scheme [6, 7]. A. Cognometric Scheme Cognometric is also known as a recognition-based scheme. This scheme basically deals with distinguishing images from the image portfolio in the sequence that is given at registration time [8]. Dhamija and Perrig proposed a scheme in 2000, based on Cognometric scheme [9]. In this authentication scheme, the user selects images from the set of random images at registration time. At login time the user has to identify preselected images from the set of distracter images. This scheme suffers from shoulder surfing attack. Brostoff et al. proposed PassFaces technique. In this scheme, the user has to choose the same face images as a password at registration time. In the authentication process, the user has to click on preselected images. This scheme suffered from spyware attack and shoulder surfing attacks [10]. M. Sarosh Umar and Qasim Rafiq proposed Select-to-Spawn technique. This graphical authentication scheme allows the user to create a password at registration time. In this scheme single image is divided into 4 × 4 grid cells and the user has to choose the grid cell from the image. Here images are spawn into new images at each level. At the login time, the user has to recognize that images at each level in the registered sequence [4]. B. Drawmetric Scheme Drawmetric Scheme is also known as a pure recall-based authentication scheme. In this scheme, the user creates their password by drawing something on a 2D grid. At login time the user has to reproduce that password on a 2D grid or in a blank canvas. Drawing could be one steady stroke or multiple strokes i.e. separated by “pen-ups”. Ziran Zheng et al. proposed a stroke-based textual password authentication scheme. This scheme is based on recall-based authentication method. In this scheme, the user has to draw some shape or stroke on the grid and enter characters at the time of authentication as a password [11].

9 A Secure Shoulder Surfing Resistant Hybrid Graphical …

81

C. Locimetric Scheme Locimetric Scheme is also known as Cued-Recall System. In this scheme, the user creates a password by choosing random locations in the image. At login time, the user has to click on the preselected location of the image in the sequence to enter the password [12]. Susan Wiedenbeck et al. proposed a scheme. In this scheme, the user can snick on any places in the picture to create a password in a specific order [13, 14]. D. Hybrid Scheme Hybrid scheme is the collection of two or more password schemes. Swaleha Saeed et al. proposed a scheme that is based on a hybrid password authentication scheme. This scheme is the integration of recognition based scheme and dynamic graphics. At the login time, color objects are associated with images [15]. Gao et al. proposed a hybrid scheme named “Passhands”. This scheme is the combination of recognition based schemes and biometric technique. In this scheme, they processed the palm images of the human [16].

3 Proposed Scheme A. Registration Phase In the registration phase, the user has to register with their details and creates their password with images as shown in Fig. 1. The user also creates images in the registration phase. Image details are associated with a playing card. With the help of these image details, the password strength of this scheme is increased and resist to

Fig. 1 Registration screen

82

S. Varshney et al.

brute force attack. In this scheme, there is no need to store a large number of images on the device. After clicking on the submit button, details of the user are stored in the database. B. Login Phase Login phase is responsible for finding the correct user. In this phase, the user has to recognize their password images from the dummy images portfolio. Login phase is divided into two parts. The first one is for graphical authentication, second is textual authentication. In the graphical authentication phase, there are 16 images. All these images are combined with their background color, i.e., changed continuously. User has to recognize their password image with the correct background color. Random assignment of background color on each image enhances the security of the system. In phase 2 i.e. the textual password phase, a security question is given that is previously registered. If the user wants to update their security question and answer, he can do it any time after the successful login of phase 1 and phase 2. In the proposed work, 12 background colors are there. This background color is assigned to each card and colors move on each card randomly. At the login phase 1 (graphical authentication phase) 8 and 2 grid is used as shown in Fig. 2. The grid contains password images and 15 dummy images. Background color changes per second. User has to instantly click on the button when they get password image with correct color in the correct order. This process is repeated four times. After successful identification of password images, login phase 2 proceeds as shown in Fig. 3. In this phase, a security question and answer must be entered as given previously. In login phase 1, 12 background colors are used. These colors are repeated in a cyclic manner, so background color is repeated on the image on the 13th second. If the user recollects all the password images correctly, then login phase 2 will appear. In login phase 2, the user has to choose the correct security question and enter the

Fig. 2 Login screen phase 1

9 A Secure Shoulder Surfing Resistant Hybrid Graphical …

83

Fig. 3 Login screen phase 2

respected answer the same as given previously. This scheme is secure from shoulder surfing attack, dictionary attack, and brute force attack.

4 Security Analysis 4.1 Password Space Password space is described as the combination of all viable passwords that can be applied in the authentication system. The propensity of a knowledge-based authentication system to withstand the brute force attack depends on the password space. Brute force attacks can be resisted effectively if password space is increased. In the textual password scheme, the password space is 95N , where N is the length of the password and 95 is the printable characters. If 8 is the maximum length of the password, then the password space of this scheme is approx. 6.6 * 10(15) . Password Space in the proposed scheme is described below. 4 

(C × B × T)N ∗ (95)15

N=1

where, C B T N

is the total number of images in phase 1. is a different background color. is the time hold for each login screen. is the total number of password images.

In login phase 2, 95 is total printable characters, and 12 is the length of the text password.

84

S. Varshney et al.

Table 1 Password entropy comparison Scheme name

Formula

Entropy (in bits)

Text-based scheme

6 * log2(95)

39.41

PassFaces (4 runs, 9 images)

4 * log2(9)

12.67

Object-based graphical user authentication

5 * log2(16 × 16 × 5)

51.6

Proposed scheme

4 * log2(52 × 16 × 12)

53.14

4.2 Password Entropy Password entropy predicts, how much difficult to crack the given password with guessing attack, brute force attack, and dictionary attack. Password entropy can be calculated with the help of characters in password or password length. Password entropy for the graphical password scheme can be calculated with the formula given below: N ∗ log2(|L||O||C|) In the above formula, N is the length of the password. L is the set of all locations or locus alphabet. C is the color alphabets. O is object alphabets. In the proposed scheme N is 4, because the graphical password has 4 images. L is total images used in the scheme, i.e. 52, O is object images, i.e., 16 and C is background colors, i.e., 12. Password entropy of proposed scheme is 53.14 bits for login phase 1. Password entropy for login phase 2, 8 * log2(95), i.e. 52.5 bits. In Table 1, the entropy of various schemes is compared with the proposed scheme entropy: Table 1 shows that the password entropy of the proposed scheme is larger than other schemes. So, it can be concluded that it is hard for attackers to crack the password by random guessing.

4.3 Secure from Attacks a. Shoulder Surfing Attack Analysis To analyze the shoulder surfing attack, team of shoulder surfers was created. They were allowed to look at the password from a short distance. Ten users were invited to perform the authentication five times in the presence of shoulder surfers. Attractively out of ten users, shoulder surfers were able to guess the password of only one user. If here we calculate the probability of a successful shoulder surfing attack, then we will get 1/50 i.e. 0.02. Because time factor is included in this scheme, so it is difficult for the shoulder surfer to find the time at which user click on the submit button, because it is different in every session. Background colors associated with

9 A Secure Shoulder Surfing Resistant Hybrid Graphical …

85

Table 2 Password space comparison Technique name

N=2

N=3

N=4

Graphical authentication scheme

5929

456,533

3.51 * 107

PassPoints

139,129

5.1 * 107

1.9 * 109

Object-based graphical user authentication Proposed scheme

3.6 *

108

8.5 *

1010

7.0 *

1012

1.3 * 1017

2.4 *

1016

7.25 * 1021

images change at every second, so simple screenshot will not suffice to determine the time instant. A combination of dynamic graphics and random behavior makes the proposed scheme highly resistive against shoulder surfing attack. b. Brute Force Attack Analysis Password space for the proposed scheme can be calculated with the formula (1), where C = 16, B = 12, T = 16. Table 2 shows the comparison of the password space of various authentication schemes. Table 2 shows that the password space of the proposed scheme is larger than other graphical authentication schemes. So, it can be concluded that the proposed scheme provides better resistance to brute force attack.

5 User Study This section describes the result of our scheme. To demonstrate the performance of the proposed scheme we have done a short user study and evaluated two performance metrics. There were 20 users in the analysis process. A. Memorability Analysis Firstly, training is required to memorize the password in this scheme. This training is done when users have registered in the system, and login to the system the first time. To analyze the password memorability, we asked the experimental participants to login to their account one week later. If the number of retries exceeded 5 times, that login attempt marked as a failure. B. Feasibility Study Feasibility study is done to analyze how the system should be easy to use by users and even for beginners. To carry out feasibility study, we have recorded registration time and login time of each user. Login time can be defined as the duration when the server receives a request and then the server gives its response. Here login time includes the time when the user gives his/her username, cracks login phase 1 and login phase 2. In Fig. 4, a graph is given that represents the login time of ten users. Here login time is taken in seconds. This is the time taken by users to login to phase 1 and phase 2. It can be concluded from the graph that the average login time is 6.17 s.

86

S. Varshney et al.

8 6 Login Ɵme…

4 2 0

U1

U3

U5

U7

U9

Fig. 4 Login time graph

6 Conclusion This paper proposed a hybrid graphical user authentication technique that is secure from shoulder surfing attack, dictionary attack, and brute force attack. This scheme is a combination of knowledge-based authentication and dynamic graphics. The security analysis proves that this scheme effectively resists shoulder surfing attack because authentication not only relies on click-on-button but also on click-on-button at the correct instant. This feature enhances security without sacrificing the usability aspect. This authentication scheme can be used in public places, ATMs, access control, etc.

References 1. Fatima R, Siddiqui N, Umar MS, Khan MH (2019) A novel text-based user authentication scheme using pseudo-dynamic password. In: Information and communication technology for competitive strategies. Springer, Singapore, pp 177–186 2. Lashkari AH, Manaf AA, Masrom M, Daud SM (2011) Security evaluation for graphical password. In: DICTAP 2011, Part I, CCIS 166, pp 431–444 3. Eljetlawi AM (2008) Study and develop a new graphical password. A project report in M. Tech in University Teknologi Malaysia, Nov 2008 4. Umar MS, Rafiq MQ (2012) Select-to-spawn: a novel recognition-based graphical user authentication scheme. In: International conference on signal processing, computing and control. IEEE, June 2012 5. Ramanan S, Bindhu JS (2014) A survey on different graphical password authentication techniques. Int J innov Res Comput Commun Eng 2(12) 6. Suo X, Zhu Y, Owen GS (2005) Graphical passwords: a survey. In: 21st annual computer security applications conference (ACSAC’05). IEEE, Dec 2005 7. Karode A, Mistry S, Chavan S (2013) Graphical password authentication system. Int J Eng Res Technol (IJERT) 2(9) 8. Zaheer Z, Khan A, Umar MS, Khan MH (2019) One-tip secure: next-gen of text-based password. In: Information and communication technology for competitive strategies. Springer, Singapore, pp 235–243

9 A Secure Shoulder Surfing Resistant Hybrid Graphical …

87

9. Dhamija R, Perrig A (2000) Déjà Vu: a user study using images for authentication. In: 9th USENIX security symposium 10. Brostoff S, Sasse M (2000) Are passfaces more usable than passwords? a field trial investigation, pp 405–424 11. Zheng Z, Liu X, Yin L, Liu Z (2009) A stroke-based textual password authentication scheme. In: First international workshop on education technology and computer science. IEEE, May 2009 12. Umar MS, Rafiq MQ, Ansari JA (2012) Graphical user authentication: a time interval based approach. In: International conference on signal processing, computing and control. IEEE, June 2012 13. Wiedenbeck S, Waters J, Birget JC, Brodskiy A, Memon N (2005) Design and longitudinal evaluation of a graphical password system. Int J Hum Compt Stud 63:102–127 14. Waters WJ, Sobrado L, Birget JC (2006) Design and evaluation of a shoulder-surfing resistant graphical password scheme. In: Proceedings of the working conference on advanced visual intefaces, pp 177–184 15. Saeed S, Umar MS (2015) A hybrid graphical user authentication scheme. In: Communication, control and intelligent systems (CCIS). IEEE, Nov 2015 16. Gao H, Ma L, Qiu J, Liu X (2011) Exploration of a hand-based graphical password scheme. In: 4th international conference on Security of information and networks, pp 143–150

Chapter 10

Origin Identification of a Rumor in Social Network Sushila Shelke and Vahida Attar

1 Introduction Nowadays, people connect to a variety of social networks like Twitter, Facebook, and Reddit, which rapidly leads to a load of data dissemination. Due to active updates from the social network, it is a regular resource for dispersing leaning discussions and hot topics, which may involve unverified facts regarding events, or incidences that became known in the globe. According to the analytical study of a social network, there will be globally 3.02 billion monthly live users, roughly 33% of the Earth’s all populations by the year 2021 [1]. The rise in the connections of network uncovers a wide number of threats like viruses, rumors with a severe result [2], and negative effects on individuals and society. To control the circulation of rumors is vital and demanding for individuals, organizations, government agencies, election commission, etc., wherever there is a need for finding the source. Distinguishing fast and precise origin of rumors in a social network is not a simple work over complex dissemination process, continuous and evolutionary progress in the network. In the process of origin or source detection, there are three main modules, such as a diffusion model, likelihood estimator, and performance metric. In recent work, many researchers have utilized snapshot-based observation, which considers various snaps of the network at different time slots and monitor-based observations in which they utilized a few nodes as monitor nodes which keep track of the arrival of rumors [3, 4]. The time required in snapshot-based observation depends on the number of snapshots. Therefore, researchers move their attention toward monitorbased approach. Identifying the fast and accurate source of the rumor is necessary to control the diffusion in less time. S. Shelke (B) · V. Attar Department of Computer Engineering and IT, College of Engineering Pune, Pune, India e-mail: [email protected] V. Attar e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_10

89

90

S. Shelke and V. Attar

In the monitor-based approach, accuracy depends on a number of monitor nodes, which are selected randomly [5], nodes having the highest betweenness centrality [6], etc. The less number of monitor nodes have been selected for source detection in earlier work [7, 8]. In this approach, a few nodes need to focus on origin identification, however, selection of monitor nodes and collecting the input from all the observer nodes is again a difficult task for large complicated networks and needs more computation time. In recent work, the random propagation delay for each edge is assumed as a Gaussian distribution [6] and an exponential distribution [9], whereas in the proposed model we use a progressive approach for delay during rumor spreading process. In the proposed method, our main focus is to limit the search space for finding the source in a large social network. This model follows a discrete-time susceptibleinfected model for rumor diffusion in the network. In this model, a whole network is divided into different partitions and then candidate partition (i.e., partition, where the source node is likely to be present) is identified based on the smallest arrival time of the rumor. In the candidate partition, the origin of the rumor is identified by applying maximum likelihood estimator (MLE) which is adopted from [5] and the nodes having higher betweenness centrality are selected as an observer node for the real-world network.

2 Related Work The dispersal of rumor in a network creates numerous risks like incorrect decisions in catastrophic situations and hurting the fame of individuals or an association. The spread of rumor in a network can be constrained by early discovery of rumors and recognizable proof of rumor source. Due to its wide span, recent decades saw large upgrades in origin detection techniques in different areas like a virus in computer network [10], gas leakage in wireless sensor network [11], propagation sources in complicated networks [12], and source of a rumor in a social network [13] which are by implication related to origin detection of a rumor. In the process of source detection, various aspects like Network Structure, Diffusion Models, Centrality Measures, and Evaluation Metrics need to be considered are studied in [12, 14]. In recent work, a substantial amount of work has been done for source detection using multiple observations of networks and using selected observer nodes. The computation time for processing multiple snapshots is more as compared to monitor nodes. In monitor-based approaches, majorly used diffusion models are Susceptible-Infected (SI) [6, 7], Susceptible-Infected-Recoverd (SIR) [4], and Independent Cascade (IC) [13]. Pinto et al. [5] proposed a strategy where observer nodes keep track of posts arrival time and also, they believe that the rumor diffuses along with the BFS tree and gathering the data from all the observer nodes is time-consuming. Paluch et al. [7] ignores the observer nodes with less quality information and selects only vital sources with maximum likelihood, which improves a method of origin detection as a contrast to [5] using the SI model. Xu and Chen

10 Origin Identification of a Rumor in Social Network

91

[13] utilized a dynamic IC model and rumor quantifier metric to locate the origin of a rumor. In this, the precision of source detection relies on the number of observer nodes. Jiang et al. [4] proposed a method for temporal network by utilizing the SIR diffusion model and outlined a novel MLE to ensure the dynamic evolutions in the network. They derived that monitor-based examination gives good accuracy for origin detection. The proposed method was extracted from [6, 7]. In [7], they consider only the nearest observers with minimum arrival time and discrete-time SI model with the Gaussian distribution for propagation delay, whereas in the proposed approach, observer nodes are selected based on their betweenness centrality. Louni and Subbalakshmi [6] utilized a continuous-time SI model on a weighted graph and Louvain’s method [15] for partitioning the graph into clusters. They proposed the algorithm in two stages. They use multiple instances of the network, then partition the network and find out the candidate cluster where the source node belongs. They applied approximately similar MLE to find the candidate cluster and estimated the source on different instances of the graph. Planned work is dissimilar than [6] as we are finding the candidate partition using the node who infects the vertex with minimum arrival time in a partition-connected graph and also we used a single snapshot of the graph to reduce the computations of multiple instances. They follow a Gaussian distribution for the random propagation delay, whereas our approach focuses on progressive delay in rumor propagation. The accuracy of detecting candidate partition is 100% and it shows the 0–2 hops distance for source estimation in a synthetic network and 0–4 hops in a real-world network.

3 Methodology In the proposed model, we have focused toward reducing the search space by partitioning the network and selecting candidate partition based on the smallest arrival time of rumor. We consider the undirected graph where the weights coupled for every edge are uniformly assigned between 0 and 1. The origin of the rumor is unknown in the network, therefore discrete-time SI diffusion model is recognized to spread the rumor with the assumption that rumor originates from a single source.

3.1 Diffusion Model In the discrete-time SI diffusion model, each vertex will have possibly one of the two statuses as susceptible, i.e., the node where at least one of its neighbor has acquired the rumor and infected, i.e., the node who received the rumor. The already infected node u can infect all its susceptible neighbors with the equal infection rate β. The propagation delay for each edge follows the progressive approach. The average propagation time for each edge is mean = 1/β and variance is var = (1 − β)/β 2 . In

92

S. Shelke and V. Attar

the diffusion model, actual source is selected randomly all over the network at time 0, then the source node will infect all its neighbors with the rate β and constant delay for each neighbor. The propagation delay for each neighbor is the same because in a real network like Twitter, when user tweets a message, it is accessible to all the followers of a user. At each time step, the infected node going to infect its susceptible neighbors with the rate β and increment the propagation delay by one. Therefore, the nodes with late arrival of rumor can be discriminated by early infected nodes.

3.2 Candidate Partition To reduce the search space, the candidate partition is identified. The Louvains partitioning method [15] is used to partition the infected graph. The rumors are getting viral rapidly in the initial stage, i.e., within minutes and hours [16]. With this assumption, the node with the smallest infection time is the one, which got infected at the early stage, considered for identifying candidate partition. For example, let G be the network with different partitions and Gp be the partition-connected graph, then identify the node w from Gp with minimum infection time. Determine the node x who infects w and detect the partition P where node x present, then partition P is considered as candidate partition.

3.3 Origin Identification of a Rumor The proposed approach works in two phases. The infected graph diffused by SI model, along with weights and infection time is given as input to Phase-I which will identify the candidate partition and in Phase-II, the origin or source is identified on candidate partition. We have utilized the MLE which was used with Gaussian-joint diffusion in [5] shown in (1). In this MLE, s is a node, d k is the observed delay vector (2), μs is the deterministic delay vector (3), and |SP(u,v)| is the length of shortest path between vertex u and v, Λs is the covariance matrix of delays (4) where k is the number of observers.   exp − 21 (d − μs )T −1 s (d − μs ) (1) score (s) = 1 |s | 2 where [d ]k = tk − t1

(2)

[μs ]k = μ · (|SP(s, ok+1 )| − |SP(s, o1 )|)

(3)

10 Origin Identification of a Rumor in Social Network

 [s ]k,i = σ 2

93

|SP(o1 , ok+1 | if k = i |SP(o1 , ok+1 ) ∩ SP(o1 , oi+1 )| if k = i

(4)

In Phase-I, the candidate node is identified by finding the node with minimum infection time from the partition-connected graph which gives the candidate partition as an output. In Phase-II, the MLE is applied on candidate partition to find the source. The observer nodes are selected based on higher betweenness centrality which are further sorted by their infection time. In this phase, the observed delay vector is evaluated using Eq. (2) with respect to infection time of the first observer with other observers. In the beginning, the first observer node is selected as temporary source node and the diffusion tree for the shortest path from its neighbors to all observers is constructed. The deterministic delay using Eq. (3), covariance matrix using Eq. (4) and score using Eq. (1) is evaluated for all nodes of the tree. The neighbor with maximum score is selected for the next iteration. This process continues until all neighbors have less score than the maximum score. Finally, the node with maximum score is considered as the estimated source. The accuracy of source detection is determined by finding the shortest path distance between the actual and estimated source node.

4 Experimental Study and Results We have tested the proposed model on a synthetic dataset of Erdos–Renyi (ER) random graphs [17] for different sizes from 200 to 2000. The ER model constructs arbitrary graphs with N nodes, and E edges have equal probability of edge creation. Further, we have tested our model on a real dataset of Facebook and Twitter [18] publicly available on [19]. The details of the datasets are shown in Table 1. The ER graphs are generated with probability 0.5 and the weights are uniformly assigned between 0 and 1. The experimental results are shown for various metrics as Distance Error (DE) which is the smallest distance of hops between the precise source and the source evaluated by the algorithm and Average Distance Error (ADE). Figure 1a shows the frequency of DE for various network sizes. The observers are selected based on higher betweenness centrality, where the density is 15% and the experiment runs for 10 times. It can be observed that 1 hop distance appears many times, also for N = 500, it shows 0 hop distance. The accuracy as an ADE is shown Table 1 Details of real-world datasets

Network

Facebook

Twitter

# Nodes

4039

81,306

# Edges

88,234

1,768,149

Diameter

8

7

94

S. Shelke and V. Attar

(a) Distance Error (DE)

(b) Average Distance Error (ADE)

Fig. 1 Distribution of DE on ER network with observer density = 15%

in Fig. 1b, which is tested for various observer selection methods: Degree, Betweenness, and Closeness centralities where betweenness centrality shows good accuracy; therefore, this method is utilized for real networks. The actual network to search the estimated source gets reduced to the size of a candidate partition as shown in Fig. 2 which depends on the size of the candidate partition where the search space gets reduced approximately by 79–80%. The execution time required for diffusion and detection of candidate partition is very less as compared to source estimation. The proposed model is tested on the real-world network of Facebook and Twitter as shown in Fig. 3 which shows the frequency distribution of DE on a real-world network. It shows DE within 0–2 hops in Facebook and 0–4 hops in Twitter network. Also, it shows good performance when observer density is 20% as shown in Fig. 3b.

Fig. 2 Actual network size against reduced network size (ER graph)

10 Origin Identification of a Rumor in Social Network

(a) Observer density=15 %

95

(b) Observer density=20%

Fig. 3 Distribution of Distance Error (DE) on real-world network

5 Conclusion The major focus of this paper is to reduce the search space for finding the origin of a rumor in the network. To reduce the search space, we have proposed a model which partitions the graph and then finds out the candidate partition by identifying the candidate node which infects the node with minimum arrival time in partitionconnected graph. Also, we have proposed a progressive approach for propagation delay. Through the experiment, it can be revealed that the progressive approach helps to identify the candidate partition faster, and that the betweenness centrality approach to select the observer nodes shows good accuracy. We have demonstrated the proposed model on a synthetic and real-world network. In a synthetic network, it shows accuracy of 0–2 hops distance, however, in a real-world network it shows 0–4 hops distance. In the future, we are planning to experiment with the joint probability distribution on diffusion model and more focus towards improving the accuracy. Also, planning to extend the model for identifying multiple sources.

References 1. Social Network Statistics. https://www.statista.com/topics/1164/socialnetworks/. Accessed May 2019 2. Moya I, Chica M, Sez-Lozano JL, Cordn (2017) An agent-based model for understanding the influence of the 11-M terrorist attacks on the 2004 Spanish elections. Knowl-Based Syst 123:200–216 3. Wang Z, Dong W, Zhang W, Tan CW (2014) Rumor source detection with multiple observations. ACM SIGMETRICS Perform Eval Rev 42:1–13 4. Jiang J, Wen S, Yu S, Xiang Y, Zhou W (2016) Rumor source identification in social networks with time-varying topology. IEEE Trans Depend Secur Comput 15(1). https://doi.org/10.1109/ TDSC.2016.2522436

96

S. Shelke and V. Attar

5. Pinto PC, Thiran P, Vetterli M (2012) Locating the source of diffusion in largescale networks. Phys Rev Lett 109:68702 6. Louni A, Subbalakshmi KP (2018) Who spread that rumor: finding the source of information in large online social networkswith probabilistically varying internode relationship strengths. IEEE Trans Comput Soc Syst 5:335–343. https://doi.org/10.1109/TCSS.2018.2801310 7. Paluch R, Lu X, Suchecki K, Szymaski BK, Hoyst JA (2018) Fast and accurate detection of spread source in large complex networks. Sci Rep 8:2508 8. Spinelli B, Celis LE, Thiran P (2016) Observer placement for source localization: the effect of budgets and transmission variance. In: 2016 54th annual Allerton conference on communication, control, and computing (Allerton), pp 743–751 9. Cai K, Xie H, Lui JCS (2018) Information spreading forensics via sequential dependent snapshots. IEEE/ACM Trans Netw 26:478–491 10. Shah D, Zaman T, Shah D, Zaman T (2010) Detecting sources of computer viruses in networks. In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems—SIGMETRICS 10. ACM Press, New York, USA, p 203 11. Shu L, Mukherjee M, Xu X, Wang K, Wu X (2016) A survey on gas leakage source detection and boundary tracking with wireless sensor networks. IEEE Access 4:1700–1715 12. Jiang J, Wen S, Yu S, Xiang Y, Zhou W (2017) Identifying propagation sources in networks: state-of-the-art and comparative studies. IEEE Commun Surv Tut 19:465–481. https://doi.org/ 10.1109/COMST.2016.2615098 13. Xu W, Chen H (2016) Scalable rumor source detection under independent cascade model in online social networks. In: Proceedings of 11th International Conference Mobile Ad-Hoc and Sensor Networks, MSN 2015, pp 236–242. https://doi.org/10.1109/MSN.2015.36 14. Shelke S, Attar V (2019) Source detection of rumor in social network–a review. Online Soc Netw Media 9:30–42 15. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008:P10008 16. Friggeri A, Adamic L, Eckles D, Cheng J (2014) Rumor cascades. In: Eighth international AAAI conference on weblogs and social media 17. Erds P, Rnyi A (1960) On the evolution of random graphs. Publ Math Inst Hungar Acad Sci 5:17–61 18. Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Advances in neural information processing systems, pp 539–547 19. Stanford Large Network Dataset Collection. https://snap.stanford.edu/data/

Chapter 11

Luminance [Y] Utility to Compact Color Video Neha Shammi Wahab

1 Introduction Rapidly the demand and use of digital media are increasing exponentially. The expensive resources required are scanty. There is no linear relationship between resources required and number of uses. Nowdays in this smart world everyone is fascinated by bright distinct unique color be it any medium. In general, use of technology especially mobile technology is not limited to educator or youngsters. Infact its spark has reached like wild flame to young babies of three or four till oldies for that reason the very expensive scanty resource called bandwidth has to be intelligently utilized which is the foremost requirement in this tech-savvy modern world. In our latest smartphone or android there is a provision for various effects like negative, sepia, and grayscale. Compression of color video is a solution for this challenge.

2 Luminance-Based Coding The color video selected is video scene clip. It is converted into YCbCr formed. Y is the luminance component (black and white information) and Cb Cr is the chrominance component (Color information). Only Y frames are considered for scene change detection. All the Y frames of video are split into the same size for several blocks. Consecutive frames Blockwise scenes change detection is done. The difference of two blocks is converted into binary form. Threshold TI is required for

N. S. Wahab (B) Department of E&T Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_11

97

98

N. S. Wahab

this conversion. After this Threshold T2 it required to detect whether the resultant block is responsible for scene change detection. The Eq. (1) is the individual block of frame. MXN is block size, I, j are horizontal and vertical indexes of the block frame. pijA ≡

N  M  1 Pij (x, y) M × N x=1 y=1

(1)

Equation (2) Is the bit (digital form) of pixels.  pijbit (x, y) ≡

1 pijA ≺ pij (x, y) 0 otherwise

(2)

Equation (3) is the executive for difference between two blocks of two consecutive frames DijS ≡

M  N    1 bit_prev pij (x, y) ⊗ pijbit_curr (x, y) M × N x=1 y=1

(3)

Equation (4) explains the candidate block greater than threshold is scene detected block (Fig. 1).  C Bij ≡

1 DijS  T2 0 otherwise

See Table 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11.

3 Experimental Result Sum of first block of first frame = 7514 Sum of second block of first frame = 7956 Sum of first block of second frame = 7956 Sum of second block of second frame = 8424 Average of first block of first frame = 208.72 Average of second block of first frame = 221 Average of first block of second frame = 221 Average of second block of second frame = 234

(4)

11 Luminance [Y] Utility to Compact Color Video

99

CURRENT FRAME

PREVIOUS FRAME

RGB-YCbCr CONVERSION

LUMINACE CODING

THRESHOLDING

SCENE CHANGE DETECTION

Fig. 1 Block diagram of paper Table 1 First block of first frame 104

156

156

156

156

156

234

234

234

234

156

234

234

234

234

156

234

234

234

234

156

234

234

234

234

156

234

234

234

234

Table 2 Second block of first frame 156

156

156

156

156

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

100

N. S. Wahab

Table 3 First block of second frame 156

234

234

234

234

156

234

234

234

234

156

234

234

234

234

156

234

234

234

234

156

234

234

234

234

156

234

234

234

234

Table 4 Second block of second frame 234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

234

Table 5 Bit codes of first block of first frame 0

0

0

0

0

0

1

1

1

1

0

1

1

1

1

0

1

1

1

1

0

1

1

1

1

0

1

1

1

1

Table 6 Bit codes of second block of first frame 0

0

0

0

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

Table 7 Bit code of first block of second frame 0

1

1

1

1

0

1

1

1

1

0

1

1

1

1

0

1

1

1

1

0

1

1

1

1

0

1

1

1

1

11 Luminance [Y] Utility to Compact Color Video

101

Table 8 Bit code of second block of second frame 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Table 9 XOR result of first block of both frames 0

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Table 10 XOR result of second block of both frames 0

0

0

0

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

Table 11 Luminance coding-based result

Name

Value

Video

Video scene CIP

Width

120

Height

160

Total frames selected

20

Block size

5×6

Total blocks

10,400

Scene change blocks

191

102

N. S. Wahab

4 Conclusion Luminance-based scene change detection deals with small block size of frame. Instead of handling frame, its block version gives more accurate and error-free result. To remove redundancy and irrelevance, instead of single frame, small block of frame is used for detection of scene change result is more efficient and precise. For a color video optimization, its basic black and white information in fact its roots are useful for compactness, conciseness of color video.

References 1. Kang SJ (2013) Adaptive Luminance coding based scene change detection for frame rate up conversion. IEEEE Transcation Consum Electron 59(2) 2. Agrawal R, Gupta S, Gude V (2012) A color video compression technique using key frames and low complexity color transfer. In: International conference on signal processing and communication spcom, 25 July 2012 3. Wahab NS. Color video compression with aid of key frames and shots. In: International conference on convergence of technology

Chapter 12

Apriori Algorithm and Decision Tree Classification Methods to Mine Educational Data for Evaluating Graduate Admissions to US Universities Pranav Manjunath and Kushal Naidu

1 Introduction Educational Data Mining (EDM) is an emerging discipline, concerned with developing methods for exploring unique types of data from educational settings [1]. Data mining algorithms are used extensively to discover insights that can be used to help students and educators to make pedagogical decisions. Two such mining techniques used in EDM are association rule mining and predictive classification models. Association rule mining is one of the major data mining techniques employed to assess strong relationships and correlation among items in transactional databases. Classification is a data mining function that assigns items in a collection to target classes. The main goal of the classification model is to accurately predict the target class for each record in the data. An association mining problem consists of two subparts (i) Frequent Itemsets and (ii) Association Rule. Apriori algorithm is used for mining frequent itemsets for Boolean association rules [1]. Apriori uses a ‘bottom-up’ and width search approach, where frequent subsets are extended one item at a time (candidate generation), and groups of candidates are tested against the data. Apriori discovers patterns with a frequency above the minimum support threshold. Apriori algorithm is extensively used in market analysis, decision support systems and financial forecast. One of the potential methods of solving this issue is by training a machine to accurately classify the data. Classification predictive modelling is the task of approximating a mapping function (f) from input variables (GRE score, TOEFL score, number of research papers) to the discrete output variable (possibility of getting admitted).

P. Manjunath (B) · K. Naidu PES Institute of Technology, Bengaluru, India e-mail: [email protected] K. Naidu e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_12

103

104

P. Manjunath and K. Naidu

The data is divided into two sets in which one of the sets, wherein the class distribution is known, is called the training set. The other set, in which the class distribution is unknown is called the test set. A classification algorithm usually performs two steps: induction and deduction. In the induction step, it makes use of the training set to induce a model—abstract knowledge representation. In the deduction step, the induced model is employed for classifying instances of the test set, whose class information is unknown. Unlike a majority of the classifiers, decision trees are an extremely comprehensible classification model since they can be easily represented in a graphical form and can also be represented as a set of classification rules. It is one of the most effective methods for data mining; they have been widely used in several disciplines [2]. Further, the advantage is that they can be expressed in natural language in the form of IF-THEN rule set, which is mutually exclusive and exhaustive for classification. They are often the preferred method [3] because they produce classification rules that are easy to interpret than other classification methods. The experimental results show that Classification and Regression Tree (CART) is the best algorithm for the classification of data. This is especially helpful in application domains in which understanding the reasons that lead to a certain prediction is equally or more important than the prediction itself. In this study, we use the Apriori algorithm to generate frequent itemsets to understand the overall attributes of a student who gets into a top-, mid- or lower ranked universities in the US. Following the Apriori algorithm, we employed the CART technique to predict the chances of a student getting into a university of the desired ranking. We used evaluation metrics such as confusion matrix, classification accuracy, precision, recall and F1 score to validate our model.

2 Review of Related Work EDM methods have been employed by several researchers to analyse data to evaluate student performance and/or predict admissions to a certain college. • Jha and Ragha [1] developed an improved Apriori algorithm for Educational Data Mining (EDM) using bottom-up approach along with a standard deviation functional model to mine frequent educational data patterns. • Mashat et al. [4] employed Apriori algorithm to generate association rules for undergraduate admissions to King Abdulaziz University (KAU). By taking attributes such as gender, high school study type, high school grade, area and application status, they were able to generate rules to conclude similar attributes of students who get admitted to or rejected from the college. In addition, they have employed decision tree classification, specifically ID3 algorithm, to provide an analytical view of the university admission system [5]. • Raut et al. used the Decision Tree algorithm to measure student performance and laid out recommendations for the future development of performance [6].

12 Apriori Algorithm and Decision Tree Classification Methods …

105

• Feng et al. [7] developed a model using the Self Organizing Map (SOM) neural network, association rule and Fayyad data mining model to establish a university admissions decision-making model in China. • Al-Radaideh et al. used a decision tree classification model to evaluate student data to study the main attributes that may affect student performance in courses [8]. • Kolo et al. used a decision tree model for predicting academic performance. They took the parameters such as student grade, status, gender, finance, motivation to build the predictive decision tree using SPSS predictive approach and the CHAID approach [9]. • Waters et al. [10] illustrate GRADE, a statistical machine learning system developed to support the work of the graduate admissions committee at the University of Texas at Austin Department of Computer Science (UTCS). GRADE uses historical admissions data to predict the likeliness of the committee to admit each new applicant.

3 Apriori Algorithm 3.1 Apriori Property Apriori Property states that all subsets of a frequent itemset must be frequent. In other words, if an itemset is infrequent, all its supersets will be infrequent. The Apriori algorithm in [11] is the general algorithm used to identify frequent itemsets.

3.2 Evaluation Metrics • Support: Support is a measure of how frequently the collection of items occur together as a percentage of all transactions (1). For example, the support of a frequent 1 itemset containing item A is Support{A} =

freq(A) N

(1)

where freq(A) = No. of transactions which contain item A and N = Total number of transactions. Similarly, for a 2 frequent itemset {A, B}, the support is calculated by (2): Support{A, B} =

freq(A, B) N

(2)

106

P. Manjunath and K. Naidu

Min Support (min_sup) is a threshold value assigned by the user. If the support of an itemset is below the min_sup, the itemset will be eliminated. • Confidence: Confidence of the association rule (A → B) is the ratio of the number of transactions that include both the antecedent and the consequent of the rule to the number of transactions that include the antecedent of the rule (3). Confidence{A → B} =

freq(AU B) freq(A)

(3)

Min Confidence (min_conf) is a threshold value assigned by the user.

4 Classification and Regression Trees 4.1 CART Algorithm Classification and Regression Trees or CART for short is a term used to refer to Decision Tree algorithms that can be used for classification or regression predictive modelling problems. The representation of the CART model is a binary tree. Each root node represents a single input variable (x) and a split point on that variable. The leaf nodes of the tree contain two output variables which are used to make a prediction. The complete algorithm is present in [12].

4.2 Evaluation Metrics • Confusion Matrix: A confusion matrix is a table that is often used to describe the performance of a classification model on a set of test data for which the true values are known (Table 1). True Positives (TP): These are cases in which the model predicted Positive and actual value belongs to Positive. True Negatives (TN): These are cases in which the model predicted Negative, and the actual value belongs to Negative. False Positives (FP): These are cases in which the model predicted Positive, but the actual value belongs to Negative. Table 1 Confusion matrix

Predicted values Actual values

Class 1

Class 2

Class 1

TP

FN

Class 2

FP

TN

12 Apriori Algorithm and Decision Tree Classification Methods …

107

False Negatives (FN): These are cases in which the model predicted Negative but the actual value belongs to Positive. • Classification Accuracy: Classification Accuracy (4) is the number of correct predictions made divided by the total number of predictions made. Accuracy =

TP + TN TP + TN + FP + FN

(4)

• Recall: Recall (5) is the number of positive predictions divided by the number of positive class values in the test data. It is also called Sensitivity or the True Positive Rate. A low precision can also indicate a large number of FP. Recall =

TP TP + FN

(5)

• Precision: Precision (6) is the number of positive predictions divided by the total number of positive class values predicted. It is also called the Positive Predictive Value (PPV). A low precision can also indicate a large number of False Positives. Precision =

TP TP + FP

(6)

• F1 Score: F1 (7) score conveys the balance between the precision and the recall. F1 Score =

2 ∗ Recall ∗ Precision Recall + Precision

(7)

5 Methodology 5.1 Data Collection and Analysis The dataset we used [13] is a 500-student record dataset that consists of eight major columns. An overview of the methodology is displayed as a block diagram in Fig. 1: • GRE Scores. GRE is a standardized test taken for testing verbal reasoning, quantitative reasoning and analytical writing skills. Best universities use these test scores as a metric to evaluate whether a student has the capabilities to succeed in their school. In the dataset, GRE scores are continuous variables ranging from 290 to 340. • TOEFL Scores. Continuous variables ranging from 92 to 120. • University Ranking. On a scale of 5, where 5 indicates the best college. • Statement of Purpose (SOP). Statement of Purpose is a student-written essay which tells the admission committee who you are, why you are applying, why you are

108

P. Manjunath and K. Naidu

Fig. 1 Block diagram of the methodology



• • •

a good candidate and what you want to do in the future. In this dataset, this is a measure of the strength of the SOP on a scale of 5, where 5 indicates extremely strong SOP. Letter of Recommendation (LOR). A Letter of Recommendation is a letter in which usually a mentor evaluates the skills, work habits and achievements of an individual applying for a job, for admission to graduate school or for some other professional position. In the dataset, this is a measure of the strength of the LOR on a scale of 5, where 5 indicates extremely strong SOP. CGPA. This attribute discusses the CGPA of students during undergraduation. CGPA in the data ranges from 6.8 to 9.92. Research Experience. This attribute is a binary value. 1 indicating the candidate has some experience in research and 0 indicating no research experience. Chance of Admit. This dataset contains values between 0.34 and 0.97. A value of 0.97 indicates a 97% chance of getting admissions from the college.

5.2 Data Preprocessing We separated the data on the Chance of Admit. If the Chance of Admit is >0.72, the student is accepted into the college. We used the value 0.72 to attain an entropy of close to 0.5 before we begin to apply algorithms to determine the frequent sets and the induced rules to ensure there is no informational bias introduced into the algorithms. As the Apriori algorithm works only on discrete values, we converted values that were continuous, such as the scores in GRE and TOEFL. To convert these continuous variables, we used a conversion coding scheme that binned the continuous values into discrete groups of data. Table 2 summarizes the conversion coding structure we

12 Apriori Algorithm and Decision Tree Classification Methods … Table 2 Conversion coding structure

109

Attribute

Original values

Converted values

GRE scores

320–340

‘GRE-1’

300–319

‘GRE-2’

TOEFL scores

CGPA

= 9 AND GRE >=320 AND RESEARCH =1 AND TOEFL >=110 THEN ‘ACCEPTED’. IF CGPA =3 AND CGPA >=9 AND LOR >=4 THEN ‘ACCEPTED’ IF GRE counter then Replace csi with a randomly produced food source using Eq. (1);

Memorize the best food source achieved so far; until if condition is meet;

5 Simulation We demonstrate the effectiveness and the performance of the proposed ABC-based algorithm for clustering and validation of clustering result with various sensor sets. The developed method is simulated in MATLAB; the scenario consists of 150 sensor nodes generated randomly and distributed to cover the area of 30 × 30 m. The sensor nodes in this scenario can sense sending and receiving with other nodes in the application. The other parameters set for this algorithm are presented in Table 1.

13 ABC-Based Algorithm for Clustering and Validating WSNs Table 1 Network parameters

123

Parameters

Values

Distributed type

Random

Network size

30 m × 30 m

Number of sensor node

100,150, 200

Initial power energy

0.5 j

Packet size

1024 bits

5.1 Clustering Validation To compute the purity validation for each sensor set, we report the purity of cluster which is defined as Purity(Si , C j ) =

  1  max j  Sk ∩ C j  N K

(4)

5.2 Simulation Scenarios Scenario 1: Sensor nodes of 150 nodes with different locations are generated where the distance between sensor nodes is unequal and the position of these sensors are placed randomly to cover the network range size 30 × 30 m2 . The initial sensors set is plotted as depicted in Fig. 1. The grouping result of the ABC-based method is shown in Fig. 2 with the number K = 3.

Fig. 1 Initial sensors of 150 positions deployed network topology

124

A. M. Almajidi et al.

Fig. 2 Clustering result with three clusters of the ABC algorithm

Table 2 Clustering result and validation Sensor sets

Sensors

Features

Clusters

Validations

NW-100

100

2

3

97.7

NW-150

150

2

3

100

NW-200

200

2

4

98.7

The proposed ABC-based method is examined on three sensor sets and it is performed in clustering and validation of clustering result; we applied external validation (purity) for validating the grouping of sensor nodes in wireless networks for all sensor sets. The distributed sensors set were randomly produced to form the topology of the wireless network and the type of sensor nodes is stationary. We ran each simulation 20 times for the ABC-based algorithm. The validation in the matter of correctly grouped sensors of these methods are recorded in Table 2 (Fig. 3).

6 Conclusion In this work, Artificial Bee Colony is swarm intelligence and optimization technique, is used for clustering WSN and validation grouping result; we develop a metaheuristic ABC-based method that is applied to partition wireless networks into smaller subclusters which result in maximized lifetime of WSN and save the energy of the nodes by avoiding additional communication in the range of the network and minimize energy consumption during the activate. Simulation results demonstrate that, the proposed ABC-based method can effectively minimize data transmission, use less energy, improve sense raw-data collection, and prolong the wireless network lifetime.

13 ABC-Based Algorithm for Clustering and Validating WSNs

125

Fig. 3 Clustering validation

In the future direction, we plan to incorporate the ABC-based method with the SPC method and compare the ABC-based method with ABC-based SPC methods as part of a validation of clustering result and identify the dead sensor nodes in the field.

References 1. Mann PS, Singh S (2017) Improved artificial bee colony metaheuristic for energy-efficient clustering in wireless sensor networks. Artif Intell Rev 51:329–354 2. Almajidi AM, Pawar VP (2018) K-means-based method for clustering and validating wireless sensor network. In: International conference on innovative computing and communications (ICICC), vol 55, pp 251–258 3. Azharuddin M, Jana PK (2016) Particle swarm optimization for maximizing lifetime of wireless sensor networks. Comput Electr Eng 51:26–42 4. Almajidi AM, Pawar VP (2015) Machine learning approach for sensors validation and clustering. In: International conference on emerging research in electronics, computer science and technology (ICERECT), pp 370–375 5. Karaboga D, Okdem S, Ozturk C (2012) Clustering based wireless sensor network routing using artificial bee colony algorithm. Wirel Netw 1:847–860 6. Kim SS, McLoone S, Byeon JH, Lee S, Liu H (2017) Cognitively inspired artificial bee colony clustering for cognitive wireless sensor networks. Cogn Comput 7:207–224 7. Karaboga D, Gorkemli B, Ozturk C, Karaboga N (2012) A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif Intell Rev 42:21–57 8. Nagchoudhury P, Choudhary K (2014) Classification of swarm intelligence based clustering methods. Int J Comput Appl (IJCA) 91:28–33 9. Almajidi AM, Pawar VP (2019) A new system model for sensor node validation by using OPNET. Wirel Pers Commun 1–13

Chapter 14

Improving SPV-Based Cryptocurrency Wallet Adeela Faridi and Farheen Siddiqui

1 Introduction Cryptocurrency is the digital medium of exchange and it depends on the cryptography techniques to verify the transaction processes. Most cryptocurrencies, such as Bitcoin, are decentralized and consensus-based. Cryptocurrency transactions use underlying Blockchain technology which is a digitally signed financial ledger. Each and every transaction can be seen on the public ledger. Every transaction that is executed will be added to the Blockchain [1].

1.1 Fundamental Prerequisites of the Cryptocurrency Payment System 1. All exchanges ought to be made over the Internet (in view of permissionless/permissioned Blockchain standards) between partaking hubs/clients. 2. We do not have any desire to have one specialist that will procedure exchanges. 3. Users ought to be unsigned and perceived just by their virtual identity [1].

A. Faridi (B) · F. Siddiqui Jamia Hamdard, New Delhi, India e-mail: [email protected] F. Siddiqui e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_14

127

128

A. Faridi and F. Siddiqui

1.2 Decentralized Information Sharing Over Internet Satisfying the initial two prerequisites from our list (1) and (2), expelling a focal specialist for data trade over the Internet, is now possible. A distributed (P2P) organize is just required [1]. Information partaking in P2P systems is like data sharing among your dear companions. In the event that you share data with someone like that, in the end this data will achieve each other individual from the system [1]. The main distinction is that in the computerized system this data will not be adjusted in any capacity and you can actualize or utilize one of the current open-source P2P protocols to help the new digital currency.

1.3 Digital Signature When marking a paper, everything you do is annex your mark to the content of a report. A digital signature is comparative: you simply need to add your own information to the archive you are signing. It is here that the hashing calculations is connected to the (first information/cash + your own information) to make it carefully marked or digitally signed the report. Clearly, the HASH esteem made for the first record will be not quite the same as the HASH esteem made for the archive with the annexed mark. Fulfilling the (3) point: This is the manner by which we get to your virtual character, which is characterized as the individual information you added to the report before you made that HASH esteem. Figure 1 shows how the document is being signed and how its verification process is carried out [1]. It is shown in the figure that for signing a document, the person requires a private key and the moment

Fig. 1 Verification and Signature of document [1]

14 Improving SPV-Based Cryptocurrency Wallet

129

the document is being signed the verification of a digitally signed document is carried out. Next, the most ideal approach to ensure that your signature is verified (signature cannot be replicated, and nobody can execute any exchange for your sake), is to keep it to yourself, and give an alternate strategy (public-key cryptography otherwise called as asymmetric cryptography) for another person to approve the digitally signed record. For securing your private keys, we use encryption wallets because the private keys are the most important component of the transaction. Once the private key of a person is leaked, the attacker can easily steal the user’s digital currency that is Bitcoin. You cannot find the digital currency unless and until you got your private key back.

1.3.1

Hash Tree

In hash tree, the leaf node stores the hashes of the data blocks. As shown in Fig. 2, the adjacent hash values are combined to find the hash value of the string. This process will keep on going until we find the hash value of the root node of the hash tree [2]. This hash tree is used for verification of data that is already present in the system and to transfer the stored data between different systems.

1.4 Wallet Wallet can be used to check, store and make a transaction of any digital currency. The private key of a cold wallet can be stored offline so that it cannot get stolen by

Fig. 2 The structure of blockheader and hash tree [11]

130

A. Faridi and F. Siddiqui

attackers. If the private key is stored offline, then it is considered as more secure than the private key that is stored online because the attackers can easily get the private key of the user by using the Internet. Some of the hardware wallets—such as Ledger Nano S—have been highlighted because people at any time can make a secure transaction and virtually in any part of the world. Similar in case of hardware wallets, the private key is generated offline and will remain in the wallet unless and until a person loses it. As per the information, there are three different types of wallets which are central wallet, fullynode wallet and SPV wallet and out of these wallets, the SPV wallet is considered to be more secure and portable because there is no involvement of the third party.

1.4.1

Simplified Payment Verification (SPV)

To maintain the size of the database, SPV node stores the blockheader in the local databases. It provides the complete information of all the available nodes present in the Blockchain for the verification of the transaction. After this, it will calculate all the hash values and then compare these hash values with the real ones. There is a possibility that the information might get leaked because SPV nodes make use of the entire data for a single transaction from all the nodes present in a Blockchain. The paper is organized into the following sections: Section 1 speaks about some basics pertaining to cryptocurrency principles, components and its execution in different scenarios. In Sect. 2, the whole discussion has been centred around the evolution, history of Blockchain and its relationship with Bitcoin and other cryptocurrencies, also, awareness among the mass and government in order to drive Blockchain technology. Section 3 discusses the challenges of various cryptocurrency wallets. A detailed comparison can be seen in Sect. 4 among the available wallets and the proposed solution which has been shown feasible to implement and also improving and covering the loopholes of the existing wallets.

2 Literature Review Bitcoin was the first in its genre to rule the cryptocurrency world. So most of the research technology and background are based on this virtual currency. All throughout the 1990s, many different schemes were presented in order to remove the bank online at the time of purchase, allowing the currency to be partitioned into different units. None of the schemes like this tried to achieve significant deployment. One of the algorithm of Blockchain that is Proof-Of-Work was projected in the early 1990s which is being used for detecting spam mails but in actuality, it is not meant for these purposes. So the replacement of digital micropayments is the final HashCash. Another use of Proof-Of-Work is to examine the node that is being the most damaged one by multiple attacks. To control the supply of money, the auditable

14 Improving SPV-Based Cryptocurrency Wallet

131

e-cash was initiated by bank organizers in the late 1990s to overcome the problem of double-spending and to protect the validity of digital currency. B-money, brought forward in 1998, is considered to be the only system where we can broadcast all the transactions. B-money which is presented on cypherpunks holds the least attention from the academic research community. Smart contracts, which appeared in the early 1990s, enabled organizations to clearly particularize cryptographically enforcement agreement which signifies Bitcoin’s potential. Bitcoin as a currency was introduced as an operation in May 2010; from one of the sources, one customer ordered pizza and return 10,000 Bitcoins to another customer. After this, a number of dealers started using this service and accepted the use of Bitcoins for transactions which have risen the price of Bitcoins that is approximately US$ 1200 in late 2013 and 20 k$ in 2017. There was also a controversy about Bitcoins linked with some crime. FBI shut down the black market website Silk Road that was operated from February 2012 to October 2013. Botnets come to the conclusion that that Bitcoin is becoming a growing source of income in many fields. In 2014, a CryptoLocker which is a computer virus that withdraws millions of dollars from the users by asking for ransom in exchange of their encrypted files gained popularity. There are a number of victims whose Bitcoins have been stolen or lost due to these misshapen exchanges. However the newly introduced system had flaws which do not guarantee the fairness between the majority compliance and miners. Additionally, there were various big risks associated with this set of rules—DoS attack; some of the things are still under the radar as the assessment of consensus protocol to use. Nobody has any scientific model that can answer any questions about how Bitcoins work on different parameters in different circumstances. Till now, it is not possible to calculate the Bitcoins’ success with any other digital currencies. The current market capitalization of Bitcoin is more than 72 billion dollars, incorporating millions of users transacting per day. It becomes necessary to check users’ reviews about the virtual currency, Bitcoin and how people manage it. The research paper on “The other side of coin” [3] reveals the survey made on 990 Bitcoin users via Coin Management Tool coined by the narrator that it was still difficult to manage Bitcoin, lacked encryption and backup, many were unaware of the security features desired unburdening of managing Bitcoins to the third party. 22.5% of the users experienced security breaches and lost their Bitcoins [3]. So the first thing is to improve the usability of Bitcoins by suggestions to both expert and non-expert users and increasing the awareness among common users via various media platforms [3]. Bitcoin is getting the maximum attention and adoption as compared to other digital currencies. Distributed Time-stamping service is implemented by Bitcoins and it works on the P2P network that ensures all the Bitcoin users can view all the transaction details [4]. The transaction that is stored in Bitcoin blocks are broadcasted over the network; every block is connected to each other to form the Bitcoin Blockchain. Usually, Bitcoin installation requires disc space of more than 18 GB and it takes a lot of time to download and locally index the blocks and transactions that exist in a Blockchain. By the time the Bitcoins transaction volume is expected to go higher

132

A. Faridi and F. Siddiqui

which results in increased size of the Blockchain. With the growth of Bitcoins, the disc usage also goes higher which may affect the trust of Bitcoin clients. It should broadcast the correct blocks and transactions over the network. This is the major concern of people using mobile devices to verify or make payments [4]. So to solve this problem, the Bitcoin developer delivers a lightweight client which can be used to hold the SPV for devices such as smartphones and private servers. This SPV mode downloads only a portion of the Blockchain that is required for a particular transaction [4]. These channels insert every address utilized by SPV customers and are redistributed to all more dominant Bitcoin hubs; these hubs will at that spot forward to the SPV customers exchange, significant to their bloomfilters. Bloom filters bring about genuine protection in existing SPV customer executions. More specifically, we show that there are a number of addresses of SPV users that hold for about an address in a range of less than 20 that can be leaked by its own bloomfilter [4]. Moreover, it has been shown that mostly the user’s addresses are leaked because the opponent makes use of two different bloomfilter present at the same node, assuming that the above act can be responsible for information leakage which is going to be harmful to the user’s privacy. Another device like hardware wallets was proposed to handle Bitcoin transactions and was claimed that the payments were fast and secure. A wallet named BlueWallet created by the user’s system or mobile phones can be used in signing and authorizing different transactions [5]. Further to say, by giving the responsibility of unsigned transactions to another group such as Bluewallet which can be combined with pos and then can be used as an electronic wallet. Several privacy measures can be implemented; Bluewallet ensures all the transactions are made from a trusted origin.

2.1 Current Trends and Acceptance of Wallet Currencies Countries like Iran and Venezuela have started accepting the virtual currency; one of the states of the United States, Ohio, has adopted virtual currencies as a way to pay tax and on the other hand the remaining developed and developing nations have started barring the creation and use of cryptocurrencies due to security threats as the currency is created from null. In India, RBI has barred Indian banks to serve Bitcoin and cryptocurrency exchanges and multiple crackdowns were made by the government on cryptocurrencies like Bitcoin. RBI states that: Technological innovations including those underlying virtual currencies have the potential to improve the efficiency and inclusiveness of the financial system however virtual currencies also variously referred to as cryptos and cryptoassets raise concerns about the consumer protection market integrity and money laundering among others [6].

14 Improving SPV-Based Cryptocurrency Wallet

133

It had been seen in countries like India, China and South Korea that the ban on major cryptocurrencies has led to the rise of local cryptocurrency exchanges. Companies like Samsung have come forward and defended the cryptocurrency exchanges such as ETH in the form of the announcement of cryptocurrency wallet for its new flagship phone, the Galaxy S10. According to a report from CoinDesk Korea, the Samsung Blockchain Wallet at present is compatible only with ether (ETH) and ethereum-based ERC20 tokens. Bitcoin is not yet approved to be part of S10. The firm discovered the first essentials of the storage solution, the [6] “Blockchain Keystore,” a cold wallets, which appears to have three extensive properties: payments to merchants, digital signatures and cryptocurrency storage and transfers. Samsung Blockchain Wallet will be used in combination with Blockchain Keystore and is considered to make the transactions process simpler for newcomers to the technology, according to CoinDesk Korea. Through the supported CoinDuck dapp, clients can also make payments to merchants. Another tech giant Sony (NYSE: SNE) is accepting cryptocurrency and building new products for the space [6]. One of Sony Corporation’s subsidiaries, Sony CSL has previously announced that they developed a physical cryptocurrency hardware wallet using IC card technology and they also mentioned their plans to commercialize in the near future. Zebpay, an app-enabled cryptocurrency exchange, India’s first Bitcoin exchange was recently under the Indian government radar to stop transactions. Last year it issued a statement—“The curb on bank accounts has crippled our, and our customer’s, ability to transact business meaningfully. At this point, we are not capable to discover a reasonable way to perform the cryptocurrency exchange business. So the outcome is we are stopping our exchange activities” [7].

3 Challenges in Various Available Wallet Types In a recent research white paper “2018 Global Cryptocurrency Wallet Security White Paper” by Cheetah Mobile Blockchain Research Lab, they highlighted concerns over stored private keys in wallets [8]. They have advanced security suggestions for wallet clients, and delineated the security benchmarks to which all protected digital money wallets ought to follow. In another research paper on Journal of Internet Banking and Commerce— Blockchain: Bitcoin Wallet Cryptography Security, Challenges and Countermeasure—we found that though the base of Blockchain technology is strong and underlying technology is difficult to break through, the attackers have devised different ways of attacking the crypto [9]. Wallets of users to gain access to private keys via transferred info.while the user is completing and verifying the transaction. The research article “Blockchain Security in Cloud Computing: Use Cases, Challenges, and Solutions”—Symmetry: The private key that is stored in our system or mobile devices are being hacked by an attacker to stole or get an access to our digital

134

A. Faridi and F. Siddiqui

currency such as Bitcoin. There are various methods and ways to secure our private key but current researches are still not complete and are still in the development phase. There are various soft wallets which are more secure to use and it does not require any intermediate but it can end up taking a lot of space to store all the blocks in a Blockchain. The required space can be more than 150 GB [10] to store all the blocks. There are some soft wallets such as ArcBit and BitGo that do not take a large space in a Blockchain but end up using intermediate [10]. Electrum is one of the lightweight wallets that does not require and is intermediate to carry out the transaction but its privacy might be at risk [10]. Some devices with fewer hardware such as mobile phones and tabs can easily be carried out for the secure transaction by using SPV. It does not require to store all the blocks in the local storage, instead it stores only the header of a Blockchain [10]. When the transaction is going on, it checks the adjacent transaction data which is connected through the root node or the header. The attacker can still steal the private key of a user in the existing SPV wallet [10]. Secondly, it cannot guarantee the verification process if the original address or the blockheader is modified by an attacker. All in all, the major concern is to secure the private key of all the existing wallets to carry out the secure transaction without the involvement of a third party.

4 New Approach to Handle the Above Issues 4.1 Assumptions All the available information extracted from all the nodes of the Blockchain is trustworthy. There are some algorithms in Blockchain that claimed that the information extracted from all the nodes of a Blockchain is the information which is being recently added or is the newest one. Transactions carried out at both regions have a secure and trusted environment. We can consider this assumption is the valid one because nowadays each and every device is equipped with TrustZone. Some of the cryptographic algorithms, such as SHA-256, are safer to implement. This assumption can also be considered as a valid one.

4.2 What Can We Alter? (1) Make running environment partitioned: Divide the system into two different parts that would be hardware and software. These parts will be considered as secure and insecure one. Secure will cover the protected information such as authentication, biometrics and digital signature which are executed in a secure

14 Improving SPV-Based Cryptocurrency Wallet

(2)

(3)

(4)

(5)

135

environment. The rest of the task can be executed in the insecure environment that may include an operating system and different applications used by the users. The environment should have the ability to switch between secure and insecure zones via fast interrupt requests. Prefer software wallet which is more portable than hardware wallets (which have to be bought online like Amazon or different organizations). It does not get damaged and stolen and is affordable at nominal fees or none. Use of efficient filters to pass on only those information that is necessary and in turn protecting the user data. For the verification task, the SPV system requires all the nodes that exist in the Blockchain which display the entire data which may be sensitive [2]. By using the efficient filter it will require only the fewer information regarding the transaction process. By displaying the address privately to the user only will decrease the chances of their cryptocurrency getting hacked by the attacker. But the person should make sure the address is accurate and is from the official target. Finally, to make sure whether the transaction is secured or not, we need to run the verification process in a secure environment which is not dependent on the operating system [2]. If the operating system and the transaction processes are carried out in a different environment, then there will be more privacy of the blockheader.

4.3 Comparison Chart The comparison chart bears the advantages and disadvantages of the available cryptocurrency wallet types. The Table 1 shows the advantages of improved SPV Wallets over Soft Wallets, Hardware Wallets and SPV Wallets. This table also explains which wallet can be suited best in which conditions. Wallets that suit best under different conditions: 1. Hardware Wallet: • Operating system should not be infected. 2. Soft Wallet: • Operating system should not be infected. • Transaction should be done within private network from trusted devices. 3. SPV Wallet: • Operating system should not be infected.

136

A. Faridi and F. Siddiqui

Table 1 Improved SPV Wallets over Soft Wallets, Hardware Wallets and SPV Wallets Parameter

Hardware wallets

Soft wallets

SPV wallet

Improved SPV wallet

Space requirement

Sufficient space available

Space issues, cannot store all blocks: Other soft wallets do require less space but depend on trusted third party

Least

Least

Private key security

High protection, since the private key is offline

Very Less protection, private key easily gets exposed

Less protection, private key may get exposed during verification of transaction

High protection, private key the generation will occur in a secure zone hence unexposed

Destination address modification while making a transaction

Possible

Possible

Possible

Impossible, as it is part of secure zone/trusted zone

Information privacy

Privacy may get compromised

Compromised

Compromised

Privacy maintained by using filters

Attack on OS

May affect the transaction

May affect the transaction

May affect the transaction

Transaction execution does not get affected in secure/trusted zone

5 Conclusion The demand and investment in the cryptocurrency industry have been increasing day by day though there are some hurdles in its way to development as governments have still not accepted them as a mode of payment due to security concerns and its existence (cryptocurrency created from nothing). Another hurdle in its way is the maintenance of the servers consuming a lot of energy to drive and generate new currencies. The latter parameter is not under control but the former one can still be controlled. The paper concluded that it is possible to design a secure SPV wallet which can store the private key of the user and not provide the complete information about the transaction at the time of verification. It will not matter whether our operating system is malicious or not because in SPV the transaction is carried out in a secured platform which uses an appropriate filter for the verification process.

14 Improving SPV-Based Cryptocurrency Wallet

137

References 1. Thomas DS, Cryptocurrency for dummies: bitcoin and beyond. Lead Technical Editor @Portal 2. Dai W, Deng J, Wang Q, Zou CC, Jin H (2018) A secure blockchain lightweight wallet based on trustzone. IEEE 3. Krombhloz K, Judmayer A, Gusenbauer M, Weippl E, The other side of the coin. SBA Research, Vienna, Austria 4. Gervais A, Karame GO, Gruber D, Capkun S (2014) On the privacy provision of bloom filters in lightweight bitcoin clients. Louisiana, USA, New Orleans 5. Decker C (2016) Blue wallet: the secure bitcoin wallet, 7 September 6. Khatri Y (2019) Samsung unveils cryptocurrency wallet, dapps for galaxy S10 phone, March 11 7. https://www.businesswire.com/news/home/20180216005322/en/Cheetah-Mobile-ReleasesWhite-Paper-Global-Cryptocurrency 8. Latifa E-R, Ahemed EK, Mohamed EG, Omar A (2017) Blockchain: bitcoin wallet cryptography security, challenges and countermeasures. J Internet Bank Commer, December 9. Park JH, Park JH (2017) Blockchain security in cloud computing: use cases, challenges, and solutions. Basel, Switzerland 10. https://economictimes.indiatimes.com/small-biz/startups/newsbuzz/bitcoin-exchange-appzebpay-pulls-the-plug-following-policy-restrictions/articleshow/65992734.cms?from=mdr 11. https://image.slidesharecdn.com/nosqldatabases-slideshare-110227120448-phpapp01/95/ nosql-databases-why-what-and-when-91-728.jpg?cb=1298888093/ 12. Nakamoto S (2009) Bitcoin: a peer-to-peer electronicash system (2009) 13. Hassan IH (2017) Blockchain expenses-resources need to generate cryptocurrency. Eur Acad Res 8, November

Chapter 15

Modelling Fade Transition in a Video Using Texture Methods Jharna Majumdar, N. R. Giridhar and M. Aniketh

1 Introduction A video has a hierarchical structure, which when broken down to units have scenes, which is broken down to shots and shots are broken down to frames or images. A shot is an uninterrupted frame capture from a single video recorder. A video shot transition is a technique used to combine different shots in the film-making process to get a continuous flow in the video which is key to bring out certain emotions. These transitions may occur over a single frame or can occur over a number of frames. The transitions which occur over a single frame are called abrupt cut transition, whereas transitions which occur over a number of frames are called as gradual transitions. A gradual transition is of three types: fade, wipe and dissolve. A video shot transition detection also known as shot change detection is nothing but identifying changes in the scene content of a video sequence. The shot transition is the initial step for video segmentation, video summarization, video indexing and retrieval. Many techniques have been devised to detect an abrupt cut, but as the change of shot occurs over a sequence of frames in a soft transition, i.e. gradual transitions, detection of gradual transitions are not so easy and the algorithm becomes complex.Gajera and Mehta [1] in their paper have found gradual transition like Fade and Dissolve. The algorithm they have developed for Fade Transition calculated mean of DC image. Porter et al. [2] with the assistance of inter-frame coefficient and block-based movement estimation can locate the steady changes. This is done to track picture obstructs through the video sequence and to recognize changes caused by shot transitions. Truongt et al. [3] have presented enhanced algorithms for automatic fade and dissolve detection in video analysis. They have conceived new two-step J. Majumdar · N. R. Giridhar (B) · M. Aniketh Nitte Meenakshi Institute of Technology, Bangalore, India e-mail: [email protected] J. Majumdar e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_15

139

140

J. Majumdar et al.

calculations for fade and dissolve, and presented a technique for wiping out false positives from a rundown of distinguished applicant advances. Instead of selecting thresholds based on the traditional trial and error approach, robust adaptive thresholds are derived analytically from the mathematical models of transitions in Bansod et al. [4] have displayed a novel methodology for video shot detection, in light of the analysis of temporal slices which are extricated from the video by cutting through the sequence of video frames and gathering temporal signature. Song et al. [5] with the help from Columbia’s Consumer Video data set along with TRECVID 2014 Multimedia Event Detection data set have proposed an adaptive Support Vector Machine Model to extract the location of key segments from a video. Sun et al. [6] have presented a novel method for detecting cut transition and gradual transition. Their key idea employed is that a pixel in a frame usually has a pixel value close to it within its neighbourhood in the adjacent frame. It has a low computation complexity and it outperforms state-of-the-art methods. Fani et al. [7] have come up with an effective Shot Boundary Detection method for detecting both gradual and cut transitions. The candidate segment selection method is carried out by frame histogram and an adaptive threshold. This is done by the K-means classifier. To obtain discriminating feature vectors obtained from singular value decomposition along with some measures of differences are fed into a support vector machine. The gradient of a metric associated with each segment is evaluated to the boundary of the transition. Zhang et al. [8] have proposed an effective system for event recognition which is based on the semantic-visual knowledge base. From lexical database FrameNet and WordNet, event-centric concepts and the relationships between them are encoded. Later to learn the noise-resistant classifier a learning model is proposed. At last for the event representation, event-centric semantic concept is utilized. Zhu et al. [9] in their research work have given a unique tag to each shot, thus by solving video to shot tag problem. A Graph Sparse Group Lasso model is proposed to reconstruct visual features of test shots with that of training videos. From the learnt co-relations, a tagging rule is proposed. To build the model, constraints like temporalspatial knowledge, intra-group sparsity and intergroup sparsity are considered. Duan et al. [10] in their proposed work have made an unsupervised approach for identification of video shots. Dictionary of features are extracted from small patches in a video called video words. Appropriate video words are selected for modelling by Information Projection. In the identification of the category of video shot by treating the problem as unsupervised graph partition task with each vertex of the graph representing a video shot. Stochastic Cluster Sampling technique is used for graph partition. Tippaya et al. [11] have proposed a multi-model visual feature-based shot boundary detection. It analyses the discontinuity signal to learn the behaviours of videos. It uses candidate segment selection to collect probable shots and the cumulative moving average of the discontinuity signal to identify shot boundaries. The proposed algorithm was tested on Sport and Documentary Video data set and was found to give high recall and precision values for cut transition detection and significant lower values for gradual transitions. Fu et al. [12] presents a novel—two-step—framework for shot transition detection. First Local Linear Embedding is used to extract manifold features. Addition of virtual frames is done to ensure embeddings that do not

15 Modelling Fade Transition in a Video Using Texture Methods

141

collapse to a single point. KNN classifier is used for the classification of transition. In this research paper, the authors have defined a new approach to detect Fade Transition following the principles of Machine Learning. It consists of two phases, the Learning Phase and Identification Phase. In the Learning Phase, each frame of the video sequence is transformed into a texture domain and the response of histogram properties [13] are analysed along with their gradient descent sign. Based on these observations, a rule is framed from which a polynomial is generated for all the input data set. In the Identification Phase, an unknown video sequence is fed as input to the system. The equation obtained in the polynomial equation is used here to identify if the video belongs to a transition or not. The remainder of this paper is structured as follows—Sect. 2 discusses Fade Transitions that can occur in a video. Section 3 deals with the methodology employed for this work, followed by experimentations and evaluations in Sect. 4 and Sect. 5, respectively.

2 Shot Transition There are three types of gradual transitions: fade, dissolve and wipe. In this paper, we have focused our attention to Fade Transition. The methodology and the algorithm developed is given in the subsequent sections. Fade Transition: A transition to and from a blank image is Fade. This is in contrast to a cut where there is no such transition. A gradual change between a scene and a steady picture (Fade-out) or between a consistent picture and a scene (Fade-in) (Fig. 1).

3 Methodology 3.1 Texture Methods Image texture is characterized as a component of the spatial variation in pixel intensities (grey values). Image texture gives us data about the spatial arrangement of shading or greyscale distribution in an image or selected area of an image. In the current research work, we have made use of three texture methods namely Grey

Fig. 1 Fade transition

142

J. Majumdar et al.

Level Co-occurrence Matrix (GLCM) [14], Statistical method [15] and Laws Texture method [16]. These methods are used to transform the video into different texture domains. Details of these methods are discussed in the appendix.

3.2 Extraction of Texture Features 3.2.1

Grey Level Co-occurrence Matrix (GLCM) [14]

It is a standout amongst the most known texture investigation techniques. Evaluations of picture properties are identified with second-order statistics. A Co-occurrence matrix is used to describe the patterns of neighbouring pixels in an image at a given distance, d. Every section (i, j) in GLCM relates to the number of occurrences of the pair of grey levels i and j which are at a separation d in the original picture. Co-occurrence matrix describes pixels that are • adjacent to one another horizontally, P0, • vertical to one another horizontally, P90 and • diagonal to one another horizontally, P45 and P135. Haralick has proposed 14 features out of which we make use six features, i.e. energy, entropy, contrast, maximum probability, variance and homogeneity (Appendix 1). Figure 2 shows the six features.

3.2.2

Statistical Method [15]

The factual strategy estimates the coarseness and directionality of texture. In any case, the statistical technique portrays the shape and distribution of the entities. Its produces geometrical properties of connected regions in a sequence of binary images. NOC0 (mean), NOC0 (variance), NOC1 (mean) and NOC1 (variance) are the four different properties (Appendix 2). Figure 3 shows the Statistical Texture method features.

Fig. 2 GLCM texture features, a input image, b energy, c entropy, d contrast, e max probability, f homogeneity, g variance

15 Modelling Fade Transition in a Video Using Texture Methods

143

Fig. 3 Statistical texture method features. a Input image, b NOC1 variance, c NOC1 mean, d NOC0 variance, e NOC0 mean

3.2.3

Laws Energy Texture [16]

Laws portrayed a novel Texture energy approach to deal with texture analysis. Laws Texture Energy feature purposes of high ‘Texture energy’ in an image. The two-dimensional convolution kernels regularly utilized for texture discrimination are created from the accompanying arrangement of one-dimensional convolution kernels of length 5: L5 E5 S5 W5 R5

= = = = =

[1 4 6 4 1] [−1 −2 0 2 1] [−1 0 2 0 −1] [−1 2 0 −2 1] [1 −4 6 −4 1]

These mnemonics stand for Level, Edge, Spot, Wave and Ripple. From these one-dimensional convolution kernels, we can produce 25 diverse two-dimensional kernels. These are rotationally variant, so some of these kernels are combined to form convolution kernels which are invariant to rotation. The ten kernels obtained after combining, are rotationally invariant (Appendix 3). Figure 4 shows the Laws Texture features.

Fig. 4 Laws texture features. a Original image, b F1, c F2, d F3, e F4, f F5, g F6, h F7, i F8, j F9, k F10

144

J. Majumdar et al.

3.3 Machine Learning for Video Transition The objective of this work is to model Fade Transition in a video using video transformed in texture domain. The modelling is carried out in two phases: Learning Phase and Identification Phase. In order to propose the idea behind this research, we have considered Energy feature from the GLCM Texture method. For each frame of input video containing Fade Transition, we extract Texture Feature Energy to generate new video which corresponds to Energy feature from GLCM Texture. To analyse the response of new video during the transition and to formulate the rules, for each frame of transformed video we calculate characteristic properties of histogram such as Entropy, Skewness, Kurtosis and Spatial Frequency [13]. During the transition, we study the response of each frame of transformed video by observing the change of the value of characteristic properties of the histogram. We use gradient descent to study the nature of change.

3.4 Learning Phase The entire process of detecting transition follows the principle of Machine Learning. A large number of videos from different categories containing Fade Transition is used as the input for the learning phase. Using the texture methods described in Sect. 3.2, such as Grey Level Co-occurrence Matrix (GLCM), Statistical Texture and Laws Texture Energy, the response of the gradient descent of histogram features is studied to form the rules. To start with, input video containing ‘Fade Transition’ is converted to texture domain and histogram properties [13] are calculated for each transformed frame. Property values are normalized within the range 0–1 for the ease of comparison. A graphical representation is done to observe the response of the histogram properties during the transition [13]. Based on the response of these properties, rules are formulated using gradient descent which is found to be consistent for any categories of video. The consistent response here refers to those rules which are unique for each transition and whose response is the same for different categories of the video. The same is shown in Fig. 5a. In GLCM Texture, Energy, Homogeneity and Maximum Probability features gave consistent response for Fade Transition. In Laws Texture Energy, texture features F5 showed a consistent response to Fade Transition. Statistical Texture failed to produce any output and the same will be shown in the result section. These selected texture features are used in the Learning Phase to study the response of Fade Transition for any unknown video. As an example, GLCM Texture’s Energy feature is applied to the input frame sequence as shown in Fig. 6. The input video now corresponds to Texture Feature Energy of GLCM Texture method. For these frames, normalized values of histogram properties are plotted. The graph of the individual property will vary, but their behaviour will remain the same. Figure 7 shows the response of Skewness and

15 Modelling Fade Transition in a Video Using Texture Methods

145

Fig. 5 a Pictorial Representation of learning phase, b pictorial representation of identification phase

Fig. 6 Input frame sequence for FADE transition 1.2 1 0.8 Value

Fig. 7 The response of skewness and Kurtosis for GLCM texture’s energy feature for video sequence shown in Fig. 6

0.6

Skewness

0.4

Kurtosis

0.2 0 -0.2

0

2

4

6 Frame No

8

10

12

146

J. Majumdar et al.

(b) 1.2

1

1

0.8

0.8

0.6 0.4

Value

Value

(a) 1.2

Skewness

0.2

0.6 0.4

0 -0.2 0

Kurtosis

0.2 0 2

4

6 Frame No

8

10

12

-0.2 0

2

4

6 Frame No

8

10

12

Fig. 8 a Response of skewness, b response of Kurtosis for GLCM texture’s energy feature

Kurtosis. Figure 8a, b shows the response of Skewness and Kurtosis, respectively. Now the gradient descent (next frame—present frame) of two successive frames of an individual property is considered, i.e. the difference between the frames. If the difference is greater than or equal to 0, the gradient descent sign is set to +1. If the difference is less than 0, the gradient descent is set to –1. The normalized values of Skewness and Kurtosis are shown in Fig. 9a and c, respectively. Similarly, the gradient descent sign of Skewness and Kurtosis are shown in Fig. 9b and d, respectively. The gradient descent values of the histogram properties are considered to formulate the rules. From Fig. 9b and d, we can observe the gradient descent sign values of Skewness and Kurtosis. It is seen that they are almost the same. From this observation we can conclude that the responses of Skewness and Kurtosis have a linear relationship, i.e. they follow each other. The relationship is found out for four other input data containing Fade Transition. So, a plot of Skewness versus Kurtosis is obtained. The above graphs show the plot of five input Fade Transition videos. Figure 10a shows the data obtained from the input videos, (b) corresponds to the polynomial obtained for the data obtained. The Fig. 9 a Value of skewness for each frame, b gradient descent of skewness, c value of Kurtosis for each frame, d gradient descent of Kurtosis

(a) 0.032 0 0.026 0.011 0.05 1 0.02 0.14 0.03 0.20

(b) +1 -1 +1 -1 -1 +1 -1 +1 -1

(c) 0.001 0 0.005 0.022 0.01 1 0.005 0.034 0.004 0.06

(d) +1 -1 +1 -1 -1 +1 -1 +1 -1

15 Modelling Fade Transition in a Video Using Texture Methods Table 1 Polynomial coefficients for equation 1

147

a0

−0.005

a1

−0.4276

a2

31.842

a3

−350.81

a4

1758.81

a5

−4709.1

a6

7270.58

a7

−6531.5

a8

3187.19

a9

−655.57

(b)

KURTOSIS

KURTOSIS

(a)

SKEWNESS

SKEWNESS

Fig. 10 a Experimental data, b experimental data with polynomial fit

respective

polynomial

equation

for

curve

shown

in

Fig.

10b

is

k = a0 + a1s + a2s2 + a3s3 + a4s4 + a5s5 + a6s6 + a7s7 + a8s8 + a9s9; equation 1; k: kurtosis, s: skewness.

The respective polynomial coefficients are shown in Table 1. Hence is the rule that is formulated (1) Skewness and Kurtosis have a linear relationship, i.e. they follow each other.

3.5 Verification Phase With the help of polynomial obtained in the Learning Phase, it is verified if the output obtained for Kurtosis matches with that of experimental Kurtosis data (Figs. 11, 12 and 13).

148

J. Majumdar et al.

(a)

(b)

(c)

0.032

0.001

0.001

0

0

0

0.026

0.005

0.005

0.011

0.022

0

0.05

0.01

0.021

1

1

0.99

0.02

0.005

0.001

0.14

0.034

0.072

0.03

0.004

0.007

0.20

0.06

0.078

Fig. 11 a Experimental data of skewness, b experimental data of Kurtosis, c data of Kurtosis obtained from the Polynomial equation 1 1.2 1 Experimental value of Kurtosis Skewness Polynomial value of Kurtosis

0.8 0.6 0.4 0.2 0 -0.2

0

2

4

6

8

10

12

KURTOSIS

Fig. 12 Representation of all the three data shown in Fig. 11

SKEWNESS

Fig. 13 The plot of skewness versus Kurtosis for four input data sets

15 Modelling Fade Transition in a Video Using Texture Methods Table 2 Quality of Fit values for five data sets in Learning Phase and four data sets in Identification phase

Quality of fit

149

Learning phase

Identification phase

SSE

0.07297

0.07369

MSE

0.001459

0.00184

RMSE

0.03819

0.04289

3.6 Identification Phase In this phase, the polynomial equation along with the rules from Learning Phase is used to detect the kind of transition the video belongs to. Feature corresponding to texture method which gives a consistent response from a video of different categories is selected as the rules during Learning Phase. These rules are verified with known data in Learning Phase. Given a raw input video, using the rules obtained in the Learning Phase, we can identify the type of video the transition belongs to. For GLCM, Energy Texture feature, the corresponding polynomial equation is represented in equation 1 and coefficients are shown in Table 1. The rule obtained is (1) Skewness and Kurtosis have a linear relationship, i.e. they follow each other. In this phase, a total of four fade data sets are used as input. To measure the quality of polynomial fit, Sum of Squared due to Errors (SSE), Mean Square Error (MSE) and Root Mean Squared Error (RMSE) are used. Green dots represent the actual value of Kurtosis with Skewness obtained. The red line is the curve that is generated using the values of Skewness to the polynomial obtained in the Learning Phase. From each video sequence, 10 frames are used. So a total of 40 frames are used (Table 2).

4 Experimental Study and Results A total of 30 videos from various categories are trained. A total number of videos from Fade, Dissolve and Wipe are 10 each. The number of inputs to the Learning Phase is 5 and to that of Identification Phase is 4. To obtain texture features which are consistent to Fade Transition, a total of 1500+ graphs were analysed. In this section, we have presented the results in three parts. Part I: Modelling Fade Transition using Grey Level Co-occurrence Matrix (GLCM) Texture Feature: Homogeneity Learning Phase The algorithm discussed in Section II—D is implemented for 5 input data sets containing Fade Transition. The input video is converted to GLCM’s Homogeneity Feature.

150

J. Majumdar et al.

KURTOSIS

Fig. 14 The plot of skewness versus Kurtosis

SKEWNESS

Figure 14 represents the plot of Skewness versus Kurtosis. Green dots represent the experimental data and the blue curve represents the polynomial obtained for the experimental data. Rule obtained is (1) Skewness and Kurtosis have a linear relationship, i.e. they follow The respective polynomial equation for the curve shown in Fig. 14 is k = a0 + a1s + a2s2 + a3s3 + a4s4 + a5s5 + a6s6; equation 2; k: kurtosis,s: skewness.

The respective poly-

nomial coefficients are Identification Phase The polynomial equation 2 obtained in the Learning Phase is used here. Green dots represent the actual value of Kurtosis with Skewness obtained. The red line is the curve that is generated using the values of Skewness to the polynomial obtained in the Learning Phase (Fig. 15). Feature: Maximum Probability Learning Phase The algorithm discussed in Section 3 is implemented for five input data sets containing Fade Transition. The input video is converted to GLCM’s Maximum Probability. Figure 16a represents the plot of Skewness versus Kurtosis, (b) represents the plot of Entropy versus Spatial Frequency. Green dots represent the experimental data and the blue curve represents the polynomial obtained for the experimental data. Rule obtained is 1. Skewness and Kurtosis have a linear relationship, i.e. they follow each other. 2. Entropy and Spatial Frequency have a linear relationship, i.e. they follow each other.

151

KURTOSIS

15 Modelling Fade Transition in a Video Using Texture Methods

SKEWNESS

Fig. 15 The plot of skewness versus Kurtosis

(b) SPATIAL FREQUENCY

KURTOSIS

(a)

Fig. 16 a The plot of skewness versus Kurtosis, b plot of entropy versus spatial frequency

The respective polynomial equation for curve shown in Fig. 16a is k = a0 + a1s + a2s2 + a3s3 + a4s4 + a5s5 + a6s6 + a7s7 equation 3; k: kurtosis, s: skewness.

+

a8s8;

The

respective polynomial coefficients are The respective polynomial equation for curve shown in Fig. 16b is sf = a0 + a1e + a2e2 + a3e3 + a4e4 + a5e5 + a6e6 + a7e7 + a8e8 + a9e9 + a10e10;equation 4; sf: spatial frequency, e:Entropy.

The

respective polynomial coefficients are Identification Phase The polynomial equations 3 and 4 obtained in the Learning Phase is used here. In Fig. 17a, green dots represent the actual value of Kurtosis with Skewness obtained. The red line is the curve that is generated using the values of Skewness to the polynomial obtained in the Learning Phase. Similarly, in Fig. 17b green dots represent the actual value of Spatial Frequency with Entropy obtained.

152

J. Majumdar et al.

Fig. 17 a Plot of skewness versus Kurtosis, b plot of entropy versus spatial frequency

The red line is the curve that is generated using the values of Entropy to the polynomial obtained in the Learning Phase. Part II: Modelling Fade Transition using Laws Texture method Feature: F5 Learning Phase The algorithm discussed in Section 3 is implemented for five input data sets containing Fade Transition. The input video is converted to Laws F5 feature. Figure 18 represents the plot of Skewness versus Frequency. Green dots represent the experimental data and the blue curve represents the polynomial obtained for the experimental data. Rule obtained is (1) Skewness and Spatial Frequency invert each other.

KURTOSIS

Fig. 18 The plot of skewness versus spatial frequency

ENTROPY

SKEWNESS

15 Modelling Fade Transition in a Video Using Texture Methods

153

Fig. 19 The plot of skewness versus spatial frequency

The respective polynomial equation for the curve shown in Fig. 18 is p = a0 + a1 +a2

2

+a3

3

+a4

4

+a5

p: spatial frequency, si :

5

+a6 ;

6

+a7

7

; equation 5;

The respec-

tive polynomial coefficients are Identification Phase The polynomial equation 5 obtained in the Learning Phase is used here. Green dots represent the actual value of Spatial Frequency with Skewness obtained. The red line is the curve that is generated using the values of Skewness to the polynomial obtained in the Learning Phase (Fig. 19). The algorithm discussed in Section 3 is implemented for five input data sets containing Fade Transition. The input video is converted to Laws’ F5 feature. Feature: F3 Learning Phase The algorithm discussed in Section 3 is implemented for five input data sets containing Fade Transition. The input video is converted to Laws’ F3 Feature. Figure 20 represents a plot between Contrast and Entropy for Laws’ F3 Feature. The experimental data are spread all over the graph. So, it is not possible to fit a polynomial to obtain a relationship between Contrast and Entropy. Part III: Modelling Fade Transition using Statistical Texture Method Feature: Variance NOC1 Learning Phase The algorithm discussed in Section II—D is implemented for five input data sets containing Fade Transition. The input video is converted to Statistical’s Variance N0C1 feature. Figure 21 represents a plot between Contrast and Entropy for Statistical’s Variance N0C1 feature. The experimental data are spread all over the graph. So, it is not possible to fit a polynomial to obtain a relationship between Mean and Kurtosis (Tables 3, 4, 5 and 6).

154

J. Majumdar et al.

KURTOSIS

Fig. 20 The plot between contrast and entropy

MEAN

Fig. 21 The plot between mean and Kurtosis

Table 3 Polynomial coefficients for equation 2

Table 4 Quality of fit values for five data sets in learning phase and four data sets in identification phase

a0

−0.0076

a1

1.7989

a2

−10.251

a3

33.318

a4

−46.899

a5

29.9741

a6

−6.9272

Quality of fit

Learning phase

Identification phase

SSE

0.14671

0.18858

MSE

0.00293

0.00471

RMSE

0.05413

0.006863

15 Modelling Fade Transition in a Video Using Texture Methods Table 5 Polynomial coefficients for equation 3

Table 6 Polynomial coefficients for equation 4

a0

0.0005

a1

0.20599

a2

8.77101

a3

−73.192

a4

286.698

a5

−577.19

a6

620.688

a7

−338.16

a8

73.189

a0

−0.085

a1

11.686

a2

−258.06

a3

2539.58

a4

−12630

a5

36146.7

a6

−63669

a7

70633.4

a8

−48384

a9

18788.8

a10

−3176.7

155

On comparing the quality of polynomial fit from Tables 2, 4, 7 and 8, Feature Maximum Probability of GLCM Texture gives the best result. The respective rule is (1) Skewness and Kurtosis have a linear relationship, i.e. they follow each other.

Table 7 Quality of Fit values for five data sets in learning phase and four data sets in identification phase for skewness and Kurtosis

Table 8 Quality of fit values for five data sets in learning phase and four data sets in Identification phase for entropy and spatial frequency

Quality of fit

Learning phase

SSE

0.0462

Identification phase 0.11279

MSE

0.000924

0.002819

RMSE

0.0304

0.05309

Quality of fit

Learning phase

Identification phase

SSE

0.78801

0.15179

MSE

0.01576

0.00379

RMSE

0.12554

0.006156

156

J. Majumdar et al.

5 Evaluation The previous section concluded that polynomial 4, i.e. Feature Maximum Probability of GLCM Texture gives the best result. In this section, we feed in three Fade videos in order to evaluate the result obtained. Fade video 1 Fade video 2 Fade video 3 The algorithm discussed in Section II—D is implemented for Frame Sequence shown in Figs. 22, 23 and 24. Each input video is converted to GLCM’s Maximum Probability feature. Polynomial 4 obtained in Sect. 4 is evaluated (Tables 9, 10, 11, 12, 13, 14, 15 and 16).

Fig. 22 Fame sequence of fade transition

Fig. 23 Fame sequence of fade transition

Fig. 24 Fame sequence of fade transition

15 Modelling Fade Transition in a Video Using Texture Methods Table 9 Polynomial coefficients for equation 5

Table 10 Quality of fit values for five data sets in learning phase and four data sets in identification phase

Table 11 Experimental data of skewness and Kurtosis along with the value of Kurtosis obtained from polynomial 4 for fade video 1

Table 12 Quality of Fit for Table 11

a0

0.0443

a1

0.1259

a2

−0.0084

a3

0.00027

a4

−4.38e−06

a5

3.54e−08

a6

−1.30e−10

a7

1.49e−13

Quality of fit

157

Learning phase

Identification phase

SSE

0.14671

0.18858

MSE

0.00293

0.00471

RMSE

0.05413

0.006863

Skewness

Kurtosis (Experimental)

Kurtosis (Polynomial)

0

0

0.0005

0.10031

0.0455515

0.0593063

0.471687

0.303246

0.341571

0.505725

0.334882

0.376819

0.461615

0.293567

0.331025

1

1

1.01048

0.851823

0.756121

0.751584

0.738473

0.595775

0.598527

0.511025

0.340736

0.382219

0.410735

0.249803

0.278134

SSE

0.0074816

MSE

0.00074816

RMSE

0.0273525

6 Conclusion Texture methods can individuate Fade Transition from the rest. Grey Level Cooccurrence Matrix (GLCM) and Laws Texture Energy gave a consistent response to differentiate Fade Transition from the rest when compared to our previous work [13] which made use of histogram properties. The methodology discussed in this work

158 Table 13 Experimental data of Skewness and Kurtosis along with the value of Kurtosis obtained from Polynomial 4 for Fade video 2

Table 14 Quality of Fit for Table 13

Table 15 Experimental data of skewness and Kurtosis along with the value of Kurtosis obtained from polynomial 4 for fade video 3

Table 16 Quality of fit for Table 15

J. Majumdar et al. Skewness

Kurtosis (Experimental)

Kurtosis (Polynomial)

0.614324

0.330269

0.480712

0.970193

0.928107

0.961538

1

1

1.01048

0.997019

0.994909

1.00587

0.42334

0.161315

0.291072

0.391485

0.140276

0.258789

0.364271

0.123274

0.232619

0.312565

0.0957368

0.188129

0.112854

0.0223466

0.0673987

0

0

0.0005

SSE

0.0773857

MSE

0.00773857

RMSE

0.0879691

Skewness

Kurtosis (Experimental)

Kurtosis (Polynomial)

0.730343

0.574893

0.589797

0.612519

0.433578

0.479086

1

1

1.01048

0.865577

0.768754

0.774424

0.559894

0.377597

0.430417

0.637257

0.458815

0.501282

0.988982

0.987114

0.993093

0.923623

0.870971

0.877688

0

0

0.0005

0.0210997

0.0107535

0.0081181

SSE

0.00711661

MSE

0.000711661

RMSE

0.026677

is to be applied to Dissolve and Wipe Transition to see their behaviour. In this paper we have explored only texture domain; other domains like Wavelet should also be explored.

15 Modelling Fade Transition in a Video Using Texture Methods

159

Acknowledgements The authors offer their earnest thanks to Prof. N. R. Shetty, Advisor and Dr. H. C. Nagraj, Principal Nitte Meenakshi Institute of Technology for giving steady consolation and support to complete research at NMIT. The authors also thank and appreciate the Vision Group on Science and Technology (VGST), Government of Karnataka to recognize their exploration and for providing financial help to set up the foundation required to carry out the research.

Appendix 1 Grey level Co-occurrence matrix (GLCM) Haralick has proposed 14 features out of which we use six features which are as follows: (1) Energy: Angular Second Moment is also known as Uniformity or Energy. It is N −1 the sum of squares of entries in the GLCM.Energy = Pi,2 j i, j=0

(2) Entropy: Entropy shows the amount of information of the image that is needed N −1   for the image compression. Entr opy = Pi, j −ln Pi, j i, j=0

(3) Contrast: Contrast is defined as the difference between the highest and the smallest values other adjacent set of pixels considered. Contrast = N −1 Pi, j (i − j)2 i, j=0

(4) Maximum Probability: This simple statistic records in the centre pixel  the  of window, the largest Pij value found within the window. M P = Max Pi, j (5) Homogeneity: Homogeneity weighs values by the inverse of the Contrast weight, with weights decreasing exponentially away from the diagonal. N −1 Pi, j H omogeneit y = 1+(i− j)2 i, j=0

6) Variance: Variance is a measure of the dispersion of the values around the mean. It is similar to entropy.

Appendix 2 Statistical Texture method The two geometrical attributes NOC1 (α) and NOC0 (α) are characterized using the following formulas: 1. Sample mean

1 nl −1

α · g(α)

g(α) α=1 n l −1 1 (α nl −1 α=1 g(α) α=1

α=1

2. Sample variance

n l −1

− sample_mean)2 .g(α)

160

J. Majumdar et al.

Appendix 3 Laws features The 25 kernels are rotationally variant so some of these kernels are combined to form convolution kernels which are invariant to rotation. The invariance of rotation has a major impact on pattern recognition, pattern classification, etc. So we will denote these new features with an appended ‘R’ for ‘rotational invariance’. E5L5TR = E5L5T + L5E5T (F1) S5L5TR = S5L5T + L5S5T (F2) W5L5TR = W5L5T + L5W5T (F3) R5L5TR = R5L5T + L5R5T (F4) S5E5TR = S5E5T + E5S5T (F5) W5E5TR = W5E5T + E5W5T (F6) R5E5TR = R5E5T + E5R5T (F7) W5S5TR = W5S5T + S5W5T (F8) R5S5TR = R5S5T + S5R5T (F9) R5W5TR = R5W5T + W5R5T (F10)

References 1. Gajera AC, Mehta RG (2006) Detecting fade and dissolve transition in compressed domain. J Inf Knowl Res Electron Commun Eng 2. Porter S, Mirmehdi M, Thomas B. Detection and classification of shot transitions. Department of Computer Science University of Bristol, Bristol, BS8 1UB, UK 3. Truongt BT, Dorai C, Venkatesh S (2000) ICIP 2000 improved fade and dissolve detection for reliable video segmentation. In: ICIP 2000 proceedings of the IEEE international conference on image processing. IEEE 4. Bansod AM, Kuttarmare AP, Aher MB, Chavan SA. Comparison of techniques for detection of fades and dissolves in video sequences. IOSR J Electr Electron Eng (IOSRJEEE) 5. Song H, Wu X, Yu W, Jiam Y (2017) Extracting key segments of videos for event detection by learning from web sources. IEEE Trans Multimed 6. Sun J, Wan Y (2014) A novel metric for efficient video shot boundary detection. IEEE 7. Fani M, Yazdi M (2014) Shot boundary detection with efficient prediction of transitions positions and spans by use of classifiers and adaptive thresholds. In: Iranian conference on electrical engineering 8. Zhang X, Yang Y, Zhang Y, Luan H, Li J, Zhang H, Chua T-S (2015) Enhancing video event recognition using automatically constructed semantic-visual knowledge base. IEEE Trans Multimed 9. Zhu X, Huang Z, Cui J, Shen HT (2003) Video-to-shot tag propagation by graph sparse group Lasso. IEEE Trans Multimed 10. Duan X, Lin L, Chao H (2013) Discovering video shot categories by unsupervised stochastic graph partition. IEEE Trans Multimed 11. Tippaya S, SitJongsataporn S, Tan T, Mehmood Khan M, Chamnongthai K (2017) Multi-modal visual features-based video shot boundary detection. IEEE

15 Modelling Fade Transition in a Video Using Texture Methods

161

12. Fu M, Zhang D, Kong M, Luo B (2010) Shot transition detection based on locally linear embedding. In: 2010 international conference on measuring technology and mechatronics automation 13. Peter Hudson A, Giridhar NR, Hari Venkata Deepak S, Majumdar J (2017) Transition based on histogram properties. Int J Eng Res Technol (IJERT) 6(10) 14. Haralick RM, Shanmugan K, Dinstein I (1973) Texture features for images classification. IEEE Trans Syst Man Cybern 3(6):610–621 15. Hawlick RM (1979) Statistical and structural approaches to texture. In: Proceedings of the IEEE, vol 67, no 5, May 1979 16. Laws K (1980) Rapid texture identification. In: Proceedings of SPIE, image processing for missile guidance, vol 238, pp 376–380

Chapter 16

Video Shot Detection and Summarization Using Features Derived From Texture Jharna Majumdar, M. P. Ashray, H. M. Madhan and Dhanush M. Adiga

1 Introduction • A feature represents various attributes of an image. The attributes being color, shape, texture, orientation, etc. Feature extraction involves representation of an input image as a feature or a set of features. • A video might be composed of a number of scenes or scenarios. A series of frames/images in a video, that belongs to a particular scene, is known as a shot. When traversing through a video, the transition from one shot to another can be categorized into hard transition/soft transition. Our work is focused on detecting video shots that fall into the category of hard transition. A simple approach such as the histogram difference methods has been adopted to detect changes in transition. • Video summarization technique eases the processes of analyzing the contents of a video. The important step in video summarization is key-frame selection from input video to summarize the rest of the frames. A summary of a video can be created by extracting key-frames, that are significant/important by the key-frame extraction method. This process is known as video summarization (Fig. 1). This paper is focused on the extraction of key-frames using texture features, namely, Gray Level Co-occurrence Matrix (GLCM) and texture spectrum. Video shot detection in intensity domain may lead to results that are susceptible to illumination conditions, noise, and variance in rotation. To make the process of video shot detection immune to noise, illumination conditions, and invariant to rotation, we transform the input video in intensity domain to texture domain (Figs. 2, 3).

J. Majumdar · M. P. Ashray (B) · H. M. Madhan · D. M. Adiga Nitte Meenkashi Institute of Technology, Bangalore, India e-mail: [email protected] J. Majumdar e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_16

163

164

J. Majumdar et al.

Fig. 1 Hard cut transition from one shot to other between frames 736 and 737

Fig. 2 Video summarization process by selecting the key-frames from a shot

The rest of the paper is organized in the following way: related work carried out is discussed in Sect. 2. Section 3 throws light on various texture feature extraction process. The video shot detection method adopted is discussed in Sect. 4 followed by video summarization process in Sect. 5. Experimental results and analysis of the outcome is carried out in Sect. 6.

2 Related Work • There are a number of techniques that can be adopted to accomplish the task of detecting video shots in a video. Also, there is a wide range of feature extraction methods available that can be used to extract a number of features from an image. When combined together, they yield a number of opportunities to accomplish the task.

16 Video Shot Detection and Summarization Using Features …

165

Fig. 3 Video shot detection framework

• Features of gray level co-occurrence matrix have been proven to be promising for the video shot detection purposes in the work carried by Priyanka and Jharna Majumdar [1]. However, a comparison is not carried out in their work to find out the best feature among the considered six Haralick features. A series of work carried out by Li Wang and Dong-Chen He paves way for a number of opportunities for using texture features of an image for a number of applications [2–4]. • Upesh Patel et al. proposed two methods for shot detection [5]. The first one is making use of the traditional pixel-wise difference between two frames with the adaptive threshold to determine if a shot has occurred. The other method is by using a color histogram of the two frames that are used for comparison. The techniques employed for measuring the similarity between two histograms are discussed in the further sections. It is proven in the work carried out by Jharna Majumadar et al. [6] that histogram difference method yields better results compared to the traditional pixel difference method. Lakshmi Priya [7] discusses a new approach to video cut transition detection based on the block-wise histogram differences of the two consecutive frames of a video sequence in RGB color space. A new method of finding the threshold for detecting the shots is also put forward in this work.

3 Feature Extraction The feature extraction process involves representing an image as a feature or a set of features. Primitive or low-level image features can be either general features, such as extraction of color, texture, and shape or domain specific features. Using color

166

J. Majumdar et al.

Fig. 4 Classification of texture-based feature extraction method

as a feature proves to be ineffective mainly because, multiple images with different contents might have a similar color histogram. Secondly, the results vary depending on the illumination condition. Identifying the transition or cut serves as a problem when the illumination conditions vary in the video (Fig. 4).

3.1 Gray Level Co-occurrence Matrix Initially proposed by Robert Haralick et al. [8], Gray Level Co-occurrence Matrix (GLCM) is one of the most popular statistical methods for examining texture. It works by considering the spatial relationship of pixels in an image. Initially, a total of 14 features were proposed by Haralick for GLCM. Since the six features of GLCM provided a promising result [1], a question arises out of curiosity as to which feature among the six provides best performance for video shot detection purpose. To answer the above question, a comparison of the six features is carried out to find the best or optimal one for our application of video shot detection (Fig. 5).

Fig. 5 An overview of the Haralick features extracted from the intensity image

16 Video Shot Detection and Summarization Using Features …

167

3.2 Texture Spectrum Li Wang and D. C. He puts forward a promising new statistical approach for analyzing texture [2]. Over a series of publications [2–4], various aspects ranging from the introduction and working to application of texture spectrum for texture classification is explained in detail. Li Wang et al. further indicates in their work [2] about the potential application of the texture spectrum method in texture characterization and classification. The texture unit is the most basic unit in an image that best characterizes the local texture aspects of the image. A texture unit is computed by applying (1) on each element of the 3 × 3 window of the image: ⎧ ⎨ 0 if Vi < V0 E i = 1 if Vi = V0 ⎩ 2 if Vi > V0

(1)

where Vi for i = {1,2,…8} is the set of neighboring pixels of the central pixel V0 ·E i denotes the elements of texture unit formed for i = {1,2,…8}. Each element E i occupies the same position as that of pixel i. The computation of texture the unit is described in the Fig. 6: Each element of a texture unit has one of three possible values (0, 1, 2) after conversion. By combining all the eight elements in a clockwise or in an anticlockwise manner, we get texture unitnumber. There are 38 = 6561 possible values that a texture unit number can take up. The Texture Unit Number (NTU) is calculated using (2): NTU =

8 

E i ∗ 3i−1

(2)

i=1

where NTU represents the texture unit number and E i is the ith element of the texture unit set TU = {E 1 , E 2 , . . . E 8 }. The value of NTU varies from 0 to 6560 (Table 1). where S(i) is the occurrence of frequency of texture unit numbered I = 0, 1, 6560. K(i) is the number of pairs that have the same value in elements (E 1 , E 5 ), (E 2 , E 6 ), (E 3 , E 7 ), and (E 4 , E 8 ) (Fig. 7).

Fig. 6 a 3 × 3 window of an image. b Corresponding texture unit

168

J. Majumdar et al.

Table 1 Features of texture spectrum Feature Black and white symmetry Degree of direction

Formula   3279 |S(i)−S(6560−i)| 1 − i=0 6560 × 100 i=0

1−

1 6



Geo symmetry

1−

Center symmetry

6560 

1 4

S(i)

6560 i=0 |Sm (i)−Sn (i)| 6560 m=1 n=m+1 2 × i=0 Sm (i) 3 

4 

4 

6560

j=1



i=0

| S j (i)−S j+4 (i)| 6560 i=0

S j (i)

× 100

× 100

S(i) × [K (i)]2

i=0

Fig. 7 An Overview of the features extracted by texture spectrum

4 Video Shot Detection There is a need for a similarity measure or distance measure to compare the similarity or the dissimilarity of the contents in a pair of image frames in a video. With this similarity or distance measure, an adaptive threshold is computed to carry out the video shot detection process. A shot change is said to have occurred at a particular frame when the frame’s distance measure is greater than the adaptive threshold. Adaptive Threshold is calculated by taking the mean and the standard deviation of the frames in the video into account (3). Below is the formula for calculating the same: Adaptive Threshold = Mean + ∝ ∗ Standard Deviation

(3)

16 Video Shot Detection and Summarization Using Features … Table 2 A list of distance measures with their formula used for comparison

Distance measure Chi-square distance

169

Formula 1 N2

256  i

(h 1 [i]−h 2 [i])2

max(h 1 [i], h 2 [i]) h 2 [i] = 0 and h 1 [i] = 0

Bhattacharyya distance

√  √ H1 ·H 1− 2

Minkowski distance



1 | f i (I ) − f i (I )|3 3

H1 (i)

H2 (i)

In the above equation, the mean and the standard deviation are calculated from the feature values generated from the feature extraction methods. α is a predetermined value that regulates or controls the number of shots that are detected. The method of video shot detection chosen in this work is the histogram difference method. Histogram difference has been proven to be effective in comparison with the pixel-wise difference method. A histogram is plotted for every frame in the feature video generated by the above feature extraction methods. Once the histogram is plotted, the distance measure is used to find the change in content in the consecutive frames. A high value of the distance measure for a pair of consecutive frames indicate that the content of the frame has changed drastically. This technique is used to detect the transition or cut. A number of distance measure can be used for histogram difference measure. We use three distance measures (Table 2):

5 Video Summarization A brief sequence of video or a summarized of a video highlighting the key and vital aspects of the original video are created by the process of video summarization. We have chosen to carry out this task by identifying the key-frames in a shot and combining them to form a summarized version of the video. We have adopted keyframe identification by clustering methods. The traditional methods like K-Means clustering algorithm requires the value of K (number of clusters) to be specified as input by the user. To make the selection key-frames automatic irrespective of any video as input, we have chosen the Affinity propagation to be our clustering algorithm. A number of video shots might exist in a video. Summarization process takes place by the identification of the key-frames in a video shot. Video shots should be divided and each video shot must be fed to the clustering algorithm.

170

J. Majumdar et al.

5.1 Affinity Propagation Affinity propagation algorithm was initially proposed by Brendan J. Frey et al. [9] in 2007. Similarity measure is used to compare histograms of consecutive frames. A similarity measure is generated for each pair of consecutive frames. This set of similarity measures serve as input to this clustering algorithm. The algorithm makes use of the median of the input similarities as regulator that control the number of clusters formed. This value is known as preference. Each S(i,i) is initialized with the median of the input similarities, where i is the number of frames in a video shot. It is also suggested that minimum of the input values can also be taken as the preference value if small number of clusters is desired. Algorithm Input: Similarity measure values of N data points. • Compute N × N similarity matrix of the N data point using negative Euclidean distance (4). 

2 S(i, j) = − xi − x j

(4)

• Assign preference value (median) to S(i, i) i.e., the diagonal elements of the matrix. • Initialize availability matrix to zero. • Compute responsibility matrix (5). r (i, j) = s(i, j) − max{a(i, k) + s(i, k)} j=k

(5)

• Compute availability matrix (6). ⎧ ⎫ ⎨ ⎬  a(i, j) = min 0, r ( j, j) + max{0, r (k, j)} ⎩ ⎭

(6)

i=k

• Compute self-availability (6). a(i, i) =



max{0, r (k, j)}

(7)

i=k

After the algorithm is run, combining the availability and responsibility matrix yields the exemplars. For point i, the value of j that maximizes a(i, j) + r (i, j) is identified as an exemplar. The algorithm can be run a number of times to get accurate results.

16 Video Shot Detection and Summarization Using Features …

171

6 Experimental Results Quality metrics is carried out to measure the performance of the video shot detection method with different feature extraction methods described before. Quality metrics procedure is carried out by calculating three parameters. Namely, 1. Recall—denotes the probability of existing cut being detected.

V =

C C+M

(8)

2. Precision—Probability of the cut detected being correct.

P=

C C+F

(9)

3. F1—A Combined measure of Recall and precision.

F1 =

2∗ P ∗V P+V

(10)

6.1 Video Shot Detection See Tables 3, 4 and Fig. 8.

6.2 Video Summarization The optimal feature extraction methods identified for video shot detection are considered for the video summarization process because the other methods failed in pre-requisite step, i.e., video shot detection (Table 5).

6.3 Analysis Following observations can be made with regard to feature extraction method and distance measure from Tables 3 to 4:

172

J. Majumdar et al.

Table 3 F1 measure of GLCM features Features

Distance



Video-1 13 shots

Video-2 17 shots

Video-3 16 Shots

Energy

Chi-square

4

0.85

0.69

0.66

5

0.83

0.58

0.60

Bhattacharyya

4

0.88

0.71

0.76

5

0.72

0.64

0.72

Minkowski

4

0.74

0.58

0.81

5

0.72

0.52

0.72

4

0.61

0.74

0.85

5

0.60

0.64

0.66

Bhattacharyya

4

0.38

0.69

0.72

5

0.21

0.58

0.47

Minkowski

4

0.69

0.57

0.81

5

0.57

0.58

0.72

4

0.86

0.58

0.72

5

0.7

0.58

0.47

Bhattacharyya

4

0.55

0.64

0.76

5

0.55

0.58

0.54

Minkowski

4

0.88

0.64

0.72

5

0.86

0.64

0.60

4

0.96

0.64

0.66

5

0.81

0.52

0.54

4

0.63

0.64

0.66

5

0.55

0.52

0.60

Minkowski

4

0.81

0.58

0.85

5

0.81

0.52

0.76

Chi-square

4

0.86

0.58

0.72

5

0.7

0.52

0.66

4

0.55

0.58

0.66

5

0.47

0.52

0.54

Minkowski

4

0.76

0.69

0.72

5

0.7

0.64

0.66

Chi-square

4

0.81

0.64

0.54

5

0.76

0.58

0.54

4

0.76

0.74

0.72

5

0.69

0.64

0.60

4

0.76

0.58

0.89

5

0.78

0.58

0.76

Entropy

Contrast

IDM

Chi-square

Chi-square

Chi-square Bhattacharyya

DM

Bhattacharyya

Max. probability

Bhattacharyya Minkowski Bold indicates the Highest values

16 Video Shot Detection and Summarization Using Features …

173

Table 4 F1 measure of texture spectrum features Features

Distance



Video-1 13 shots

Video-2 17 shots

Video-3 16 shots

Geo symmetry

Chi-square

4

0.64

0.49

0.72

5

0.57

0.45

0.72

Bhattacharyya

4

0.55

0.49

0.69

5

0.50

0.56

0.72

Minkowski

4

0.91

0.61

0.72

5

0.76

0.49

0.66

4

0.86

0.52

0.72

5

0.86

0.45

0.72

Bhattacharyya

4

0.91

0.69

0.81

5

0.86

0.58

0.76

Minkowski

4

0.86

0.45

0.66

5

0.7

0.45

0.60

4

0.25

0.42

0.6

5

0.28

0.40

0.66

Bhattacharyya

4

0.24

0.37

0.55

5

0.56

0.44

0.64

Minkowski

4

0.76

0.49

0.72

5

0.7

0.46

0.72

4

0.68

0.52

0.72

5

0.61

0.38

0.72

4

0.64

0.52

0.76

5

0.60

0.45

0.72

4

0.91

0.69

0.76

5

0.81

0.58

0.66

Center symmetry

Black and white symmetry

Degree of direction

Chi-square

Chi-square

Chi-square Bhattacharyya Minkowski

Bold indicates the Highest values

• Out of the six Haralick features considered in GLCM method, Inverse Difference Moment (IDM) exhibits best performance seconded by Energy. • Among the four Texture Spectrum Features, Center Symmetry exhibits best performance. • Out of the three distance measures under consideration, Minkowski distance can be considered as the most suitable distance measure for the purpose of video shot detection by histogram difference method. • Most of the feature extraction methods seem to perform well under the alpha value (α) of 4.

174

J. Majumdar et al.

Fig. 8 A graph depicting adaptive threshold during video shot detection

Table 5 Key-frames extracted in the summarized video Feature GLCM Texture spectrum

Video-1 1345 frames

Video-2 1265 frames

Video-3 1379 frames

Energy

173

246

270

IDM

276

263

318

Center symmetry

277

400

265

Bold indicates the Lesser values

7 Conclusion The earlier methods of video shot detection and summarization involved processing the videos in the intensity domain directly and the use of traditional clustering algorithms that require human intervention to select the number of clusters. Our work provides an alternative approach for detecting video shots by converting the video from intensity domain to texture domain and applying the video shot detection techniques. Thus, providing comparatively accurate results. The video shot detection approach is different and more efficient than the traditional pixel-wise difference method. The summarization process is made automatic, without having the need to know the number of key-frames to be selected from each shot. Acknowledgements The authors express their sincere gratitude to Prof. N. R. Shetty, Advisor and Dr. H. C. Nagaraj, Principal, Nitte Meenakshi Institute of Technology for giving constant encouragement and support to carry out research at NMIT. The authors extend their thanks and gratitude to the Vision Group on Science and Technology (VGST), Government of Karnataka to acknowledge their research and providing financial support to set up the infrastructure required to carry out the research.

16 Video Shot Detection and Summarization Using Features …

175

References 1. Priyanka AR, Majumdar J (2015) Video shot detection using texture feature. Int J Sci Res (IJSR) 2. Haralick RM, Shanmugam K, Dinstein IH (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3. Wang L, He DC (1990) A new statistical approach for texture analysis. Photogram Eng Remote Sens 56(1):61–66 4. He D-C, Wang L (1990) Texture unit, texture spectrum, and texture analysis. IEEE Trans Geosci Remote Sens 28(4):509–512 5. Wang L, He D-C (1990) Texture classification using texture spectrum. Pattern Recogn 23(8):905– 910 6. Majumdar J, Aniketh M, Abhishek B, Hegde N (2017) Video shot detection in transform domain. In: 2017 2nd international conference for convergence in technology (I2CT) 7. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Sci Mag 315 8. Patel U, Shah P, Panchal P (2013) Shot detection using pixel-wise difference with adaptive threshold and color histogram method in compressed and uncompressed video. Int J Comput Appl 64(4):0975–8887 9. Lakshmi Priya GG, Domnic S (2010) Video cut detection using block based histogram differences in RGB color space

Chapter 17

Numerical Approximation of Caputo Definition and Simulation of Fractional PID Controller Sachin Gade, Mahesh Kumbhar and Sanjay Pardeshi

1 Introduction The process control industry consists of more than 90% of Proportional Integral Derivative (PID) control action. Optimum value of PID parameters (Kp, Ki and Kd) is essential for smooth, accurate, and precise control action. Fractional PID controller is emerged to overcome the limitations of integer order PID controller. Fractional PID is based on the fractional calculus and uses the concept of fractional order modeling and fractional transfer function. In the integer order differentiation, the operator n D = ddx is often used and can be extended to the nth order operator as D n = ddx n , where “n” is a positive integer. Historical birth of fractional order calculus started when in 1695 L’ Hospital asked the Leibniz, what meaning could be understood when “n” was a fractional [1]. Since then the concept of fractional calculus has been drawing attention to many mathematicians, but it was until 1884 when the general operator theory was built up. D n The operator is defined for the differentiation and integration in the arbitrary order where “n” could be an integer or fractional, positive or negative, real, or complex, rational or irrational. Generalization of differentiation and integration operator a Dtα , where “a” and “t” are limits of operator and [α ∈ R].

S. Gade (B) Shivaji University, Kolhapur, Maharashtra, India e-mail: [email protected] M. Kumbhar · S. Pardeshi Rajarambapu Institute of Technology, Islampur, Maharashtra, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_17

177

178

S. Gade et al.

The integro-differential operator is given as, [2–6] ⎧ dα ;α > 0 ⎪ ⎪ ⎨ dxα 1 ;α = 0 α a Dt = t ⎪ −α ⎪ ⎩ ∫(d x) ; α < 0

(1)

a

Three equivalent definitions are more popular and used for differintegration are Grunwald–Letnikov definition (GL), Riemann–Liouville definition (RL) and Caputo definition. In many engineering applications, Caputo definition is preferred [7, 8]. Integer order and fractional order PID controller Laplace domain (S-Domain) representation is given as C(S) KI = KP + + K D S (Integer Order PID) E(S) S

(2)

C(S) KI = K P + λ + K D S μ (Fractional Order PID) E(S) S

(3)

First fractional order PIλDμ controller was proposed by Podlubny [7] and proved that fractional order PIλ Dμ controller shows better response as compared to traditional integer order PID controller. Vinagre et al. [9] suggested frequency-based auto-tuning of the fractional order PIλ Dμ controller. Zeigler–Nichols traditional three parameters (KP, KI, KD) tuning method was used for fractional order PIλ Dμ controller in [10]. State-space approach for fractional order system proposed in [11]. Nataraj et al. proposed recent Quantitative Feedback Theory (QFT) for the robust design of fractional order controllers [12]. Parameter tuning of the fractional PI controller using ITAE criteria discussed in [13, 14]. In the recent year parameter optimization of fractional order PIλ Dμ controller using the constraint solver is becoming more popular. Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) based parameter optimization of fractional order PIλ Dμ controller in which objective function is minimized were reported in [15, 16]. The optimized parameter of fractional order PID controller is obtained offline using the methods like an Artificial Bee Colony (ABC) algorithm, Ant Colony Optimization (ACO) algorithm, and Bacterial Swarm Optimization (BSO) [17–19]. However, very little attention was given to the direct implementation of fractional numerical methods in a suitable environment. This may happen due to software complexity of discretization methods suggested so far in the literature. Properties of fractional diffintegration are discussed in the following sections and suitable new numerical approximation method has been also introduced.

17 Numerical Approximation of Caputo Definition and Simulation ...

179

2 Fractional Calculus Fractional diffintegration has the integral of convolution type and kernel is of power law type. [20] Definition 1 (Riemann–Liouville Integral)

−α R L Da,t

−α R L Dt,b

1 f (t) = Γ (α)

1 f (t) = Γ (α)

b

t

(t − s)α−1 f (s) ds, α > 0; f (t) : t ∈ [a, b];

(4)

a

(s − t)α−1 f (s) ds, Γ (.) = Euler s Gamma Function.

t ∞

Γ (n) = ∫ e−t t n−1 dt, 0

Definition 2 (Grunwald–Letnikov) If α > 0; f (t) : t ∈ [a, b]; then, α G L Da,t

α G L Dt,b

f (t) = lim h α

f (t) = lim h α h→0 N h=b−t

h→0 N h=t−a

[N

]

(−1) j

j=0

[N ] 

(−1) j

j=0

α j

 α f (t − j h), j

(5)

f (t + j h), [N] integer only.

Definition 3 (mth order diffintegration) If α > 0; f (t) : t ∈ [a, b]; then, α R L Da,t α R L Da,t

 dm −(m−α) D f (t) , R L a,t dxm t dm 1 f (t) = (t − s)m−α−1 f (s) ds, Γ (m − α) d x m f (t) =

a

α R L Dt,b

(−1)m d m f (t) = Γ (m − α) d x m

m − 1 ≤ α ≤ m, m is positive integer.

b (s − t)m−α−1 f (s) ds, t

(6)

180

S. Gade et al.

Definition 4 (Caputo) If α > 0; f (t) : t ∈ [a, b]; then, α C Da,t α C Da,t

 −(m−α)  m f (t) , f (t) = C Da,t t 1 f (t) = (t − s)m−α−1 f m (s) ds, Γ (m − α)

(7)

a

α C Dt,b

(−1)m f (t) = Γ (m − α)

b (s − t)m−α−1 f m (s)ds, t

m − 1 ≤ α ≤ m, m is positive integer. Definition 5 (Riesz) If α > 0; f (t) : t ∈ [a, b]; then, α R Z Dt

Cα =

f (t) = Cα



α R L Da,t

 α f (t) + R L Dt,b f (t) ,

(8)

1   , α = 2k + 1; k = 0, 1, . . . 2 cos α π2

In general, if initial value a = 0; If f (t) is smooth and continuous, f ∈ C m [a, b] α R L Da,t

α f (t) = G L Da,t f (t),

α α R L Da,t f (t) = C Da,t f (t) +

m−1  k=0

f k (a)(t − a)k−α , Γ (k + 1 − α)

(9)

(10)

f ∈ C m [a, t], m − 1 ≤ α ≤ m, α R L Da,t [

α f (t) − φ(t)] = C Da,t f (t),

φ(t) =

m−1  k=0

If f k (a) = 0, or a = −∞ then

f k (a)(t − a)k , Γ (k + 1)

α R L Da,t

(12)

α f (t) = C Da,t f (t)

−α f (t) = f (t), α > 0 lim Da,t

α→0+

(11)

(13)

17 Numerical Approximation of Caputo Definition and Simulation ...

lim

α→m −

α R L Da,t

α f (t) = lim− C Da,t f (t) = f m (t), α→m

(m − 1) < α < m lim

α→(m−1)+

lim

α→(m−1)+

181

(14)

α R L Da,t

α C Da,t

f (t) = f m−1 (t),

(15)

f (t) = f m−1 (t) − f m−1 (0),

(16)

3 Properties of Fractional Diffintegration Linear operator D α (λ f (t) + μg(t)) = λD α f (t) + μD α g(t),

(17)

Leibniz rule nth Derivative of the product f (t) g(t), α

R L Da,t (

f (t)g(t)) =

∞   α k=0

k

g k (t) f (α−k) (t),

(18)

Composite function If g(t) = F(h(t)), then α

R L Da,t (g(t))

=

∞   (t − a)−α (t − a)k−α k α g(t) + g (t), k Γ (1 − α + k) Γ (1 − α) k=1

(19)

∞   (t − a)−α α Γ (k + 1)(t − a)k−α k F(h(t)) + F (h(t)), k Γ (1 − α) Γ (1 − α + k) k=1   k k  k    1 h r (t) ar k m , F (h(t)) R L Da,t (F(h(t))) = ar ! r! m=1 r =1 r =1 α

R L Da,t (F(h(t)))

k  r =1

=

rar = k, and

k 

ar = m,

r =1

Semigroup properties (from Definitions 1 and 3) [α, β > 0], f (t) : t[a, b], −β −α −α −β −α−β Da,t Da,t f (t) = Da,t Da,t f (t) = Da,t f (t)

(20)

182

S. Gade et al. −α

−α

lim Da,t f (t) = lim Dt,b f (t) = 0, [α > 0]

t→a α

t→b

−α

Da,t Da,t f (t) = f (t), [α > 0]

(21)

If m − 1 ≤ α ≤ m, f ∈ C m [a, t] then, α

−α

Da,t Da,t f (t) = f (t) −

m 

 α− j Da,t f (t)

(t − a)α− j t=a Γ (α − j + 1)

j=1

(22)

4 Fractional Numerical Method Finite Difference Method [21] Interval [0, T] uniformly divided into “n” subinterval. The time step t =

T n

then,

tk = kt, [k = 0, 1, . . . k], λ(tk ) be the exact value of a function λ(t) at time step tk . α

C Da,t λ(tk+1 ) c

= c αt λ(tk+1 ),

αt λ(tk+1 ) ∼ =

 (t)−α   W j λ(tk+1− j ) − λ(tk− j ) , Γ (2 − α) j=0 k

(23)

  where W j = ( j + 1)1−α − j 1−α , j = 0, 1 . . . k Kernel-Based Spatial Approximation u(x, t) ≈ U (x, t) =

N 

 λk (t)Φ

k=1

x − ξk , x ∈ Ω, c

(24)

where “c” is scaling parameter, (·) is radial basis function. Compact difference Scheme [22, 23] xi = i h, (0  ≤ i ≤ M) and tk = kτ, (0 ≤ k ≤ N ), k−1  j

 α (τ )−α k ak− j−1 − ak− j u i − ak−1 u i0 + O(τ 2−α ), C D0,t u(x i , tk ) = Γ (2−α) u i − j=1

(25)   where a j = ( j + 1)1−α − j 1−α , j = 0, 1 . . . k

17 Numerical Approximation of Caputo Definition and Simulation ...

183

Cao, Li and Hen Approximation [24]  α  C D0,t f (t) t=t = n

 (τ )−α   g j f n− j + r n , Γ (2 − α) j=0 n

(26)

Lemma 1 (Proposed numerical approximation) 

1 y (xn ) ≈ α [h (1 − α)]Γ (1 − α) α





y(xn )G αn,n



y(x0 )G αn,1



+

n−1 

 α y(xk )Mn,k

k=1

,

+ O(h)

2−α

α where, Mn,k = G αn,k − G αn,k+1 , and G αn,k = ((n − k + 1))1−α − ((n − k))1−α , (27)

Lemma 2 (Discretized fractional integration) 

  n−1    hα 1−α 1−α y (xn ) ≈ y(xk )G n,k y(xn ) + y(x0 )G n,1 + Γ (1 + α) k=1   1 (h)3−α f  (ξ ) , −O 12 0 < α < 1, f (t) : t ∈ [0, t], (27A) α

5 Proof of Lemma 1 From Eq. (7) and m = 1, a = 0, Γ (1 −

α α)C D0,t

t f (x) = 0

f 1 (x)d x , (t − x)α

(28)

Let, xn = nh, yn = yn (xn ) = yn (nh) and xk = kh, yk = yk (xk ) = yk (kh), α

xn

Γ (1 − α)y (xn ) ≈ 0

f 1 (x)d x , (xn − x)α

n xk  f 1 (xk )d x Γ (1 − α)y (xn ) ≈ , (nh − x)α k=1 α

xk−1

(29)

184

S. Gade et al.

From Tayler’s series neglecting higher derivatives, f 1 (xk ) ∼ = This could be written as f 1 (xk ) ∼ = α

Γ (1 − α)y (xn ) ≈

α

Γ (1 − α)y (xn ) ≈

h xk−1

dx , (nh − x)α

 kh n   y(xk ) − y(xk−1 ) h

k=1

Γ (1 − α)y α (xn ) ≈

y(xk )−y(xk−1 ) , h

 xk n   y(xk ) − y(xk−1 ) k=1

f (xk )− f (xk−1 ) , h

(k−1)h

dx , (nh − x)α

 kh n   y(xk ) − y(xk−1 ) (−1)(nh − x)1−α

, (1 − α) (k−1)h  −1 y(xk ) − y(xk−1 ) Γ (1 − α)y α (xn ) ≈ (nh − kh)1−α h 1 − α k=1  − (nh − (k − 1)h)1−α ,  n    y(xk ) − y(xk−1 ) (−1)(h)1−α  ≈ (n − k)1−α − (n − (k − 1))1−α , h 1−α k=1   n   y(xk ) − y(xk−1 )  α Γ (1 − α)y (xn ) ≈ (n − (k − 1))1−α − (n − k)1−α , α h (1 − α) k=1 h

k=1 n  



n     h α (1 − α) Γ (1 − α)y α (xn ) ≈ y(xk ) − y(xk−1 ) (n − k + 1)1−α k=1

 − (n − k)1−α , Let G αn,k = ((n − k + 1))1−α − ((n − k))1−α , 3-dimensional function, 

n      h α (1 − α) Γ (1 − α)y α (xn ) ≈ y(xk ) − y(xk−1 ) G αn,k ,



n n        h α (1 − α) Γ (1 − α)y α (xn ) ≈ y(xk )G αn,k − y(xk−1 )G αn,k ,

k=1

k=1



 h (1 − α) Γ (1 − α)y α (xn ) ≈ α

k=1



y(xn )G αn,n 

+

n−1 



k=1 n 

− y(x0 )G αn,1 +

k=2

y(xk )G αn,k 

 

  y(xk−1 )G αn,k ,

17 Numerical Approximation of Caputo Definition and Simulation ...



 h (1 − α) Γ (1 − α)y α (xn ) ≈



α

y(xn )G αn,n

+

y(xk )G αn,k

 

k=1

 y(x0 )G αn,1



n−1  

185

+

n−1  

y(xk )G αn,k+1

  ,

k=1



   h (1 − α) Γ (1 − α)y α (xn ) ≈ y(xn )G αn,n − y(x0 )G αn,1  n−1  n−1      α α + y(xk )G n,k − y(xk )G n,k+1 ,



   h α (1 − α) Γ (1 − α)y α (xn ) ≈ y(xn )G αn,n − y(x0 )G αn,1

α

k=1

+

n−1  

k=1

   y(xk )G αn,k − y(xk )G αn,k+1 ,

k=1 α Let, Mn,k = G αn,k − G αn,k+1 ,



n−1       α h α (1 − α) Γ (1 − α)y α (xn ) ≈ y(xn )G αn,n − y(x0 )G αn,1 + y(xk )Mn,k ,



1 y α (xn ) ≈ α [h (1 − α)]Γ (1 − α)



k=1





y(xn )G αn,n − y(x0 )G αn,1 +

n−1 

 α y(xk )Mn,k ,

k=1

Thus proved. It can be easily shown that O(h)2−α is the remainder. Corollary 1 1 α α = 1, y 1 (xn ) = h ≈ 0, G 1n,k = Mn,k = 0, G −1 n,1 = 2n − 1, G n,k = 1 if n = k, h −1 α = −1, y α (xn ) ≈ [y(xn ) − y(x0 )] + O(h)3 , G −1 n,k = 1, Mn,k = 0, 2

At singular point, α = 0, y 0 (xn ) ≈ y 0 (xn ) ≈ [(y(xn ) − y(x0 ))] + O(h)2 = f (t), 0 = 0. G 0n,k = 1, Mn,k

6 MATLAB: Algorithm and Simulation MATLAB variable starts from 1 to N indicating columns as samples but variable mentioned in the above section varies from 0 to n. Integer order differentiation is the local process whereas fractional order diffintegration is global process. Past data

186

S. Gade et al.

is required for the processing of current data. Hence, modified MATLAB-based equation (using Lemma 1) after the simplification is given as   N n−1       1 α y α ( f (x)) ≈ α Mn−1,k−1 ykα , ynα − y1α G αn−1,1 + h Γ (2 − α) n=1 k=2 At singular point, y 0 ( f (x)) ≈

N  

yn0 − y10



= f (x),

(30)

(31)

n=1 (n+1)t Ex. 1:- The exact solution of f (t) = t n is Dtα [t n ] = ((n+1)−α) First study of this numerical experiment is the effect of α on the mean square error at h = 0.1 and f (t) = t, t ∈ [0, 10], α ∈ [0.1, 0.9]. α increases with step of 0.001 (Fig. 1). The Mean Square Error (MSE) is almost constant for α = 0.1 to 0.3 and after that there is observed linear variation up to α = 0.5 but after this point the MSE increases exponential at more sharp rate and there is observed large variation in MSE for small variation of α. Thus it can conclude that the for small values of α there is almost very small variation in MSE but as α increases then the MSE become more sensitive for the very small change of α. Second observation is for the negative values of α where the MSE is most sensitive and varies with very small change in the parameter α. There will be another scope to minimize the sensitivity at negative values. Thus, it is concluded from above two observations that the most stable region of operation where MSE is less sensitive is 0 < α < 0.5. This could be extended to −0.5 < α < 0.5 by accepting some variation in MSE. MSE is most sensitive near the α = ±1. Let h = 0.1, α = 0.5, f (t) = t, t ∈ [0, N ] where “N” is the term of computation and increasing the value of “N” there is considerable decrease in the MSE (Table 1). n−α

Plot of alpha Vs mean square error

-29

9

x 10

8 7

mse

6 5 4 3 2 1 0 -1

-0.8

-0.6

-0.4

-0.2

0

alpha

Fig. 1 Effect of alpha on MSE

0.2

0.4

0.6

0.8

1

17 Numerical Approximation of Caputo Definition and Simulation ...

187

Table 1 Effect of “N” on MSE N

1

10

100

MSE

3.2776e−32

3.0517e−30

4.0034e−28

Table 2 Mean square error (MSE), step (h), and computation time (s) F(t) t

Alpha

h

MSE

Minimum time

Maximum time

Average time

0.1

3.0517e−30

6.853e−05

0.0013603

0.00076994

0.01

3.1969e−29

6.7206e−05

0.014064

0.0070956

−0.5

0.1

4.9966e−30

6.7868e−05

0.0015004

0.00082456

0.01

2.0805e−29

7.1841e−05

0.014965

0.0075543

0.9

0.1

2.0338e−29

6.7868e−05

0.0015924

0.00083805

0.01

1.1583e−27

6.9192e−05

0.01532

0.0074833

0.1

2.8631e−29

6.6875e−05

0.0015279

0.00082415

0.01

2.1003e−28

7.0186e−05

0.014826

0.0074532

0.5

0.1

2.0215e−04

6.7206e−05

0.0013891

0.00077764

0.01

2.1450e−07

6.6213e−05

0.013776

0.0069416

−0.5

0.1

1.6740e−05

6.7206e−05

0.0015487

0.00084224

0.01

1.7381e−09

6.522e−05

0.015049

0.0075521

0.1

0.0048

8.6408e−05

0.001504

0.0008102

0.01

3.0585e−05

6.6213e−05

0.015405

0.0075385

0.1

6.7874e−05

6.4557e−05

0.001511

0.00079262

0.01

6.7689e−09

6.7206e−05

0.015464

0.0076088

0.5

−0.9 tˆ2

0.9 −0.9

If the length of time approaches to infinity, then MSE approaches to (γ )2−α such as [(γ )2−α ∈ f (t)]. If the function is linear and smooth, then the MSE is small otherwise for the nonlinear function (t2) MSE is large (Table 2). However, in case of linear (t) and nonlinear function (t2), there is no any large considerable effect on the computation time for same values of h, t and α.

7 MATLAB Simulink Model Proposed fractional numerical method tested in MATLAB environment by comparing the results from standard available tool set of MATLAB with the proposed method. Fractional numerical method is further approximated and taken first principal term to fit the algorithm in MATLAB simulink environment. The error is not greater than 0.4 for fractional integration and the fractional derivative exactly matches with the output of MATLAB integer order derivative block. This method can be used for the implementation of fractional-based non-integer order controller in the real

188

S. Gade et al.

Fig. 2 Implementation of MATLAB Simulink model

time. The present fractional algorithm is used (Sect. 8) for the controlling of nonlinear plant using the control strategy of fractional order PID controller scheme simulated in the MATLAB simulink environment (Figs. 2, 3, 4).

8 Fractional PID Let us consider the linear plant [24] G1 =

0.000064s 4

+

1 + 0.2578s 2 + 1.248s + 1

0.009984s 3

The manual selection of fractional PID parameters and tuned integer order PID parameters are listed in Table 3. To ensure the effectiveness of proposed fractional order controller its performance is compared with tuned integer order PID controller. Fractional control scheme is more effective on the various performance basis, listed in Table 4. Considerable decrease in the overshoot from 7.59 to 0.56% is observed for the simulation time of 4 s (Fig. 5). For the unit step response, fractional order

17 Numerical Approximation of Caputo Definition and Simulation ...

Fig. 3 Comparison of Integration

Fig. 4 Comparison of derivative

189

190

S. Gade et al.

Table 3 Controller parameters Controller

Kp

Ki

Kd

N

λ

μ

Integer order

1.72

1.88

0

100





Fractional order

1

0.1

0



−0.1

0

Table 4 Controller performance Controller

Rise time (s)

Settling time (s)

Overshoot %

Integral mean square error

Peak control signal

% saving in control effort

Integer order

0.702

2.43

7.59

4.6

2

Fractional order

1.1

1.5

0.56

4.806

2

Same control effort

Fig. 5 Comparison of controller performance for unit step input

controller is comparatively faster than integer order controller, which settles early than traditional PID controller. The same performance is repeated when train of pulses is applied at the input (Fig. 6). The pulse has magnitude 1 and period of 10 s with 50% duty cycle. The simulation time is 60 s. The control efforts of fractional order PID controller are smoother than integer order PID (Fig. 7). This happened because fractional diffintegration is a global process and integer order derivative is a local process that’s why the fractional controller shows the memory and filtering effect with more efficiency. This fractional PID controller is implemented using proposed fractional numerical method. Hence, conversion of fractional controller to higher order integer controller is not required. Present method uses minimum bandwidth as control effort is smooth and thus this scheme is more reluctant to any disturbance input.

17 Numerical Approximation of Caputo Definition and Simulation ...

191

Fig. 6 Comparison of controller performance for unit pulse input

Fig. 7 Comparison of controller efforts for unit pulse input

9 Conclusion New method of numerical approximation of the Caputo fractional diffintegration has been derived. The numerical solutions can be effectively solved using this present fractional numerical method. The numerical study demonstrated the minimum mean square error, guarantee of accuracy, precision, feasible, and effective in providing the solution. The present method can be extended further for the real-time implementation in the digital/Computer/Embedded system. In future, this can be extended further for mth order fractional numerical solutions. G and M functions are 3D and some of the properties listed in this paper but further in future there are extensive scope of these functions in the field of physics, engineering, control, and image processing. MATLAB simulink block is successfully developed for model-based simulation. The error is negligible in this case. First time a fractional PID controller

192

S. Gade et al.

is simulated without converting to the equivalent approximate higher order controller and optimization of the same. Performance of fractional order PID controller is much smoother and faster as compare with integer order PID controller. The control effort is smooth and has memory like effect.

References 1. Caponetto R, Dongola G, Fortuna L, IPetras (2010) Fractional order systems modelling and control applications. World Scientific Series on Nonlinear Science, Series A, vol 72. World Scientific Publishing Co. Pvt. Ltd 2. Sabatier J, Agrawal OP, Tenreiro Machado JA (2007) Advances in fractional calculus: theoretical developments and applications in physics and engineering. Springer 3. Das S (2007) Functional fractional calculus for system identification and controls. Springer 4. Oldham KB, Spanier J (2006) The fractional calculus: theory and applications of differentiation and integration to arbitrary order (DoverBooks on Mathematics) 5. Kilbas AA, Srivastava HM, Trujillo JJ (2006) Theory and applications of fractional differential equations. Elsevier 6. Ross B (ed) (1975) Fractional calculus and its applications. Springer, Berlin 7. Podlubny I (1999) Fractional differential equations. Academic Press, San Diego 8. Miller KS (1995) Derivatives of non-integer order. Math Mag 68(3):1–68 9. Vinagre BM et al (2006) On auto-tuning of fractional order PIλDμ controllers. In: Proceedings of IFAC workshop on fractional differentiation and its application (FDA’06) (Porto, Portugal) 10. Valerio et al (2006) Tuning of fractional PID controllers with Ziegler-Nichols-type rules. Signal Process 86:2771–2784 11. Dorcak L et al (2001) State-space controller design for the fractional order regulated system. International Carpathian Control Conference, Krynica, Poland 12. Nataraj PSV, Tharewal S (2007) On fractional-order QFT controllers. J Dyn Syst Meas Control 129:212–218 13. Boudjehem B, Boudjehem D (2012) Parameter tuning of a fractional-order PI controller using the ITAE criteria. Fractional Dynamics and Control, Springer Science Business Media 14. Padula F, Visioli A (2011) Tuning rules for optimal PID and fractional-order PID controllers. J Process Control 21:69–81 15. Cao J, Cao B-G (2006) Design of fractional order controllers based on particle swarm optimization. Industrial Electronics and Applications, IST IEEE Conference, pp 1–6 16. Cao J, Jin L, Cao B-G (2005) Optimization of fractional order PID controllers based on genetic algorithms. Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on, vol 9, pp 5686–5689 17. Khubalkar Swapnil, Junghare Anjali, Aware Mohan, Das Shantanu (2017) Modeling and control of a permanent-magnet brushless DC motor drive using a fractional order proportionalintegral-derivative controller. Turk J Electr Eng Comput Sci 25:4223–4241 18. Das S, Saha S, Das S, Gupta A (2011) On the selection of tuning methodology of FO-PID controllers for the control of higher order processes. ISA Transactions 50(3):376–388 19. Jin Y, Branke J (2005) Evolutionary optimization in uncertain environments: a survey. IEEE Trans Evol Comput 9:303–317 20. Bhrawy AH et al (2014) A spectral tau algorithm based on Jacobi operational matrix for numerical solution of time fractional diffusion-wave equations. J Comput Phys 21. Brunner Hermann et al (2010) Numerical simulations of two-dimensional fractional subdiffusion problems. J Comput Phys 229(18):6613–6622 22. Cao Jianxiong, Li Changpin, Chen YangQuan (2014) Compact difference method for solving the fractional reaction–subdiffusion equation with Neumann boundary value condition. Int J Comput Math. https://doi.org/10.1080/00207160.2014.887702

17 Numerical Approximation of Caputo Definition and Simulation ...

193

23. Langlands TAM, Henry BI (2005) The accuracy and stability of an implicit solution method for the fractional diffusion equation. J Comput Phys 205:719–736 24. Avinash KM, Bongulwar MR, Patre BM (2015) Tuning of fractional order PID controller for higher order process based on ITAE minimization. IEEE Indicon 1570186367 25. Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov

Chapter 18

Raitha Bandhu G. V. Dwarakanath and M. Rahul

1 Introduction A farmer is a person engaged himself in agriculture by growing crops. They are the backbone of Indian economy. They produce crops which are consumed by all the people, they get up early in the morning go to fields to harvest their crops in the olden days they used to cultivate through oxen and now they are using cultivators. They go to farm sows seeds and water their crops regularly. They have to protect their crop from various pests. They face a lot of problems like financial issue, harm to crops from pest and some farmers does not know which crop to grow on their land based on climate and their soil. So to help farmers with all these issues we are developing an application called “Raitha Bandhu”. These days everyone is familiar with smartphones. “Raitha Bandhu” is an androidbased application created to help farmers to cultivate crops with the help of scientific methods instead of traditional methods. This application will help farmers regarding cultivation of crop, pest control tips from agricultural experts and insurance to their crops. “Raitha Bandhu” is an android-based application whose main aim is to help farmers. Here the farmers will able to know what are the climatic condition that is suitable to grow a particular crop and which crop is suitable to for their soil using this information the farmer can grow their crops accordingly if they have any issue to their crops regarding pest they can upload the photo of that crop and the agricultural experts will suggest what are the necessary steps that need to be followed in order to grow with pest management. And the farmer can register themselves for crop insurance by providing necessary information which will by later inspected by the insurance company/bank and will be approved if all the information is correct.

G. V. Dwarakanath (B) · M. Rahul Department of MCA, BMS Institute of Technology and Management, Bangalore 560064, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_18

195

196

G. V. Dwarakanath and M. Rahul

Agricultural expert will analyze the issue of farmers and then they provide solution to their problem. The farmers will get tips and solution from various agricultural experts which they can adopt in their cultivation to increase the yield of their crops. The farmers can know the status of their status of the insurance for their crop whether it is approved or rejected and if it is approved they will be able to know what amount of money is approved for their crop.

2 Literature Survey Raitha Bandhu is an application that helps farmers to solve their agricultural related issues. Farmers do not find a single app where they can get crop growing tips, a forum and insurance portal in single application. Farmers have to download a separate application to get the features like agricultural forum, crops cultivating tips, and insurance. By downloading a separate application, the storage on their smartphones increases. To solve this issue we have developed an application named “Raitha Bandhu” where farmers can get the above-mentioned features in a single application and save storage space. This application is a discussion forum for farmers and agricultural experts supported in two languages, i.e., Hindi and English language. The main aim is to help farmers to grow crops more effectively which will help to increase their yield, income using the latest technology. It helps the farmers understand how to grow crops effectively [1]. This application will help farmers to find solution the queries regarding agriculture. It provides information about pesticides that need to be used, information about seeds, fertilizer, and dosage that needs to be used for farming crops. It allows farmers to buy agricultural products and equipment using the application [2]. This application is a team of agricultural-doctors and experts who helps farmers by providing tips for better cultivation of the crop. It also gives information about weather forecast. It has a discussion forum, agricultural shop where farmers can buy agricultural products. This is mainly developed to help farmers of Gujarat, Maharashtra, and Rajasthan [3]. This application gives information on pests, diseases and weeds that are affecting crops and pest management that need to be taken care. It provides agriculture web links, i.e., that is the information that is collected from online resources like agricultural department, market price, pest management tips, etc. [4]. This application will provide information on more than 100+ crops and this application is supported in seven languages, it provides suggestions like how to cultivate crops using latest techniques, crop that need to be grown based on seasons, climatic conditions and provide irrigation information. It helps farmers to gain information on their crops that they are growing. It gives latest news regarding agricultural and it has videos in local language and it also has a discussion forum where the farmers can interact with agricultural experts [5]. This is an application developed by Gujarat agricultural organization especially for the agricultural community. It has audio and video that helps farmers to know

18 Raitha Bandhu

197

about latest techniques of agriculture. The video is supported in two languages, i.e., Gujarati and Hindi languages. The main goal of this application is to provide necessary information to farmers for regarding cultivation and providing pest management tips using videos [6]. This application is India’s number one agricultural shopping app where farmers can buy the necessary things for agriculture. It gives description of the products by agricultural experts and specialist. It has a facility of delivering the product on cash on delivery [7]. This application will give information on production of crops, how to protect crops. It allows users to interact with agricultural specialists and scientists. This app will allow users to share audio and video and then the experts will analyze the issue and give a solution [8]. This application will help farmers to buy, sell, and exchange agricultural related products without the interference of middleman. Farmers can post ads by entering the details and submit, this will be viewed by other farmers. It has a open discussion forum for the users [9].

3 Working The android application Raitha Bandhu consists of the following users, i.e., three modules • Farmer • Forum • Banker Farmer: Here the farmer will verify themselves with OTP and register, then they can check how the crops can be grown effectively they can send a request to bank or insurance company. Forum: It is interactive forum where the agricultural experts and farmers can interact. If the farmers are having any issue they can post that issue to the forum then the various pest experts will take a look and try to give a solution to their problem. Banker: Here the banker/insurance company will go through the request for insurance that is submitted by the farmers. They can approve and reject the request of the farmers insurance request.

4 Implementation This part includes all the information about technology that is used and control flow of the project. This project is implemented using java, xml which is written in Android Studio IDE which is easy to understand by the developer and can be implemented easily.

198

• • • • • • •

G. V. Dwarakanath and M. Rahul

Install android studio. Create a project by giving the name of the project. Design front-end using XML. Code the backend using java Create database in Firebase. Build the Apk file and run the Apk. The output is displayed.

Android Studio Android Studio is an open-source mobile application development platform for creating applications in android. It is official IDE for the development of application in android. Here the application can be written using java and kotlin. It has Android Virtual Device (Emulator) which is used to debug and run the application. It supports gradle-based build support. It has a template-based wizard with basic design and components. User can use drag-and-drop to create a user interface. Android wear application can be developed using this IDE. Firebase It is a platform which is built to develop mobile and web application. Firebase we can store the data of user in cloud. It supports various features like Authentication, real-time database, storage system, testing, analytics, cloud messaging, etc. It is also used for web hosting services. XML In this project, we are utilizing XML language for front-end structure of the application. XML depicts that Extensible Mark-up Language. It is a language that is understood easily by computer and user. Here the code is organized and kept simple which is easy to understand. Java Java is a programming language which will allow the developers to write once run anywhere (WORA). Once a code is compiled, it can run on all platform of java with compiling it again. Java is used in android studio because it builds the packages very quickly. Java is a robust and secure programming language (Figs. 1, 2, 3, 4, 5, 6).

5 Conclusion The “RAITHA BANDHU Android App” is designed in order to make the farmers acquire more knowledge about agriculture. The application is very easy to use as it is having a good user interface and users can interact with the application without any difficulties.

18 Raitha Bandhu Fig. 1 Home page

Fig. 2 Main page

199

200

G. V. Dwarakanath and M. Rahul

Fig. 3 Insurance request page

The application was has fulfilled the following objectives. • It helps Farmers to know which crop to grow on which season and based on the soil content of their farm. • It will allow farmers to request for crop insurance from the bank/insurance companies that are registered. • It has a forum where the farmer and pest experts can interact, here the farmers can ask the pest experts any queries regarding the crops and it will be solved by various pest experts. The bank/insurance companies will view the request for crop insurance that is produced by the farmers they process the request form and then decide whether to approve or reject the requests.

18 Raitha Bandhu Fig. 4 Forum home page

Fig. 5 Crop growing tips

201

202

G. V. Dwarakanath and M. Rahul

Fig. 6 Process insurance

References 1. My Agri Guru developed by MyAgriGuru, February 2017. Available in Google Playstore 2. Agro-Medix developed by AgroMedix AgriTech Solutions, January 2018. Available in Google Playstore 3. AgroStar Agri developed by AgroStar, May 2016. Available in Google Playstore 4. CropInfo India developed by Arun Gulbadher, February 2016. Available in Google Playstore 5. KrishiHub developed by KrishiHub, November 2017. Available in Google Playstore 6. AgriMedia Video App developed by Digital AgriMedia, September 2017. Available in Google Playstore 7. FarmKey developed by FarmKey, developed by FarmKey-Agriculture Shop, December 2018. Available in Google Playstore 8. AgriApp developed by AgriApp, September 2014. Available in Google Playstore 9. FarmBazaar developed by FarmBazaar Agri Solutions, October 2017. Available in Google Playstore 10. Kisaan Helpline developed by Ample eBusiness, January 2015. Available in Google Playstore 11. IFFCO Kisan developed by IFFCO Kisan, September 2015. Available in Google Playstore

18 Raitha Bandhu

203 Mr. Dwarakanath, G. V., MCA, Assistant Professor, Master of Computer Applications, BMS Institute of Technology and Management, Bangalore, Karnataka, India, 2013–2014 has published nearly 3 research papers in National/International Journals and published 16 papers in National/International Conferences, wishes to place on record his sincere gratitude to BMS Education Trust, BMS Institute of Technology and Management, Bangalore and expresses thanks to all those who helped in bringing out this paper.

Mr. Rahul, M., MCA Student, Department of MCA, BMS Institute of Technology and Management, Bangalore-560064, India, wishes to place on record his sincere gratitude to Guide Dwarakanath G V., Dept. of MCA staff, BMS Education Trust, BMS Institute of Technology and Management, Bangalore and expresses thanks to all those who helped in bringing out this paper.

Chapter 19

Queuing Theory and Optimization in Banking Models Runu Dhar, Abhishek Chakraborty and Jhutan Sarkar

1 Introduction There is hardly any person in the world who had not spent some part of waiting in the line for some kind of service. The study of queuing theory is very important in modern-day society. It appears in various fields from telecommunications to normal queuing at Railway counters, ration shops, toll gates, dentist offices, doctor chambers, banks, and much more. If the number of arrivals to a system exceeds the number of requests the system can provide service per unit of time, then the queue is formed. Thus a system is said to contain a queue when it is congested. In this paper, the system of a banking firm will be addressed. Nowadays in public life banking units are considered as one of the most necessary and important units. The banking sector primarily includes the development of a growing economy as well as the advancement in a developed economy. Arrivals to the system of banking firm are customers. The system of the banking firm is to be investigated to provide better service to the customers and to minimize the time which is needed to provide service. Here we focus to analyze the queues to optimize services of the banks to the customers and to reduce customers waiting time and increase service satisfaction. The main research question to answer is if the time spent at banks could be improved if a queuing system is implemented judicially. Arrival rate, service rte and utilization factor [1, 2] are the three important elements in queuing theory. In the context of the banking scenario, the arrival rate means the number of customers R. Dhar (B) · A. Chakraborty · J. Sarkar Department of Applied Mathematics, MBB University, Agartala, Tripura, India e-mail: [email protected] A. Chakraborty e-mail: [email protected] J. Sarkar e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_19

205

206

R. Dhar et al.

who arrive at bank in each unit time. The service rate is the number of customers that are serviced in each unit time, the utilization factor means efficiency related to work performed by the banking system or in other words the arrivals divided by departures or serviced clients [3, 4]. The system will not be effective when the utilization factor is more than one, since a utilization factor more than one indicates that the number of customers who arrive is greater than that of serviced in each unit time. When implementing queuing to an intelligent manner waiting times and customer satisfaction will be improved [5]. To develop queuing systems it must be understood in-details and queuing theory is used to do this, which leads to a Queuing Management System (QMS). A QMS can be implemented to manage the efficiency of a queuing system. The servicing methods can be changed to obtain better efficiency in the model. Here we have to structure as follows. An overview of background and existing research is to be given in Sect. 2. Section 3 narrates the fieldwork of the problem. In Sect. 4, the flow of customers to the bank and customers being served by the three machines are given. Section 5 outlines the graphical representation of the performance of servers and calculation of different results are also shown. Obtained results would be discussed in the section. Lastly in Sect. 7, conclusions from obtained are to be outlined.

2 Background and Existing Research The concept of queue is a pastime issue to some developed economy. There is hardly any person in the world who had not spent some part of waiting in the line for some kind of service. A frustration due to wait in line must occur. It often occurs in everyday life when people are in line at the Railway ticket counter, the ration shop, the big bazar, or in the banking hall or ATM booths. The formal study of waiting in line is found in queuing theory which also includes the study of operations management as a whole. This paper will investigate the role of queuing models in banking systems to provide better service to customers. In banks, this theory can be applied to measure multiple factors, namely, arrival and waiting time of customers, queuing length, service time, arrival rate, service rate, utilization factor, etc. Application of this theory would be used in banks with large number of customers where multiple service points provide service to the customers. Many researchers introduced and implemented different methods to improve the specific aspect of banking model. QMS with better performance refers to using a remote and local service [6]. M/M/1 queue model can simply the modeling process [7]. It is fact that a negative impact due to queuing delay may occur on the efficiency of a system [8]. Thus different servicing models may introduce into the system to assess performance improvements. Besides these, many other researchers [9–12] investigated on queueing theory to improve service quality, efficiency of tellers, the service time, the queue length, and waiting time of the banking scenario. After getting motivation from these works, we are interested to introduce queueing models in banking sector to provide better service to the customers.

19 Queuing Theory and Optimization in Banking Models

207

3 Field Research Here we applied quantitative research method and collected the data as per the daily record of flow of customers through queuing system over a week of Tripura Gramin Bank, Badharghat Branch, Tripura. We analyzed different queuing models and techniques and established a suitable model to provide better service to the customers. The suitable model will estimate the actual time to improve service to the customers.

4 Performance of Machines Tables 1, 2, 3, 4, 5 and 6 show the records of flow of customers from Monday to Friday to the bank and being served through machine 1, machine 2 and machine 3 per hour.

Table 1 Flow of customers and performance of machines on Monday Monday Time

Machine 1

Machine 2

Machine 3

Arriving customers

Served customers

Arriving customers

Served customers

Arriving customers

Served customers

11 a.m.–12 noon

42

32

38

34

13

13

12 noon–1 p.m.

38

34

35

31

12

11

1 p.m.–2 p.m.

40

35

31

30

10

10

2 p.m.–3 p.m.

37

34

40

38

15

12

3 p.m.–4 p.m.

33

33

25

24

10

8

4 p.m.–5 p.m.

30

30

27

27

7

6

Table 2 Flow of customers and performance of machines on Tuesday Tuesday Time

Machine 1

Machine 2

Machine 3

Arriving customers

Served customers

Arriving customers

Served customers

Arriving customers

Served customers

11 a.m.–12 noon

40

35

35

33

11

11

12 noon–1 p.m.

39

32

33

31

14

12

1 p.m.–2 p.m.

30

29

38

35

10

9

2 p.m.–3 p.m.

42

38

30

30

11

7

3 p.m.–4 p.m.

35

30

27

25

12

11

4 p.m.–5 p.m.

32

30

29

27

9

7

208

R. Dhar et al.

Table 3 Flow of customers and performance of machines on Wednesday Wednesday Time

Machine 1

Machine 2

Machine 3

Arriving customers

Served customers

Arriving customers

Served customers

Arriving customers

Served customers

11 a.m.–12 noon

45

34

33

30

15

15

12 noon–1 p.m.

41

35

27

23

16

15

1 p.m.–2 p.m.

33

25

29

27

9

9

2 p.m.–3 p.m.

27

21

41

39

10

6

3 p.m.–4 p.m.

40

39

27

25

7

7

4 p.m.–5 p.m.

33

33

25

25

12

9

Table 4 Flow of customers and performance of machines on Thursday Thursday Time

Machine 1

Machine 2

Machine 3

Arriving customers

Served customers

Arriving customers

Served customers

Arriving customers

Served customers

11 a.m.–12 noon

33

29

25

24

8

12 noon–1 p.m.

42

39

38

36

9

9

1 p.m.–2 p.m.

27

26

29

24

12

11

2 p.m.–3 p.m.

39

34

33

30

7

7

3 p.m.–4 p.m.

38

37

31

28

15

12

4 p.m.–5 p.m.

25

24

27

25

9

9

8

Table 5 Flow of customers and performance of machines on Friday Friday Time

Machine 1

Machine 2

Machine 3

Arriving customers

Served customers

Arriving customers

Served customers

Arriving customers

Served customers

11 a.m.–12 noon

25

33

31

30

12

11

12 noon–1 p.m.

42

38

38

35

7

7

1 p.m.–2 p.m.

49

46

25

24

15

12

2 p.m.–3 p.m.

36

30

34

34

8

8

3 p.m.–4 p.m.

33

26

27

24

6

5

4 p.m.–5 p.m.

21

22

24

22

4

4

19 Queuing Theory and Optimization in Banking Models

209

Table 6 Analysis of the machines performance (independently) Day

Day 1

Day 2

Day 3

Day 4

Day 5

Machine 1

Machine 2

Machine 3

Arriving customers

Served customers

Arriving customers

Served customers

Arriving customers

Served customers

Total arrival or service rate

220

198

196

184

67

60

Average arrival or service rate

36.6666667

33

32.6666667

30.6666667

11.1666667

10

Total arrival or service rate

218

194

192

181

67

57

Average arrival or service rate

36.333333

32.333333

32

30.1666667

11.1666667

9.5

Total arrival or service rate

219

187

182

169

69

61

Average arrival or service rate

36.5

31.1666667

30.333333

28.1666667

11.5

10.1666667

Total arrival or service rate

204

189

183

167

60

56

Average arrival or service rate

34

31.5

30.5

27.833333

10

9.333333

Total arrival or service rate

206

195

179

169

52

47

Average arrival or service rate

34.333333

32.5

29.833333

28.1666667

8.333337

7.833333

210

R. Dhar et al.

5 Performance of Servers Shown Graphically Data is shown graphically from Figs. 1, 2, 3 (for the different machines) Calculations Machine 1: utilization rate ( ƥ1 ) = λ1 /μ1 = 1.12244117 Machine 2: utilization rate ( ƥ2 ) = λ2 /μ2 = 1.075875708 Machine 3: utilization rate ( ƥ3 ) = λ3 /μ3 = 1.115151515 Average utilization rate ( ƥ ) = λ/μ = 1.102213363 The above results show that if the machines act independently then the utilization factor of each machine is greater than 1. If we take average of these then it is also greater than 1. It means that a queue line will be there if they serve independently. Now we calculate further: Inter arrival time (expected) (1/λ) = 0.03943044917 per min. Service time (μ) = 23.00925745 per min. P0 (zero unit in the system) = 0.2653907582 per min. Average number of line = Lq = 0.05437804932 per min. Average waiting time (Wa ) = 0.02290076 per min. Average time customers in the system (Wq ) = 0.00214415091 per min. PW (an arrival will have to wait for service) = 0.09362791158 Average time spend in the system (WS ) = 0.045604911891 per min. Series 2

Series 1

Series 3

250

200

150

100

50

0 Day 1

Day 2

Day 3

Day 4

Day 5

Fig. 1 Graphical presentation of performance of Machine 1 from Monday to Friday

19 Queuing Theory and Optimization in Banking Models

Series 1

Series 2

211

Series 3

250

200

150

100

50

0 Day 1

Day 2

Day 3

Day 4

Day 5

Fig. 2 Graphical presentation of performance of Machine 2 from Monday to Friday

Customers (in average) in the system (Ls ) = 1.15659141232 System utilization ( ƥ ) = λ/Mμ = 0.3384306091 The system capacity = Mμ = 69.02777235 s.

6 Discussion on Obtained Results We observed that when the machines work independently they are not adequate to serve as much the requirement as shown in above. But when they work simultaneously then we get a better utilization rate of 0.0936279. The parallel work of machines minimizes the time to wait for a customer and the respective probability reduces to 0.04560491 per min. The utilization factor for the system is 0.3384306 per hour. Moreover the capacity of the system consisting of three machines is obtained to be 69.027 in each hour.

212

R. Dhar et al.

Series 1

Series 2

Series 3

140 120 100 80 60 40 20 0 Day 1

Day 2

Day 3

Day 4

Day 5

Fig. 3 Graphical presentation of performance of Machine 3 from Monday to Friday

7 Conclusion The present case study concludes that the notion of queuing theory is necessary to an organization for the betterment of the concerning company. We have conducted the research at Tripura Gramin Bank, Badharghat Branch, Tripura. There are random people coming to the bank in different hours. We assume that potential customers will start to balk when they see more than usual people are already queuing. The capacity of the bank when fully occupied will not be able to give better service. We have seen in the case study that if the machines do not work simultaneously they are not enough to give the service irrespective of the number of machines that are servicing. The case study also shows that the machines of the respective bank should work simultaneously or the number of machines is to be increased to improve service to the customers. But if the number of machines is increased then the expenditure of bank is also to be increased. So we suggest to act the machines simultaneously to reduce waiting time for the customers as well as to minimize the expenditure of the bank.

19 Queuing Theory and Optimization in Banking Models

213

References 1. Allen O (1980) Queuing models of computer systems. Computer 4:275–283 2. Gross D (2008) Fundamentals of queuing theory. Wiley 3. Blanchard BS, Fabrycky WJ (1990) Systems engineering and analysis. Prentice Hall, New Jersey 4. Bolch G, Greiner GS, De Meer, Trivedi KS (2006) Queuing networks and Markov chains: modeling and performance evaluation with computer science applications. Wiley 5. Nosek RA, Wilson JP (2001) Queuing theory and customer satisfaction: a review ofterminology, trends, and applications to pharmacy practice. Hosp Pharm 36(3):275–279 6. Bagchi S (2015) Analyzing distributed remote process execution using queuing model. J Internet Technol 16(1):163–170 7. Mahmood K, Chilwan A, Sterb O, Jarschel M (2015) Modeling of open flow-based softwaredefined networks, the multiple node case, Networks. IET 4(5):278–284 8. Al-Mogren A, Iftikhar M, Imran M, Xiong N, Guizani S (2015) Performance analysis of hybrid polling schemes with multiple classes of self-similar and long-range dependent traffic input. J Internet Technol 16(4):615–628 9. Hammond D, Mahesh S (1995) A simulation and analysis of bank teller manning. Winter Simulation Conference Proceedings, pp 1077–1080 10. Sarkar A, Mukhopadhyay AR, Ghosh SK (2011) Improvement of service quality by reducing waiting time for service. Simul Model Pract Theory 19(7):1689–1698 11. Madadi N, Roudsari AH, Wong KY, Galankashi MR (2013) Modeling and simulation of a bank queuing system. In: Fifth international conference on computational intelligence, modeling and simulation, pp 209–215 12. Xiao H, Zhang G (2010) The queuing theory application in bank service optimization. In: International Conference on Logistics Systems and Intelligent Management, vol 2, pp 1097– 1100

Chapter 20

A Survey on Network Coverage, Data Redundancy, and Energy Optimization in Wireless Sensor Network Asha Rawat and Mukesh Kalla

1 Introduction A wireless sensor network is an area which is used to monitor any or certain event occurrence. It consists of large number of non-expensive sensor nodes having limited battery and are deployed to monitor a particular area of interest. The sensor nodes are usually small in size with inbuilt microcontroller and radio transceiver to sense events, process the data and transmit it. Each sensor node has four major components: (1) sensing transducer, (2) data processor, (3) radio transceiver, and (4) battery with limited energy. They are extensively used in environmental condition monitoring, wildlife habitat monitoring, security surveillance, etc (Fig. 1). Each sensor node has sensing range and communication range. A sensor node monitors its area of interest within its sensing range r s . Also, each sensor node has a communication range r c [1, 2]. The radio coverage bounded by r c is the region in which an active sensor node can talk with minimum one other sensor node. Sensing coverage (r s ) ensures proper event monitoring while radio coverage (r c ) ensures proper data transmission/communication within WSN. The network coverage can be defined as overall coverage by all active sensors nodes in WSN. Sensor network lifetime could be defined as time duration for which the wireless sensor network performs sensing activity efficiently and transmits data toward the base station or sink node. At times additional sensor nodes are deployed in order to maintain sufficient coverage degree C d . However when multiple sensor nodes are deployed to monitor the same sensing area, then there could be a possibility of unnecessary coverage redundancy which would result in wastage of energy. Since WSN node has limited battery lifetime, it is essential to identify the redundant node and switch them off A. Rawat (B) · M. Kalla Department of Computer Science & Engineering, SPSU, Udaipur, India e-mail: [email protected] M. Kalla e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_20

215

216

A. Rawat and M. Kalla

Fig. 1 A sensor node’s sensing range and communication range

so that they can be utilized later. Further, in this paper, there are many protocols discussed aimed to ensure energy-efficient coverage of WSN.

2 Issues in Wireless Sensor Network 2.1 Coverage Area Coverage can be interpreted as how efficient the sensor network monitors a field of interest. It is one of the major open areas for research. Wang et al. [3] propose a mathematical model by improvising the original whale swarm optimization algorithm. The authors have improvised this strategy by deducing the fitness level of the whale and taking into consideration the best individual whale (node) with the smallest fitness value. The fitness level refers to the initial whale population so that the individual whales (node) are not poorly positioned initially. The algorithm gives a descent coverage area after a number of iteration till all the whales (nodes) are properly positioned. However, energy required sensing the area is very large. Boulem et al. [4] states a hybrid approach by combining the characteristic of random deployment and deterministic deployment. The authors have divided the entire coverage area into subarea and deployed large number of sensor nodes on each subarea. A special node called as sink node is deployed at the center which will collect all information from neighboring nodes and send it to base station. The algorithm works in two steps: anticipate configuration step and scheduling process step. The algorithm ensures a sufficient coverage area; however, since sensing range and communication range are same, the rate of data redundancy would increase since two consecutive nodes can sense the same data. This may utilize more energy eventually leading to the imbalance energy consumption of the network.

20 A Survey on Network Coverage, Data Redundancy …

217

2.2 Data Redundancy Ouadou et al. [5] propose an idea of redundancy data elimination based on filtering scheme for integrated sensor—RFID networks. The authors have taken into consideration both intra-path and extra-path redundant data. The approach to the problem takes place in three phases: preliminary phase, network discovery phase (NDP), and network reading phase (NRP). In NDP, the sink identifies the end nodes of the sensor network by broadcasting the preliminary message. End nodes then send discovery message (DM) which contain the list of ids of tags in its reading range. During NRP, the node scrutinizes the tag list received from the neighbor and checks whether there is intra-redundant data. If it finds, it filters the data accordingly, adds its own ids of tags discovered and sends further till the data reaches the sink. The author has simulated in a deterministic controlled environment. Energy depletion during NDP and NRP is not taken into consideration. The authors have assumed that all the nodes in the network have adequate residual battery energy for processing. Priya and Enoch [6] states an algorithm for elimination of data redundancy using hash function. Rabin Karp algorithm is used to find any set of pattern in the incoming data. The algorithm constructs a tree node structure hierarchically. It divides the incoming data into chunks using Rabin Karp hashing and generates a representative chunks for each source generating packets. It then compares the representative chunks with the received packet for redundancy. If algorithm finds data redundancy, it encodes the redundant file with fixed-sized metadata and shrinks the packet. The simulation is done for very few nodes and fewer packets. For real-time environment, incoming data is abundant and numbers of sensor nodes are in hundreds. The algorithm performance could be verified in a better manner if it is simulated in a more realistic environment. Al-Qurabat et al. [7] present Distributed Data Aggregation Protocol (DiDA) for improvising the network lifetime of the sensor networks. The main objective of the algorithm is to sense the data, convert it into segments and reduce the dimension of the segment to remove the redundancy. Temperature is considered as sensor node measurements redundant data is reduced by reducing the dimensionality of the series of data sensed by the sensor nodes. Authors achieve the objective by implementing Adaptive Piecewise Constant Application (APCA) technique. Simulation has been performed in OMNET++. Results have been found more effective when compared with algorithm which did not implemented APCA. However, the data loss percentage increases with the increasing value of threshold and sensor readings. Also the authors have implemented the algorithm in simulation and have not implemented in real test bed which may give varying results.

218

A. Rawat and M. Kalla

3 Approaches for Energy Optimization in WSN 3.1 Ant Colony Optimization Heuristic states a rule that comes from experience and helps to think through things, like the process of elimination, or the process of trial and error. Nasir et al. [8] propose ACO algorithm with a heuristic approach by applying a swarm intelligence methodology that was inspired by the foraging behavior of ants that work as a team to find the shortest path between the nest and food. It determines the optimal route from source to destination through a metaheuristic approach. However, the algorithm does not take into consideration the cascading effect. Acharya et al. [9] discussion lie on the theory that when nodes are deployed at a distant from the base station then the nodes undergo unequal energy dissipation while transmitting to the base station. The algorithm implements a chain using ACO instead of greedy algorithm (PEGASIS) [10] and proved that their approach yields better result. The algorithm allows the nodes to transmit in unequal times depending upon how far the nodes are from the base station. The main objective of the algorithm is to utilize both local information (visibility) and also information about good solution obtained in the past (pheromone), when constructing new solution [9]. A chain is constructed with ACO instead of PEGASIS which shows increased network lifetime. However, the improvement fades away gradually due to rapid node deaths. Li et al. [11] propose a model for selecting the next node hop by taking into consideration the distance between the nodes and residual energy of the node. If the distance is minimum but the residual energy of the next hop node is low, then the algorithm would not select this path. The algorithm drastically reduces the cascading effect since the ants will choose the path with high residual energy node even if the pheromone concentration of the path is relatively low. Based on the fact that ACO has a slow convergence behavior which tends to premature at an early stage, Li et al. [12] have proposed a hybrid routing algorithm by combining the Artificial Fish Swarm Algorithm (AFSA) [13] and Ant Colony Optimization (ACO) [14]. The authors have taken into consideration preying, swarming, and follow behavior of the artificial fish for the initial routes discovery purpose. These routes are used as a heuristic factor for the ACO algorithm, improving the energy efficiency for the overall hybrid protocol namely Artificial Fish Swarm Ant Colony Optimization (AFSACO) for the initial route establishment, state vector of artificial fish swarm is attached to each sensor node [12]. The AF when execute a good preying behavior and swarming behavior will leave a trail for neighboring AF to reach the food. The satisfactory feasible routes obtained from AFSA are utilized as heuristic information for the initial pheromone value in ACO route discovery [12]. The authors have assumed that nodes are isomorphic. Also, the link is assumed to be symmetric. This indicates that the performance of AFSACO might degrade in a heterogeneous network. The algorithm is difficult to be implemented in a small scale network. Since the processing is high, the performance seems to be efficient in large

20 A Survey on Network Coverage, Data Redundancy …

219

scale network. The algorithm has been testified in simulation and compared with other non-hybrid algorithms. AFSACO is yet to be tested in real test bed to verify actual performance.

3.2 Deterministic Sensing Connected target k-coverage (CTCk ) can efficiently work on a heterogeneous network. Yu et al. [15] states that a node becomes active if and only if 1. Node has chances of covering multiple target 2. Node has higher battery life 3. Node covers an area which is not covered by other sensor nodes. If all the three conditions are not satisfied then it enters sleep mode. However, if central node fails, it can cause communication failure which may lead to untimely node death as well network lifetime and structure will eventually collapse. Ye et al. [16] have proposed Probing Environment Adaptive Sleeping (PEAS) algorithm in which the node works in three states: sleeping, probing, and active. Network lifetime is prolonged by keeping only necessary nodes active and rest in sleeping mode. Major advantage of PEAS is that there is maximum network coverage by waking minimum number of node. However, the node overhead is high since the algorithm does not maintain the node information. Also, there is a possibility of sensing void since the algorithm does not guarantee full sensing coverage. Once the working node becomes active, it will never enter sleep state again which will cause sensing void leading to energy imbalance in the network. The optimal efficiency of PEAS is reflected in an unfavorable kind of environment where large number of sensor nodes are deployed. In such environment the density of the nodes has to be kept large or else some of the probing node may not be able to discover any active node in their probing range. If this situation persists then the probing node will eventually enter into active state which would lead to reduction of network lifetime. Probing Environment and Collaborating Adaptive Sleeping (PECAS) is an extension of PEAS. Gui et al. [17] state that active node while indicating its residual energy enters sleep state after specific interval of time. The sleeping node, based on the information that the active node might enter sleep state, starts probing again. The highlight of the logic is that PECAS does not allow working node to operate continuously until total energy is exhausted. However, PECAS carries a large overhead of message exchange overheads of probing nodes when the active nodes residual energy is almost exhausted. Both in PEAS and PECAS, the nodes are randomly deployed. Sensors nodes transfer data using ad hoc routing which may exhaust the battery faster resulting in reduced network lifetime.

220

A. Rawat and M. Kalla

3.3 Probabilistic Sensing Ahmed et al. [18] explored the problem of determining the sensing capability of sensors in a probabilistic environment. The author has made a number of assumptions like threshold value, shadowing deviation, transmitting power, etc. The algorithm works in two phases. In the first phase, the sensor nodes gather all information of its neighbors. The second phase determines the neighbor’s contribution toward detection. This phase determines the cumulative detection probability by the neighboring nodes whose sensing range intersects with the node. The author states a probabilistic coverage algorithm to evaluate area coverage taking into account various assumptions. The algorithm does not take into consideration data redundancy issue which may arise due to overlapping coverage area of the nodes. Wang et al. [19] has proposed a method in which two-level fitness values can be evaluated. The genetic algorithm implemented ensures maximum disjoint sets with full coverage. The two-level fitness with bias attention is designed to cover larger percentage of area. The generative algorithm is implemented in multiprocessor distributed environment. Probabilistic Coverage Protocol (PCP) proposed by Hafeeda and Ahmadi [20] is a distributed coverage protocol. PCP activates a set of sensors to form a triangular lattice to cover the area. PCP starts by selecting an activator (sensor) to activate the area. This area then activates six other sensors which are deployed at the vertices of the hexagon with the activator at the center. PCP operates faster since it takes less amount of time to schedule active or sleep nodes as compared to deterministic protocol. Sensing void issues is not there in PCP because of the hexagonal structure. The nodes which are passive but have more residual energy left is selected by PCP to enhance the lifetime of the network. However, PCP takes time to form the coverage area by activator and other node selection which form the hexagonal structure. Hence it is a converging protocol. Failure probability of the nodes is also not considered.

3.4 Low-Energy Adaptive Clustering Hierarchy Heinzelman et al. [21] introduced new protocol architecture LEACH: low-energy adaptive clustering hierarchy for energy-efficient cluster-based routing. All the nodes are organized into cluster with one node as cluster head. The cluster head receives data from the cluster members and performs data aggregation before sending to the base station. There are three major stages in the algorithm: Cluster head selection algorithm where equal opportunity is given to every cluster member to become cluster depending upon energy availability. Once a cluster member becomes a cluster head, the node does not get another chance to become cluster head again. In cluster formation stage each cluster head advertises itself. The non-cluster head node selects the cluster head that requires minimum communication energy. In steady-state phase

20 A Survey on Network Coverage, Data Redundancy …

221

stage, the cluster member communicates with the cluster head by sending data implementing a TDMA schedule for reducing energy dissipation. However, the delay in time is not optimized. It may result to large queues if delay is small and idle channel if the delay is too large. Intra cluster communication should be managed properly to avoid data redundancy which may lead to unnecessary energy wastage of nodes. Cevik et al. [22] have presented three models for selecting a cluster head in LowEnergy Adaptive Clustering Hierarchy (LEACH) algorithm. In the first approach, waiting time is calculated between the node and center of cluster taking into consideration the residual energy of the node. The second model calculates the distance between the central nodes of the cluster and the target sink while considering the residual energy of the node. In the third approach, every node has the equal probability to be selected as a cluster head and the distance of each node is calculated with respect to the sink. Singh et al. [23] state that 70% of the energy is consumed in transmission of data between sensor nodes and base station. The main objective of hierarchical type of routing is to maintain the energy level of the sensor node through multihop communication within a cluster and also through data aggregation for eliminating the data redundancy. The standard Low-Energy Adaptive Clustering Hierarchy (LEACH) operates in two phases namely set up phase and steady-state phase. Set up phase is a cluster head selection phase in which all nodes participate and based on some probability, cluster head is selected. In steady-state phase, the node transmits data to the selected cluster head which performs data aggregation before sending it to the base station. The authors have modified the standard LEACH algorithm by considering heterogeneous network with two energy levels: normal node having less energy and advance node which have more energy and can be considered for cluster heads. The proposed protocol also has two phases just like standard LEACH protocol: set up phase—In the first round of cluster head selection—the proposed algorithm behaves exactly like standard LEACH protocol. In the next rounds, the remaining energy of cluster heads is checked and if the energy is greater than the threshold value then the node will continue to remain the cluster head. If not, then a new cluster head is selected just like the standard LEACH protocol. The above steps eventually eradicate the disadvantage of the standard LEACH which replaces all the existing cluster heads in the next rounds of cluster head selection even if the existing cluster head has enough energy. A comparison of the algorithm with LEACH and MODLEACH [24] shows improvement in the network lifetime when LEACH is implemented. However, the rate of individual node death is higher in the proposed algorithm.

222

A. Rawat and M. Kalla

4 Gap Analysis 4.1 Limited Node Mobility Most of the authors have assumed the nodes to be static. If the nodes are mobile then the possibility of sensing void might get reduced by positioning the mobile nodes to cover the non-sensing areas.

4.2 Heterogeneous Network with Obstacles [15, 23] show an efficient performance in a heterogeneous environment. However, most of the protocols [3, 8, 9, 11, 12, 21–23] degrade in a heterogeneous environment. Such protocols can work in a homogenous environment with known resources and devices. They are difficult to adapt in heterogeneous environment in which the probability of data delay and energy depletion increases.

4.3 Optimization of Wake-Up Rate of Sleeping Nodes The probing nodes [16, 17] have still need to wake frequently for probing even if the algorithm is optimized to precise wake-up time since the wake-up time may not be accurate. The accuracy and precision of wake-up time have to be worked upon.

4.4 Data Redundancy Since the sensing area might overlap to increase network lifetime [5–7] and decrease sensing void the probability of two or more nodes sending the same data is high. The overhead at the sink increases since it has to remove the redundant data before sending it further.

4.5 Node Failure Probability Most of the protocols have not considered the condition if there is node failure. It is difficult to deploy the sensor nodes manually in areas like mountains, dense forest. In such cases, the nodes are deployed with the help of chopper and are thrown in the area at a great distance. Since node is a hardware device, the possibility of node failure is more when thrown from a great distance.

20 A Survey on Network Coverage, Data Redundancy …

223

4.6 Sensing Void Sensing void still exists [16, 17] since the number of nodes is limited and hence unable to cover the entire area. Also as the distance between the sensor node and the monitoring event increases, the sensing decreases.

5 Conclusion Network coverage, data redundancy, and energy optimization are the most fundamental issues in WSNs which directly impacts the quality of service and network lifetime of WSN. A good handling of the first two issues will eventually lead to efficient energy optimization. In this paper, we have attempted to present the various work done to address the fundamental issues of WSN. We also discussed research gap areas like data redundancy, node failure probability, node scheduling, and sensing voids which are still the major areas of research. These gaps are discussed to analyze the parameters which can be improved for increasing the node and network lifetime.

References 1. Akkaya K, Younis M (2005) A survey on routing protocols for wireless sensor networks. Sci Direct Adhoc Netw 3(3):325–349 2. Mulligan R, Ammari H (2009) Coverage in wireless networks: a survey. Netw Protoc Algorithm 2(2):27–53 3. Wang L, Wu W, Qi J, Jia Z (2018) Wireless sensor network coverage optimization based on whale group algorithm. Comput Sci Inf Syst:569–583 4. Boulem A, Dahmani Y, Maatoug A, De-runz C (2018) Area coverage optimization in wireless sensor network by semi-random deployment. Sensornets:85–90 5. Ouadou M, Zytoune O (2017) Proactive redundant data filtering scheme for combined RFID and sensor networks. Electronics:6–72 6. Priya D, Enoch S (2018) The effect of packet redundancy elimination technique in sensor networks. J Comput Sci:740–746 7. Al-Qurabat A, Idrees A (2017) Distributed data aggregation protocol for improving lifetime of wireless sensor network. Qalaai Zanist J:204–215 8. Nasir H, Ku-Muhamud K, Kamioka E (2017) Ant colony optimization approaches in wireless sensor network: performance evaluation. J Comput Sci:153–164 9. Acharya A, Seetharam A, Bhattacharyya A (2009) Balancing energy dissipation in data gathering wireless sensor networks using ant colony optimization, vol 5408. Springer, Berlin Heidelberg, ICDCN 2009, pp 437–443. https://doi.org/10.1007/978-3-540-92295-7_52 10. Lindsey S, Raghavendra CS (2001) PEGASIS: power-efficient gathering in sensor information systems. Proceedings IEEE ICC 2001, pp 1125–1130 11. Li P, Nie H, Qiu L, Wang R (2017) Energy optimization of ant colony algorithm in wireless sensor network. Int J Distrib Sens Netw 13(4) 12. Li X, Keegan B, Mtenzi F (2018) Energy efficient hybrid routing protocol based on the artificial fish swarm algorithm and ant colony optimization for WSNs. Sensors 2018:18–3351

224

A. Rawat and M. Kalla

13. Li X (2002) An optimizing method based on autonomous animals: fish swarm algorithm. Syst Eng Theory Pract:32–38 14. Dorigo M, Birattari M (2011) Ant colony optimization, encyclopedia of machine learning. Springer, Berlin, Germany, pp 36–39 15. Yu J, Chen Y, Ma L, Huang B, Cheng X (2016) On connected target k-coverage in heterogeneous wireless networks. Sensors 16(1):2016 16. Ye F, Zhong G, Lu S, Zhang L (2002) PEAS: a robust energy conserving protocol for long-lived sensor network. In: Proceedings—international conference on distributed computing systems, pp. 28–37 17. Gui C, Mohapatra P (2004) Power conservation and quality of surveillance in target tracking sensor networks. In: Proceedings-annual international conference on mobile computing and networking (ACM MobiCom’ 04), pp 129–143 18. Ahmed N, Kanhere S, Jha S (2005) Probabilistic coverage in wireless sensor network. In: Proceedings-IEEE conference on local computer network LCN’05, pp 672–681 19. Wang Z, Zhan Z, Zhang J (2018) Solving the energy efficient coverage problem in wireless sensor networks: a distributed genetic algorithm approach with hierarchical fitness evaluation. Energies 11:3526. https://doi.org/10.3390/en11123526 20. Hefedda M, Ahmadi H (2010) Energy-efficient protocol for deterministic and probabilistic coverage in sensor networks. IEEE Trans Parallel Distrib Syst 21. Heinzelman W, Chandrakasan A, Balakrishnan H (2002) An application-specific protocol architecture for wireless microsensor networks. IEEE Trans Wireless Commun 1(4):660–670 22. Cevik T, Ozyurt F (2015) Impacts of structural factors on energy consumption in cluster-based wireless sensor networks: a comprehensive analysis. Int J Adhoc Sens Ubiquitous Comput (IJASUC) 6(1):01–19 23. Singh J, Singh B, Shaw S (2014) A new LEACH-based routing protocol for energy optimization in wireless sensor network. In: IEEE international conference on computer and communication technology, pp 181–186 24. Mahmood D, Jayaid N, Mahmood S, Qureshi S, Memon A, Zaman T (2013) MODLEACH: a variant of LEACH for WSN. In: 26th IEEE canadian conference on electrical and computer engineering

Chapter 21

An Offline-Based Intelligent Motor Vehicle Driver Behavior Analysis Using Driver’s Eye Movements and an Inexperienced and Unauthorized Driver Access Control Mechanism Jai Bharath Kumar Gangone

1 Introduction In the world, every Nation is facing a problem to identify a person who is driving a Motor Vehicle is eligible to drive the Motor Vehicle by having a valid motor driving license. Finding out such a thing is very difficult by seeing a particular person who is driving a Vehicle. Even though we have a very stringent rules for driving a motor vehicle and a very high supervision on the roads by Traffic Police personal, still it is a very difficult job to catch the people who are driving without a valid driving license and violating traffic rules, this tendency is more seen in Youth than adult or old people. According to the World Health Organization’s recent youth and road safety statistics everyday just over 1000 young people under the age of 25 years are killed in road traffic crashes around the world. Road traffic injuries are the leading cause of death globally among 15–19-year-olds, while for those in the 10–14 years and 20–24 years age brackets they are the second leading cause of death [1]. Another survey says about 719 people are dying with red-light running crashes [2]. To avoid such situations this paper suggests a method to start the engine of the vehicle if and only if the driver is eligible to driving by holding a valid driving license and authorized by the vehicle Primary owner. The implementation is carried out by using Raspberry Pi-based hardware module and a primary owner-driven secure access Mobile application. The Raspberry Pi is connected with a camera which used to recognize the driver by his face in real time [3]. A secured Android application is developed [4] to get the data from the Regional Transport Authority (RTA) and to load into MYSQL database into memory chip placed in Raspberry Pi. The data from RTA database, regardless of the data representation will be first fetched into local memory J. B. K. Gangone (B) Department of Computer Engineering, Faculty of Electrical and Computer Engineering, Bahir Dar Institute of Technology, Bahir Dar University, Bahir Dar, Ethiopia e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_21

225

226

J. B. K. Gangone

of Android application through a secured communication mechanism called Kerberos and in the JSON format will be stored as a JSON object in the Android Device local memory [5]. Then this JSON object will be transferred into MYSQL [6] database residing in memory chip on Raspberry Pi through wired or wireless medium. Once the data is entered into the memory chip on Raspberry Pi, the system will work in Offline more from now onwards. This system needs to be installed in such a way that the ignition mechanism will only work if Raspberry Pi signals high to it. The Raspberry Pi will signal High only when the driver is eligible and authenticated. This system will also handle hardware failures, hazardous situations, and emergencies by using a Numeric keyboard with 10 digit keys, if the hardware module fails, the primary owner can securely generate a One Time Password (OTP) and will intimate it to the driver currently in driving the vehicle, then this OTP can be entered through the Numeric keypad to start the engine. In this paper, this model is presented with a Motor Car Vehicle, though there is no limitation to apply this to any type of Vehicle. The system implementations is explained in the following sections which start by implementing A Secured Kerberos-based Mobile Application to access the RTA license database in the form of JSON using the API’s provided by the RTA Software development and maintenance team, then it implements a Hardware module with Raspberry Pi and Camera module and deep learning algorithms to recognize driver, then it implements the USB interface by cable or with wireless technology to transfer JSON data from Mobile phone local storage to MYSQL database residing on memory chip mounted on Raspberry Pi. The power to the hardware will be provided by vehicle battery itself with some voltage regulators. The device will be installed in such a way the driver face must be facing the camera of the Hardware module. The license data from RTA contains fields like name of the driver, Date of Birth, License Number, License Type (2-Wheeler, 4-Wheeler, Heavy Weight Vehicle), License Issue Date, Expiry Date, Face Data. The Vehicle manufacturing information like Vehicle Manufacturer details, Vehicle Manufacturing year, Vehicle Type (2-Wheeler, 4-Wheeler, Heavy Weight Vehicle), Engine Number, Engine Power will be entered at first time and one time into the MySQL database using a GUI designed at Raspberry Pi and the GUI will be disabled for not allowing further modifications, as it is constant trough out the life cycle of the Vehicle [7]. The Marcov Dynamic Model (MDM) analysis is applied on driver eye movements captured by camera [8]. Eye moments are the good source of the information to estimate the cognitive state of a person. The eye fixation pattern is useful to identify a person’s different states. The pattern of eye movements is idiosyncratic, means gaze pattern varies from person to person, but most similar thing about every human being is that fixation locations are similar. Eye movements of Driver exhibit different patterns for different fixations like lane following, curve negotiation, car following, stopping and lane changing. It is observed that the gaze location is ahead of about 2–5 s, which is useful to predict upcoming actions.

21 An Offline-Based Intelligent Motor Vehicle Driver Behavior …

227

2 Proposed System 2.1 Block Diagrams 2.1.1

The Block Diagram for Communication Between RTA Database to Mobile Application

The Block Diagram for RTA Database to Mobile Application Using Kerberos The below-given block diagram shows how the Mobile App is used to access the data from the RTA Server in a secured way. The secure method used is Kerberos is explained below (Fig. 1).

Structure of the Data at RTA Server and the Mobile Device See Fig. 2.

Fig. 1 Kerberos secure communication

Fig. 2 Structure of the data at RTA and at mobile app

228

2.1.2

J. B. K. Gangone

Block Diagram for Communication Between Mobile Application to Hardware Module

The Block Diagram Showing Transfer of Data from Mobile App to Raspberry The below-given block diagram shows how Mobile application is communicating with Raspberry Pi in Wired and Wireless mode (Fig. 3).

Structure of the Data at Mobile Device and Raspberry Pi MySQL DB See Fig. 4.

Fig. 3 Communication between Mobile App and RB Pi

Fig. 4 Format of data at Mobile App and RB Pi

21 An Offline-Based Intelligent Motor Vehicle Driver Behavior …

2.1.3

Hardware Module

Components Required See Table 1.

Raspberry Pi Pin and Hardware Connection Layout See Fig. 5. Table 1 Bill of materials

Component name

Quantity

Raspberry Pi 3 B+

1

Camera Module V2

1

LED

2 (1 red + 1 green)

Buzzer

1

Resistors (200 k)

2

Jumper wires

Adequate

Fig. 5 Raspberry Pi pin diagram and connection layout

229

230

J. B. K. Gangone

Table 2 Component connections to Raspberry Pi

Component name

Raspberry Pi pin number

Green LED

GPIO 24

Red LED

GPIO 18

Buzzer

GPIO 22

Connection Diagram See Table 2.

2.1.4

Sleep Detection

Using the aspect ratio and the landmark coordinates the sleep can be detected. The algorithm is to just extract the facial landmarks into a Numpy array, then localize the eyes then using OpenCv draw contour method just detect the eyes. The using the aspect ratio of the eyes we can say that whether the eye is closed or open. The figures below slow the aspect ratio for Close and Open eyes. Code for other parts is not included in this paper. But the implementation part is completed (Fig. 6).

2.1.5

Hardware Working Principle

The hardware module takes power from the Motor Vehicle battery and it is in always running mode. The camera module is always in the mode of sensing the face, if no face is sensed it will be in the same state. If a face is sensed, then it will look for the state of the Vehicle like whether it is moving mode or Stop mode. If Ingnition_ON variable is set then the vehicle is in the moving state, otherwise in the rest state.

Fig. 6 a Facial land mark coordinates. b Top left: open eye aspect ratio, top right: close eye aspect ratio. Bottom: eye aspect ratio over time plot, the dip indicates the eye blink

21 An Offline-Based Intelligent Motor Vehicle Driver Behavior …

231

If it is in the rest state, it will check for the driver eligibility and authenticity by verifying with RTA data using MySQL database. If the driver is an ineligible and unauthenticated user, then the red LED will glow and a long beep will come. If the driver is an eligible and authenticated person then the system will verify his license validity information. If the license is valid, the Green LED will glow and a short beep will come and at this point of time, the Ingnition_ON variable will be set. This process runs repeatedly and continuously (Fig. 7).

Fig. 7 Hardwareworking flow chart

232

J. B. K. Gangone

3 Future Work Now, this proposed model is limited to testing of driver eligibility and authentication only. But in the future, the features like analysis of driver driving behaviors, speed management, and also Driver sleeping state detection will also be added. And more importantly if Motor vehicles can be provided by a reliable internet facility, the entire system can be moved Online and the driver authentication will be done using RTA database directly, which does not require to update the MySQL database in the Hardware module frequently. If it is online, it is ever updated, and no human intervention.

4 Conclusion The most attracting feature of this model is it is an Offline Model. The primary owner handled mobile application is the key to this model. The results of the proposed model show that this is an efficient model to control the access to the Motor Vehicle by avoiding inefficient and unauthorized drivers. It also handles the hardware failure and emergency by using a numeric keyboard and One Time Password. This model works with any RTA regardless of their database structure as it uses JSON as data saving format. The face racemization algorithm is not formulated here as it is prescribed in the one of the references. Only the main screen of mobile application is added due to space constraints. By adopting this model surely, we can stop ineligible and unauthorized access to a Motor Vehicle in turn it saves the lives of so many youngsters and hoping that no human bonding will suffer anymore by losing their near and dear ones. Acknowledgements I gratefully acknowledge the financial support of Bahir Dar Institute of Technology (BiT), Bahir Dar University (BDU) for supporting generously and encouraging me to carry on the research.

References 1. (2018) Youth and road safety by World Health Organization. https://www.who.int/management/ programme/ncd/Youth%20and%20Road%20Safety.pdf 2. (2018) Road safety fast sheet by US Road safety agency 2018. https://www.atsol.com/mediacenter/fact-sheets/ 3. Vinay A, Gupta A, Bharadwaj A, Srinivasan A, Balasubramanya Murthy KN, Natarajan S (2018) Deep learning on binary patterns for face recognition. In: International conference on computational intelligence and data science (ICCIDS 2018). ScienceDirect Procedia Comput Sci (Elsevier) 76–83 4. Bao L, Lo D, Xia X, Li S (2017) Automated Android application permission recommendation. Sci China Inf Sci (Springer)

21 An Offline-Based Intelligent Motor Vehicle Driver Behavior …

233

5. Kumari A, Kushwaha DS (2017) Kerberos style authentication and authorization through CTES model for distributed systems. In: International conference on information processing, computer networks and intelligent computing. Springer, pp 457–462 6. Izquierdo JLC, Cabot J (2018) Composing JSON-based web APIs. In: International conference on web engineering, ICWE 2018. Springer, pp 390–399 7. Bell C (2017) Introducing the MySQL 8 document store. Springer. ISBN 978-1-4842-2725-1 8. Liu A, Salvucci D (2017) Modeling and prediction of human driver behavior. In: International conference on human-computer interaction, New Orleans, LA, Aug 2017

Chapter 22

Remote Monitoring and Maintenance of Patients via IoT Healthcare Security and Interoperability Approach Madhavi Latha Challa, K. L. S. Soujanya and C. D. Amulya

1 Introduction Smart environment is an emerging view of connecting penetrative intelligence into ubiquitous things, which includes physical devices, cyber things, sensors, computational elements, human thinking, and social people. The human lifestyle has been changing tremendously and comprehending the smart things through various smart applications such as smart city, smart agriculture, smart transport, smart health care, smart home etc., [1]. Moreover, the smart environment is connecting to the everyday objects in our daily lives through persistent network. This could be achieved through the Internet of Things (IoT), which is embedded with internet connectivity and physical electronics, these are used for remote controlling and monitoring purposes. IoT has evolving due to merging of various technologies, machine learning tools, sensors, real-time embedded systems. IoT is gradually increasing the importance of healthcare solutions by connecting with wireless medical devices [2, 3]. The smart healthcare systems are required to integrate various technologies, such as micro, nano, wearable computing, pervasive computing technologies, and integrated circuits to monitor the chronic decease patients. Furthermore, these systems are useful to monitor the heart, panic, epileptic related attacks of drivers through Internet of Vehicles (IoV) to prevent accidents [4–6]. M. L. Challa (B) Assistant Professor, Department of CSE, CMR College of Engineering & Technology, Kandlakoya, Medchal, Hyderabad, India e-mail: [email protected] K. L. S. Soujanya (B) Department of Computer Science and Engineering, CMRCET, Kandlakoya, Hyderabad, India e-mail: [email protected] C. D. Amulya Department of CSE, Swami Ramananda Tirtha Institute of Science & Technology, Nalgonda, Telangana, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_22

235

236

M. L. Challa et al.

The key requirement for smart health care is interoperability, which is used to share information among various resources [7]. The major issues rose for smart health care is centralized data storage database, which leads to fragmentation of data, slow accessing, quality of data, and quantity of data for research. Huge number of records could be generated in various hospitals every day; these records are maintained and stored in centralized database [8]. However, these records are not available for accessing the patients’ information and data can be lost. The authorized persons of centralized database have to ensure reliability of centralized data not in trustworthy network [9]. The majority of the patients are keen to embroiled in smart health care to make their own decision about health [10]. This leads to personalization of patients’ treatment and health care through smart devices and sensors, which are used to store and transform the recorded data remotely to concerned doctors. The main purpose of transformation of remote data is continuous observation or considers chronic conditions of particular patient [10]. The IoT technologies in health care are smartwatches, fitness bands, contact lenses, wireless sensors, and microchips under the skin. These wireless sensitive systems are compromising the security in terms of attacks and vulnerabilities such as eavesdropping, data modification, malicious attacks, steal data, etc. To avoid these various security solutions has been followed such as system security, physical security, administrator security, and information security [11]. Many researchers applied specific privacy concerns in IoT healthcare applications such as identity, location, footprint, owner, and query privacies [12–15].

2 Literature Survey The IoT objects or smart things are embedded with physical incarnation, network communication, and computing capabilities. Moreover, smart objects can sense the real-world effects and triggered actions accordingly [16, 17]. Various researchers have been worked on IoT environments and applications, toward changing the world to smart world in an innovative manner to manage the organizations and personal knowledge flows. This pattern can increase the ecosystem innovations through the enhancement of individuals’ knowledge capacity [18–20]. Ashton [21] invented the term IoT in 1999 year, then it was used in RFID, within a decade, the IoT is evolved to various multidisciplinary applications such as smart home, smart city, smart health care, smart parking, smart transportation, and so on. In the present scenario, the IoT technology is helping the citizens with the help of connected smart objects. The security of smart healthcare is one of the major requirements to ensure the confidentiality, integrity, and availability of data regarding medical information [8, 22–25]. The medical information could be gathered by using patient records and body sensors. As converting the medical records from physical format to digital format, it requires security and protection requirements such as authentication, authorization, access control, nonrepudiation, encryption, and so on.

22 Remote Monitoring and Maintenance of Patients …

237

The new riot of the Internet is IoT environment, which is promptly achieving to multidisciplinary research in healthcare environment. The traditional healthcare system is evolving to smarter by enhancing the advancements of various IoT devices such as smartphones, wired/wireless devices, sensors, etc. The patient centered with less cost in hospital environment is a major issue in smart health, which can solve by applying innovative IoT solutions. Scuotto et al. [26, 27], Bresciani et al. [28], Vrontis et al. [29] opinioned that the latest technologies of smart devices in healthcare environment leads to improve the value-added services to patients and providing more opportunities to the smart health environments. Recently, IoT research is focusing on the smart healthcare monitoring and devices. The advancements of smart environment in healthcare sector are very expensive due to the installation of sensor technologies with IoT devices, constant monitoring, and continuous management [30]. The tremendous growth of IoT and machine learning techniques on hospital or patients’ data will generate new possibilities and facilities for mHealth and eHealth services. However, smart healthcare services need to address several medical needs and challenges for flexible, cost efficient, and consistent monitoring. Therefore, the remote smart healthcare enhancement is one of the solutions for well-being of organizers as well as users. Remote smart health monitoring is used to provide services promptly from the long and short distances. Moreover, it will give a guarantee to the patients, who need quick and critical attentions. In addition to this, it contributes the efforts to the patients in various significant ways such as prevention of the disease, early diagnosis, treatment, rehabilitation, and so forth.

3 Proposed Method IoT healthcare security and interoperability (IHSI) approach is used to secure data, exchange data, and communicate data through various modules such as authentication, encryption, operational efficiency, automatic data entry received from physical devices and sensors. IHSI approach has several benefits like, well communicating systems, reduce redundancy, reduce cost, reduce radiation exposure, improve patient care, monitoring, and improve patient safety. The main goal of IHSI approach is cost effective and continuous clinical monitoring. IHSI approach is embedded with intelligence and constant connectivity, and to improve the efficiency of hospital management and patient safety or care, it requires the infrastructure of smart hospital such as smart panels, sensors, uninterrupted power supplies (UPS), relays, hubs, wireless routers, real-time location systems. IHSI modules are as follows: a. Authentication, b. Encryption, c. Operational efficiency, d. Automatic data entry.

238

M. L. Challa et al.

3.1 Authentication In authentication module certificate validation and request-response validation exchange are the two validation techniques which have been used among the users’ nodes. In certificate validation, X.509 standard has been used [31]. The procedure of certificate validation is based on two tasks. First one IoT device is responsible to receive a certificate which is authorized in unknown node. The second task is that the certificate is validated by IoT device under the considerations of X.509 standard with certificate signature. The certificate is genuine when the certificate is validated. If the certificate is genuine then the username has been provided to the user as trust worthy. Request-response validation exchange is the extra procedure of security. It has three processes. Primarily, a 64-bit length request is generated by the trusty nodes such as blood pressure, temperature, ECG, Heartbeat etc. This information sent to unknown nodes, which are nearby patients. The response is calculated using 256-bit secret key, which is known by only trusted nodes. The response again sent back to the unknown node, and then it compared with received one using the same procedure. To calculate this response hash function based on message authentication has been used [32]. This procedure is more secured cryptographic algorithm. Authentication Algorithm for IHSI approach Me = message received from IoT device Ke = secret key Ipc = inner padding constant Opc = outer padding constant ⊕ = XOR operation Authen_mod(Me, Ke) { Output = Concat(H(k ⊕ opc), H(k ⊕ ipc),Me) //concatenation }

3.2 Encryption Encryption module in this IHSI approach is a public key cryptography method using short key size. Therefore, the consumption of memory and power is very less. It follows IEEE 802.15.6 standard for the encryption process [33]. The proposed encryption module is providing robust security when comparing to other cryptographic algorithms such as Rivest, Shamir, and Adelan (RSA), Digital Signature Algorithm (DSA) etc. To establish safer environment the cumulative computational capability and continuous increasing key are required. Bp = Base point PK = private key (order)

22 Remote Monitoring and Maintenance of Patients …

239

PuK = public key Encry_shkeyGen () { //shared key generation Pk1 = rand() Pk2 = rand() PuK1 = Pk1 * Bp //random public key generation based on private key at one side PuK2 = Pk2 * Bp //random public key generation based on private key at other end ShKey = Pk1 * PuK2 //Encryption Cip_txt = Encrpt(Pl_txt, ShKey) //Decryption Pl_txt = Decrypt(Cip_txt, ShKey) }

3.3 Operational Efficiency The world becomes a large kind of connected network along with huge advanced technologies. Particularly IoT is moving to improve the operational efficiency in smart hospitals in terms of patient satisfaction, on time information delivery, decision making, and safety guards. Therefore, patient care and facility management could be improved. IoT devices can sense, smart control, smart management, embedded connectivity, and identify patients’ information remotely. IoT devices could allow doctors, patients, or attendants of patients to connect, gather, analyze and respond to information to improve performance and reduce losses. IoT operational efficiency in IHSI approach focused on manage healthcare facilities throughout the smart hospital with the use of advanced technologies such as radio frequency identification (RFID) and Real-time location system (RTLS). The main purpose of tracking technology is better and quicker care of patients when they needed. The in-patients can be always tagged and connected with IoT devices so that the patients’ health record could be stored and monitored. When any patient’s health is drastically measured or tracked by IoT devices, immediately, the concerned doctors and caretakers will be responding accordingly.

3.4 Automatic Data Entry Information collected automatically by smart healthcare IoT devices such as sensors attached to patients’ body, clipped on patients’ cloths etc., from the surroundings of hospitals without any particular action performed by manually. Therefore, according to the condition of patient services has been provided such as alarm system, patients

240

M. L. Challa et al.

cross information process, scheduling of medical activities, personal management and so on. Moreover, IHSI approach includes remote monitoring of patients’ chronic deceases and medication orders through IoT health devices such as infusion pumps, beds equipped with sensors, etc., which can send patients’ data to caretakers or caregivers.

4 Results IHSI approach is having recent technological advancements, which allow connecting low-cost IoT health devices and are able to collect the information from the patients directly. Therefore, the patients’ health status can be monitored by using wireless and wired IoT health devices. Moreover, the gathered information can send to the main server to maintain the patients’ records. In the past decade, the healthcare solutions have been very expensive to make integration with vendors or third party software systems. However, IHSI approach is one of the very low-cost programmable IoT environments, which is enhancing to the current trends and new frontiers. In this proposed IHSI approach, emerging IoT health devices are leads to not only low cost, but also standalone abilities, such as RAM (minimum memory is 1 MB), protocols (Ethernet, IP, TCP, UDP, CoAP, WiFi, Bluetooth), microcontroller resources, GPIOs etc., for interact with physical or electrical transducers. Each IOT healthcare devices has its own capacities like computing and storage of data, preprocessing the raw data and send to required parties in the form of data reduction, so that the overhead of communication has been reduced for the data transmission. Moreover, all the devices are connected with each other to form the network and data availability. The simulation has been done on Ubuntu operating system, 400 × 200 m2 is the coverage area network, only one e-health server application providers, and 7 clients with the mobility of 2 and 15 m/s in all the cases, 12 personal servers with the 30 min of simulation time. In this paper communication cost, computational cost, throughput, and end-to-end delay have been analyzed to find the network performance of proposed approach. The following Table 1 shows the modules and its security parameters. This proposed approach supports various security features against attacks such as man in the middle attack, un-traceability, insider attack, anonymity, mutual authentication, stolen smart card, mobile device attack, etc. The communication cost has Table 1 Modules and measurements of IHSI approach

Modules

Security measurements

Authentication

Communication cost

Encryption

Computational cost

Operational efficiency

Throughput (bps)

Auto data entry

End-to end delay (s)

22 Remote Monitoring and Maintenance of Patients …

241

Table 2 IHSI security parameters for communication and computation cost analysis Parameters

Communication cost analysis

Computation cost analysis

Messages

4



Client and e-health server application provider

510 bits

20.18 ms

e-health server and personal server/client server

602 bits

2.08 ms

Personal server and user/clients

509 bits

2.92 ms

Total cost

1621 bits

25.18 ms

Table 3 Measure values for considered security parameters

Security parameter

Measured value

TP

79.76

E-ED

0.00567

been analyzed based on the message exchanges between two parties. The computation cost analyzed based on the login time, encryption, and authentication time (Table 2). Throughput (TP) is one of the important network performance parameters, which measures the bits transmission time per unit. The mathematical formula for throughput is as follows. TP = (Total number of received packets × size of the packet) /total time in seconds TP =

Rp × P T∞

(1)

End to End delay (E-ED) is a parameter for the measurement of network, which is used to find the average time for the data arrival from sender to receiver. This is calculated as follows (Table 3): E-ED =

Np 

(receiver time(i)−sender time(i))/total number of packets

(2)

i=0

4.1 Comparative Analysis In this comparative analysis, the proposed approach has been compared with He et al. [34] and Shen et al. [35] schemes, which are closely related with IHSI approach. The security measurements in terms of computational cost, communication cost, throughput, and end-to-end delay have been compared. Table 4 and Fig. 1 shows the results of this study.

242

M. L. Challa et al.

Table 4 Comparative analysis of security measurements Security measurements

Proposed IHSI approach

He et al. [34]

Shen et al. [35]

Computational cost

25.18 ms

10827.40 ms

243.60 ms

Communication cost

1621 bits

1472 bits

1920 bits

Throughput

79.76

4.98

6.22

End-to-end delay

0.00567

0.01363

0.01404

Fig. 1 Comparative analysis of security measurements

The computation cost of proposed approach is 26.78 ms, which is very less when compared with other schemes. Therefore, the IHSI approach is required minimum computational cost. IHSI approach needs less communication cost as compared with Shen et al. [35] scheme, however, it is acceptable range when compared to He et al. [34], and the proposed approach provides the more security functionality in terms of communication cost. It is noticed that the throughput for the IHSI approach is very high comparing with other schemes. Since the more number of messages has been exchanged by using IHSI approach, throughput is also increased. Throughput has been achieved good security comparing with other schemes. E-ED of the IHSI approach is very less comparing to others because of small size messages has been used for the login, encryption and authentication processes.

22 Remote Monitoring and Maintenance of Patients …

243

5 Conclusion Smart healthcare systems are monitoring patients’ health through various IoT devices integrated with medical equipment. It consists of sensors in and around the patients’ body to gather information and store in hospital servers for the purpose of fulfilling the patient recovery and treatment. The major issue in smart health care is maintenance of centralized data, which includes un-accessibility of patients’ records, data loss, cyber-attacks, and unreliability. In this paper, IoT healthcare security and interoperability (IHSI) approach has been proposed to solve these issues. IHSI approach consists of four modules, i.e., authentication, encryption, operational efficiency, and automatic data entry, for the purpose of data security, data exchange, and data communication. The IHSI approach is supporting to reduce cost and redundancy, improve patient care and safety. The IHSI approach security has been measured based on the communication, computation cost throughput, and end-to-end delay parameters. In addition to this, the proposed approach has been compared with related schemes such as He et al. [34] and Shen et al. [35]. The result shows that better security with less computation and communication costs as well as higher throughput and lesser delay.

References 1. Ma J, Yang LT, Apduhan BO, Huang R, Barolli L, Takizawa M (2005) Towards a smart world and ubiquitous intelligence: a walkthrough from smart things to smart hyperspaces and UbicKids. Int J Pervasive Comput Commun 1(1):53–68 2. Chen M, Miao Y, Hao Y, Hwang K (2017) Narrow band internet of things. IEEE Access 5:20557–20577 3. Ahmed YE, Ahmed AIA, Gani A, Imran M, Guizani M (2017) Internet of things architecture: recent advances, taxonomy, requirements, and open challenges. IEEE Wirel Commun Mag 24(3):10–16 4. Chen M, Tian Y, Fortino G, Zhang J, Humar I (2018) Cognitive internet of vehicles. Comput Commun 120:58–70 5. Lin D, Tang Y, Labeau F, Yao Y, Imran M, Vasilakos AV (2017) Internet of vehicles for E-health applications: a potential game for optimal network capacity. IEEE Syst J 11(3):1888–1896 6. Zheng Z et al (2017) An overview of blockchain technology: architecture, consensus, and future trends. In: 2017 IEEE international congress on big data (BigData congress). IEEE. https://doi. org/10.1109/BigDataCongress.2017.85 7. Azaria A et al (2016) Medrec: using blockchain for medical data access and permission management. In: International conference on open and big data (OBD). IEEE. https://doi.org/10. 1109/obd.2016.11 8. Khan SI, Hoque ASL (2016) Privacy and security problems of national health data warehouse: a convenient solution for developing countries. In: 2016 international conference on networking systems and security (NSysS). IEEE. https://doi.org/10.1109/NSysS.2016.7400708 9. Jin T et al (2017) BlockNDN: a bitcoin blockchain decentralized system over named data networking. In: 2017 ninth international conference on ubiquitous and future networks (ICUFN). IEEE. https://doi.org/10.1109/ICUFN.2017.7993751 10. Zhang Y, Chen M, Guizani N, Wu D, Leung VCM (2017) SOVCAN: safety-oriented vehicular controller area network. IEEE Commun Mag 55(8):94–99

244

M. L. Challa et al.

11. Al Ameen M, Liu J, Kwak K (2012) Security and privacy issues in wireless sensor networks for healthcare applications. J Med Syst 36(1):93–101 12. Ding D, Conti M, Solanas A (2016) A smart health application and its related privacy issues. In: Smart city security and privacy workshop (SCSP-W). IEEE. https://doi.org/10.1109/scspw. 2016.7509558 13. Fernando R et al (2016) Consumer oriented privacy preserving access control for electronic health records in the cloud. In: 2016 IEEE 9th international conference on cloud computing (CLOUD). IEEE. https://doi.org/10.1109/CLOUD.2016.0086 14. Sajid, A., Abbas, H., 2016. Data privacy in cloud-assisted healthcare systems: state of the art and future challenges. J Med Syst 40(6), Article 155 15. Sinha SR, Park Y (2017) Dealing with security, privacy, access control, and compliance. In: Building an effective IoT ecosystem for your business. Springer, pp 155–176 16. Miorandi D, Sicari S, De Pellegrini F, Chlamtac I (2012) Internet of things: vision, applications and research challenges. Ad Hoc Netw 10(7):1497–1516. https://doi.org/10.1016/j.adhoc.2012. 02.016 17. Marston S, Li Z, Bandyopadhyay S, Zhang J, Ghalsasi A (2011) Cloud computing—the business perspective. Decis Support Syst 51(1):176–189 18. Darroch J (2005) Knowledge management, innovation and firm performance. J Knowl Manag 9(3):101–115 19. Cetindamar D, Phaal R, Probert D (2009) Understanding technology management as a dynamic capability: a framework for technology management activities. Technovation 29(4):237–246 20. Santoro G, Vrontis D, Thrassou A, Dezi L (2017) The internet of things: building a knowledge management system for open innovation and knowledge management capacity. Technol Forecast Soc Chang (forthcoming) 21. Ashton K (2009) That ‘internet of things’ thing. RFiD J 22(7):97–114 22. Abouelmehdi K et al (2017) Big data security and privacy in healthcare: a review. Procedia Comput Sci 113:73–80 23. Au MH et al (2017) A general framework for secure sharing of personal health records in cloud system. J Comput Syst Sci. https://doi.org/10.1016/j.jcss.2017.03.002 24. Kshetri N (2017) Blockchain’s roles in strengthening cyber security and protecting privacy. Telecommun Pol 41(10):1027–1038 25. Small A, Wainwright D (2017) Privacy and security of electronic patient records–Tailoring multimethodology to explore the socio-political problems associated with role based access control systems. Eur J Oper Res 265(1):344–360 (16 Feb 2018) 26. Scuotto V, Ferraris A, Bresciani S (2016) Internet of things: applications and challenges in smart cities: a case study of IBM smart city projects. Bus Process Manag J 22(2):357–367 27. Scuotto V, Santoro G, Papa A, Carayannis EG (2017) Triggering open service innovation through social media networks. Mercati & Competitività 28. Bresciani S, Ferraris A, Del Giudice M (2017) The management of organizational ambidexterity through alliances in a new context of analysis: internet of things (IoT) smart city projects. Technological forecasting and social change. Forthcoming 29. Vrontis D, Thrassou A, Santoro G, Papa A (2017) Ambidexterity, external knowledge and performance in knowledge-intensive firms. J Technol Transf 42(2):374–388 30. Chan M, Estève D, Fourniols J-Y, Escriba C, Campo E (2012) Smart wearable systems: current status and future challenges. Artif Intell Med 56(3):137–156 31. Boeyen S, Santesson S, Polk T, Housley R, Farrell S, Cooper D (2008) Internet X.509 public key infrastructure certificate and certificate revocation list (CRL) profile, 2008, (RFC 5280). https://doi.org/10.17487/rfc5280 32. Krawczyk H, Bellare M, Canetti R (1997) Rfc 2104: Hmac: keyed-hashing for message authentication 33. Ullah S, Mohaisen M, Alnuem MA (2013) A review of IEEE 802.15. 6 mac, phy, and security specifications. Int J Distrib Sens Netw 9(4):950704. https://doi.org/10.1155/2013/950704

22 Remote Monitoring and Maintenance of Patients …

245

34. He D, Zeadally S, Kumar N, Lee JH (2017) Anonymous authentication for wireless body area networks with provable security. IEEE Syst J 11(4):2590–2600 35. Shen J, Chang S, Shen J, Liu Q, Sun X (2018) A lightweight multi-layer authentication protocol for wireless body area networks. Future Generat Comput Syst 78:956–963

Chapter 23

Survey of Object Detection Algorithms and Techniques Kamya Desai, Siddhanth Parikh, Kundan Patel, Pramod Bide and Sunil Ghane

1 Introduction Humans are able to detect objects in their surroundings on a daily basis enabling them to perceive their surrounding environment well. In a bid to make machines autonomous and be able to navigate in the human world, it is imperative for the machine to perceive the environment in a similar manner to the way humans do. Object detection allows the machine to analyze the environment and detects objects around them. This can help the machine to recognize its surroundings and perform a multitude of tasks. Object Detection has found application in almost every field such as surveillance, vehicle navigation, autonomous robot navigation, face/people detection, etc. Basically, to locate objects in an image object detection techniques’ approach is to make bounding boxes covering the objects. To identify such occurrences and identify them quickly algorithms like R-CNN, fast R-CNN, YOLO, etc. have been devised. The algorithms addressed and surveyed in this paper include Fast R-CNN, Faster R-CNN, Mask R-CNN, and YOLO.

2 Literature Survey R-CNN, devised by Ross Girshick et al., works on the principle of extracting 2000 regions from an image by using selective search and these are named region proposals. This proposed method helps to overcome the problem of selecting huge number of

K. Desai (B) · S. Parikh (B) · K. Patel · P. Bide · S. Ghane Computer Engineering Department, Sardar Patel Institute of Technology, Mumbai 400055, India e-mail: [email protected] S. Parikh e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_23

247

248

K. Desai et al.

regions for more accuracy. Therefore, now work is reduced to only 2000 precise regions generated with the assistance of a selective search algorithm as given below. Selective Search: 1. The first sub-segmentation should help in creation of a number of candidate regions. 2. Related smaller regions should be recursively combined into larger ones using greedy algorithm. 3. The desired candidate region proposals are created using the generated regions above. Issues of R-CNN are that classification of 2000 region proposals for each image takes up valuable training time of the network, approximately 47 s per test image. Moreover, learning is hindered to an extent as selective search is not a dynamic algorithm. To solve the issues with R-CNN and basically fasten the R-CNN algorithm, Ross Girshick et al. came up with Fast R-CNN. In this algorithm, a convolutional feature map is created by feeding input image to the convolutional neural network instead of region proposals being fed. One can say it is similar to the R-CNN in some ways. The convolutional feature map thus created is studied to extract region of proposals from it. Once identified, they are bent into squares with the use of Rol pooling layer and reshaped into fixed sizes. This is done to enable it to be fed into a fully connected layer. Softmax layer from the Rol feature vector is used to predict the class of the proposed region along with offset values of the bounding box. Fast R-CNN turns out to be faster than traditional R-CNN because it does not require 2000 region proposals to be fed to the CNN every time. On the contrary, the feature map is generated from the working of convolution operation once per image only [1–3]. The time-consuming and slow process of selective search is used to extract region proposals in both of the above algorithms(R-CNN and Fast R-CNN) which degrades the network performance. Shaoqing Ren et al. devised an object detection algorithm that can find out region proposals without using the selective search algorithm. This algorithm is quite similar to Fast R-CNN up to the point of providing image as input to a convolutional network and generation of feature maps. The change comes in the next step where selective search was used on the feature map initially to extract region proposals. In this algorithm, region proposals are predicted with the assistance of a separate network. The resulting predicted region proposals are subject to refiguring by a Rol pooling layer which then classifies the image and offset values for bounding boxes are calculated. The above process yielded results faster than the previous algorithms and hence this algorithm was named Faster R-CNN [4–6]. The techniques discussed above used the bounding boxes approach. This can even be extended to locating pixels inside an image. Kaiming He along with a team of researchers devised Mask R-CNN which allowed pixel-level segmentation along with object detection. An extra branch is added to the Faster R-CNN which outputs a binary mask. This mask is used to detect if a given pixel is a part of the object. The procedure followed by the researchers initially led to some inaccuracies. But this problem was eliminated by adopting a method known as RoI Align which adjusts the RoIPool in

23 Survey of Object Detection Algorithms and Techniques

249

order to get more aligned. Once the masks are generated, they are combined with the bounding boxes from Faster R-CNN to get pixel-level segmentation [7–9]. All the previous algorithms discussed before were region-based algorithms as those algorithms localized objects in images using regions. YOLO, standing for You-Only-Look-Once, is different as compared to such algorithms as it involves the use of one convolutional network to predict the bounding boxes and their respective class probabilities. The working of YOLO, described briefly, proceeds by considering an image and splitting it into an S × S grid. M bounding boxes are taken within each grid and for each box, the network identifies a class probability and offset value. An object within an image is located when a bounding box has class probability above a certain threshold. The previous algorithms are much slower than YOLO as YOLO works at a resounding 45 frames per second. However, YOLO struggles with detection of small objects within an image due to spatial constraints of the method [10–16]. The rest of this paper is ordered as; Sect. 3 will set the objectives of this survey, Sect. 4 will outline the methodology used to carry out the survey, Sect. 5 will be a discussion about the performed survey in a tabulated format followed by Sect. 6 which analyzes the performance of the algorithms and the last section consists of the conclusion and future work.

3 Objectives In this study, we will review different object detection techniques over various applications that were conducted in the past 3 years. Papers that were published from 2016 till date will be reviewed in this study with the motivation to find out the magnitude of usage of such techniques, the accuracy with which they perform and their limitations if any.

4 Methodologies In this study, a number of electronic databases were used to search on the topic. Only the most recent papers and articles (past 2–3 years) were considered relevant for performing the study. The searches were performed on the topic of Object Detection Algorithms with a major focus on the evolution of R-CNN techniques in the field of Object Detection. Popular Electronic Databases that were used as a source for study included IEEE Xplore, Google, and Science Direct. Different keywords were used in order to perform an extensive search on the topic. Table 1 lists the databases used for performing literature search and the names of the websites for the same.

250

K. Desai et al.

Table 1 Electronic databases Name of database

Access method

Website name

IEEE Xplore

Online search

https://ieeexplore.ieee.org

Science Direct

Online search

https://www.sciencedirect.com

Google

Online search

https://www.google.com

Fig. 1 Object detection techniques

5 Discussion The studies and papers selected above are summarized below in a tabular format. Methodology, application, efficiency, accuracy, research results and limitations and gaps. The methodology column describes the method/algorithm used in the paper while the efficiency denotes how the proposed algorithm affects performance. Accuracy helps in determining the false positives/negatives and limitations and gaps are the problems identified with the method. Figure 1 refers to the various object detection techniques we have covered in this survey paper (Table 2).

6 Performance Analysis and Comparison R-CNN belongs to the state-of-the-art CNN-based deep learning object detection techniques category. Fast, Faster and Mask R-CNN are modifications of this approach based on different applications and requirements. Fast R-CNN is a modification of R-CNN. However, in R-CNN the region proposals are given as input to the CNN. On the contrary, in Fast R-CNN the input image is fed to the CNN to create a convolutional feature map. This feature map is used for the identification of region proposals which are then warped into squares by using a RoI pooling layer. They are then reshaped into definite size to be fed to the fully

23 Survey of Object Detection Algorithms and Techniques

251

Table 2 Comparative study Methodologies

Application and dataset used

Efficiency

Accuracy

Research results

Limitations and gaps

A two-step Fast-R-CNN which consists of a convolution network that extracts feature and an RoI network [1]

Object detector sturdy to various conditions like occlusions, deformations, and illuminations

Partially addressed in comparison to OHEM models

Addressed, through tables

Learn invariances in object detectors like occlusions and deformations. Also, boosts detection performance on VOC and COCO significantly

Not addressed

A scale-aware Fast R-CNN that uses two subnetworks to detect small and large pedestrians and passed through a gate function for fusion [2]

Pedestrian detection Dataset: Caltech, INRIA and ETH, KITTI dataset

Addressed through means of comparison graphs

Addressed

Shared convolutional layers were used and two subnetworks were unified into a single architecture to detect different instances of pedestrians

Detects only pedestrians from the available background

A Fast R-CNN method where features are pooled from the last convolutional layer for each bounding box proposal [3]

Using pooling that is dependent on pooling and layer wise cascading rejection classifiers Dataset: PASCAL, KITTI dataset

Addressed

Addressed, elaborately discussed with a detailed representation of improvement

The scale pooling increases the accuracy of identifying small objects and the cascading reject the negative object proposals

Not addressed

Modifying Faster R-CNN specifically for vehicle detection on KITTI dataset [5]

Vehicle detection Dataset: KITTI dataset

Not addressed

Extensively addressed, graphically represented

Better performance on easy examples of KITTI dataset, worse performance on moderate and hard examples

Accuracy on moderate and hard examples is worse

(continued)

252

K. Desai et al.

Table 2 (continued) Methodologies

Application and dataset used

Efficiency

Accuracy

Research results

Limitations and gaps

Faster R-CNN that has two modules the first concerns Regional Proposal Network and the second is Fast R-CNN to refine proposals [6]

Face detection Dataset: Face Detection Dataset and Benchmark (FDDB), WIDER face dataset

Partially addressed, sharing of convolutional layers

Addressed, graphically represented for different datasets

Effective face detection performed

Special patterns of human faces not considered

Two-stage approach-pretrain a designed CNN, then fine-tune the faster R-CNN using real and simulated GPR data [4]

Buried object detection Dataset: 100 real B-scanned data collected from France

Not addressed

More simulated data can provide detections with more accuracy

Better performance than COD method

Quantitative evaluation is yet to be performed

Two-stage Mask R-CNN using Regional Proposal Network and Fast R-CNN classifier + binary mask prediction branch [7]

Inshore ship detection Dataset: Optical remote sensing dataset collected from Google Earth

Not addressed

Partially addressed, soft non-maximum suppression is used to increase average precision

Proposed framework is more robust and gives good performance in segmentation of inshore ships

Difficulty in segmentation of merchant ships due to varying shapes and sizes

Faster R-CNN extended to Mask R-CNN by adding branch to prediction segment for each Region of Interest generated in Faster R-CNN [8]

Autonomous detection of disruption in ICU Dataset: COCO dataset

Partially addressed

Addressed in accordance with daytime and night time

During daytime, this approach detects with high precision

During night time precision falls

Scene-Mask R-CNN is used and the network structure is discussed [9]

Nearshore vessel detection Dataset: 10,000 remote sensing images

Partially addressed

Addressed, compared with Faster R-CNN and concluded to be better

Successfully suppresses false alarm caused by terrestrial target interference

Computational efficiency and costs can be further optimized

(continued)

23 Survey of Object Detection Algorithms and Techniques

253

Table 2 (continued) Methodologies

Application and dataset used

Efficiency

Accuracy

Research results

Limitations and gaps

YOLO-based people counting using deep learning approach [13]

People counting Dataset: Retraining of CNN thus not specifically mentioned

Partially addressed, high due to low computation overhead

Addressed, high because interference can be ignored owing to boundary selection

YOLO-PC has higher average confidence values than traditional YOLO

It does not take into consideration abnormal behavior and children

Optimized-YOLO involves single convolution network for location and classification [14]

Object detection in traffic scene Dataset: Public datasets

Partially addressed

Addressed, it is faster than YOLO by 18%

OYOLO takes 45 ms to process one image and consequently quicker than Fast and Faster R-CNN, R-FCN, YOLO, and SSD

Night images require preprocessing

Traditional YOLO [15]

Real-time face detection Dataset: WIDER FACE dataset

Not addressed

Addressed, the average detection time increases in proportion to image size

Average detection time is reduced

Shorter detection time on YOLO v3 with reduced miss rate and error rate compared to the discussed traditional method

Traditional YOLO, with 3-class object detection by revamping last fully connected layer [16]

Human-Robot Interaction (HRI) Dataset: 5403 total images obtained from internet and local laboratory environment

Not addressed

Addressed, different accuracies for different objects

Big objects can be visually detected better

Algorithm misses small and adjacent objects in an image

Improved network structure of YOLO by adding three passthrough layers and making it YOLO-R [12]

Pedestrian detection Dataset: INRIA dataset

Partially addressed, comparison of YOLO v2 and YOLO-R

Addressed, numerical values showing better accuracy than YOLO v2

Good results in pedestrian detection were obtained

Combination of shallow layer features and deep layer features can yield better results (continued)

254

K. Desai et al.

Table 2 (continued) Methodologies

Application and dataset used

Efficiency

Accuracy

Research results

Limitations and gaps

A single-stage approach using 53-layer feature extracting deep neural network is used to evaluate the entire image at once [11]

Detection of airplanes on the ground

Not addressed

Addressed, high precision achieved even for 25 fps video

Airplanes were successfully detected even after the contours were obscured by another object

When the size of image is extremely small, there is a drop in the probability of detection

Mixed detection system consisting of YOLO pedestrian detection and Gaussian Mixture Model foreground detection [10]

Pedestrian detection for transformer substation

Not addressed

Accuracy is drastically increased by using a mixed approach instead of YOLO and GMM

Mixed detection approach is effective and more robust

Not addressed

connected layer. From the RoI feature vector, we make use of a softmax layer for predicting the class of the region that is proposed and also the offset value for the bounding boxes. Fast R-CNN is superior to R-CNN on the grounds that countless propositions are not required to be sustained to the CNN inevitably. Instead, a feature map is generated from the convolutional operation performed only once per image. R-CNN and SPPNet first trains the CNN for softmax classifier, then uses the feature vectors for training the bounding box regressor. Thus, R-CNN and SPPNet are not end-to-end training. On the other hand, Fast R-CNN improves training and testing speed and detection accuracy. The main advantages include: Fast R-CNN trains the extremely deep VGG-16, 9 times quicker than R-CNN and 213 times more rapidly at test time. Also, Compared to SPPNet, it trains VGG-16 3 times the speed of R-CNN and tests 10 times quicker, plus it’s more accurate. Faster R-CNN is an extension to Fast R-CNN. In both the techniques (R-CNN and Fast R-CNN), region proposals are found using selective search. Selective search is a time-consuming process and affects the performance of the network. Keeping the rest of the algorithm same, Faster R-CNN eliminates the selective search to identify region proposals and uses a separate network instead. As a result, the testing time in Faster R-CNN is much less as compared to its predecessors and can be used for realtime object detection. The learning efficiency of the training dataset makes Faster R-CNN models more suitable for identifying classified moving objects. It can be deemed superior to ordinary CNN algorithms due to its high accuracy in labeling correct classes during the validation and testing of dataset. Figure 2 shows the testing speed of different object detection techniques. The X-axis refers to the speed and the Y-axis refers to the various techniques under survey.

23 Survey of Object Detection Algorithms and Techniques

255

Fig. 2 R-CNN test-time speed

Mask R-CNN is completely based on the architecture of Faster R-CNN. It has two major additions. First and foremost, a more accurate RoI Align module replaces the RoI Pooling module. Secondly, an additional branch out of the Region of Interest Align module is inserted. The additional branch is used to accept the output of the ROI Align which is then fed to the two Convolution layers. The final output that we get from the two Convolution layers, is the mask itself. Mask R-CNN inputs the CNN feature map and outputs a matrix with 1’s for pixel that belongs to the object and 0’s elsewhere. Yolo architecture is more like F-CNN (fully convolutional neural network). Region proposal classification networks (Fast R-CNN) need to perform prediction algorithm multiple times for different regions in images which YOLO avoids. YOLO architecture passes the image (n × n) once through the FCNN and output is (m × m) prediction. YOLO accomplishes a mean normal exactness of 63.4/100 at 45 frames per second. In correlation, the cutting edge demonstrate, Faster R-CNN VGG-16 accomplishes a mAP of 73.2, however, just keeps running at a greatest seven frames for each second which denotes a 6 times lessening in effectiveness. Table 3 shows the comparative performance analysis of techniques widely used in real-time object detection. Table 3 Performance analysis Detection framework

mAP (average precision)

Faster R-CNN (VGG-16)

73.2

Frames per second

Faster R-CNN (ResNet)

76.4

5

YOLO

63.4

45

YOLO v2 (416 × 416 image size)

76.8

67

YOLO v2 (480 × 480 image size)

77.8

59

7

256

K. Desai et al.

7 Conclusion and Future Scope In this paper, we have reviewed the techniques used for Object Detection, particularly the different versions of R-CNN. In a short span of three years, the field of Object Detection has seen numerous algorithms, each overcoming the limitations of the previous one. We observed that the basic algorithm of R-CNN was too slow as the proposed regions in each image overlapped and the CNN computation had to be run again and again. This was overcome by a new version, the Fast R-CNN. This technique was then speeded up by Faster R-CNN which replaced the selective search algorithm by using the classifier results for getting region proposals. Pixel Level Segmentation was then added by Mask R-CNN. YOLO, the fastest algorithm for Object Detection was finally proposed which significantly reduced the time taken for detection. These algorithms were successfully used in a number of applications such as face detection, inshore ship detection, airplanes detection on the ground, etc. as listed in the table. Each of the discussed algorithms has some advantages as well as some limitations according to their applications. The future work consists of overcoming the gaps currently observed in the techniques. Some of them include less precision of detection during night time by Mask R-CNN, detection of small objects by YOLO, difficulty in segmentation when the sizes of objects vary a lot, as in the case of Inshore Ship Detection, etc.

References 1. Wang X, Shrivastava A, Gupta A (2017) A-Fast-RCNN: hard positive generation via adversary for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3039–3048 2. Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2018) Scale-aware Fast R-CNN for pedestrian detection. In: 2018 IEEE transactions on multimedia, pp 985–996 3. Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2129–2137 4. Pham M-T, Lefèvre S (2018) Buried object detection from B-scan ground penetrating radar data using Faster-RCNN. In: IGARSS 2018—2018 IEEE international geoscience and remote sensing symposium, pp 6804–6807 5. Fan Q, Brown L, Smith J (2016) A closer look at Faster R-CNN for vehicle detection. In: 2016 IEEE intelligent vehicles symposium (IV), pp 124–129 6. Jiang H, Learned-Miller E (2017) Face detection with the Faster R-CNN. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017), pp 650–657 7. Nie S, Jiang Z, Zhang H, Cai B, Yao Y (2018) Inshore ship detection based on Mask R-CNN. In: IGARSS 2018—2018 IEEE international geoscience and remote sensing symposium, pp 693–696 8. Malhotra K, Davoudi A, SIegel S, Bihorac A, Rashidi P (2018) Autonomous detection of disruptions in the intensive care unit using deep Mask R-CNN. In: 2018 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 1944–19442

23 Survey of Object Detection Algorithms and Techniques

257

9. Zhang Y, You Y, Wang R, Liu F, Liu J (2018) Nearshore vessel detection based on Scene-mask R-CNN in remote sensing image. In: 2018 international conference on network infrastructure and digital content (IC-NIDC), pp 76–80 10. Peng Q, Luo W, Hong G, Feng M, Xia Y, Yu L, Hao X, Wang X, Li M (2016) Pedestrian detection for transformer substation based on Gaussian mixture model and YOLO. In: 2016 8th international conference on intelligent human-machine systems and cybernetics (IHMSC), vol 02, pp 562–565 11. Kharchenko V, Chyrka I (2018) Detection of airplanes on the ground using YOLO neural network. In: 2018 IEEE 17th international conference on mathematical methods in electromagnetic theory (MMET), pp 294–297 12. Lan W, Dang J, Wang Y, Wang S (2018) Pedestrian detection based on YOLO network model. In: 2018 IEEE international conference on mechatronics and automation (ICMA), pp 1547– 1551, 2018 13. Ren P, Fang W, Djahel S (2017) A novel YOLO-Based real-time people counting approach. In: 2017 international smart cities conference (ISC2), pp 1–2 14. Tao J, Wang H, Zhang X, Li X, Yang H (2017) An object detection system based on YOLO in traffic scene. In: 6th international conference on computer science and network technology (ICCSNT), pp 315–319 15. Yang W, Jiachun Z (2018) Real-time face detection based on YOLO. In: 2018 1st IEEE international conference on knowledge innovation and invention (ICKII), pp 221–224 16. Zhou J, Feng L, Chellali R, Zhu H (2018) Detecting and tracking objects in HRI: YOLO networks for the NAO “I see you” function*. In: 2018 27th IEEE international symposium on robot and human interactive communication (RO-MAN), pp 479–482

Chapter 24

Study on Recent Methods of Secure Provable Multi-block Level Data Possession in Distributed Cloud Servers Using Cloud-MapReduce Method B. Rajani, V. Purna Chandra Rao and E. V. N. Jyothi

1 Introduction CLOUD registering has been developing quickly and it is the cutting edge for information technology (IT). Cloud specialist organization offers on-ask for selfadministration, inescapable system to get territory-free resource pooling, quick resource flexibility, usage-based evaluation and transference of peril. Cloud computing is changing the general thought of how associations use information technology. As a result of advantages offered by cloud among storage as an administration, with this numerous clients are spurred to re-appropriate their sensitive data to the cloud. One major part of moving cloud storage is that data are being centralized or re-distributed to the cloud. From clients’ perspective, including the two individuals and IT enterprises, securing data remotely to the cloud in a versatile on-ask for way brings connecting with benefits: help of the weight for storage management; by and large, data access with region opportunity; and evading of capital consumption on gear, programming and work drive systems for upkeeps. While cloud computing makes these central focuses more captivating than some other time in late memory, it, in like manner, brings new and testing security perils toward clients’ re-distributed data. Since cloud service providers (CSP) are free administrative entities, data redistributing is truly surrendering customer’s complete direction over the fate of their data. Along these lines, the data accuracy of cloud is at a risk due to the above reasons. Regardless of anything else, despite the way that the frameworks under the fog are essentially more predominant and reliable than individualized computing contraptions, they are so far going up against the wide scope of both inside and outside

B. Rajani · V. Purna Chandra Rao · E. V. N. Jyothi (B) Ph.D Scholar, Shri Jagdish Prasad Jhabarmal Tibrewala (JJT), Jhunjhunu, India © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_24

259

260

B. Rajani et al.

threats for data integrity. For availability and scalability, specialist organization offers multi-cloud storage setting to the client. Due to multi-cloud storage setting, the client repeats record into numerous clouds. In various cloud storage for the most part DepSky demonstrates that on execution using this model we can accomplish read, write and read/write.

2 Related Works In [1], Christof Weinhard et al. deal with the detailed information about the various characteristics and properties of grid computing and cloud computing. This would enable a clear understanding of various business opportunities of the cloud computing paradigm due to its clear classification and brief discussion. The article provides a clear framework of the business model for clouds. In view of its framework it classifies and reports current cloud offerings. In [2], an article by Vincenzo D. Cunsolo et al., a clear presentation of the Cloud@Home paradigm is provided. This describes the contribution of Cloud@Home to the state-of-the-art cloud computing. A detailed functional architecture and core structure of Cloud@Home and demonstration of its possibility are presented in the article. Bousdekis et al. [3] describe that cloud computing is advancing very quickly, with its data centers growing at an unexpected fast rate, although there are certain privacy issues with the concept. The technology is dependent on the cloud vendors like Microsoft, Google, Amazon and so on. There is another efficient model replacing cloud conceptualization, C3. This community cloud computing provides an efficient and replaceable architecture that unites the cloud with the paradigms from grid computing, digital ecosystem’s principles and self-sustainability of green computing, yet retaining its internet versions’ originality. In the article [4], Qi Zhang et al. state that cloud computing technology is emerging as a paradigm for undertaking and providing services through the internet. It is the most fascinating technology for the entrepreneurs due to its time- and labor-efficient protocols. This technology enables the owners to invest minimum capital and manage the investment and resources based on the client needs. It proved its efficiency in providing high employment opportunities for the IT industry. It is still in its early stages of development and many problems are yet to be tackled and solved. The current work includes a survey on the cloud computing technology focusing on its principle concepts, architectural principles and implementation procedures. The article aims to provide a very clear explanation of challenges and advancements of cloud computing technology and provides a platform for the new approaches in implementing this.

24 Study on Recent Methods of Secure Provable Multi-block …

261

In [5], the authors C. N. Höfer and G. Karagiannis describe the several services provided by cloud computing technology and also provide an insight into their working principle. The work also includes proposing a tree-structured taxonomy for organizing the characteristics of the technology. This tree-based classification enables the user to understand clearly the various services provided by this technology and allows the user for a proper comparative selection with the other technologies. The taxonomy provides a universal terminology and information that can allow simple understanding of the technology. This makes the communication faster and easier. Lastly, the taxonomy is verified using the cloud services as examples. In [6], Qu et al. examined the effects of diminished interstate speed confined on traffic discharges in Houston, Texas. The producers used TRANSIMS to demonstrate the traffic. The traffic rehearses were transported into three overflowing models: TRANSIMS (CMEM), Adaptable 5 and Flexible 6. Floods of three imperative contaminations [volatile normal mixes (VOC), NOx and CO] appeared. The majority of the three spreads models to examine the adequacy of road speed limit reduces the approach to managing decline discharges. The examination additionally demonstrated TRANSIMS’s powerlessness to demonstrate changes in speed restrained precisely due to its discrete methodology in displaying vehicular velocities. In [7], the authors David W. Chadwick et al. brief the research work conducted for 6 months in JISC/EPSRC, which was funded by a private cloud project. In the work they developed a model of cloud file storing service which involves the user to log in with their log in data from a configured trusted identity provider. Here a user can access a group of his accounts at once with the same credentials provided. Once he gains access to the accounts he can either download or upload the data and files into it. One of the special properties is the accessibility of the files and data of a user to the selected individuals irrespective of the other person being registered with cloud service or not. Standard identity protocol management is applied by the system, along with attribute-based access controls, and a delegation service. In [8], the authors Sanjeev Kumar Pippal and Dharmender Singh Kushwaha state that data management and data sharing has become a major task of challenge faced by IT professionals today. Further, the problems faced by the cloud service professionals are the data retrieval, the multi-tendency of data and its efficient retrieval. Availability of cloud service is far difficult in a heterogeneous computing environment. A simple, accurate, highly efficient and user-friendly database architecture has been designed with an inherent cloud architecture, wherein various organizations can combine to create a cloud which would not cause a negative impact on their profits and presence. An ad hoc cloud technology is the best option to reach the remote areas and provide services like education with the help of a cloud.

262

B. Rajani et al.

In [9], Linquan Zhang et al. have described the importance of cloud computing. This technology is emerging with a great speed providing easy access to several remote and usually inaccessible data. The major hurdle lies in moving the data from one geographical location over time to the cloud for efficient processing. The hard drive shipping approach is neither flexible nor secure. The current work focuses on moving dynamically produced bulk data timely and cost effectively into the cloud which can be processed further using a MapReduce-like framework. Specifically focusing on the cloud encompassing dispersed data centers, the authors design a low-cost data migration technology. This is based on two online algorithms: an OLM and an RHFC, which codes for online lazy migration and randomized fixed horizon control, respectively. These algorithms can optimize the data and select the data center for data aggregation and processing along with the data transfer paths. In [10], Muhammad Baqer Mollah et al. state the ease of accessibility of data and communication efficiency of smart devices with one another, both with internet and cloud over long as well as short distance range. Based on this a new concept called the internet of things (IoT) was developed. Offload storage of data and burden processing at cloud are some of the benefits obtained by some of the IoT smart devices which are based on cloud computing. Working at the network edge provides more advantages than cloud which can support the real-time data processing and so on. This article suggests an efficient data sharing strategy which can enable smart devices to share their data securely at the edge of cloud-assisted IoT. The research also proposes a better data searching protocol that would be secure and within own/shared data on storage.

3 Research Gaps See Table 1.

4 Conclusion and Future Work In this paper we introduced the study on secure provable multi-block level data possession in distributed cloud servers using Cloud-MapReduce method. We also discussed the existing data possession in cloud computing systems and dealt with stored data in a dynamic way to the server. During the literature survey study, number of other factors that are related to cloud computing, such as MapReduce, encryption and so on, have been studied. For future work we suggest to conduct a systematic study to bring the securely upload data on cloud.

Title

A straightforward, adaptable and proficient heterogeneous multi-occupant database architecture for ad hoc cloud

Moving big data to the cloud: an online cost-minimizing approach

Dynamic request splitting for interactive cloud applications

Author name

Sanjeev Kumar Pippal, Dharmender Singh Kushwaha

Linquan Zhang, Chuan Wu, Zongpeng Li, Chuanxiong Guo, Minghua Chen, and Francis C. M. Lau

Mohammad Hajjat, Shankaranarayanan P. N., David A. Maltz, Sanjay Rao

Article in IEEE Journal on Selected Areas in Communications 31(12):2722–2737. December 2013 with 13 Reads https://doi.org/10. 1109/jsac.2013. 131212

IEEE Journal on Selected Areas In Communications, Vol. 31, No. 12, December 2013

Journal of Cloud Computing: Advances, Systems and Applications Advances, Systems and Applications 2013; 2:5 https://doi.org/10. 1186/2192-113X-2-5

Journal

Sending intuitive applications in the cloud is a test because of the high inconstancy in performance of cloud services. In this paper, we present Merchant, a system that helps geo-dispersed, intelligent and multi-level applications meet their stringent prerequisites on reaction time in spite of such fluctuation. Our methodology is propelled by the way that, whenever, just few application components of substantial multi-level applications experience poor performance. Merchant persistently screens the performance of individual components and correspondence latencies between them to construct a worldwide perspective on the application. In serving any given demand, Merchant looks to limit user reaction times by picking the best blend of copies (possibly situated crosswise over various data focuses)

Cloud processing, quickly rising as another calculation worldview, gives nimble and versatile asset to an utility-like style, particularly for the handling of big information. An essential open issue here is to effectively move the information, from various land areas after some time, into a cloud for compelling preparing. The accepted methodology of hard drive shipping is not adaptable or secure. This work ponders auspicious, cost-limiting transfer of enormous, powerfully created, geo-scattered information into the cloud, for handling a MapReduce-like structure. Focusing at a cloud including unique server farms, we display an expense limiting information migration issue, and propose two online algorithms: an online lazy migration (OLM) algorithm and a randomized fixed horizon control (RFHC) algorithm, for advancing at some random time the decision of the server farm for information collection and handling, just as the courses for transmitting information there

Data execution and sharing is the test being looked at by most of the IT professionals today. In addition to this, the test looked at the cloud service suppliers as far as multi-tenure of data and its productive recovery. It turns out to be increasingly perplexing in a heterogeneous computing condition to give cloud services. A straightforward, hearty, query productive, versatile and space sparing multi-inhabitant database architecture is suggested alongside an ad hoc cloud architecture where associations can team up to make a cloud that does not affect their reality or profits

Objectives

Table 1 Comparative study of some methods of cloud computing with their objectives and algorithms used 24 Study on Recent Methods of Secure Provable Multi-block … 263

264

B. Rajani et al.

References 1. Anandasivam A, Blau B, Borissov N, Mein T, Michalk W, Stößer J (2009) Cloud computing—a classification, business models, and research directions. Bus Inf Syst Eng 1(5):391–399 2. Cunsolo VD, Distefano S, Puliafito A, Scarpa M. Cloud@Home: bridging the gap between volunteer and cloud computing. University of Messina Contrada di Dio, S. Agata, 98166 Messina, Italy 3. Bousdekis A, Magoutas B, Apostolou D, Mentzas G (2015) Review, analysis and synthesis of prognostic-based decision support methods for condition based maintenance (Received: 18 June 2015/Accepted: 28 November 2015 © Springer Science + Business Media New York 2015) 4. Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges (Received: 8 January 2010/Accepted: 25 February 2010/Published online: 20 April 2010 © The Brazilian Computer Society 2010) 5. Höfer CN, Karagiannis G (2011) Cloud computing services: taxonomy and comparison (Received: 1 February 2011/Accepted: 18 May 2011/Published online: 19 June 2011© The Author(s) 2011. This article is published with open access at Springerlink.com) 6. Blair G, Kon F, Cirne W, Milojicic D, Ramakrishnan R, Reed D, Silva D (2011) Perspectives on cloud computing: interviews with five leading scientists from the cloud community. J Internet Serv Appl 2(1):3–9 7. Huth CL, Chadwick DW, Claycomb WR, You I (2013) Guest editorial: a brief overview of data leakage and insider threats. Inf Syst Front 15(1):1–4 8. Pippal SK, Kushwaha DS (2013) A simple, adaptable and efficient heterogeneous multi-tenant database architecture for ad hoc cloud. J Cloud Comput: Adv Syst Appl 2:5. https://doi.org/ 10.1186/2192-113X-2-5 9. Zhang L, Wu C, Li Z, Guo C, Chen M, Lau FCM (2013) Moving big data to the cloud: an online cost-minimizing approach. IEEE J Sel Areas Commun 31(12) 10. Mollah MB, Azad MAK, Vasilakos A (2017) Security and privacy challenges in mobile cloud computing: survey and way ahead. J Netw Comput Appl 84:34–54. https://doi.org/10.1016/j. jnca.2017.02.001

Chapter 25

Insolent Tube Trickle Revealing Classification Sara Begum

1 Introduction The current plumbing work is lacking the gas leakage detection technology as well as efficient flow rate sensors along the pipeline [2]. The sensors if placed along the pipeline would allow the detection of the rate of gas flow and leakage identification but would never interfere with the flow of the gas. In case of problems in flow and leakage the actuators like solenoid valves placed internally would control the system and stop the flow of gas. The current proposal involves the advanced systems with internally placed microcontroller that would constantly observe the flow rate data of multiple sensors, thus having a hard grip on the flow rate of the gas constantly. After collecting the flow rate data from various controllers placed within the system the calculation of the average flow rate is performed by the sensors. In case of any variation in the flow rate among the various sensors, it would be an indication of the leakage at one or the other point. Thus the problem can be rectified soon. In case of a large difference between the flow rates, immediate action is required and the signal is passed by the microcontrollers to stop the solenoid valves and thereby blocking the gas flow in that region. A final alarm is sent to the user about the leakage for further processing. All these properties enable to cut down all the wastage and countless expenditure on gas. On the other hand, if the difference in the gas flow detected among various sensors with the difference lower than the threshold, solenoid sensor is sent to the cloud and the system is detected with various scenarios. The results of all such testing are presented in the article. Gas being a nonrenewable resource, proper management of available resources and avoiding the wastage and unnecessary consumption is of at most importance to the country. Application of advanced technologies and sophisticated sensor detection S. Begum (B) Department of Electronics, University of Technology, Jaipur, Rajasthan, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_25

265

266

S. Begum

system would enable the proper usage of the limited resources. This would allow minimizing the gap between the supply and demand of the gas usage [3].

2 Methodology As stated earlier, conserving these limited resources like gas is one of the major concerns for any of the developing countries. Management and proper usage of gas play an important role in the society [1]. As gas is one of the essentials of life, the quality of gas and the quantity of usage both are of high importance. Coming to the statistics of gas in a country like India it has been recorded that the loss of gas due to leakage was on an average about 30–40% of the regular usage of gas, especially in the domestic sector. This is not only damage to the country’s economy but also a major concern of public health. In 1993–1994 India has an excellent irrigation facility of~36%. In the current technology of gas flow through pipes, there are three different energy components as shown in the following: 1. Motion-based kinetic energy 2. Elevation-based gravitational potential energy 3. Energy because of pressure [4].

3 Hardware Components As the title indicates, here is a clear explanation of various parts and the working mechanism involved in automated gas leakage detection system. Various hardware components are dealt in detail [5].

3.1 Flow Sensor The flow sensor is the first most important component in the automated gas sensing device. As the name suggests, it is involved in constantly calculating the rate of gas flow in each and every chamber of the pipeline. It provides the major data input to the microcontroller (Fig. 1). This data can be converted to calculate the rate of gas flow. Several kinds of flow sensors are available. The solenoid valve (Fig. 2) is another important part within this advanced technology. It is an electromechanical device efficient in flow rate regulation and detection. These are made of ferromagnetic material which can block the flow upon demand. Upon getting attracted due to the energized rod, the passage is cleared and the flow rate is on. The valve used in this setup needs a power supply of 12 V DC.

25 Insolent Tube Trickle Revealing Classification

267

Fig. 1 Gas flow rate sensor

Fig. 2 Solenoid valve

Upon getting the energy the valve acts as a closed switch. The maximum flow rate that can be achieved in the valve is 3 L/min and the pressure is about 3 psi. Relay circuit would connect this valve to the output of microcontroller. A trigger pulse would activate the valve of the microcontroller upon detection of a leakage [6].

3.2 GPRS Module The major purpose of placing the GPRS module in the setup is to transit the information from the sensor or the microsatellite to the cloud-based servers for computing wirelessly. In the current work the GPRS module is used as a means for data transfer in the regions where there would not be any access to the internet after a certain range. Mobile network radio waves are used by the GPRS module to access the internet. The module can be made ready for communicating with the internet by using the SIM card and external power supply. Several AT commands are employed to connect to the internet via GPRS with the help of TCP/IP protocol. For the communication between the microcontroller and GPRS module a band rate of 9600 is used. In order to monitor the activities between these two devices, a serial monitor of the Arduino IDE is turned on which check for the communication like connection status, data transfer and so on [7].

268

S. Begum

3.3 Microcontroller Based on the algorithm provided by the user, the microcontroller can take decisions accordingly and can send signals to the actuator. Sensors provide the necessary input to the microcontroller. The proposed system involves reading of a simple analog data and turning on/off a relay. As no complex calculations are involved here, a simple 8-bit microcontroller like ATmega328 suffices the requirement. Furthermore, as it is a RISC-based controller, low power for operation is required [8].

4 Construction and Working The current information is related to the mechanism of working and the structural aspects of gas leakage sensing device [9].

4.1 Placement of Flow Sensor The major task to use this system is the permission from the gas board authorities to use the gas for various domestic and industrial purposes. The flow sensors are placed within the gas flow system at regular intervals making the flow to be like in compartments. The sensors are not only placed in the main pipeline but also in the further sub-branches at regular intervals. This placement allows the perfect detect of leakage and its location based on the flow rate variation among the sensors (Fig. 3). This kind of arrangement would ensure timely detection and localization of leakage. Fig. 3 Gas flow sensor placement

25 Insolent Tube Trickle Revealing Classification

269

Fig. 4 GSM/GPRS module setup

The gas enters through all the pipelines and reaches the desired location flowing through various flow rate measurement sensors [10].

4.2 Flow Rate Monitoring System The microcontroller controls all the flow rate sensors at a particular location to make the interconnection quick and enhance the accuracy of leakage detection (Fig. 4). A couple of these microcontrollers are connected to the main network to control and monitor the supply of gas in the whole region. Every sensor sends the total amount of gas passing through itself to the microcontroller. Now the microcontroller would gather the data from flow rate sensors and sends this data to the loud using a wireless GPRS [7] system. The values of flow rate are entered into a sensor cloud which can be later used if required. This method is termed as data logging [11].

4.3 Leak Detection Algorithm The valves in the machinery are all connected in series with the transistor to make it work as a controller. The valves are actually in the closed-type model. Figure 4 depicts the model of the circuit. Under normal conditions a basic signal is given to the transistor which causes the valves to open under normal conditions. This causes the flow of gas in the pipeline via the sensors and microcontrollers. In certain conditions of leakage of gas the solenoid valve closes, thereby causing a blockage of gas in the region. This is caused as the base signal is removed from the transistor. This causes the stop in the gas flow. This mechanism puts a hold on the wastage of gas in the initial stages of leakage. Figure 6 clearly depicts the condition of the mechanism during the initial stages of leak and also the condition after the activation of the

270

S. Begum

Fig. 5 Gas leakage

solenoid valve. The leak is completely stopped and the information is passed to the authorized team. In conditions where the system is in the on state the microsatellite is completely involved in flow rate monitoring. The leakage detection algorithm works in the following method. There is already a standard threshold value of rate flow difference calculated. Whenever the rate flow difference between any two consecutive sensors is higher than the threshold value, a leakage is identified by the microcontroller. The leakage phenomenon is shown in Fig. 5. The flow rate difference is entered into the cloud through GPRS as shown in Fig. 4. Upon detection of a leakage, a notification or alert is sent to the concerned authorities [12].

4.4 Integrated Gas Cut-off System In spite of the detection of leakage and alert, the most important task is to stop the flow of gas at the time of danger. This is facilitated by the automatic system for stopping the gas flow based on solenoid sensors. The sensors once detect a gas leakage, the information is sent to the cloud using GPRS system which sends the information to the authorized individuals. Simultaneously, the system automatically closes the solenoid valves and the gas flow is stopped [8] (Fig. 6) [13].

5 Results and Discussion The current work is involved in the development of an advanced technology for the detection and prevention of gas leakage in the domestic and commercial supplies. In the methodology, two flow rate sensors for the detection of the gas flow rate and a solenoid valve (12 V) to stop and on the gas flow are used. Upon testing the working of this model, it proved to be successful. Initially the valves are open to allow the flow of gas from the pipelines. During the gas flow the flow rate is recorded in all the sensors constantly. The gas leakage algorithm is then applied to calculate the flow rate. In case of any variation in the flow rate among the sensors the information is logged to the cloud. The flow is

25 Insolent Tube Trickle Revealing Classification

271

Fig. 6 Gas cut-off system

immediately stopped by hardening the valves. The cloud is connected via GPRS system that sends the information of the leakage to the concerned authorities to fix and resolve the problem in the pipelines to continue the gas flow normally. Thus it becomes an easy task for both detection of leakage and stop the gas flow to prevent health hazards using this technology. The proposed model is proved to be promising [14].

References 1. Kulshrestha M. Efficiency evaluation of urban gas supply utilities in India. Environmental Engineering Division, Department of Civil Engineering, National Institute of Technology, MANIT-Bhopal, India 2. Gas sector in India: overview and focus areas for the future, PanIIT Conclave 2010. https:// www.kpmg.de/docs/Gas_sector_in_India.pdf 3. Guidelines for improving gas use efficiency in irrigation, domestic & industrial sectors, Central Gas Commission, Ministry of Gas Resources, Government of India, November 2014. http:// wrmin.nic.in/writereaddata/Guidelines_for_improving_gas_use_efficiency.pdf 4. Openshaw A, Vu K. Irrigation leak detection-using flow rate sensors to detect breaks in an irrigation system. http://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1010&context= cpesp 5. Ariyatham T. Gas leak detection. California State University, Northridge. http://scholarworks. calstate.edu/bitstream/handle/10211.3/132789/Ariyatham-Teddy-thesis-2015.pdf?sequence=1 6. Sood R, Kaur M, Lenka H (2013) Design and development of automatic gas flow meter. Int J Comput Sci Eng Appl (IJCSEA) 3(3) 7. Deepika T, Sivasankari A (2015) Smart gas monitoring system using wireless sensor network at home/office. Int Res J Eng Technol 2(4):1305–1314 8. Suresh N, Balaji E, Anto KJ, Jenith J (2014) Raspberry Pi based liquid flow monitoring and control. Int J Res Eng Technol 03:122–125 9. Sadeghioon AM, Metje N, Chapman DN, Anthony CJ (201) Smart pipes: smart wireless sensor networks for leak detection in gas pipelines. J Sens Actuator Netw 3:64–78

Chapter 26

Evolving Database for New Generation Big Data Applications K. Raja Shekar and B. Bhoomeshwar

1 Introduction “Big data” is the trendy expression of the day. It is an accumulation of so expansive and complex data sets that it ends up hard to process, oversee, and investigate data in a just-in-time manner. The dangerous development in the measure of information made on the planet keeps on quickening and amazing us. In addition, big data is more confused to deal with productively. A fascinating Google Trend big data tidbit is that India and South Korea are evaluated with the most elevated enthusiasm, with the USA a far off third. Along these lines, the majority of huge information sellers should now center around India and South Korea [1]. In South Korea, the legislature will set up another huge server firm to enable its industry to make up for lost time with worldwide innovation monsters. This will be the nation’s first focus, which enables anybody to refine and investigate enormous information. The big data got enormous lift in South Korea. Our venture in South Korea is one of the elective frameworks that empower to register, store, and break down big data. To open the stream handling framework as key to huge information’s possibilities, we are worried about the specialized boundaries and achievement thoughts. (1) Big data safety supervision platform Figure 1 presents the complete structure of the big data safety supervision platform developed using the Apache Foundation open source technology. The concept focused here is the application of Hadoop distributed file system (HDFS) for the storage framework of big data. This in turn is based on MapReduce distributed computing technology as framework of the big data processing. The PB and ZB data storage K. Raja Shekar (B) Faculty of Computing, Bahir Dar Institute of Technology, Bahir Dar University, Bahir Dar, Ethiopia B. Bhoomeshwar Department of Computer Science, Faculty of Engineering & Technology, Mettu University, Mettu, Ethiopia © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_26

273

274

K. Raja Shekar and B. Bhoomeshwar

Fig. 1 Big data storage and processing framework

was finally achieved by the distributed file processing technology. The querying and analysis of the PB and ZB data were achieved by distributed computing technology. Several other modules that are included in the framework are big data access framework, business intelligence applications, big data scheduling framework, traditional data warehouse, network layer, server, backup and recovery, network layer, data management and so on. The same server, operating system or virtual machine is used for big data processing framework and storage to minimize the economy and scalability of hardware processing. This type of architecture can convert a simple PC to a terminal building block. The big data access framework is built upon the big data storage and big data processing frameworks. All these are connected by a layer of network. Several sub-modules are included in the access framework like Sqoop, Pig, Hive and so on. Conditions required for data analysis are attained by organizing the big data and scheduling it using big data scheduling framework. Query, analysis, statistics, and reports are carried out by big data scheduling framework, which is an intelligence application. The big data safety and protection are add-on benefits over big data backup, management and recovery framework [2]. All the concepts and principles of big data technology are covered by this framework. Big data processing is the process of integrating wide heterogeneous data resources with the assistance of accurate tools, storing of the results following the regular standards, and using of perfect technique of data analysis for stored data. The

26 Evolving Database for New Generation Big Data Applications

275

main goal here is to extract valuable information; all the results can be exhibited to several end-users [3]. With respect to the safety regulations, the simple process of big data safety supervision and traditional method of data processing are nearly the same. Figure 2 briefs the complete architecture of the big data platform that enables safety in supervision. For the generation of specific regulatory applications this platform must be combined with rich source of data [4]. • • • • • • • • •

Repositories with government statistics Historical weather information and forecasts Demographics Comments, pictures Information about traffic status in huge metropolitan areas Reviews upon the products and comments Videos posted on social network web sites Information collected from citizen-science platforms Data collected by sensors measuring different environmental conditions such as temperature, precipitation, air humidity and air quality • DNA sequencing.

Fig. 2 Architecture of the big data platform

276

K. Raja Shekar and B. Bhoomeshwar

2 Big Data Application The popularity of big data is emerging day-by-day due to its immense applicability [5]. Philip et al. [5] in one of his articles detailed about the big data and also described about the opportunities and hurdles faced by data-intensive applications. The article described several areas and importance of big data. It has its vital role in both business sector and research groups in solving several issues faced. Owing to the fast emergence of business sectors, there is a need to develop a new economic system that would redefine the concepts of producers, consumers and distributors for sales of goods and services. Definitely it is not possible to depend completely on experience or pure intuition to take any decision. Thus, it is important to approach good data services for decision making. In case of research and science, it is important to retrieve the data from various sources and references from several open sources as well as sensor networks and analyze the same. Big data concept is an approach which is applicable to several domains like scientific research, business development and intelligence, engineering and technology, government projects and so on. Few of the areas where big data is applicable are described below. (1) Healthcare industry: Medical industry has an important goal to analyze the zones of disease origin from where it may spread and cause an endemic. This has to be eradicated. This is one of the major tasks for the medical departments and disease control centers to predict the disease origin. This can be achieved only with the help of a statistical data obtained from various locations. For instance, in 2009, a different flu virus which was identical to H1N1 started spreading. This was first identified by Google which was published in the International Journal Nature. The authors of the article explained in the paper the way Google could do this prediction. Its approach was not limited to nation but was to the level of states. This was all possible to Google for identifying the most searched topic in the internet. As Google gets more than 3 billion queries per day and saves these data, it had an archive of data to work upon. Top 50 million search queries used by Americans were selected by the organization and performed a comparison with the list of Centers for Disease Control and Prevention (CDC) data on the spread of flu from 2003 to 2008. The goal was to predict and locate the zones affected by flu using the most searched queries on the net.

3 Big Data Challenges In the current decade people use the remotely sensed data in several fields for socioeconomic development together with the data of China. The data centers have gained a considerable progress compared to the previous 10 years. The challenges of the era of big data are suffering due to lag in the real-time applications. The major hurdles faced are as shown below.

26 Evolving Database for New Generation Big Data Applications

277

CRESDA is the first among all the challenges, which indicates a rapid growth in the volume of data. At present, the stored data at the center are the remotely sensed data obtained from 13 land-observing satellites [6]. Owing to an increase in the size of number of satellites and payloads in addition to the enhanced data resolution, this causes a sudden growth in the quantity of remote sensing data which is to be stored and managed by ground data processing system. Further, the satellite remote sensing data is preserved and managed forever. Based on this, there is a vast accumulation of data for applications like archive and query by users. The data of CRESDA are increasing day-by-day in 7.2 TB and also generate products more than 4.2 TB per day. All these factors are because of a raise in the data with an annual increase of 4.2 PB with respect to data obtained from currently operating satellites. In case of launch of new satellite like the five new satellites in the near 3 years, the data can increase tremendously with an approximate storage scaling to 199 PB. The second challenge is the diversity of the data collected and its applicability. In order to deal with a problem, it is important to apply satellite calibration in addition to the remotely sensed data. Other types of data to be included in the study are GIS, ground object spectrum data and so on. There are several types of data products to tackle different kinds of issues in real-life applications. This is an add-on advantage of big data challenge. The third challenge is the data speed. The problem-solving protocol has to consider time as the most important parameter before delivering the results. The data must be analyzed on time and the results should be sent. In most of the cases results on time can save a number of lives, property and a calamity from being done. The fourth challenge is the importance of system extensibility. There are a total of six systems in CRESDA for data analysis with each having their own computing platform with the storage facility and data-analyzing software. At the current situation the aim of the data centers is not only the scalability on the growth of data, but the compatibility of this data with various management systems. Also, it is a problem for the existing systems to manage data from the newly launched satellites.

4 Techniques for Big Data Processing A synergistic method has to be implemented for the processing of big data. This includes techniques like mathematics, statistics and optimization methodologies with the help of available protocols, like data mining, signal processing, machine learning and so on. This makes big data processing an inter-disciplinary approach. A detailed description of various techniques involved in big data processing is mentioned below.

278

K. Raja Shekar and B. Bhoomeshwar

4.1 Mathematical Analytics Techniques (a) Mathematical Techniques Various big data issues can be mathematically designed and solved using mathematical analysis methodology, for example, analysis of factors and correlation analysis. For the analysis of relations among various elements constituting big data factor analysis is used. Thus it is used for revealing vital information. Moving a little advance in relation analysis, correlation analysis can be applied for retrieving strong and weak dependencies. These techniques are applicable in the analysis of the fields, like biology, healthcare, engineering and economics. (b) Statistical Methods These include mathematical applications that are used for collecting, organizing and interpreting the data. They are applied in understanding the causal relationships and co-relationships. It is the most selected strategy for deriving numerical descriptions. Standard protocols cannot be applied as such for big data. In order to apply classical techniques for big data utilization, parallelization approach was used. The research domains linking this domain are statistical computing, statistical learning and data-driven statistical technique. The economic sector and healthcare industry make exhaustive use of statistical methods for several functions. (c) Optimization Methods Foundation fields like physics, economics and so on include a huge number of quantitative problems. For solving these problems, optimization methods are used. Few approaches are widely used because of their inherent ease of parallelization. They include genetic algorithm, simulated annealing and adaptive simulated annealing. Nature-inspired optimization protocols and evolutionary programming are proved to be appropriate methods for resolving problems with optimization. They are both storage-wise and computationally intensive. Several research methodologies have tried scaling of these techniques. A major need of big data applications in the current context is real-time optimization, specifically in WSNs.

5 Data Analytics Techniques 5.1 Data Mining Data mining permits the retrieval of useful information from raw datasets and visualization of it in a manner that is helpful for making decisions. The most frequently used data mining techniques include classification, regression analysis, clustering, machine learning and outlier detection. To analyze different variables and their dependency on one another, regression analysis is applicable. Organizations mostly utilize

26 Evolving Database for New Generation Big Data Applications

279

this strategy for analyzing CRM big data and calculating varied degrees of customer satisfaction and its effect on retaining customers. Moving ahead in this, there may be a need to group similar customers to gather and study their shopping patterns or their classification on the basis of certain parameters. Clustering and classification principles are applied for this purpose. Finally, outlier detection is used for fraud detection or risk reduction by the identification of patterns or behaviors that are abnormal [7].

6 Research Methodology To move beyond the current techniques in the field of machine learning and data analytics, some challenges have to be won. NESSI [3] quotes the following requirements to be important: 1. To develop an apt methodology or design, a strong scientific foundation is needed. 2. New efficient and scalable algorithms have to come up. 3. For proper implementation of devised solutions, specific development tricks and technological platforms must be identified and developed. 4. Finally, the business value of the solutions must be identified just as much as the data structure and its usability. Research and development in the following areas 1. 2. 3. 4. 5.

Programming abstracts or scalable high-level models and tools. Solutions for data and computing interoperability issues. Integration of different big data analytics frameworks. Techniques for mining provenance data. Evolution of analytics and information management in relation to cloud-based analytics. 6. Adaptation and evolution of techniques and strategies to enhance efficiency and minimize risks. 7. Design strategies and techniques to deal with the privacy and security concerns. 8. Analysis and adaptation of legal and ethical practices to suit the changing viewpoint, impact and effects of technological advances in this regard. The research directions are, however, not limited to the above-mentioned points. The main aim is to transform the cloud from being a data management and infrastructure platform to a scalable data analytics platform.

7 Conclusion At last, we can conclude that to comprehend the benefits of the use of big data, an exceptionally essential comprehension of analytics is required. We see that the enduser analytics devices assume an essential role together with the perception of data

280

K. Raja Shekar and B. Bhoomeshwar

keeping in mind the end goal to empower management and non-specialized staff to have the capacity to settle on choices in view of information. Inside the association there is likewise a requirement for taught information researchers and pro that work once a day with the preparation of data and that can comprehend the business needs of the association while adding to the accomplishment with their technical skills. The challenges of big data lie within the organization, as well as in the environment. Regulatory laws and privacy concerns will have an effect on how data will be used and may limit the usage of data from a business perspective to protect the private individual from a breach of privacy. Upon unnoticing the big data can cause a threat to the privacy of a huge number of individuals. There should be a regulatory framework that can ensure that the technology remains within the boundaries. A new ethical code can be designed which defines the use of big data technology ethically. Big data technology is extensively used in commercial field and generated a good commercial value, but its application in intelligent safety supervision has just started. So, it is vital to unite with the technical advantages of big data safety supervision system application requirements and find the importance of big data smart safety supervision, which can open new opportunities for development of the construction of intelligent safety supervision platform. Safety supervision departments at all the levels should strongly grasp this opportunity, by individual training, research and development of key technologies and other aspects, to promote the growth of big data safety supervision technology.

References 1. Kusmawan PY, Hong B, Jeon S, Lee J, Kwon J (2014) Computing traffic congestion degree using sns-based graph structure. In: Proceedings of the 11th ACS/IEEE international conference on computer systems and applications, November 2014, pp 397–404 2. Meng X, Ci X (2013) Big data management concepts techniques and challenges. J Comput Res Dev 50(1):146–169 3. Li G, Chen X (2012) Research status and scientific thinking of big data. Bull China Acda Sci 27:647–657 4. Zhao G (2013) Big data technology and application practice. Electronics Industry, Beijing, pp 56–58 5. Philip Foster I, Kesselman C, Salisbury C, Tuecke S (2014) The data grid: towards an architecture for the distributed management and analysis of large scientific datasets. J Netw Comput Appl 23 (3) (2014) 187–200 6. Brunet P-M, Montmorry A, Frezouls B (2012) Big data challenges, an insight into the gaia hadoop solution. In: 12th international conference on space operations, vol 2, pp 1263–1274 7. Assuncao M, Calheiros R, Bianchi S, Netto M, Buyya R, Big data computing and clouds: trends and future directions. J Parallel Distrib Comput (JPDC) 79(5):3–15

Chapter 27

Connotation Imperative Mining with Regression for Optimization Methodology B Bhoomeshwar and K Raja Shekar

1 Introduction Data mining process though simple is a tedious and a significant step in any research. This is an important step involving the retrieval of required piece of information from a vast data set. There are several rules and techniques for data mining, one among them being the association rule mining technique. This technique involves analyzing the relation between several data items obtained from within the large data set. The development of this technique enables main stream designs, which may include correlation, connections and associations. The protocol of this technique enables the user to efficiently extract large amount of data without much manual effort. It is now well known that this technique enables the use of large data sets and extracts the useful information within a short period of time. Earlier, several protocols and calculations were proposed claiming these correlations, and were linked to many multi-disciplinary domains. Use of these concentrated rules to resolve the problems of the real world is classified by the selection of rules. Among these rules the concentrated guidelines can be skipped and ignored. Measurement of association rank rules may cause additional trouble. The present protocols may create unsuitable, specifically if multi-level and crosslevel guidelines would involve. The regression analysis can be a measurable approach for assaying the relationship between different data sets, reliance factors and the autonomous variable. One has to predict the qualities of the subordinate factors. All those factors based on which we develop the forecast are the subordinate factors. The protocols and regressions that use the known location of data either straight or calculated except the information of the future arrangement would fall in the structure of the data. The protocol then involves foreseeing the incentive using some B. Bhoomeshwar (B) · K. Raja Shekar Departmet of Computer Science, Faculty of Engineering & Technology, Mettu University, Mettu, Ethiopia e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_27

281

282

B. Bhoomeshwar and K. Raja Shekar

calculations on the archive of information. It ignores the incentive by the application of numerical calculation on the informational collections.

2 Association Rule Mining One of the most attractive data mining protocols is the association rule mining method. This includes finding out the engaging patterns and relation and connectivity among different data pieces. Two important parameters for the exact patterns are support and confidence. Other parameters include client-enrich parameters which differ among various clients. This association rule mining method is used in the analysis of showcase data or in the retail information. In case of basket analysis of market, one differentiates the dissimilar and elevates the customer’s style. The things that are more analyzed should be organized surrounding the individual. This leads to a secure position for the purchaser. Eventually, the candidate set has to be recognized for the things that are obtained habitually. Retailers would be benefitted by the association analysis by providing different sorts of marketing, methods for stock administration, placement of things and so on. In the process of using association rule mining in social database oversaw the economy frameworks. The general protocol involves the change in the format under TID item. TID indicates transaction ID. The term item refers to the various products purchased by the clients. For a single transaction ID, there will be several sections. A client can buy many commodities based on their requirement, thus there are many sections. Association rule may be observed as below:   X(buys, computer) X(buys, Windows OS CD) support = 1%, confidence = 50% Where: Suppor t =

The number of transactions that contain Computer and Windows OS CD The total number of transactions The number of transactions that contain Windows OS CD Confidence = The number of transactions that contain Computer

Based on the minimum support and confidence provided by the client, the above specified rule can be retained, which is an excellent alternative to the conventional method. The rule of investigation of claiming association may move the applications, like banking, telecommunication and human services. Moreover, Manufacturing and all other processes are following the same strategy like observing and arrangement of stocks.

3 Related Works Agrawal and Srikant [1] in the year 1993 had made a major development in the computer science domain. They developed an algorithm and the forebear of the

27 Connotation Imperative Mining with Regression …

283

algorithms should start the frequent itemsets and application of confident association rule. This system has two phases. The first phase constitutes provoking of regular itemsets and needs support in the initial phase, whereas the second stage involves acquiring confidence wherein the frequent association rules are applied. It is proved that facts mining can be a chance for using query languages like SQL. This was opposed in creating specific black box algorithms. The blooming of Apriori was enhanced by the set of oriented characteristics of STEM. In the year 1994–1995 Agrawal et al. improved all the algorithms that were being ignored. They focused on monotonicity property and supported the confidence of association rules. A new optimization named direct hashing and pruning (DHP) was arranged by Agrawal and Srikant [1] in the year 1995. This technique focused on controlling the number of claiming itemsets. In order to predict and generate big itemsets, DHP algorithm was used. This algorithm runs on two important traits, which include: (1) generation of proficient large itemsets and (2) reducing agent transaction span of database. DHP is perfect and technically good for candidate set generation of large 2itemsets, upon the requests of magnitude, and is considered to be lower compared than the past methods; this may be in view of the goal of operating the hash techniques, thereby resolving the bottlenecks in the operations. Agrawal et al. in the year 1996 suggested that the finest features of the Apriori and AprioriTid algorithms can be combined together to form a hybrid algorithm, called Apriori Hybrid. Scale-up tests and investigations showed that Apriori Hybrid scrabbles linearly based on the transaction counts. The execution time was little slow previously due to the increased load and surge of items in the database. However, this can be managed slowly by increasing the execution speed.

4 Apriori Algorithm If the target is to find the frequent itemsets, Apriori algorithm is the correct choice. It would also involve the application of association rules in the transactional database. It involves the identification of frequently used single items, thereby making a big data set of such searches which includes all the possible items similar to each other in the existing database. This methodology is termed as bottom-up approach. It also involves mining of facts and the methodologies which are in complicated form in the database and converting them to the most simple and readable form in order to use them for further analysis. In the present algorithm the main guidelines include the subsets of regular things, visit itemsets and the supersets of both frequent and infrequent things set. Apriori algorithm uses level-based search itemsets for item k which are used to extent itemsets of size k + 1. It involves two steps to find out the frequent itemsets. The steps are described as follows:

284

B. Bhoomeshwar and K. Raja Shekar

(1) Join operation and (2) Prune operation. Join operation This operation involves both frequent itemset for pass k denoted by Lk and candidate set which is denoted by Ck. This form adheres Lk-1 with itself. Prune operation Every single subset of Ck is computed to obtain the figure which can identify the frequent set. As all the representatives of Ck need not be frequent, removing all the members having the count less than the support value is the next step. This is followed by resetting the frequent set members. In case a subset of Ck with size k is absent in Lk-1, this indicates it to be a non-frequent set, thereby getting removed.

5 Proposed Algorithm

Repeat K = k+1 Ck = apriori-gen (Fk 1 ) {Generate candidate itemsets} For each transaction t Є T do Ct = subset ( Ck, t)K {identify all candidates that belong to t} for each candidate itemset C Є Ct do σ(c) =σ(c)+1 {increment support count} end for end for Fk = {c|c Є Ck ˄ σ ({c}) ≥N x min sup} {extract the frequent k – itemsetsBy Pruning Using Linear Regression }

27 Connotation Imperative Mining with Regression …

285

Read Fk sumx = Ø sumxsq = Ø sumy = Ø sumxy = Ø For each transaction do Read Fk, Uk Sumxq = sumxq +2 Sumy = sumy + Uk * sumxsq Time of Execution (In ms) Apriori

278

Regression

141

= sumxy + F*U end for Denom = n * sumxsq – sumx *sumx Fk Ø = (sumy * sumxsq – sumx * sumxy) / denom Uk = (n * sumxy – sumx * sumy) / denom write Uk, Fk Ø until Fk = Ø Result = UkFk

6 Results The first Apriori is claimed if the main test collates. It also depends on progressed calculation for application of the transactions of five groups in the implementations. Figures 1 and 2 describe the same.

7 Implementation Simulation of Regression in Dot Net See Figs. 3, 4, and 5.

286

B. Bhoomeshwar and K. Raja Shekar

Fig. 1 Time execution with different support and confidence

Apriori

Fig. 2 Time optimization of Apriori with regression

Regression

Time of Execution (in msec) Apriori Regression

Fig. 3 Regression implemented for optimization

8 Conclusion KDD refers to the broad process of finding comprehension or knowledge in data, which is a newly introduced process within the database. This data mining technique has become very important and crucial step of research in the current times. This technique of pattern discovery becomes the heart of the facts mining. The major target of this is to find the co-occurrence of relationship. Several algorithms are developed, and association rule mining, which in turn, is the active topic for KDD. In order to locate the relations within the itemsets the current proposed algorithm can be used. The major concerns in the algorithm of mining

27 Connotation Imperative Mining with Regression …

287

Fig. 4 Screenshot of Apriori with regression in framework

Fig. 5 Screenshot of Apriori with regression in framework

association rules are the accuracy and efficiency. Being a conventional approach it still has several defaults. It begins with the adequate examination of the database. There are several subitemsets formed which are frequently yielding candidate itemsets which are not required. Simultaneously several sub-itemsets are formed which are redundant. This

288

B. Bhoomeshwar and K. Raja Shekar

causes repetitive looking at the database. Upon implementation of the developed Apriori approach, it was concluded that this modified approach is well efficient and accurate and cut down the time consumption. The current work is carried out considering short segments of data sets each time rather than the whole database at once. This would drastically reduce the time consumption by the Apriori algorithm. Rather than scanning the entire database each time during the analysis, it is scanned only once to create a large itemset. This itemset can be further used during the computations. This causes a reduction in the time for scanning the data set, thereby drastically increasing the speed of overall analysis. The unimportant data sets formed in the process can be removed by calculating the minimum support value during each pass. The most effective pruning process is carried out by this simple algorithm. The article involves combining of two association rule algorithms, which include Apriori algorithm and regression algorithm to cut down the time consumption. The work involved analyzing the generation of frequent itemsets and the number of cycles performed based on Apriori algorithm.

9 Future Scope The paper includes the description of Apriori computation. This also dealt with certain problems and restrictions in conventional Apriori estimation. It also dealt with the techniques of pre-fixed itemset-based structures and also the upgradations in the algorithm. The work included completing the interfacing step and the data pruning of the Apriori calculation. This application enables a drastic decrease in the time lapse and increased efficacy of the search. Further the proposed Apriori algorithm can be updated with bolster check and itemsets involved. The protocol involves steps like investigation of perspectives, analyzing the credibility of pre-fixed itemsets and computation.

Reference 1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceeding of the 20th international conference on VLDB, pp 478–499

Chapter 28

Study on Recent Method of Dynamic Time-Based Encryption (DTBE) and Searchable Access Control Scheme for Personal Health Record in Cloud Computing E. V. N. Jyothi, V. Purna Chandra Rao and B. Rajani

1 Introduction Cloud computing stands for providing on-demand IT resources with pay-as-yougo pricing via the internet. A solution of IT infrastructure is provided by cloud at low cost. Cloud computing is used for storing and sharing of information in the cloud environment, where computing resources are given by a third-party service provider. Generally, cloud service provider offers various services to the clients, like infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS) and database as a service (DaaS). We can use these services without having local maintenance. The client can demand services like storage service offered by Amazon EC2, IBM, Microsoft, Google and Rockspace and so on. Cloud can be deployed in four ways: private cloud, public cloud, hybrid cloud and community cloud. Public cloud allows the accessibility of systems and services easily to general public, and document uploaded on public cloud will be available to the other client. Owing to the security issue, the document must be encrypted before uploading over the cloud since data integrity is required. Using data integrity the client can know the status of his data, whether the data is modified or not. To provide security over cloud data, many cryptography techniques are implemented. By using cryptography techniques secrete key management becomes difficult, wherein the presented system, in order to privacy protection concerned surveys of research, categorized as privacy by policy, privacy by statistics and privacy by cryptography is applied on various public and personal domains. However, the privacy concerns and data sharing requirements The original version of this chapter was revised: The author E.V.N. Jyothi name have been updated. The correction to this chapter is available at https://doi.org/10.1007/978-981-15-1632-0_33 E. V. N. Jyothi (B) · V. Purna Chandra Rao · B. Rajani Shri Jagdishprasad Jhabarmal Tibrewala (JJT) University, Jhunjhunu, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020, corrected publication 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_28

289

290

E. V. N. Jyothi et al.

on different parts of the medical data may be done in distinct ways. They loose data integrity and authenticity when the data are encrypted using an attribute-based encryption (ABE) scheme under KP-ABE and CP-ABE. Defining fine-grained access structure over cloud data is difficult, and the key management is difficult if access levels are more from various backgrounds such as health domain.

2 Related Works In [1], Christopher Moretti tested that today’s distributed computing system gives very easy access to a lot of computing power. It is not simple in every case for users who are not familiar with these expansive systems effectively. A heavy workload created coincidentally by the user is accidentally abused by the shared assets and achieves outstandingly poor execution. To reduce this issue we commend the age frameworks which are able to give end-user advanced level of data hiding and allow end user for even and resourceful execution of data concentrated workloads. In [2], Mo Zhao and Kai Cao proposed accuracy along with time efficiency in calculation is link of contradictions which is unbreakable to come to a decision for continue taking place weighty traffic in sequence prediction. To amplify time efficiency of such prediction, we are functioning on a steady traffic information prediction model which is based on accurate on-line support vector regression and which also takes the hold up of registering method for sigmoid portion based on cloud model is in addition anticipated. In [3], Ruixuan Li et al. offered a lightweight scheme of data sharing. It also includes the support of CP-ABE, which is an access control technology used to run the mobile cloud environment. It changes the composition of access control tree to make it more conscientious for mobile-based cloud environments. Light weight data sharing scheme moves a broad segment of the computationally delicate access control tree to change in CP-ABE from mobile devices to exterior proxy servers. Besides, to help to drop off the customer revocation cost, it has user-friendly attributed description fields that realize with lazy revocation, which is a subtle issue in programbased CP-ABE systems. In [4], Tsang-Long Pao has put forward that the location of the centre of the typhoon is especially functional for environment prediction and typhoon examination. Furthermore, the convenience of the typhoon center can be analyzed from the IR satellite cloud image, which will have different shapes and size at different time. Toward the initializing phase, the typhoon center is unusually doubtful. When it meets the particular quality, there will be an eye that appears at the center. As the performance of the typhoon gets more grounded, the eye will get slighter in size and besides, closes toward getting to be clearer. Right when the typhoon hits the ground surface, the quality of typhoon is going to decrease and the eye may become blurred. In [5], Burak Kantarci concurrent the energy efficiency in calculation to time of use (ToU)—mindfulness, and offered a method based on virtualization, explicitly ToU-aware provisioning (ToUP) for a network among data centers over an IP. In

28 Study on Recent Method of Dynamic Time-Based Encryption (DTBE) …

291

ToUP, the traffic between two or more backbone nodes, upstream user demand will be enclosed to relate data centers. Moreover, downstream data center requests are enclosed from the beginning of multiple data centers. Traffic between diverse data centers is similarly measured for the workload allotment between these data centers. In [6], Frederic F. Leymarie presented various possibility of a medial scaffold. A various leveled relationship between these medial axes of a 3D shape in a graph is created by various combinations of medial curves and medial points. A key for particular positions of the scaffold is that it gets various parts of shape at different levels and immovably merged representation. We present an adept and precise methodology for calculating the medial scaffold which is subjected to an proposal of (propagation) transmission along with the scaffold itself, from the beginning of the stream and structuring the scaffold as the center of propagation. In [7], Sherman W. Marcus introduced the mathematical calculation of electromagnetic waves distracted by the clouds of dipoles which are clearer when the density between dipoles is close to nothing. For higher dense clouds, the coupling effects between those dipoles of clouds and their oscillation effects on the waves must be taken into consideration. This is developed by making changes in the aimlessly distributed dipoles in the cloud by a conducting continuum. The conductivity of the continuum is analyzed with the help of radar cross-segment (RCS) where a slight bit of the continuum will be matched to that of a relative segment of the dipole cloud. The thinness of this reference dipole cloud discards the need to consider between inter-dipole couplings. The reflection and transmission characteristics of the cloud are acquired by using efficient techniques for propagation with the help of effective conductors. In [8], Jin Li tried to use erasure resilient code method in a cloud storage which is of peer-to-peer type. By using random linear code and Reed-Solomon code, he deliberate on the most ideal random linear code factors in terms of information accuracy, computational efficiency and information safety. We conclude that the use of erasure resilient code in peer-to-peer cloud storage might get better information reliability and help out to drop off backup server cost. Owing to the lack of homomorphic hashing technique, random linear code would not be capable of performing well next to the malicious attacks. So, we recommend that the usage of RS code will be the most proficient way in the P2P storage cloud. In [9], Wassim Itani presented privacy as a service (PaaS), which offers huge number of techniques to ensure confidentiality and lawful conformity of customersensitive data over cloud. PaaS concentrates on security for storage and managing of user secret data by using efficient cryptographic co-processors. By using these cryptographic techniques secure implementation will be assured in the cloud computing, which is actually and coherently shielded from unapproved access of data. PaaS central structure objective is to expand the control of the users in managing the individual viewpoints which are attached to the confidentiality of perceptive data over cloud network (Table 1).

Title

Cloud computing: different approach and security challenge

Cloud computing data storage security framework relating to data integrity, privacy and trust

Privacy as a service: privacy-aware data storage and processing in cloud computing architectures

Author name

Maneesha Sharma, Amit Kumar Sharma and Himani Bansal

Amit Agarwaly, Preeti Sirohi

A. Kayssi, W. Itani, A. Chehab

Eighth IEEE International Conference on Dependable Autonomic and Secure Computing, 2009

2015 1st International Conference on Next Generation Computing Technologies (NGCT)

International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-2, Issue-1, March 2012

Journal

Present privacy as a service (PaaS) offers lots of security protocols to make certain privacy as well as genuine conformity of client data which reside on cloud network. PaaS takes into consideration secure storage and dealing out with user’s confidential data by using the cautiously considered characteristics of cryptographic co-processors. Utilizing cautiously designed techniques which gives a secure execution in cloud computing that is physically as well as logically protected from unconstitutional access of sensitive data. PaaS focal plan intention is to increase users having power over sensitive data

Data security is becoming the most vital question while selection of cloud computing services. Data confidentiality, reliability as well as conviction issues are couple of excessive security concerns in cloud computing. The upcoming of the projected model has efficient functionalities and capacity that gives the undertaking of data security, integrity and confidentiality. This model centers on the encryption and decryption method and promises the cloud user for data security affirmation. The present paper only discusses about the stretched security; however, it doesn’t discuss about performance

Cloud computing has produced tons of concern and rivalry in the field of business and is being counted among the top 10 technologies of 2010. Being an internet-based organization model, it provides internet-based services, storage and computing for consumers in every market place with budgetary, social insurance and government. In this paper we present efficient inspection on a variety of cloud network and the challenges of security that ought to be solved. Cloud security becomes most essential issue for cloud providers. This paper presents various security issues related to cloud computing

Objectives

Table 1 Comparative study of some encryption with their objectives and algorithms used

292 E. V. N. Jyothi et al.

28 Study on Recent Method of Dynamic Time-Based Encryption (DTBE) …

293

3 Conclusion and Future Work We displayed an examination on dynamic time-based encryption and searchable access control schemes for providing security for sensitive data over cloud network. Also discuss the existing methods related encryption techniques to provide security for cloud data. During this literature survey study, number of other factors those are related to encryption techniques and searchable access control systems such as attribute-Based encryption (ABE) scheme under KP-ABE and CP-ABE, defining fine grained access structure over cloud data is difficult and key management is difficult if there is access levels and are more from various backgrounds wherein health domain has been studied. For future work we suggest to conduct a systematic study for how the security policy is good with our way to deal with searchable encryption with access control and how they influence efficiency.

References 1. Moretti C, Bulosan J, Thain D, Flynn PJ, All-pairs: an abstraction for data-intensive cloud computing. In: 2008 IEEE international symposium on parallel and distributed processing 2. Zhao M, Cao K, Ho S (2007) Real-time traffic prediction using AOSVR and cloud model. In: IEEE intelligent transportation systems conference 3. Li R, Shen C, He H, Gu X, Xu Z, Xu C-Z (2018) A lightweight secure data sharing scheme for mobile cloud computing. IEEE Trans Cloud Comput 6(2) 4. Pao T-L, Yeh J-H, Liu M-Y, Hsu Y-C (2006) Locating the typhoon center from the IR satellite cloud images. In: 2006 IEEE international conference on systems, man and cybernetics 5. Kantarci B, Mouftah HT (2013) Time of use (ToU)-awareness with inter-data center workload sharing in the cloud backbone. In: 2013 IEEE international conference 6. Leymarie FF, The medial scaffold of 3D unorganized point clouds. IEEE Trans Pattern Anal Mach Intell 29(2):313–330 7. Marcus SW, A model for EM propagation through dense clouds of wire dipoles. In: 2006 IEEE antennas and propagation society international symposium 8. Li J, Huang Q (2006) Erasure resilient codes in peer-to-peer storage cloud. In: 2006 IEEE international conference on acoustics speech and signal processing proceedings 9. Itani W, Kayssi A, Chehab A (2009) Privacy as a service: privacy-aware data storage and processing in cloud computing architectures. In: Eighth IEEE international conference on dependable autonomic and secure computing 10. Shangguan W, Lu Z, Hao Y, Wu P (2007) The research of satellite cloud image recognition base on variational method and texture feature analysis. In: 2nd IEEE conference on industrial electronics and applications. ICIEA 2007 11. von Laszewski G, Wang L, Younge AJ, He X, Cloud computing: a perspective study. New Gener Comput 28(2):137–146 12. Jensen M, Schwenk J, Gruschka N (2009) On technical security issues in cloud computing. In: 2009 IEEE international conference on cloud computing 13. Sharma M, Bansal H, Sharma AK (2012) Cloud computing: different approach & security challenge. Int J Soft Comput Eng (IJSCE) 2(1). ISSN: 2231-2307 14. Sirohi P, Agarwaly A (2015) Cloud computing data storage security framework relating to data integrity, privacy and trust. In: 2015 1st international conference on next generation computing technologies (NGCT)

Chapter 29

Learning of Operational Spending and Its Accompanying Menace and Intimidations Rachana Pembarti and C. S. R. Prabhu

1 Introduction Development of the organization can be defined based on two important parameters, which include system analysis and system design. The term system design implies an idea to start a new business or to restart an old organization or replace it. Prior to the development of new ideas the current strategies and their flaws have to be analyzed thoroughly. System analysis is the method of studying and diagnosing the different principles and concepts on which an organization runs so as to identify the defects in the present system and modify it to obtain a better performance. System analyst would be assigned with this kind of work. In order to proceed with any plan, information from the financial records, like outstanding purchase orders, stock records and so on, have to be analyzed in detail. One has to identify the sources from where the information can be retrieved whether from the purchasing department or from the stock room, and so on. This only means that a clear understanding of the existing system is necessary before undertaking any developmental steps. One has to have a clear picture about the flow of information through the system. The analyst must have knowledge about the reason for a change in the operations required by the organization. He/she has to analyze the presence of hurdles in tracking the orders, money or any other issue existing within the current organization and whether the company is lagging in handling of the inventory records. Before planning for the expansion of its operation, check for the requirement of a more efficient system for handling its operations. Collecting all the above facts is a prerequisite to determine the importance of computer information system for bringing up the users of the system. Gathering all the above information is termed as system study, which is more important than any other activities. R. Pembarti (B) · C. S. R. Prabhu Department of Computer Science & Engineering, University of Technology, Jaipur, Rajasthan, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_29

295

296

R. Pembarti and C. S. R. Prabhu

Business analysts are not only involved in solving the problems of the system but have a major role in providing plans and implementation of business extension. For example, in a cloth and garment shop the system study is a promising step for business development since it is totally a new concept. The term would be involved in analyzing with at most care the requirements of business and also the solution for arranging these needs. In case of unexpected/bad situations the analyst is the most important person to propose the alternative steps to come out of the situation. The analysts propose the alternative steps that need to be taken by employees and the managers. In most of the cases the time would be of major concern as this parameter varies in various alternative concepts of business expansion. Apart from time, cost involved and the profit expected are also very important in making a change in the system. After taking the decision a proper plan for implementation of the concept has to be designed. This will include system design features, like new data capture needs, specifications of files, operating protocols and so on [2].

2 Business System Concepts The word system has now gained its wide applications, and has become a common word and a contemporary flair to refer to a thing or a process. All the domains like education, medical, information and so on, have become one kind of system. Thus using the terms as education system, medical system, information system and so on is a set of components that function interconnected toward a common goal, thereby functioning effectively. Though there are a number of definitions for system, the most suitable phrase would be an orderly grouping of interdependent components linked together based on a protocol toward an objective. Component defines the physical parts, managerial tasks or a system in a multilevel organization. The components can be either simple or complex or compound. These components in a system may be arranged as single PC with its keyboard, printer and memory or it can be like a series of continuous intelligent terminals connected to a mainframe. In both the situations components are the individual parts of the system that should exhibit their personal level performance to achieve a common target. To achieve this, an orderly arrangement of the individual components for successful development of a system is necessary. Three major applications of system study include the following. System designed must be efficient enough to reach the objective for which it was proposed. All the components must be interrelated and work together. A higher importance must be given to the objectives of the company rather than the internal subsystems’ objectives. When one part or the computer of the company depends on the other for its operations it is called interdependence. These interconnected systems would run based on a predesigned plan wherein each subsystem would depend on the output of the initial system and so on. Using various remote terminals the integrated information system is designed to fulfill the requirements of assigned users which can provide an easy

29 Learning of Operational Spending and Its Accompanying Menace …

297

and fast access. There is always interdependence between the subsystems and the users of the organization. Based on the above information, it can be clearly stated that no subsystem can be independent in its functioning and requires the output of the initial component. This dependency can also be found in tasks of system analyst, programmer and the operation staff [3].

3 Elements of a System System analyst always works in an ever-changing and challenging environment. Whatever may be the location like a business, business application etc. To restart a system there are some important points like: Control, Feedback and Environment [4].

4 Control System guidance is provided by the control element. The function of this element is to develop a decision-making system that can control the functional pattern regulating the input, processing and output. With respect to the organization concept management is that team that is involved in decision making, inflow, outflow and handling of the various activities. Systems’ behavior is directly influenced by the type of operating system and the software utilized. Output instructions will in turn define the type of input required to make the system balanced. The success and failure of the system depends on understanding the properties of the individual controlling the area containing the computer. For the fulfillment of the accepted challenge help from the management is needed [5].

5 Boundaries and Interface Boundaries are the defining parameters of a system which limits its subsystems, processing and components from interfering into the other. In case of a commercial bank teller, system is limited to the activities like deposit, withdrawal and so on, which are related to the functions of the customer’s savings account. It will not include trust activity and mortgage foreclosures. All systems would have their own limitations called boundaries which would limit its area of applicability and controlling efficiency. In case of integrated banking system there is a wide system design for the individuals having mortgage and a checking account with that same bank. They can write a check using “tesystem” to proceed for the premium payment which can be processed to mortgage loan system later.

298

R. Pembarti and C. S. R. Prabhu

For automatic fund transfer permission a new system was designed which proved to be successful. This can transfer automatically the fund from ones bank account for the payments of bills and other facilities of the user irrespective of the location and distance. This indicates that it is important to know about the boundaries of a given system to analyze its response with other systems to generate a successful design. Five most vital properties of an open system are Input from outside: These systems are inbuilt for regulation and adjustment. Under a good functioning condition open system attains a stagnant state or equilibrium. In a private organization open systems are self-adjusting and self-regulating. When functioning properly, an open system reaches a steady state or equilibrium. In a retail firm, for example, this stagnant state is identified upon goods purchase and storage neither being out of stock nor overstocked. Operating costs are directly influenced by the cost of the goods. This can maintain the firm in a balanced state. Entropy: Entropy is the energy possessed by any system. All the systems experience a fall of entropy with time. In case of open systems this effect is nullified by seeking for new inputs or process modification. In case a system exhibits no reaction to the increased cost of the merchandise soon the profit curve comes down and the business becomes unprofitable and may further get forced into insolvency by totally removing the organization. Process, output and cycles: An open business system produces a useful output and exhibits a cyclic operation with a continuous flow of path. Differentiation: Open systems have a tendency toward an increasing specialization of functions. Equifinality: The term implies that targets are reached by various courses of action and by following several paths. In many of the systems, there is more of a consensus on goals than on paths to reach the goals [6].

6 Made Information Systems In general, information would always decrease the uncertainty about the status of a firm. Example during boating the information about the calmness of wind ensures a peaceful and a successful trip to the shore. This information system is the main source of intercommunication between the analyst and the user. It can help in providing various instructions and commands. It would reveal the quality of interaction between the decision makers. It can be imagined as a decision center. Information system is a set of interconnected devices, protocols and operations of an OS developed around the basis of user criteria. This helps in communication of the user with the systems for controlling the operations. The vital aspect of system analysis to be remembered is to improve one or more criteria of information system.

29 Learning of Operational Spending and Its Accompanying Menace …

299

Most of the practitioners fail to identify that an organization has several information systems, with each designed for a cause and works to gather data flow, communications, decision making and so on. The important information systems are formal, informal and computer based [7].

7 Systems Models The system analysis models used are more developed and widespread. They have more advantages over system analysis. The system analyst first creates a model of a reality. All computer systems deal with direct external world in troubleshooting a problem or a query existing in the real world. For example, a telephone switching system is made up of clients, handsets, dialing pad, conference calls, and so on. The function of the analyst is to build this model before dealing with the functions to be performed by the system. Several business system models are used to analyze the benefits of abstracting complex system to model form. The major models are schematic, flow, static and dynamic system models [8].

8 Static System Models This type of model shows a single pair of relationship such as activity–time or cost–quantity. The Gantt chart would provide a stationary picture of an activity– time relationship. Preplanned activities are plotted against relation to time. The date column indicates the total amount of time required to complete the activity. The stamping department is scheduled to start working with respect to the order 25 on Wednesday morning and would complete the job by the same evening. One day is also planned for order number 28, two days for order number 28, two days for order number 22 and two days (May 10 and 11) for order number 29. The heavy line opposite to the stamping department exhibits six days. Broken line is an indication of the fact that department is two days slower than the schedule. The arrowhead dictates the date of effect of the chart practically. In order to solve any issue, awareness of the problem is a must. The principle for a candidate system is identification of a need for developing an information system. For example, a supervisor may require to study the system flow in purchase. This requirement leads to an initial basic survey or an investigation to determine whether an alternative system can tackle the problem. The parameters like flaws of the current system, its hurdles and so on are studied in detail before going for the alternative systems [9].

300

R. Pembarti and C. S. R. Prabhu

9 Testing Libraries Test libraries are the checkpoints which are involved in ensuring the system accuracy. They keep testing all the parameters of the system. It is a set of rules or data which is used internally for testing the system of a program. It is located in the magnetic disc of the machine in a readable format which can be used by all the participants involved in the program. For example, a huge research system contains hundreds of computer programs. Data and file formats shared are all in common. All will process similar transactions and would also update records and retrieve data for responding to the queries or to prepare reports and documents. As all of these programs are interdependent and process-related transactions, it is feasible to use a common set of data to test each program. Not only initial testing but re-testing is a must to be applied for all the testing libraries upon modification or updating of the programs. These libraries would be maintained through the life of the system to obtain reliable data irrespective of the changes made to the system [10].

References 1. Suman DR, Wenjun Z (2015) Social multimedia signals: a signal processing approach to social network phenonmena. ISBN-13: 978-3319091167, Springer International Publishing Switzerland 2. Larsen PO, von Ins M (2010) The rate of growth in scientific publication and the decline in coverage provided by science citation index. Scientometrics 84(3):575–603. PMC. Web 3. Peng L, Cui G, Zhuang M, Li C (2014) What do seller manipulations of online product reviews mean to consumers? (HKIBS Working Paper Series 070-1314). Hong Kong Institute of Business Studies, Lingnan University, Hong Kong 4. Thomas B (2013) What consumers think about brands on social media, and what businesses need to do about it. Report, Keep Social Honest 5. Liu B (2012) Sentiment analysis and opinion mining. Morgan & Claypool Publishers 6. Yin Z, Rong J, ZhiHua Z (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 7. Nisha J, Kirubakaran E (2012) M-Learning sentiment analysis with data mining techniques. Int J Comput Sci Telecommun 3(8) 8. Richard S, Alex P, Jean YW, Jason C, Christopher DM, Andrew YN, Christopher P (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on empirical methods in natural language processing 9. Farah B, Carmine C, Diego R (2007) Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: International conference on weblogs and social media-ICWSM, Boulder, CO, USA 10. Saifee V, Jay T (2013) Applications and challenges for sentiment analysis: a survey. Int J Eng Res Technol (IJERT) 2(2) 11. Hassena RP (2014) Challenges and applications. Int J Appl Innov Eng Manag (IJAIEM) 3(5) 12. Mitchell PM, Mary AM, Beatrice S, Building a large annotated corpus of English: the Penn Treebank. Comput Linguist J 19(21993) 13. Schukla A (2011) Sentiment analysis of document based on annotation. CORR J arXiv:1111. 1648

29 Learning of Operational Spending and Its Accompanying Menace …

301

14. Kasper W, Vela M (2011) Sentiment analysis for hotel reviews. In: Proceedings of the computational linguistics-applications, Jacharanka Conference 15. Zhang L, Hua K, Wang H, Qian G, Sentiments reviews for mobile devices products. In: The 11th international conference on mobile systems

Chapter 30

Learning on Demonstration of Control Classification Expending HVDC Links B. Ravi Teja and Swathi Sharma

1 Introduction The most commonly used type of current in household and for industrial purpose is the alternating current (AC). However, it becomes expensive when compared to DC in case of long transmission lines above 600 km. When compared to the DC transmission line, AC is more complicated due to its frequency as well as dependence on transfer of power based on the angle difference between the voltage phasors at both ends. These factors do not lay boundaries to the DC transmissions. This led to the buildup of long transmission lines of high-voltage direct current [1, 2]. To transfer bulk power for longer distances, a high-power electronic technology, called HVDC is widely used by electric power systems. The transmission of direct current involves two-step conversion: one at the sending end where AC is converted to DC and the other at the receiving end where DC is converted to AC. Converter stations conventionally perform these conversions. The converter can be changed from rectifier to inverter and back to rectifier by simple control actions. This facilitates the reversal of power. The pathway of development of HVDC transmission was paved by the invention of high-voltage mercury valve. The first commercial HVDC that connected two AC systems came into operations by the year 1954 as a submarine cable link lying between the Swedish mainland and the island of Gotland [2]. At global scale, this is the most prevalent technology of the current days. In recent times, one of the HVDC was developed using thyristors and the configuration of current source converter (CSC). Further advancement in the HVDC technology has been made currently.

B. Ravi Teja (B) · S. Sharma Department of Electrical Engineering, University of Technology, Jaipur, Rajasthan, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_30

303

304

B. Ravi Teja and S. Sharma

2 Literature Review Many previous works have introduced the HVDC technology and its configuration, which includes monopolar, bipolar back-to-back and multi-terminal processing [3, 4]. The basic principle of HVDC is the transfer of electric energy from one node to another or from one AC system to another in any direction. The same is depicted in Fig. 1 in a more simplified and schematic version. HVDC system includes three blocks: one DC line and two converter stations at the two ends. These converter stations are further divided into different components facilitating the conversion of the current from AC to DC and so on [5, 6]. The Graetz bridge used in the AC/DC or DC/AC conversion has been described in two modes: with commutation and without commutation. The HVDC lines are presented to represent the circuit. The assumptions made during the analysis are as follows: (i) The converter’s valves are ideal and do not have any drop of arc voltage. (ii) The voltage and frequency delivered by the AC source is constant. (iii) There is no ripple to the DC voltage and current. The magnitudes of AC bus voltage, their angles as well as the DC variables are the three parameters that define the operating condition of the AC/DC system where DC link is present. The variables of DC consist of ring angles at the converters, transformers’ tap positions that are located at the converter stations, voltages of the DC at the inverters and rectifiers, and DC across the line. In order to achieve enough voltage through the valve before ring, rectifier has the minimum limit of 5°. To maintain the room for increasing voltage for power control, typically rectifiers operate within a range of 15°–30°. Communication failure can be avoided by maintaining the minimum extinction angle of the inverter [7]. Several other reasons for the communication failure include AC system faults causing voltage dips, shift of phase angle of line-to-line voltage as well as DC current enhancement due to the inverter’s side faults [8]. A typical two-terminal HVDC lines’ DC control primary functions include (i) Power row control across the terminals. (ii) Protection to the equipment from current/voltage. HVDC is brought to control using gate control of valve ring angle. Another method is by changing the tap of the converter transformer for the AC voltage control. One of the works have explained the AC/DC converter modeling and proposed a way Fig. 1 Transfer of electric energy in HVDC

30 Learning on Demonstration of Control Classification …

305

to incorporate the equations for converters of DC and lines of transmission in the Newton Raphson AC system power flow [9]. Per unit form of expression is used for DC equations which are compatible to the per unit system of AC equations. These conversions make the solving of AC and DC equations simultaneous rather than serial. The current work includes the assumptions and hypothesis made in the derivations of equations of AC/DC converter. In case of HVDC link is present in the AC system, inclusion of the power at the converter stations modifies and facilitates the power balance equations at the AC terminal. There are three categories for the methods of load flow calculations of mixed AC/DC systems. (i)

A sequential approach was proposed by Sangavi and Banarjee for conducting the analysis of 1oad of an integrated AC/DC power system. This approach proposed by the team works by separately solving the equations of DC and AC systems. (ii) In extended variables/united method, both the AC and DC systems are combined and solved simultaneously. This method is more effective than the sequential method in AC/DC power system owing to its better computing efficiency and convergence. (iii) Eliminated variable method, wherein the converters are subjected as voltagedependent loads. This method also involves the elimination of DC variables from the power Row equations.

3 Converter Performance Analysis Figure 1 shows the HVDC converter or the Graetz bridge. Typically, two of the sixpulse converters are used that are connected as Y-Y as well as the Y-transformers. This process facilitates to eliminate the DC-side multiples of sixth harmonics resulting in the significant reduction of harmonic filters. The conclusions of the analysis are as follows [5]: (a) (b) (c) (d)

The voltage supply phases are all identical which are all displaced by 120°. Constant and ripple-free nature is observed in direct current (Id ). It has unchanged transformer leakage reactance. The valves prove to be ideal switches.

Control selection basics The following measures should be taken into consideration for selecting the controls: (1) Large fluctuations in the DC due to AC system voltage variations have to be prevented. (2) Direct voltage has to be maintained close to the rated value. (3) The power factors influencing the sending and receiving ends have to be maintained quite high.

306

B. Ravi Teja and S. Sharma

Following are the logics of maintenance of high power factors: (a) For minimizing valve stress (b) To reduce the losses and equipment rating to the AC system with the connected converter (c) For reduction of the drops of the voltage occurring at the AC terminals due to the increasing load (d) To lower the cost of reactive power supply to the converters.

4 AC/DC Power Flow The solution techniques have two different approaches for AC/DC power flow. They include a sequential approach and a united or simultaneous solution method. In sequential solution method, equations are solved separately for AC and DC system in each iteration until the terminal conditions of converters are satisfied. The sequential methods are simple and easy to apply due to their modular programming. The implementation and incorporation of different control specifications are quite easy. In united method the equations of both the AC and DC systems are combined together with the residual equations. This describes the behavior of the rectifier terminal in one common set of equations. Better reliability lies with the united method due to its better computing efficiency and convergence. Hence, even if it might be more complicated to program, it seems to be better suitable than the sequential method for use in industrial AC/DC power systems. The Newton–Raphson (N-R) method, the fast-decoupled N-R method and the Gauss–Seidel (G-S) method can be used for solving power flow problems of AC/DC systems as they do in the pure AC systems. An accelerating factor is required by the G-S method to enhance the quality of iteration process due to its slow convergence rate. The N-R method seems to be more attractive because of its powerful convergence characteristics. It seems to be a promising and attractive technology in resolving the AC/DC power flows.

5 Conclusions Since its first installation of HVDC in Gotland, Sweden in 1954, there has been an ever-increasing interest in HVDC technology. There are a number of advantages and applications due to the high flexibility of control available with HVDC transmission. The current work aims to perform the load flow and optimal power flow analysis of an AC/DC system. The article includes the discussion related to the load flow analysis of different configurations of both single and multiple DC transmission lines. In order to perform point-to-point DC transmissions for different operating modes MATLAB simulations are performed. Out of seven DC variables specified

30 Learning on Demonstration of Control Classification …

307

for power transfer through the DC line, any three are used to define a particular mode. Section 4 describes the different modes of operation. In case any unspecified variable exceeds its boundary limits, then transition takes place from one mode to the other. In such conditions the HVDC operating mode shifts to a new mode. During this transition among the modes, the variable exceeding its operating limit is fixed at the limit and any other variable that is actually fixed is made free to vary. 1. It is possible to perform the interconnected AC/DC system’s transient behavior analysis during both switching and fault conditions. 2. A generic approach of modeling is required for all types of HVDC systems including all the possible configurations that will minimize the complexity involved in the power system annotations.

References 1. Meah K, Ula S (2007) Comparative evaluation of HVDC and HVAC transmission systems. In: Power engineering society general meeting, pp 1–5 2. Kundur P (1994) Power system stability and control. Tata Mcgraw-Hill Edition, New York, pp 463–579 3. Arrillaga J, Liu YH, Watson NR, Flexible power transmission: the HVDC options. Wiley, England, pp 21–95 4. Setreus J, Bertling L (2008) Introduction to HVDC technology for reliable electrical power systems. In: Probabilistic methods applied to power systems, pp 1–8 5. Tzeng Y, Chen N, Wu R-N (1995) A detailed R-L fed bridge converter model for power flow studies in industrial AC/DC power systems. IEEE Trans Ind Electron 42(5):531–538 6. Kimbark EW (1971) Direct current transmission, vol 1. Wiley Interscience, New York, pp 21–95 7. Engstrom PG (1964) Operation and control of HVDC transmission. IEEE Trans Power Appar Syst 83(1):71–77 8. Zhang L, Dofnas L (2002) A novel method to mitigate commutation failures in HVDC systems. In: Power System Technology, vol 1, pp 51–56 9. Braunagel DA, kraft LA, Whysong JL (1976) Inclusion of DC converter and transmission equations directly in a Newton power flow. IEEE Trans Power Appar Syst 95(1):76–88

Chapter 31

Cloud Computing Security in the Aspect of Blockchain K. Praveen Kumar and Gautam Rampalli

1 Introduction IoT considers all the equipments that are involved in generation, processing and exchange of large amount of data while maintaining the security and confidentiality. In view of its major focus on security and data protection, it can be promising approach for various cyber-attacks [1] and cyber-crimes. The main properties of IoTbased devices are their light weight and less consumption of energy. Thus almost all the energy of the devices and their computation is used for executing the functioning of the main application, making the work affordable and retaining its privacy. In order to accompany all the above criteria, the conventional protocols would become expensive with respect to consumption of energy and overhead-processing. Most of the currently known security setups are fully centralized and would not be compatible for IoT because of several factors like scaling problem, traffic nature being many-toone approach and failure point being single [2]. IoT-based applications are most of the times void of providing personalized services [3] of privacy protection due to the noisy and incomplete data provided by the existing conventional methods. IoT system needs devices which are light in weight, scalable and with well-distributed security system and privacy concerns. The first crypto-currency system [4], the bitcoin is based on the principle of BC blockchain technology. This system has overcome all the above specified problems in conventional devices due to its secure, private and distributed quality. The IoT system is a complex system involving the integration of various technical and social domains. In spite of the broad spectrum of research on IoT, its description remains to be fuzzy [1]. The data protection and confidentiality of the content becomes a great challenge to engineering in view of the increasing demand and diversity of IoT-based devices. A minor disturbance to the security system of the IoT-based devices can lead to a heavy loss with respect to data confidentiality, K. Praveen Kumar (B) · G. Rampalli Assistant Professor, Department of Information Technology, Kakatiya Institute of Technology & Science, Warangal, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_31

309

310

K. Praveen Kumar and G. Rampalli

authentication attack, hurdles in the integrity of the service and so on. However, privacy and anonymity of the systems must be prioritized and these parameters must be integrated into the system design to provide a control on their data security to the end users. In order to accomplish the above-specified criteria, a new methodology has been proposed in protecting the security and transparency of the data. This protocol can be replaced by the current conventional protocols. This approach is closed and obscure in contrast to the open and transparent approach. Blockchain method and technology has been accepted [2] by some initiatives. From the hurdles faced during the development of devices based on IoT technology, few of them include lack of formalism, modeling architecture and language which are important in the independent development of this technology among several disciplines of semantic web stack. In addition to the regular problems of scalability and complexity, there are certain other problems which include time latency, transaction dependence on confirmation number and contradicting IoT concepts related to real-time processing [1]. Bitcoin network transactions are visible to all the nodes. This becomes a drawback for the situations requiring controlled environments [3]. In this case it becomes a necessity to understand the ways to use the traditional software protocol to support the new requisites of blockchain-based IoT protocols. In order to reach the aim of the research, all the critical factors of IoT paradigms were systemically mapped. The work focuses on the following questions: (i) Construction of blockchain-based IoT stand on development processes and (ii) Principles that has to be considered during blockchain-based IoT development processes. Understanding the blockchain-based IoT domains is the major criteria of the current research. It also involves in analyzing the latest designs and principles updated concepts in the construction and development of IoT-based devices. A brief study on various important aspects of security, data privacy, confidentiality and so on based on IoT was discussed. Apart from its academic benefits the research is of industrial importance for the development of secure database system. The industries can be benefitted from data by applying this for the advancement and development toward the new IoT-based systems.

1.1 The Blockchain Overview This is an online digital ledger that works universally for the maintenance of the financial system similar to the bitcoin system. All the transactions of participants are recorded by the blockchain system. The privacy of the blockchain system is maintained and all the operations are verified by cryptography. Each transaction is screened based on various principles leading to a good redundancy in data verification and reward for the computational work. The use of blockchain technology imparts characteristics like transparency, democratic, decentralization, security, and efficiency to the organizations. It is employed

31 Cloud Computing Security in the Aspect of Blockchain

311

in the assessment of financial services; this maintains consistency in standards of the processes and global reconnaissance at a long range.

1.2 The Blockchain Ontology Blockchain ontology with dynamic extensibility (BLONDIE) ontology is the first and foremost effort to standardize this technology. A first effort to standardize this technology is the BLONDIE ontology. The OWL ontology can be employed in RDF on various structures of Ethereum or Bitcoin. This can be further applied to other various blockchain technologies. Extensive knowledge can be made available in view of the OWL of BLONDIE [3]. According to Amoozadeh et al. [3] use of the actual bitcoin technology or its alternatives with minute modification would be an ideal scenario. Though this is a self-standardized and validated system, it has several limitations, one of it being limited to its usage in financial transactions. The interoperability among various blockchain technologies is the point of consideration in the current era. The devices must be made self-sufficient in communicating and resolving problems like updating software, management of bugs and energy usage monitoring.

2 Research Methodology Four major steps are involved in research methodology. This research describes the step 2 related to systemic mapping. This is described in detail in the Fig. 1 that is adapted from [5].

2.1 Risks Involved in Cloud Computing and Security Issues Seven important security problems were identified by Gartner in 2008. All the problems are specified below, which needs to be addressed before applying the cloud computing technology in all the organizations. (1) Location of data: Once stored in the cloud the data location is difficult to be identified by some clients. (2) Regulatory compliance: Choice of selecting the providers by the client based on the permit for third-party checking. (3) Segregation of data: Data gathered from various locations and organizations is stored at a common location, and there must be an internal system that can separate and filter the data from each client.

312

K. Praveen Kumar and G. Rampalli

Fig. 1 Scientific methodology steps

(4) Long-term viability: it allude to withdraw an agreement and all information if the current supplier is brought out by the client.

3 Current Trends and Challenges This section would deal with various trends and challenges that are faced by the IoT protocol based on blockchain system. One of the article [4] deals with the concept that any flaws in the IoT-based applications may lead to risky attacks on the security and confidentiality of the information, and the attacks may be done on authentication of the information or on the networks available like DoS. Privacy of the data is also an important issue to be considered. Being natural collectors and distributors of the information, these IoT-based devices face a unique and a serious challenge to the privacy of individual data. Other challenges to this system may include interactions of clients and users with smart objects and groups. The concentration of data on these platforms is uncontrolled and lack transparency. The end users are prone to risks related to localization of data, data monitoring, manipulation, tracking, profiling and so on.

31 Cloud Computing Security in the Aspect of Blockchain

313

In, the authors dealt with several factors which affect the integrity and adaptability of blockchain. Further analysis must be undertaken to detect the security properties and proof of work, which is a key factor in achieving the distributed consensus. The transactions are processed through smart contracts using Ethereum platform. The authors of one of the research article [6] studied on the concept of safety and security behind running of the transactions on Ethereum platform on an open distributed network. Several new hurdles have been recognized and reported by different authors [6, 7] highlighting the problems in understanding the distributed semantics of underlying platform. Oyente which is an execution tool was proposed by these authors to enhance the operational semantics.

3.1 Risks Involved in Cloud Computing The six major zones of cloud computing operations that require high security attention are as follows: • • • • • •

Security of data in transit Security of data at rest Regulatory issues and legal matters of cloud Robust separation protocol of the data belonging to different customers Authentication of users/applications/processes Incident response.

3.2 Threats in Cloud Computing All the security hurdles that the current systems in practice face are the same challenges faced by the cloud computing technology. These risks may be faced in different structures. The following main problems were studied by the cloud security alliance (Cloud Computing Alliance, 2010): • • • • • • • •

External customers attack Insecure application programming interfaces Perimeter security model broken Failures in providing security Abuse and nefarious use of cloud computing Availability and reliability issues Legal and regulatory issues Integrating provider and customer security systems.

314

K. Praveen Kumar and G. Rampalli

4 Concluding Remarks and Future Work Several systematic mappings were performed to identify the development process used and the factors influencing the blockchain-based IoT building. The aim of the research is to develop a blockchain-based protocol for IoT projects with updated technology and highest security and hurdle-free environment compared to the current protocols. The blockchain-based IoT is a new era research which is not well trodden. The research reveals some frameworks and models for blockchain-based IoT technology which have good adherence to various developmental protocols. The main aim of the research is to develop a best blockchain-based IoT protocol that can cross all the regular hurdles faced, best in application and device construction.

References 1. Das ML (2015) Privacy and security challenges in internet of things. In: Distributed computing and internet technology, pp 33–48 2. Ho G, Leung D, Mishra P, Hosseini A, Song D, Wagner D, Smart locks: lessons for securing commodity internet of things devices. In: Proceedings of the 11th ACM on Asia conference on computer and communications security 3. Amoozadeh M et al (2015) Security vulnerabilities of connected vehicle streams and their impact on cooperative driving. IEEE Commun Mag 53(6):126–132 4. Skarmeta AF, Hernandez-Ramos JL, Moreno M (2014) A decentralized approach for security and privacy challenges in the internet of things. In: 2014 IEEE world forum on internet of things (WF-IoT) 5. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system 6. T. Project. https://www.torproject.org/ 7. de Montjoye Y-A et al (2014) openpds: protecting the privacy of metadata through safeanswers. PloS one 9(7)

Chapter 32

Analysis of Various Routing Protocol for VANET Akansha Bhati and P. K. Singh

1 Introduction Ad hoc mobile network (MANET) is a network without infrastructure that can configure itself to connect to mobile devices using wireless channels. It is used to provide each device with the information it needs to route traffic correctly. In addition to security applications, VANET also delivers valuable real-time information to users such as transportation systems, weather information, e-commerce via mobile devices, internet access, and other multimedia applications. Routing in vehicle-specific networks is a daunting task due to unique network characteristics such as high node mobility, dynamic structural changes, and highly segmented networks. The performance of a routing protocol depends on several internal factors, such as node mobility, and external factors, such as road topology and obstacles, which block signals. Vehicle networks can be built to be ideal for road safety in many industrial applications. Wireless systems have allowed most of the features in our lives and have improved our daily performance. For wireless systems, IVC has several important aspects: reduced access time due to direct connections, better protection, and no service fees. Temporary networks operate without infrastructure. The VANET network that handles 802.11 WLAN progress has drawn considerable attention. Since the car manufactured using Wi-Fi equipment refers to the mobile node (host), internet access in VANET has gained a lot of power in recent years. Their value is increasing, mainly determined by automakers and government companies, as well as by the education community. Vehicle systems can be used to understand visitor bottlenecks and provide better performance. The location of the inter-vehicle communication (IVC) can be another place where the wireless communication system may have a significant impact. The IVC has generated great interest in research and the automotive market, which will help provide intelligent transportation (IT) as well as driver A. Bhati (B) · P. K. Singh Electronics & Communication Engineering, Radha Govind Group of Institutions, Meerut, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_32

315

316

A. Bhati and P. K. Singh

and travel-related services. VANET is one of the thin, dedicated mobile networks (MANET) that transports between nearby cars and between cars. VANETS is the goal of manufacturers who want to manufacture cars in smart comfort applications. When implementing Wi-Fi products, rural information (cases, visitor blocking, etc.) should be obtained in a timely and efficient manner by distributing multi-hop data efficiently and frequently, making it more difficult for VANET networks to plan for many security, convenience, and mission applications. Vigilance collision systems, street receivers, and readers improve the driver’s basic information products to determine the highest course of action to avoid the haste and accidents of visitors. The special features associated with VANETs allow the introduction of innovative and ideal services.

1.1 Applications of VANETs The applications represented in the most relevant fields are safe and comfortable. 1. Comfort Application: These types of programs enhance the convenience of the traveler, the effectiveness of the visitor, and/or the upgrade of the selected route to a location. Some examples of this category include: visitor details system, weather status details, gas service location or restaurant location, cost details, and entertainment connections, such as browsing the internet and downloading songs. 2. Security Applications: The benefit of this category is to enhance the security of passengers by sharing security details with IVC. The details are either provided to the driver or used to configure the engine associated with the dynamic security system.

2 Background Ayyappan and Kumar [1] Wireless sensor networks have become one of the latest technologies in modern times. This happens because the internet of things (IoT) and VANET are assisted directly or indirectly by the WSN. The range of wireless sensor networks is becoming very fast throughout the year, so protecting wireless sensor nodes is critical. In this article, several security protocols that could be used in wireless sensor networks are investigated. Ali et al. [2] A wireless sensor network is a collection of small devices called sensor nodes, gateways, and software. These nodes use wireless transmission to sense data and transmit it to other nodes. Typically, a WSN consists of two types of nodes: a common node and a gate node. A common node that is aware of the portal node is used to direct this information. The internet of things has now expanded to IoET (Internet) to cover all of its surrounding electronic products, such as body

32 Analysis of Various Routing Protocol for VANET

317

sensor networks, VANET, smart electric websites, smartphones, PDAs, self-driving cars, refrigerators, and smart toasters, which can be technology-based. The current network of communication and sharing information. The nodes sensed by the WSN have a very limited transmission range and processing speed, low storage capacity and battery power. Despite the wide range of applications using WSN, the nature of its resources is limited by many serious security attacks, such as selective routing attacks, jamming attacks, pelvic attacks, crowd attacks, cyber-attacks, flood welcome attacks, gray holes, and the most dangerous black-hole attack. Attackers can easily exploit these vulnerabilities to block WSNs. Pradhan et al. (2018) Due to the increase in private transportation, ad hoc network (VANET) has become a prominent research field for many years. Due to features such as high dynamic topology, intermittent connectivity, limited node mobility, and variable network density, VANET networks face significant challenges in data transmission between nodes. The focus of this work is to design solutions for various routing-related issues in VANET’s dynamic environment. They used low power and loss network routing protocol (RPL) to solve this problem. RPL is a tree-based IPV6 routing standard primarily used in wireless sensor networks (WSNs). Many aspects of the RPL design apply to dynamic network topologies and can be transferred to the vehicle environment. It provides a comprehensive assessment of RPL simulation performance under high dynamic VANET. Cooja uses a Contiki-based simulator to simulate the model. Simulation results show that many parameters of the vehicle environment have a significant impact on protocol performance. They use delay as one of the parameters for evaluating efficiency in RPL. The results show that RPL performance is very important in solving the data communication challenges in VANET networks. Tian et al. [3] This paper focuses on driving safety routing protocols that collect vehicle status information from VANET-WSN roadside sensors. Routing protocols (tree-based routing protocols) for low power and loss networks (RPLs) naturally adapt to their routing requirements. However, RPL is designed for fixed wireless sensor networks, so RPL needs to be modified to accommodate the high dynamic topology of VANET-WSN. For the first time, they used geographic information (GI) as a measure of RPL (GI-RPL) to achieve RPL in a timely manner. They also proposed some strategies for RPL adjustments in VANET-WSN. To demonstrate the performance of GI-RPL, they used Cooja to build simulations and compare them to other modified RPLs. Simulation results show that GI-RPL has a higher percentage of package delivery (PDR), reasonable overhead, and lower latency. Kaur et al. (2016) Each car participating in VANET turns it into a wireless router or node through which the network is converted to a wide area network. The two main issues with VANET are privacy and security. Because there is a very powerful environment in VANET certification calculations, it is even more. At the same time, most privacy plans are vulnerable to Sybil attacks. This paper proposes a lightweight authentication scheme that forms a secure communication system with vehicles and roadside units, as well as with other vehicles inside the vehicle. In VANET, the privacy of anonymous legitimate nodes is crucial, and anonymous authentication of these legitimate nodes is also necessary. Unfortunately, many privacy schemes

318

A. Bhati and P. K. Singh

are sensitive to Sybil attacks. Preventing and detecting Sybil attacks in a privacyfriendly VANET environment is a huge challenge. The timestamp method is used in this method, and the computational rate for verification in the traffic congestion area is also significantly reduced. The vehicle unit will be protected by keeping the actual condition of the vehicle confidential. Remya (2015) In today’s situation, crime is common in every sector of society. However, it has been observed that in most cases, cars are involved in some way. In order to make roads safer and improve overall road safety, many of the technologies that make up the ITS network have been integrated. An example of this field is the “automotive network” (VANET), a sub-category of “mobile network” (MANET). At VANET, cars act as mobile hotspots, in addition to a fixed infrastructure, and can communicate with each other to create a separate dynamic mobile network. This article focuses on road monitoring and security system models by integrating different wireless network technologies/devices such as smartphones, WLANs, and MANETs. This article also discusses an unsafe environment that addresses the need for an effective monitoring system and how the model approaches the solution. They used MATLAB tools to simulate and validate the proposed model. Successful implementation of any such system will help control crime rates and may help provide the necessary evidence. Saggi (2015) Advances in wireless communications have enabled researchers to conceive and develop the concept of a vehicle network, also known as a vehiclespecific network (VANET). In the Sybil attack, the WSN destroys stability through malicious nodes, which create numerous fraudulent identities that facilitate the disabling of network protocols. In this paper, a new technique is proposed to detect and isolate Sybil’s attack on vehicles, thereby improving network proficiency. It works in two phases. In the first phase, the RSU registers the contract by specifying its credentials. On successful verification, the second phase begins and the vehicle identification is assigned. Multiple identities generated by Sybil attacks are very harmful to the network and may be misused to propagate error messages over the network. The simulation results show that the proposed detection technique increases the detection probability and reduces the Sybil attack rate. Mathematical Analysis of Proposed Algorithm Integrate the real-time model to the grid algorithm. Inputs: Grid size (N) Number of lanes: Z; Direction of departure and destination of each zone. Given: Each pass contains the cells of the museum and is very selectable from each other. Each unit needs time, and there is one unit time. Output: Z no. of paths which are SHORTEST possible without clash. Z(t) is time domain for AI ´ = Mean(z); Z(t) ´ + δt) = Z(t) ´ + R−1 (Z) * ▼z * z(t + δt) Z(t

32 Analysis of Various Routing Protocol for VANET

319

How Shortest Path Calculation Processed By Real-Time Information Model • • • •

The grid algorithm is counting the least route. It is the beginning and destination. Then this includes factors that affect the shortest route. This get the shortest way in real-time information models that are more suitable for the user.

3 Result with Simulation The proposed studies include the A* algorithm and the better TEEN algorithm. This matches MATLAB, and the results are displayed in both graph and numeric format. TEEN • TEEN (threshold sensitivity energy efficient sensor network protocol) is for passive networks. • The TEEN protocol is for a simple sensing application. • Energy-saving protocol compared to traditional sensor network protocol. Functioning As each cluster changes, each node cluster becomes a CH interval, and in addition to the attributes, the cluster head distributes its members strictly and flexibly. A* star process Render the way from a closed list After all, repeatedly, and once the goal is achieved, the practice begins with the purpose of restart. This is usually done by the bug through a closed list starting with the node’s parent and the first node’s parent.

4 Simulation and Result The number at the top indicates the data displayed after running MATLAB. Note that the minimum path data result is the result, but it is just a ramp unit. There are many graphical user interfaces. This is only a small part of the real time and is automatically executed in the selected area. It is a circular robot that represents a separate car throughout the day, but reduces simulation time. But the ultimate goal is to automatically add key features to automation in a minimal amount of way (Figs. 1, 2, 3, 4, 5 and 6).

320

A. Bhati and P. K. Singh

Fig. 1 Main GUI of proposed work

5 Comparison See Table 1.

6 Conclusion and Future Work In our research, all constraints were fixed. It provided a road to traffic and speed obstacles. This method can be used as a reference when planning a city plan. By monitoring the construction of roads near these buildings (schools, hospitals, traffic lights, residential areas), you can reduce the risk of traffic congestion. For researchers and students of biochemistry and computer science, how to apply the features of our application will be helpful in their work similarities and learning. In addition, our applications are very useful for people interested in technologies and tools, such as GII, traffic analysis, web development, and various technologies and tools in the application. As described in the application, there are some issues that need further investigation and resolution. We need an improved algorithm to speed up long reaction times. In the future, we hope to maintain the application plan regularly, improve the procedure, and make our research available to the public in the future.

32 Analysis of Various Routing Protocol for VANET

321

Fig. 2 Calculation of shortest path for VANET

All testing was done in an artificial environment. This interface has only been tested on laptops, i-phones, and i-pads, and not on other mobile systems. The following results were obtained from the tests shown in the results. The number of obstacles for both factors and the weight of each obstacle affect the result of the shortest time. The obstacles in our research show how they drive cars, and although this provides less time for each other, actual traffic is more complicated than analyzing our research. In addition to these factors, other factors in our study are also important in terms of traffic and speed, but are difficult to apply to applications, for example, the number of vehicles and pedestrians. In fact, the root cause of road traffic is the result of lively behavior and interaction among many road users. In fact, it is impossible to calculate how many people or cars are driving on the street, it is difficult to collect dynamic data, and the main factor of traffic is because time and weather are different. But before that it may be difficult to estimate. In fact, it is impossible to find a method in the shortest time. This introduced a new approach of finding effective ways based on plant considerations. With large amounts of local data, researchers can more accurately select factors and weighting factors to better prevent results and make results more appropriate.

322

A. Bhati and P. K. Singh

Fig. 3 After the completion of shortest path (A result in shorted path in pop up box)

Fig. 4 A differential value of movement of car during the whole day

32 Analysis of Various Routing Protocol for VANET

323

Fig. 5 Decision of CAR when it drastically changes its movement long run up

Fig. 6 An operation point that has also come in pop up box that is speed time calculation of sensor node mounted on CAR

324

A. Bhati and P. K. Singh

Table 1 Comparison of individuals with proposed integrated system

A* star algorithm

TEEN algorithm

Integrated system (proposed)

Jamming information

No

Yes

Yes

Shortest path

Yes

No

Yes

Real-time information

No

Yes

Yes

Calculation speed

Very fast

Slow

Very fast

Platform

MATLAB

MATLAB

MATLAB

References 1. Ayyappan B, Kumar PM (2017) Security protocols in WSN: a survey. In: 2017 third international conference on science technology engineering & management (ICONSTEM). Chennai, pp 301–304. https://doi.org/10.1109/iconstem.2017.8261297 2. Ali S, Khan MA, Ahmad J, Malik AW, ur Rehman A (2018) Detection and prevention of Black Hole attacks in IOT & WSN. In: 2018 third international conference on fog and mobile edge computing (FMEC). Barcelona, pp 217–226. https://doi.org/10.1109/fmec.2018.8364068 3. Tian B, Hou KM, Shi H, Liu X, Chen Y, Chanet J-P, Application of modified RPL under VANET-WSN communication architecture. In: 2013 international conference, Blaise Pascal University Clermont-Ferrand, France, 978-0-7695-5004-6/13 © 2013 IEEE. https://doi.org/10. 1109/iccis.2013.387 4. Dak AY, Yahya S, Kassim M (2012) A literature survey on security challenges in VANETs. Int J Comput Theory Eng 4(6). Manuscript received August 1, 2012; revised September 2, 2012 5. Pathre A, Agrawal C, Jain A, A novel defense scheme against DDOS attack in VANET. 9781-4673-5999-3/13/$31.00 ©2013 IEEE 6. Castleman KR (2009) Digital image processing. Prentice-Hall, Englewood Cliffs, New Jersey 7. Kushwaha D, Shukla PK, Baraskar R (2014) A survey on sybil attack in vehicular ad-hoc network. Int J Comput Appl (0975–8887) 98(15) 8. Chadha D, Reena (2015) Vehicular ad hoc network (VANETs): a review. ISSN (Online): 23209801 ISSN (Print): 2320-9798. Int J Innov Res Comput Commun Eng (An ISO 3297: 2007 Certified Organization) 3(3) 9. Dudgeon DE, Mersereau RM (2014) Multidimensional digital signal processing. Prentice-Hall, Englewood Cliffs, New Jersey 10. Eze EC, Zhang S, Liu E (2014) Vehicular ad hoc networks (VANETs): current state, challenges, potentials and way forward. In: Proceedings of the 20th international conference on automation & computing. Cranfield University, Bedfordshire, UK, 12–13 September 2014 @ IEEE 11. Piran MJ, Rama Murthy G, Praveen Babu G, Ahvar E, Total GPS-free localization protocol for vehicular ad hoc and sensor networks (VASNET). In: 2011 third international conference on computational intelligence, modelling & simulation, 978-0-7695-4562-2/11 © 2011 IEEE. https://doi.org/10.1109/cimsim.2011.77 12. Oppenheim AV, Willsky AS, Young IT (2007) Systems and signals. Prentice-Hall, Englewood Cliffs, New Jersey 13. Qin R, Li Z, Wang Y, Lu X, Zhang W, Wang G, An integrated network of roadside sensors and vehicles for driving safety: concept, design and experiments. 2014 by US Government work not protected by U.S. copyright @ IEEE 14. Malebary S, Xu W (2015) A survey on jamming in VANET. Int J Sci Res Innov Technol 2(1):143

32 Analysis of Various Routing Protocol for VANET

325

15. Deshpande SG (2013) Classification of security attack in vehicular adhoc network: a survey. www.ijettcs.org 2(2):371 16. Raghuwanshi V, Jain S (2015) Denial of service attack in VANET: a survey. Int J Eng Trends Technol (IJETT) 28(1):15. ISSN: 2231-5381 http://www.ijettjournal.org 17. Vinh Hoa LA, Cavalli A (2014) Security attacks and solutions in vehicular ad hoc networks: a survey. Int J AdHoc Netw Syst (IJANS) 4(2). https://doi.org/10.5121/ijans.2014.4201

Correction to: Study on Recent Method of Dynamic Time-Based Encryption (DTBE) and Searchable Access Control Scheme for Personal Health Record in Cloud Computing E. V. N. Jyothi, V. Purna Chandra Rao and B. Rajani

Correction to: Chapter 28 in: V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_28 In the original version of the book, the chapter author name E.V.N. Jyothi is inadvertently published with spell error “E.V.N. Jyoti” and it has been corrected now. The chapter and book have been updated with the change.

The updated version of this chapter can be found at https://doi.org/10.1007/978-981-15-1632-0_28 © Springer Nature Singapore Pte Ltd. 2020 V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine Learning Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-1632-0_33

C1