Towards the Integration of IoT, Cloud and Big Data: Services, Applications and Standards (Studies in Big Data, 137) 9819960339, 9789819960330

This book discusses integration of internet of things (IoT), cloud computing, and big data. It presents a unique platfor

112 43 5MB

English Pages 166 [164] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Big Data, Cloud Computing and IoT 9781032284200, 9781032287430, 9781003298335, 103228420X

Cloud computing, the Internet of Things (IoT), and big data are three significant technological trends affecting the wor

532 130 44MB Read more

Big Data, Cloud Computing, and Data Science Engineering 3031196074, 9783031196072

This book presents scientific results of the 7th IEEE/ACIS International Conference on Big Data, Cloud Computing, Data S

1,482 262 6MB Read more

Multimedia Big Data Computing for IoT Applications 9789811387593

990 116 39MB Read more

AgroTech. AI, Big Data, IoT 9789811935541, 9789811935558

550 112 5MB Read more

Service-Oriented Computing and System Integration: Software, IoT, Big Data, and AI as Services [8 ed.] 9781792494178, 9781792494161

Service-Oriented Computing and System Integration: Software, IoT, Big Data, and AI as Services focuses on service-orient

422 77 76MB Read more

Software Engineering in IoT, Big Data, Cloud and Mobile Computing 9783030647735

854 114 37MB Read more

Big Data Analytics in Fog-Enabled IoT Networks: Towards a Privacy and Security Perspective 1032206446, 9781032206448

The integration of fog computing with the resource-limited Internet of Things (IoT) network formulates the concept of th

583 168 41MB Read more

Software Engineering in IoT, Big Data, Cloud and Mobile Computing 3030647722, 9783030647728

This edited book presents scientific results of the International Semi-Virtual Workshop on Software Engineering in IoT,

1,379 235 8MB Read more

Big Data Technologies and Applications 3030728013, 9783030728014

This book constitutes the refereed post-conference proceedings of the 10th International Conference on Big Data Technolo

126 49 15MB Read more

Credit Data and Scoring: The First Triumph of Big Data and Big Algorithms 0128188154, 9780128188156

Credit Data and Scoring: The First Triumph of Big Data and Big Algorithms illuminates the often-hidden practice of predi

2,861 358 4MB Read more

Towards the Integration of IoT, Cloud and Big Data: Services, Applications and Standards (Studies in Big Data, 137)
9819960339, 9789819960330

Author / Uploaded
Vinay Rishiwal (editor)
Pramod Kumar (editor)
Anuradha Tomar (editor)
Priyan Malarvizhi Kumar (editor)

Table of contents :
Preface
Contents
Editors and Contributors
Introduction to Big Data Analytics
1 Introduction to Big Data
2 The Distinction Between Small and Big Data
3 Classification of Big Data
4 Characteristics of Big Data
5 Who’s Generating Big Data?
6 Why Is Big Data Important?
7 Challenges in Big-Data
8 Big Data Applications
9 How Big Data Analysis Differs from Business Intelligence Analysis?
9.1 Business Intelligence
9.2 Big Data
9.3 Differences Between Business Intelligence (BI) and Big Data
10 The Analytical Lifestyle of Big Data
10.1 Phase 1: Discovery
10.2 Phase 2: Data Preparation
10.3 Phase 3: Model Planning
10.4 Phase 4: Model Building
10.5 Phase 5: Communicate Results
10.6 Phase 6: Operationalize
11 Big Data Analysis Necessitates a Set of Skills
12 Big Data Domain
13 Introduction to Big Data Analytics
14 Overview of the Hadoop Ecosystem
14.1 HDFS
14.2 YARN
14.3 MapReduce
14.4 Spark
15 Overview of Big Data Analysis and Its Need
16 Use Cases of Big Data Analytics
17 Challenges in Analyzing Big Data
18 Big Data Quality Dimensions
19 Conclusion
References
DCD_PREDICT: Using Big Data on Prediction for Chest Diseases by Applying Machine Learning Algorithms
1 Introduction
1.1 Introduction
1.2 Background
1.3 Objective
2 Literature Survey
2.1 Summary
3 System Design
3.1 Existing System
3.2 Identification of Common Risks
3.3 Types of Heart Diseases
3.4 Problem Statement
3.5 Scope
3.6 Proposed System
4 Methodology
4.1 Supervised Learning
4.2 Symptom-Based Questionnaire
4.3 Dataset Training and Testing
5 Process and Analysis
5.1 General Process
5.2 Use Case Diagram
5.3 Data Flow Diagram
5.4 System Flow
6 Implementation and Results
6.1 Details of Algorithms
6.2 Data Set and Its Parameters
6.3 Dataset Attributes
6.4 Execution and Screenshots
7 Conclusion and Future Scope
7.1 Conclusion
7.2 Future Scope
References
Design of Energy Efficient IoMT Electrocardiogram (ECG) Machine on 28 nm FPGA
1 Introduction
2 Background
3 Environmental Settings for Energy Efficient IoMT ECG Machine
4 Power Analysis of IoMT ECG Machine
5 Conclusion
References
Automatic Smart Irrigation Method for Agriculture Data
1 Introduction
2 Motivation
3 Contribution of the Chapter
4 Organization and Roadmap of the Article
5 Related Works
6 About the Dataset and Features
7 Methodology and Applied Algorithms
7.1 Data Processing
7.2 Machine Learning
8 Result and Analysis
9 Challenges in Proposed Work
10 Conclusion and Future Work
References
Artificial Intelligence Based Plant Disease Detection
1 Introduction
2 Motivation
3 Contribution of the Chapter
4 Organization of the Chapter
5 Literature Survey
6 Issues and Challenges
7 Methodology
7.1 Advantages of Using Convolution Neural Network (CNN)
7.2 Flow of the Models
7.3 Image Preprocessing
8 Performance Metrics
9 Result Analysis
10 Conclusion and Future Scope
References
IoT Equipped Intelligent Distributed Framework for Smart Healthcare Systems
1 Introduction
1.1 Internet of Things (IoT)
1.2 Smart Healthcare
1.3 DDBMS
1.4 Artificial Intelligence (AI)
1.5 Blockchain Technology
2 Security Issues in Smart Healthcare Systems
2.1 Communication Media
2.2 Topology Issues
2.3 Scalability
2.4 Mobility and Energy Constraints
2.5 Memory Constraints
2.6 Multi-protocol Network
2.7 Tamper Devices
3 Existing Healthcare Systems
4 Proposed Model
5 Results and Discussion
6 Conclusions
References
Adaptive Particle Swarm Optimization for Energy Minimization in Cloud: A Success History Based Approach
1 Introduction
2 Background and Related Work
3 Proposed Approach
4 Results and Discussions
5 Conclusion and Future Work
Appendix
References
Field Monitoring and Automation in Agriculture Using Internet of Things (IoT)
1 Introduction
2 Related Works
3 IoT Technologies for Field Monitoring in Agriculture
3.1 Drones in Agriculture
3.2 Remote Sensing in Agriculture
3.3 Computer Imaging in Agriculture
4 Proposed Automated System Model for Agricuture
4.1 Proposed System Block Diagram
5 Work Flow of System Model
5.1 Field Quality Analysis
5.2 Irrigation System
5.3 System Design
5.4 Irrigation System
6 Hardware Setup for Proposed System Model
7 Android Mobile Application for Monitoring the Work Flow
8 Getting Alerts for Motor On/Off via Mobile Application
9 Conclusion
References

Citation preview

Studies in Big Data 137

Vinay Rishiwal Pramod Kumar Anuradha Tomar Priyan Malarvizhi Kumar Editors

Towards the Integration of IoT, Cloud and Big Data Services, Applications and Standards

Studies in Big Data Volume 137

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Big Data” (SBD) publishes new developments and advances in the various areas of Big Data- quickly and with a high quality. The intent is to cover the theory, research, development, and applications of Big Data, as embedded in the fields of engineering, computer science, physics, economics and life sciences. The books of the series refer to the analysis and understanding of large, complex, and/or distributed data sets generated from recent digital sources coming from sensors or other physical instruments as well as simulations, crowd sourcing, social networks or other internet transactions, such as emails or video click streams and other. The series contains monographs, lecture notes and edited volumes in Big Data spanning the areas of computational intelligence including neural networks, evolutionary computation, soft computing, fuzzy systems, as well as artificial intelligence, data mining, modern statistics and Operations research, as well as self-organizing systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are reviewed in a single blind peer review process. Indexed by SCOPUS, EI Compendex, SCIMAGO and zbMATH. All books published in the series are submitted for consideration in Web of Science.

Vinay Rishiwal · Pramod Kumar · Anuradha Tomar · Priyan Malarvizhi Kumar Editors

Towards the Integration of IoT, Cloud and Big Data Services, Applications and Standards

Editors Vinay Rishiwal Department of CSIT Faculty of Engineering and Technology M.J.P. Rohilkhand University Bareilly, India Anuradha Tomar Netaji Subhas University of Technology New Delhi, India

Pramod Kumar Glocal University Saharanpur, Uttar Pradesh, India Priyan Malarvizhi Kumar Department of Data Science University of North Texas Denton, TX, USA

ISSN 2197-6503 ISSN 2197-6511 (electronic) Studies in Big Data ISBN 978-981-99-6033-0 ISBN 978-981-99-6034-7 (eBook) https://doi.org/10.1007/978-981-99-6034-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.

Preface

The rapid advancement of technology has led to the emergence of the Internet of Things (IoT), Cloud Computing, and Big Data as transformative forces in various industries. As these technologies continue to evolve, there is a growing need for their integration to unlock their full potential and enable the development of innovative services, applications, and standards. The integration of these three domains presents numerous challenges and opportunities. One of the key challenges is the efficient and secure management of the massive data generated by IoT devices, as well as the seamless integration of IoT devices with cloud-based infrastructure. This requires the development of scalable and robust architectures, protocols, and standards that enable interoperability, data sharing, and resource allocation across heterogeneous systems. Moreover, the integration of IoT, Cloud, and Big Data enables the creation of innovative services and applications. To achieve successful integration, the establishment of common standards is crucial. To summarise, it is the right time to explore the integration of IoT, Cloud, and Big Data, which holds immense potential to transform industries, enhance services, and enable data-driven decision-making. However, addressing the challenges related to data management, interoperability, and security is vital for successful integration. Moreover, the establishment of standards is crucial to facilitate seamless communication and collaboration between different systems. By leveraging the combined power of IoT, Cloud, and Big Data, organizations can unlock new possibilities and drive digital transformation in the era of interconnected and data-driven ecosystems. This book consists of eight chapters. The first chapter covers introduction to Big Data analysis and its need, skills required for Big Data analysis, characteristics of Big data analysis, an overview of the Hadoop ecosystem, and some use cases of Big Data analysis. The aim of the second chapter is to study and compare three of the most common classification methods, Support Vector Machines, K-Nearest Neighbours and Artificial Neural Networks, for heart disease prediction using the ensemble of standard Cleveland cardiology data. The objective of the third article is to reduce the energy consumption of the ECG machine. Authors in chapter four, have proposed a system to implement an automatic water supply to the farms based upon their crop, system that measures water level of soil and helps to decide to turn on or off the water v

vi

Preface

supply. Further, chapter five uses deep convolutional networks algorithms for leaf image classification to provide accurate results. The concept of Blockchain is used in chapter six with the aim to ensure the security of the patient’s medical records. Chapter seven offers SHA-PSO, a PSO-based meta-heuristic technique that schedules workloads among Virtual Machines (VM) to minimize energy. Authors in chapter eight have proposed design of field monitoring device using IoT in Agriculture. Bareilly, India Saharanpur, India New Delhi, India University of North Texas, USA

Vinay Rishiwal Pramod Kumar Anuradha Tomar Priyan Malarvizhi Kumar

Contents

Introduction to Big Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nitin Arora, Anupam Singh, Vivek Shahare, and Goutam Datta DCD_PREDICT: Using Big Data on Prediction for Chest Diseases by Applying Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . Umesh Kulkarni, Sushopti Gawade, Hemant Palivela, and Vikrant Agaskar Design of Energy Efficient IoMT Electrocardiogram (ECG) Machine on 28 nm FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pankaj Singh, Bishwajeet Pandey, Neema Bhandari, Shilpi Bisht, Neeraj Bisht, and Sandeep K. Budhani

1

19

43

Automatic Smart Irrigation Method for Agriculture Data . . . . . . . . . . . . . Rashmi Chaudhry, Vinay Rishiwal, Preeti Yadav, Kaustubh Ranjan Singh, and Mano Yadav

57

Artificial Intelligence Based Plant Disease Detection . . . . . . . . . . . . . . . . . . Vinay Rishiwal, Rashmi Chaudhry, Mano Yadav, Kaustubh Ranjan Singh, and Preeti Yadav

75

IoT Equipped Intelligent Distributed Framework for Smart Healthcare Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sita Rani, Meetali Chauhan, Aman Kataria, and Alex Khang

97

Adaptive Particle Swarm Optimization for Energy Minimization in Cloud: A Success History Based Approach . . . . . . . . . . . . . . . . . . . . . . . . 115 Vijay Kumar Sharma, Swati Sharma, Mukesh Rawat, and Ravi Prakash Field Monitoring and Automation in Agriculture Using Internet of Things (IoT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Ashendra Kumar Saxena, Rakesh Kumar Dwivedi, and Danilla Parygin

vii

Editors and Contributors

About the Editors Dr. Vinay Rishiwal Ph.D. is working as a Professor in the Department of Computer Science and Information Technology, Faculty of Engineering and Technology, MJP Rohilkhand University, Bareilly, Uttar Pradesh, India. He obtained B.Tech. degree in Computer Science and Engineering in the year 2000 from M.J.P. Rohilkhand University (SRMSCET), India and received his Ph.D. in Computer Science and Engineering from Gautam Buddha Technical University, Lucknow, India, in the year 2011. He has 23 years of experience into academics. He is a senior member of IEEE, ACM and worked as Convener, Student Activities Committee, IEEE Uttar Pradesh Section, India. He has published more than 90 research papers in various journals and conferences of international repute. He also has 20 patents into his credit. He is a General/Conference chair of four International Conferences namely ICACCA, IoT-SIU, MARC 2020 and ICAREMIT. He has received many awards as best paper/ research/orator at various platforms. Dr. Rishiwal has visited many countries for academic purposes and worked upon many projects of CST, UP Government, MHRD and UGC. His current research interest includes Wireless Sensor Networks, IoT, Cloud Computing, Social networks and Blockchain Technology. Prof. (Dr.) Pramod Kumar is an accomplished academic leader with over 24 years of experience in the field. He currently serves as the Dean of Academics at Glocal University in Saharanpur, UP, where he has been since September 2022. Prior to this, he held the position of Dean of Computer Science and Engineering at Krishna Engineering College in Ghaziabad and served as the director of Tula’s Institute in Dehradun, Uttarakhand. Prof. Pramod Kumar holds a Ph.D. in Computer Science and Engineering, which he earned in 2011, as well as an M.Tech in CSE from 2006. He is a Senior Member of IEEE and an Ex-Joint Secretary of the IEEE U.P. section. Through his research, he has made significant contributions to the fields of Computer Networks, IoT, and Machine Learning. He is the author or co-author of more than 70

ix

x

Editors and Contributors

research papers and has edited four books. He has also supervised and co-supervised several M.Tech. and Ph.D. students. Dr. Anuradha Tomar is currently working as an Assistant Professor in the Instrumentation & Control Engineering Division of Netaji Subhas University, Delhi, India. Dr. Tomar has completed her Postdoctoral research in EES, from Eindhoven University of Technology (TU/e), the Netherlands. She received her B.E Degree in Electronics Instrumentation & Control with Honours in the year 2007 from the University of Rajasthan, India. In the year 2009, she completed her M.Tech. Degree with Honours in Power Systems from the National Institute of Technology Hamirpur. She received her Ph.D. in Electrical Engineering, from the Indian Institute of Technology Delhi (IITD), India. Dr. Anuradha Tomar has committed her research work efforts towards the development of sustainable, energy-efficient solutions for the empowerment of society, and humankind. Her areas of research interest are the Operation & Control of Microgrids, Photovoltaic Systems, Renewable Energy based Rural Electrification, Congestion Management in LV Distribution Systems, Artificial Intelligent & Machine Learning Applications in Power Systems, Energy conservation, and Automation. Dr. Priyan Malarvizhi Kumar is presently employed as an Assistant Professor at the University of North Texas in the United States. Before joining this role, he served as an Assistant Professor in the Department of Computer and Information Science at Gannon University, USA. Prior to his tenure at Gannon University, he held the position of Assistant Professor in the Computer Science and Engineering Department at Kyung Hee University in South Korea. Additionally, he gained valuable experience as a Postdoctoral Research Fellow at Middlesex University in London, UK. Dr. Kumar earned his Ph.D. degree from Vellore Institute of Technology University. His academic journey also includes a Bachelor of Engineering degree from Anna University and a Master of Engineering degree from Vellore Institute of Technology University. Dr. Kumar’s current research focuses on areas such as Big Data Analytics, Internet of Things (IoT), Internet of Everything (IoE), and Internet of Vehicles (IoV) in the context of healthcare. He has authored and co-authored papers published in international journals and conferences, including those indexed by the Science Citation Index (SCI). He maintains a lifetime membership with the International Society for Infectious Disease, the Computer Society of India, and is an active member of the Vellore Institute of Technology Alumni Association.

Contributors Vikrant Agaskar Vidyavardhani College of Engineering and Technology, VasaiVirar, Maharashtra, India

Editors and Contributors

xi

Nitin Arora Electronics and Computer Discipline, Indian Institute of Technology, Roorkee, India Neema Bhandari Birla Institute of Applied Sciences, Bhimtal, Uttarakhand, India Neeraj Bisht Birla Institute of Applied Sciences, Bhimtal, Uttarakhand, India Shilpi Bisht Birla Institute of Applied Sciences, Bhimtal, Uttarakhand, India Sandeep K. Budhani Graphic Era Hill University, Bhimtal, Uttarakhand, India Rashmi Chaudhry Netaji Subhas University of Technology, Delhi, India Meetali Chauhan Department of Computer Science and Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Goutam Datta School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India Rakesh Kumar Dwivedi CCSIT, Teerthanker Mahaveer University, Moradabad, UP, India Sushopti Gawade Pillai College of Engineering, Panvel, India Aman Kataria Amity Institute of Defence Technology, Amity University, Noida, India Alex Khang GRITEx and VUST, Ho Chi Minh City, Vietnam Umesh Kulkarni Vidyalankar Institute of Technology Wadala, Mumbai, Maharashtra, India Hemant Palivela Manager-AI, Accenture Solutions, Mumbai, Maharashtra, India Bishwajeet Pandey Gyancity Lab, Guragaon, India Danilla Parygin Volgograd State Techincal University, Vogograd, Russia Ravi Prakash CSED, Motilal Nehru National Institute of Technology, Allahabad, India Sita Rani Department of Computer Science and Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Mukesh Rawat CSED, Meerut Institute of Engineering and Technology, Meerut, India Vinay Rishiwal MJP Rohilkhand University, Bareilly, India Ashendra Kumar Saxena CCSIT, Teerthanker Mahaveer University, Moradabad, UP, India Vivek Shahare Department of Computer Science and Engineering, Indian Institute of Technology, Dharwad, India Swati Sharma IT, Meerut Institute of Engineering and Technology, Meerut, India

xii

Editors and Contributors

Vijay Kumar Sharma CSED, Meerut Institute of Engineering and Technology, Meerut, India Anupam Singh Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India Kaustubh Ranjan Singh Delhi Technological University, Delhi, India Pankaj Singh Birla Institute of Applied Sciences, Bhimtal, Uttarakhand, India Mano Yadav Bareilly College Bareilly, Bareilly, India Preeti Yadav MJP Rohilkhand University, Bareilly, India

Introduction to Big Data Analytics Nitin Arora , Anupam Singh , Vivek Shahare , and Goutam Datta

Abstract Nowadays, social media and networks, scientific instruments, mobile devices, mobile devices, and a high volume of information data (tabular data, text files, images, videos, audio, logos, etc.) is generated at high velocity by social media and networks, scientific instruments, mobile devices, and sensors technology and networks. In these types of data, data quality is usually not guaranteed. This data can be structured or unstructured, necessitating a cost-effective, innovative method of data processing to improve understanding and decision-making. This chapter covers some introduction to Big Data analysis and its need, skills required for Big Data analysis, characteristics of Big data analysis, an overview of the Hadoop ecosystem, and some use cases of Big Data analysis. Keywords Big data · Hadoop ecosystem · Big data analysis · Business intelligence analysis · Big data domain · Big data quality · Dimensions

N. Arora (B) Electronics and Computer Discipline, Indian Institute of Technology, Roorkee, India e-mail: [email protected] A. Singh Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India e-mail: [email protected] V. Shahare Department of Computer Science and Engineering, Indian Institute of Technology, Dharwad, India e-mail: [email protected] G. Datta School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Rishiwal et al. (eds.), Towards the Integration of IoT, Cloud and Big Data, Studies in Big Data 137, https://doi.org/10.1007/978-981-99-6034-7_1

1

2

N. Arora et al.

Table 1 Characteristics of data Small data

Big data

Volume

Less than 1 TB

Greater than 1 TB

Velocity

Controlled and steady data flow

Enormous data flowing at shorter time frames

Variety

Structured and semi-structured. E.g., Xcel, Table, data, Json

Wide variety of data, i.e., tabular data, text files, images, videos, audios, logs, etc.

Veracity

It contains more quality data

The quality of data is rarely guaranteed

Value

Business Intelligence, analysis, and reporting

Complex data mining, predictions, pattern finding, etc.

Time variance Data represents the business value for the history of data as well as incremental Infrastructure

More defined resources allocation

At times, history data becomes irrelevant for analyzing business insights The Load-on system varies a lot

1 Introduction to Big Data Big Data is a phrase that relates to a collection of vast and complex data sets that are challenging to store and analyze using standard data processing methods. Big data refers to data assets with a large volume, great velocity, and great diversity that necessitate cost-effective, creative data processing to improve insight and decisionmaking [1].

2 The Distinction Between Small and Big Data There are several distinctions between small data and big data. These distinctions include volume, velocity, variety, veracity, value, time variance, and infrastructure [2]. Table 1 summarizes all the differences.

3 Classification of Big Data Big data is classified as [3] – Structured Data: Structured data has a well-defined format. It can be readily stored in tabular form in relational databases such as MySQL and Oracle. – Semi-Structured Data: Semi-structured data has some structure but can’t be recorded in a tabular format in relational databases. XML files, JSON documents, e-mail messages, and so forth are examples.

Introduction to Big Data Analytics

3

– Unstructured Data: Unstructured data has no structure and cannot be saved in tabular form in relational databases. Examples include video, audio, text, and machine-produced data.

4 Characteristics of Big Data Big data has many characteristics. Some of them are: [4] – Volume: The term “volume” refers to the “quantity of data,” which rapidly increases daily. Humans, technology, and their interactions on social media create enormous amounts of data. – Variety: Because so many sources contribute to Big Data, the types of data they generate are diverse. It might be organized, semi-organized, or unorganized. Many different forms of data can be generated/collected by a single application. All of these forms of data must be connected to extract knowledge. – Velocity: Velocity refers to stream of data that arrives from different social media sites continuously, and the repository gets completed with new data at the same rate. It becomes a challenge to capture this stream of data promptly for further processing. – Veracity: The term “veracity” alludes to the data’s unreliability. Data inconsistency and incompleteness create uncertainty in the data supplied. Many extensive data types, such as Twitter postings with hashtags, abbreviations, typos, and colloquial speech, have less controlled quality and accuracy. – Value: It’s great to access massive data, but it isn’t sensible unless we can transform it into practice. – Variability: Diversity and variation are not the same things. Even though a coffee shop may offer six different coffee blends, diversity only exists when you consistently obtain and enjoy the same blend. The same is true for data; if the meaning changes frequently, it can significantly affect the homogeneity of the data. – Visualization: Using charts and graphs to represent vast quantities of accurate data is far more successful than using spreadsheets.

5 Who’s Generating Big Data? The capacity to acquire data no longer limits development and creativity. However, the capacity to organize, analyze, summarise, display, and find information from acquired data in a timely and scalable manner is critical.

4

N. Arora et al.

6 Why Is Big Data Important? Companies obtain a complete knowledge of their company, consumers, products, and rivals if big data is gathered, processed, and analyzed properly and efficiently. It results in enhanced efficiency, more sales, cheaper expenses, better customer service, and better goods and services. Sensors are embedded in manufacturing items to provide a stream of telemetry. Retailers frequently know who buys their products. To determine who didn’t buy and why, businesses can leverage social media and blog data from their e-commerce sites, which is the knowledge they don’t have today. Using extensive historical call center data more rapidly enhances customer engagement and satisfaction. Use of social media material is encouraged to better and more rapidly assess consumer sentiment about you/your customers and enhance goods, services, and customer interactions [5].

7 Challenges in Big-Data Big data size is enormous, and this data can be structured or unstructured. There are many challenges in this and are discussed below [6]. – Volume: Thanks to new data sources that are developing, the volume of data, particularly machine-generated data, is expanding, as is the rate at which it expands each year. For example, the world’s data storage capacity was 800,000 petabytes (PB) in 2000 and is anticipated to reach 35 zettabytes by 2020. – Variety and the use of many data sets: Unstructured data makes up greater than 80% of today’s data. Most of this data is so vast for effective management. – Velocity: As organizations realize the benefits of analytics, they face a problem: they want the data sooner, or in other words, they want real-time analytics. – Veracity, Data Quality, Data Availability – Data Discovery: Finding high-quality data from the massive amounts of data available on the Internet is a significant problem. – Relevance and Quality: It’s tough to determine data sets’ quality and relevance to specific requirements. – Personally Identifiable Information: A lot of this data is about people. This necessitates, in part, efficient industrial processes. “It partly asks for efficient government monitoring. Partly-perhaps even entirely-it necessitates a severe rethinking of what privacy truly entails.” – Process Challenges: Finding the appropriate analysis model may take a lot of time and effort; thus, the ability to cycle quickly and ‘fail fast’ through many (perhaps throwaway) models is crucial. – Management Challenges: Sensitive data, such as personal information, is found in many warehouses. Accessing such data raises legal and ethical problems. As a result, the data must be secured, access restricted, and audited.

Introduction to Big Data Analytics

5

8 Big Data Applications There are many of data in today’s environment. Big businesses use these data to expand their operations [7]. In a variety of circumstances, such as those outlined below: – Customer Spending Habits and Shopping Patterns: Management teams at large retail stores keep customer spending habits, purchasing behavior, and customers’ most loved products. Based on which product is most searched/sold, that product’s production/collection rate is fixed. Banking companies utilize information about their customers’ purchasing habits to offer customers who want to buy a particular product a discount or cashback using their bank’s credit or debit card. They will send the appropriate offer to the right individual at the right time [8]. – Recommendation: Large retailers provide custom recommendations based on spending and buying patterns. E-commerce platforms offer product suggestions. They keep track of the products customers are interested in and propose them based on that data [9]. – Smart Traffic System: Data on the traffic state on various roads were obtained using a camera stationed alongside the road, at the city’s entry and departure points, and a GPS device installed in the car. This information is examined, and the least time-consuming or jam-free routes are advised. Big data analysis may create an intelligent traffic system in the city. Another advantage is that fuel usage may be lowered [10]. – Auto Driving Car: Without human interpretation, a car can be driven, thanks to big data analysis. A sensor is installed in various places around the vehicle to gather information on the size of the neighbouring car, barriers, distance from the camera, and other things. Numerous computations are made based on these data, including how many rotational angles to utilize, what speed to employ, when to halt, etc. These calculations facilitate the automatic performance of activities [11]. – Media and Entertainment Sector: Companies that offer media and entertainment services, including Spotify, Amazon Prime, and Netflix, analyze subscriber data. To develop the following business strategy, information is acquired and assessed about video, music, and the number of time the users spend on the website. – Education Sector: Online education is highly impacted with usage of Big data. An online or offline course provider company will market their course online to someone looking for a YouTube tutorial video on a topic [12]. – IoT: IoT sensors are installed in equipment by manufacturing companies to collect operational data. By analyzing this data, it is possible to anticipate how long a machine will run without issue until it has to be repaired, allowing the firm to take action before the equipment develops several problems or fails. As a result, the cost of replacing the entire equipment can be reduced. Big data is making a significant impact in healthcare [13]. Patient experiences are collected using a big data platform and clinicians to improve treatment. An IoT gadget can

6

N. Arora et al.

detect a sign of a potentially fatal disease in the human body and prevent it from receiving treatment in advance. IoT sensor installed nearpatients and newborn infant continuously monitors various health conditions such as heart rate, blood pressure, etc. When any parameter exceeds the safe limit, an alarm is transmitted to a doctor, who can take action remotely. – Energy Sector: Every 15 min, a smart electric meter reads the used power and sends it to a server, where the data is evaluated, and the time of the day when the city’s power load is lowest may be determined. Using this technology, a manufacturing company or a housekeeper may be advised to use their heavy machines at night when the power load is lower, resulting in lower electricity bills. – Secure Air Traffic System: Numerous locations along the flight route have sensors (propellers). These sensors keep track of environmental variables such as temperature, humidity, and flying speed. Based on this data analysis, the environmental parameter is built up and changed while in flight. Studying the flight’s machine-generated data may calculate how long a machine will perform flawlessly after being replaced/repaired [14].

9 How Big Data Analysis Differs from Business Intelligence Analysis? 9.1 Business Intelligence We are analyzing the data to improve decision-making and gain a competitive advantage. Business intelligence refers to a group of tools that offers quick access to datadriven insights into an organization’s growth and development—BI’s open-source tools a rebirth, JasperReport, KNIME, etc.

9.2 Big Data Large amounts of organized and unstructured data are generated and sent fast from various sources. Big data refers to massive, varied amounts of data increasing at a high-speed rate. There are three fundamental pillars on which big data depends. Data volume, creation speed, velocity, and the variety or scope of data points are all factors to consider. The data variety may be structured, semi-structured, or unstructured. Some available tools like Hadoop, Apache Spark, Cassandra, etc., are available to deal with all types of data.

Introduction to Big Data Analytics

7

9.3 Differences Between Business Intelligence (BI) and Big Data – BI aims to help firms make improved decisions. Business intelligence supports the delivery of credible information by extracting data directly from the data source. In contrast, Big Data’s main aim is to capture, process, and analyze structured and unstructured data to improve consumer results. – Localization intelligence and what if analysis are some applications of BI. Variety, Volume, Variability, Veracity, and Velocity, on the other hand, are characteristics that better explain extensive data. – Big Data results can handle historical data and data generated in real-time, whereas Business Intelligence handles only historical data sets.

10 The Analytical Lifestyle of Big Data To depict a simple project, the cycle is iterative. Figure 1 shows the different phases involved in an analytical lifestyle of Big Data. A gradual approach is needed to organize the actions and procedures involved with repurposing, collecting, analyzing, and processing data to address the specific requirements for performing Big Data analysis.

10.1 Phase 1: Discovery – – – –

The data science team researches and learns about the issue. Creates a sense of context and understanding. Researches the data sources necessary for the project which will be available. The team builds an initial hypothesis which is tested later with data.

10.2 Phase 2: Data Preparation – Data must be examined, pre-processed, and conditioned before modeling and analysis. – Data transformation, loading, and execution into an analytical sandbox are all necessary for data preparation. – Tasks for data preparation may be repeated frequently and in an arbitrary manner. – At this stage, several technologies are used, including Hadoop, Alpine Miner, Open Refine, and others.

8

N. Arora et al.

Fig. 1 Analytical lifestyle of big data

10.3 Phase 3: Model Planning – After executing the model, the team must compare the established success and failure criteria. – The data science team produces data sets for training, testing, and production during this phase. – The team builds and executes the models based on work done during this.

10.4 Phase 4: Model Building – Datasets are created for testing, training, and production by the team. – The team also determines if its present tools are adequate for running the models or whether a more stable environment is necessary. – Open-source software includes R, PL/Rand, and WEKA.

Introduction to Big Data Analytics

9

10.5 Phase 5: Communicate Results – After executing the model, the team must assess the findings against the success and failure criteria. – The team assesses the best methods for informing various team members and stakeholders of the results and conclusions while considering justification warnings and assumptions. – The business value should be quantified, and a narrative should be established to summarize and explain findings to stakeholders.

10.6 Phase 6: Operationalize – The team conveys the benefits of the projects to a broader audience. – It creates a pilot project to deploy work in a controlled fashion before extending it to the whole organization. – With this approach, the team may test the model’s capabilities and constraints in a real-world setting before deploying it. – The team delivers final reports, briefings, and codes. – Octave, WEKA, and SQL are examples of open-source software.

11 Big Data Analysis Necessitates a Set of Skills – Problem-solving abilities can go a long way in the age of Big Data. Because of its unstructured data, Big Data is considered a risk. Someone who enjoys solving problems is the best candidate for working in Big Data. Their ingenuity and originality will aid them in developing a better solution to an issue they have discovered. – SQL serves as a foundation in the Big Data era. SQL is a data-centric programming language. While working with Big Data buzzwords like NoSQL, knowing SQL can benefit a programmer in dealing with high-dimensional data sets. – Utilizing as many big data tools and technologies as possible, including R, SAS, Scala, Hadoop, Linux, MatLab, SQL, Excel, SPSS, etc., is often preferred. The demand for professionals with strong programming and statistical knowledge has surged.

10

N. Arora et al.

12 Big Data Domain Things connected and constantly delivering data to a system generate data, which might be semi-structured, structured, or unstructured. The best examples are your mobile devices, from which Telecom Operators receive a massive amount of data from each cellular network and analyze it. Bioinformatics, the Internet-of-Things, Cyber-Physical Systems, and Social Media are just a few fields that use Big Data to look at trends and behavior for their purposes. Modern search engines, such as Google, are based on Big data, used to obtain information using information retrieval techniques and logic. Furthermore, you may argue that the World Wide Web is the most important realm of Big Data.

13 Introduction to Big Data Analytics Big Data analytics has become a first-class citizen of daily life. It involves a process of continual discovery using practical analytic tools to find correlations, hidden patterns, and various other insights from big data. This includes data of any source, structure, and size. Insights can be discovered more quickly and efficiently, resulting in immediate business decisions that decide a winner [15]. The rise of big data, which began in the 1990s, prompted the development of big data analytics. At the advent of the computer age, corporations employed enormous spreadsheets to analyze information and look for trends. New data sources helped boost the volume of data generated in the late 1990s and early 2000s. Due to the widespread use of mobile devices and search engines, more data was generated than any organization could handle. Another factor to consider was speed. The more data generated, the more and more data need to be processed. Gartner defined this phenomenon as the “3Vs” of data in 2005: volume, velocity, and variety. Anyone who feels it boring to deal with the vast amounts of raw and unstructured data could unlock a coffer of unseen facts about business operations, consumer behavior, population changes, and natural phenomena. Conventional data warehouses and relational databases were incapable of completing the task. So it required Innovation. Therefore, Hadoop came into existence. Yahoo engineers created it in 2006 and released it as an Apache open source project in 2007. Thanks to the distributed processing framework, big data applications could now run on a clustered platform. Distributed processing is the critical distinction between traditional and big data analytics. Only big corporations such as Facebook and Google took extensive data analysis. But then, in the 2010s, banks, retailers, healthcare, and manufacturing organizations saw the value in big data analytics companies. At first, big organizations with onpremises data stores were best suited to gathering and analyzing large data sets. However, Amazon Web Services (AWS), Microsoft Azure, and many other cloud platform providers, on the other hand, give ease for any company to utilize a big data analytics platform. The option to set up Hadoop clusters over the cloud allowed any

Introduction to Big Data Analytics

11

company to start and run just what they needed on-demand, irrespective of its size. This provides flexibility in the usage of clusters. A big data analytics environment is a critical component of adaptability, which is required for today’s businesses to succeed [16].

14 Overview of the Hadoop Ecosystem The Hadoop Ecosystem is a platform or framework for addressing significant data issues and considering it as a package containing various services. That includes storing, ingesting, analyzing, and maintaining the data. Hadoop is a platform for storing Big Data in a distributed ecosystem that may be analyzed in parallel. Hadoop consists primarily of two parts: The first is Hadoop distributed file system (HDFS), which allows you to store data in several formats throughout a node. The second is Yet another resource negotiator (YARN), which Hadoop utilizes to manage the resource. It allows the concurrent processing of data stored throughout HDFS [17]. Figure 2 shows the Hadoop ecosystem, which has various components that combine to form an ecosystem.

14.1 HDFS HDFS creates abstraction. HDFS is logically a single unit for storing Big Data. Similar to virtualization, the actual data is distributed among numerous nodes. HDFS has a master–slave architecture. In HDFS, the primary node is Name-node, while the enslaved people are Data-nodes. Name-node holds metadata about data stored in

Fig. 2 Hadoop ecosystem [18]

12

N. Arora et al.

Data-nodes, like which data block is saved in which data node, how many replications of the data block are retained, etc. Data nodes are where the actual data is kept.

14.2 YARN Yet another resource negotiator (YARN) handles all data processing duties. These duties mainly allocate resources by the manager and schedule tasks. The Resource Manager and the Node Manager are the two primary components of YARN. The Resource Manager plays the role of a controller node. It accepts processing requests and then forwards them to the corresponding Node Managers. Node managers are responsible for the actual processing that takes place. Every Data-node has a Node Manager installed. It is in charge of completing the task on each Data-node.

14.3 MapReduce A MapReduce task separates the input data into fragments processed by the map jobs in parallel. The framework sorts the output of the map tasks before being given to the reduced tasks. HDFS stores the data from the job’s input and output. The framework handles task monitoring, scheduling, and re-execution. The MR framework and HDFS run on the same nodes; hence the compute and storage nodes are usually the same. This configuration enables the framework to efficiently schedule jobs on the data nodes, resulting in high aggregate bandwidth throughout the cluster. A Resource Manager (master), Node Manager (enslaved person), each cluster node, and MR AppMaster per application make up the MapReduce framework. A MapReduce framework is composed of four steps including map, shuffle, sort and reduce.

14.4 Spark Spark Programming is a cluster computing platform that is both for general-purpose and is quicker. It can handle a wide range of data types, and more importantly, it is a free and open-source data processing engine. This reveals development APIs that qualify analytical professionals for streaming machine learning or SQL workloads requiring frequent access to real-time data sets. Spark can handle both stream and batch processing.

Introduction to Big Data Analytics

13

15 Overview of Big Data Analysis and Its Need Big data analytics is processing large amounts of data efficiently using technologies. This is mainly used for decision-making, which requires individual intellectual capabilities and collective knowledge. Businesses usually look forward to storing business data history to get meaningful results for new insights to grow the business. As a result, extensive data analysis needs technical Innovation and data science expertise. Models for extensive data analysis were investigated and utilized to design a general conceptual architecture to make things more transparent. Following are examples of the need for Big Data Analytics: 1 Business decisions: online retail companies like Amazon look forward to making decisions based on past’ Prime day’ sales and consider the best-selling items to be repeated for the next sale. 2 Insight into data and business: A company located in multiple locations using their sales data can get an insight into which location has maximum sales for the last financial year. 3 Interpretation of outcomes: The data can be estimated in the nearest time range based on pattern-based analysis. 4 Descriptive: Graphical representation of data can show business behavior. 5 Predictive analytics: Using mathematical and scientific techniques applied to historical data, future data can be predicted with appropriate variables to a certain confidence level.

16 Use Cases of Big Data Analytics As per industry standards, big data broadly consists of three Vs. The three V’s are as follows: Volume: The term “volume” refers to the “quantity of data,” which rapidly increases daily. Humans, technology, and their interactions on social media create enormous amounts of data. Velocity: Velocity refers stream of data that arrives from different social media sites continuously, and the repository gets completed with new data at the same rate. It becomes a challenge to capture this stream of data promptly for further processing. Variety: There is a variety of data coming from various sources. The repository stores this data in different file formats spreadsheet, text files, e-mails, image files, video files, etc. Some of the use cases of Big data are as follows: 1. Fraud detection in Financial Organization: Recently, the headlines recently found credit and debit card fraud involving millions of people. Several consumers discovered fraud activity associated with their accounts. With big data and

14

2.

3.

4.

5.

6.

7.

N. Arora et al.

machine learning, this could have been minimized. Based on machine learning analysis, banks can learn about a customer’s typical activities and transactions. And if they notice any suspicious conduct, they can quickly block the customer’s card or account and notify them. Banks have begun to use Big Data to study market and consumer behavior, but more work still needs to be done. Big data in health care: Healthcare businesses are being used to enhance profitability and save lives. Healthcare firms, hospitals, and researchers collect massive volumes of data. However, none of this information is helpful on its own. When the data is evaluated, it becomes critical to highlight trends and threats in patterns and construct prediction models. This data can also be used for classification purposes, for example, COVID-19 data as presented in [19, 20]. Big data in the telecom sector: Telecom operators use big data analytics to gain a more comprehensive perspective of their operations and consumers and accelerate innovation initiatives. Big data in the Oil and Gas sector: This sector has been using big data to find new ways to innovate for the last few years. Data sensors have long been used in the oil and gas industry to track and monitor the performance of wells, gear, and activities. Oil and gas corporations have used this information to track healthy activity, develop Earth models to discover new oil sources, and perform other value-added operations. Log data Analytics in business: Many commercial big data applications rely on log data as a foundation. Long before big data, there were log management and analysis tools. However, as business activity and transactions rise exponentially, storing, processing, and presenting log data most efficiently and cost-effectively can become a significant burden. In this context, big data analytics play a significant role because of some synergy found in log data search and big data analytics discovering by industries. Big Data Analytics in Recruitment: In the rush to place applicants as rapidly as possible in a competitive climate, recruiters frequently believe they lack the (proper) tools. Recruiters nowadays use a new technique that performs mining of internal database with candidates’ overall skill sets such as educational background, certification is done, job title applied for, skill sets, years of experience, and so forth. Then this mined result is matched and compared with previous recruitment candidates’ performance, salaries, and overall past recruitment experience. The traditional approach of matching keywords with the job description is no longer efficient in today’s scenario, where big data analytics has significantly changed the paradigm in different industry verticals. Figure 3 shows the steps involved in the recruitment process using Big data analytics. Big Data Analytics in Natural Language Processing (NLP): In NLP, the computer processes languages before feeding them to the model for training [21]. Various linguistic features are being considered during processing. We find many important use cases of NLP in different industry verticals. Sentiment analysis of customers is one of the essential applications of natural language processing used by several companies. They analyze customers’ sentiment by capturing continuous streaming data, where customers’ feedback on any particular product is

Introduction to Big Data Analytics

15

Fig. 3 Big data analytics in recruitment

positive, negative, and neutral. The company subsequently analyzes these textual sentiment documents to improve its product further. One of the essential use cases in the banking sector is a chatbot that primarily solves the customer service officer’s job/responsibility. Chatbot process all textual data on a real-time basis, and matching is done with the existing huge NLP database (corpus). It then tries to respond to the user’s query—another critical use case of NLP, i.e., Machine Translation (MT) system. Machine translation translates source language to target language. The source is one language, e.g., English, and the target is another language, e.g., Hindi. We call it a bilingual MT system if it translates from one language to another. We use neural-based translation known as the Neural Machine Translation (NMT) system. NMT’s latest NLP models are used in its language model. In NMT, since it uses a deep neural network, we need a massive amount of parallel corpus to train our model. Model performance can be measured with automatic metrics such as BLEU, METEOR, etc. Researchers have been researching performance evaluation of MT/NMT systems with various automatic metrics, and evaluated outcomes computed by different metrics are compared. 8. Blockchains aren’t efficient for storing large file sizes: Large file sizes are inefficiently stored on blockchains [12]. Storing vast volumes of data on a public blockchain is expensive and time-consuming. Storing data on-chain isn’t a very scalable or efficient option for anything other than primary ledger data and associated hashes. Each transaction may add up to thousands of dollars per terabyte on the chain, plus costs each time you wish to access that data. It also consumes time, such as minutes per megabyte, that SLAs cannot afford. As a result, blockchains are almost entirely reliant on off-chain storage.

16

N. Arora et al.

17 Challenges in Analyzing Big Data The fundamental issue is that most firms can’t keep up with the available data and data sources. Big data has created several challenges in collecting and storing many correct streaming data sources for correct analysis. Most of the big data technology we use is obsolete. Sometimes even the tools are also not able to provide a satisfactory solution. Hence, it is necessary for organizations to upgrade/replace their existing system. Some of the significant challenges in analyzing Big Data are as follows: – Lack of data science skills: There is a substantial relevant skill shortage in the data scientist community. It is a considerable challenge to minimize this gap. It is also an issue of educating people on using big data analytics. Instead, many other technical issues require addressing, so it will take longer to close this gap. – Lack of proper data visualization is often disregarded when interesting and relevant data is mixed with ordinary or irrelevant discoveries. In other cases, team members and even seasoned data scientists often fail to present data in a meaningful and visually appealing manner due to a lack of skill. Consequently, sometimes they may ignore/miss the most relevant and meaningful data. – Lack of proper data transformation demands proper transformation when we need to get or extract insight/value from data. Since data size is too significant and data formats are not fixed, proper and correct transformation is a big challenge for data engineers. Data engineers are responsible for converting this data into an analytics-ready form, i.e., which analytics team members can use. Data engineers must only depend on rudimentary and code-heavy technologies during this transformation process. Hence it sometimes becomes a significant challenge to transform data as per requirement.

18 Big Data Quality Dimensions The study of data quality in extensive data systems is still in its infancy. Most research on big data quality acknowledges the relevance of standard dimensions in measuring big data quality. Some critical quality dimensions of big data are accessibility, confidentiality, redundancy, volume, etc. In Table 2, we have represented most of the critical quality dimensions of Big data and their purpose.

19 Conclusion Big Data Analytics plays a vital role in today’s world. All businesses carry vast amounts of data with them, which can be used to uplift their future growth with the help of Big Data Analytics and its tools. Big Data Analytics helps the company predict future trends from past data using the Hadoop ecosystem, which eventually

Introduction to Big Data Analytics

17

Table 2 Critical key quality dimensions of Big data and purpose Big data key quality dimension

Purpose

Accessibility

Accessibility and availability are the ability of a person to get data from his physical status and available technology

Confidentiality

This quality factor determines if the correct data is in the hands of the correct people. Is the information safe?

Pedigree

This dimension aids in determining the data’s source, allowing any inconsistencies to be rectified in the source rather than elsewhere

Readability

This dimension, also known as clarity, simplicity, ease of understanding, interpretability, and comprehensibility, relates to the consumers’ ability to grasp/understand data

Redundancy

Redundancy, minimality, compactness, and conciseness refer to the capacity to portray a reality of interest with the least amount of information resources

Volume

The proportion of values present in the examined Data Object concerning the source from which it is derived is provided by this quality dimension

enhances the organization’s profitability. The discussion presented in this chapter gives a clear insight into Big Data analysis and the critical differences between Big data analysis and business intelligence analysis. It explains the analytical life cycle of big data. The skills required to deal with big data analysis are highlighted, and the extensive data domain is depicted. We have also discussed how big data analytics can be exploited in decision-making by different industries such as recruitment/HR of the company, oil and gas sector, healthcare, sentiment analysis, and so on. Some significant challenges in big data analytics, such as lack of proper skill, issues during the data transformation process, big data quality dimension, etc., are also discussed.

References 1. Lazer, D., Radford, J.: Data ex machina: introduction to big data. Ann. Rev. Sociol. 43, 19–39 (2017) 2. Kitchin, R., Lauriault, T.P.: Small data in the era of big data. GeoJournal 80(4), 463–475 (2015) 3. Fernández, A., del Río, S., Chawla, N.V., Herrera, F.: An insight into imbalanced big data classification: outcomes and challenges. Complex Intell. Syst. 3(2), 105–120 (2017) 4. G’eczy, P.: Big data characteristics. Macro Theme Rev. 3(6), 94–104 (2014) 5. Ansari, S., Mohanlal, R., Poncela, J., Ansari, A., Mohanlal, K.: Importance of big data. In: Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence, pp. 1–19. IGI Global (2015) 6. Fan, J., Han, F., Liu, H.: Challenges of big data analysis. Natl. Sci. Rev. 1(2), 293–314 (2014) 7. Al Nuaimi, E., Al Neyadi, H., Mohamed, N., Al-Jaroodi, J.: Applications of big data to smart cities. J. Internet Serv. Appl. 6(1), 1–15 (2015) 8. Aloysius, J.A., Hoehle, H., Goodarzi, S., Venkatesh, V.: Big data initiatives in retail environments: linking service process perceptions to shopping outcomes. Ann. Oper. Res. 270(1), 25–51 (2018)

18

N. Arora et al.

9. Verma, J.P., Patel, B., Patel, A.: Big data analysis: recommendation system with hadoop framework. In: 2015 IEEE International Conference on Computational Intelligence & Communication Technology, pp. 92–97. IEEE (2015) 10. Rizwan, P., Suresh, K., Babu, M.R.: Real-time smart traffic management system for smart cities by using Internet of things and big data. In: 2016 International Conference on Emerging Technological Trends (ICETT), pp. 1–7. IEEE (2016) 11. Fathi, F., Abghour, N., Ouzzif, M.: From big data to better behavior in self-driving cars. In: Proceedings of the 2018 2nd International Conference on Cloud and Big Data Computing, pp. 42–46 (2018) 12. Daniel, B.K.: Big data in higher education: the big picture. In: Big Data and Learning Analytics in Higher Education, pp. 19–28. Springer (2017) 13. Mahapatra, S., Singh, A.: Application of IoT-based smart devices in health care using fog computing. In: Fog Data Analytics for IoT Applications, pp. 263–278. Springer (2020). 14. Singh, A., Mahapatra, S.: Network-based applications of multimedia big data computing in iot environment. In: Multimedia Big Data Computing for IoT Applications, pp. 435–452. Springer (2020). 15. Kannan, S., Karuppusamy, S., Nedunchezhian, A., Venkateshan, P., Wang, P., Bojja, N., Kejariwal, A.: Chapter 3 - Big data analytics for social media. In: Buyya, R., Calheiros, R.N., Dastjerdi, A.V. (eds.) Big Data, pp. 63–94. Morgan Kaufmann (2016). 16. Tsai, C.W., Lai, C.F., Chao, H.C., Vasilakos, A.V.: Big data analytics: a survey. J. Big Data 2(1), 1–32 (2015) 17. Landset, S., Khoshgoftaar, T.M., Richter, A.N., Hasanin, T.: A survey of open source tools for machine learning with big data in the hadoop ecosystem. J. Big Data 2(1), 1–36 (2015) 18. Monteith, J.Y., McGregor, J.D., Ingram, J.E.: Hadoop and its evolving ecosystem. In: 5th International Workshop on Software Ecosystems (IWSECO 2013), vol. 50, p. 74. Citeseer (2013) 19. Goyal, L., Arora, N.: Deep transfer learning approach for detection of covid-19 from chest x-ray images. Int. J. Comput. Appl. 975, 8887 (2020) 20. Kakde, A., Sharma, D., Arora, N.: Optimal classification of covid-19: a transfer learning approach. Int. J. Comput. Appl. 176(20), 25–31 (2020) 21. Datta, G., Joshi, N., Gupta, K.: Empirical analysis of performance of MT systems and its metrics for English to Bengali: a black box-based approach. In: Intelligent Systems, Technologies and Applications, pp. 357–371. Springer (2021) 22. Sharma, A., Tiwari, S., Arora, N., Sharma, S.C.: Introduction to blockchain. In: Blockchain Applications in IoT Ecosystem, pp. 1–14. Springer (2021)

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases by Applying Machine Learning Algorithms Umesh Kulkarni, Sushopti Gawade, Hemant Palivela, and Vikrant Agaskar

Abstract Technology with system learning algorithms is frequently utilized in the medical domains to estimate disorders. Through the provision of some reference guidelines, it assists in the real-world diagnosis of diseases. The DCD-PREDICT system employs system learning to make prophetic diagnosis of diseases of the chest, including lung cancer, asthma, COPD, pneumonia, and tuberculosis. A questionnaire will be provided to each participant (self-administered and physicianadministered). Understanding, specificity, and positive and negative analytical values will be computed for each question, and the combined patient scores will be contrasted with those of controls. It will be determined how closely the physicianand self-administered questionnaires agree. This enables medical professionals to do better differentiated analysis earlier, lowering errors and delivering timely treatment. One of the main causes of death can be the heart disease. Because real-world practitioners lack the necessary knowledge, expertise, or experience regarding the signs of heart failure, it is challenging to diagnose the disease. Therefore, computerbased predictions of cardiac illness may be crucial as an early diagnosis to take the appropriate actions as well as a perspective on recovery. However, by choosing the right data mining classification algorithm, the early stages of the disease and its recurrence can be accurately predicted. The aim of this study was to compare three of the most common classification methods, Support Vector Machines (SVM), KNearest Neighbors (KNN) and Artificial Neural Networks (ANN), for heart disease prediction using the ensemble of standard Cleveland cardiology data. U. Kulkarni (B) Vidyalankar Institute of Technology Wadala, Mumbai, Maharashtra, India e-mail: [email protected] S. Gawade Pillai College of Engineering, Panvel, India e-mail: [email protected] H. Palivela Manager-AI, Accenture Solutions, Mumbai, Maharashtra, India V. Agaskar Vidyavardhani College of Engineering and Technology, Vasai Road, Vasai-Virar, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Rishiwal et al. (eds.), Towards the Integration of IoT, Cloud and Big Data, Studies in Big Data 137, https://doi.org/10.1007/978-981-99-6034-7_2

19

20

U. Kulkarni et al.

Keywords Prediction · Classification (SVM · KNN) · Machine learning · Artificial neural network · Heart diseases

1 Introduction 1.1 Introduction Numerous disorders related to the chest affect people. Diseases like asthma, COPD, pneumonia, tuberculosis, and others have symptoms that demonstrate their presence. These symptoms, which can occur in a number of settings while people are going about their regular lives, include shortness of breath, chest symptoms, throat and chest coughs, among others. In order to identify which chest ailment a person is experiencing, we plan to use these symptoms and how they present during various human contexts, such as running, waking up, and other situations. We achieve this by using a questionnaire that is symptom-based. The purpose of this activity is to help with the first diagnosis of chest problems and to help distinguish between various diseases. We employ the idea of prodrome based surveys furthermore, weighted scores for these inquiries in our methodology. The initiative is made to fit seamlessly into any nearby doctor’s office, nursing home, or hospital’s regular schedule to Programme the computer to recognize and foretell the illness the patient is suffering from. Training is carried out using example datasets that include survey-style cues. Test datasets are accessible in the UCI vault dataset, the California Health and Human Services (CHHS) data set, and data from the esteemed National Institute of Tuberculosis and Respiratory Diseases.

1.2 Background Coronary illness has a high worldwide mortality rate. Prediction and conclusion of coronary illness has turned into a troublesome errand for specialists and emergency clinics both in India and abroad. The Heart Infection Forecast System is a system that aids in the prediction of heart disease, specifically cardiovascular disease such as myocardial infarctions. In this field, data mining and system learning algorithms are critical. The experiments of researchers are accelerating their work to develop a graphical user interface and machine learning algorithm that can assist doctors in making decisions regarding the prediction and diagnosis of heart disease. This project’s main output is predicting a patient’s heart disease using machine learning algorithms. A comparative study is carried out, with the performance calculated using a machine learning procedure.

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

21

1.3 Objective The primary motivation behind this undertaking stands to utilize AI calculations to foresee the presence of a coronary illness in a person. Using the existing data, analysis is done to determine the presence of such characteristics in various types of individuals which indicate the vulnerability to a heart disease. ML algorithms are used on the data to calculate the probability of a person having a heart disease in the future. This data is centered on the various functioning parameters of the heart. Our project focuses on reducing efforts and time and increasing efficiency and accuracy in prediction.

2 Literature Survey Machine learning is a fast-growing field and I aim to utilize its potential to create this Artificial Intelligence system. Having a vast application, this system will be used by the doctor’s patients when all the elements are implemented in the system. Actual doctors can decide disease with a large number of tests, which require a high process time, conclusion, lack of skilled cognition and becoming inexperienced [1]. It is difficult to extract important data in the form of knowledge, hence, it is crucial to use various techniques such as mining and machine learning methods. Further, extracting important data from such a type of medical data repository becomes important, when using methods like classification, clustering, regression, prediction, etc. [2]. The primary focal point of the paper is to see the strategy for information mining grouping methods to identify coronary illness expectation in beginning phases. Likewise, by utilizing PC based expectation, it will be not difficult to foresee heart illnesses at a beginning phase [3]. KNN (k-closest neighbor), ANN (Artificial Neural organization) and SVM (Support vector machine) are some of the techniques which are typically involved and a relative report for our proposed project and for expectation is finished utilizing the Cleveland coronary illness dataset [4].

2.1 Summary Early detection and treatment options exist for heart disorders. Using the method described above, we may determine whether a patient has heart disease based on their numerous symptoms. In this instance, SVM and random forest classifiers provide the most accurate predictions. We are unable to anticipate the many types of heart disorders with any degree of accuracy due to the lack of abundant data, but we can identify heart infections with a respectable degree of accuracy of roughly 80 to 85%. When sufficient data is available, it will be possible to design methods for disease diagnosis that are more accurate three data mining creation strategies that are used to

22

U. Kulkarni et al.

construct a model of the projection system for chest infections. The process retrieves secret information from a historical record of chest infections. The models are made and gotten to utilizing the DMX inquiry language and tasks. A test dataset is utilized to prepare and approve the models. Methods like the Lift Chart and Categorization Matrix are utilized to measure how well the models work. As a consequence of the anticipated express, each of the three models are equipped for removing patterns. Neural Network and Ruling Trees appear to be the best models for anticipating individuals with chest disease. In correlation with the prepared models, the objectives are assessed. Each of the three models enjoys its own benefits concerning the effortlessness of model understanding, accessibility of exhaustive data, and precision in giving solutions to complex questions. This framework can be improved and extended further. It may likewise incorporate extra information mining strategies, for example, Association Rules and Time Series. The use of constant information is an option to all out data. Another subject is to mine the colossal measure of unstructured information present in medical care data sets utilizing message mining.

3 System Design 3.1 Existing System A large number of people suffer from chest related diseases. Several people die from chest conditions. This is often due to the fact that they are diagnosed much later after they occur when it becomes difficult to solve the problem. In addition to this, they are often misdiagnosed for one another. A patient with Asthma may be told he has COPD and vice versa. This leads to adverse effects as it leads to wrong treatment being given to the patient. Therefore, there is a need to build an easy system to aid doctors for preliminary decision making. A need to empower the patient with a tool that helps him understand his condition better and take appropriate measures by talking to the correct doctor. It is mainly focused on Knowledge Discovery in Databases (KDD) which is the primary proposal from which mashup candidates are identified by addressing a repository of open services. In this methodology, there is a personalized development of software, which can be used to produce new software based on service integration methods. KDS define service integration qualification by discovering different phases of web service specifications. The process that is being used here intersects the fields of data mashup and service mashup. This idea of obtaining information from web service offerings is comparable to the well-established KDD approaches. The representations of data integration and service mashup are discussed in this work. Furthermore, cutting-edge techniques for the fundamental KDS domains of comparison processing, grouping, filtering, etc.

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

23

3.2 Identification of Common Risks Heart disease include risk factors like 1. 2. 3. 4. 5. 6. 7. 8. 9.

High blood pressure. Abnormal blood lipids. Use of tobacco. Obesity. Physical inactivity. Diabetes. Age. Gender. Family generation.

Extracting Data from huge data sets data mining can be one of the methods to automatically process knowledgeable information [3].

3.3 Types of Heart Diseases Heart diseases identified are [2] 1. 2. 3. 4. 5. 6. 7. 8.

Coronary heart disease. Cardiomyopathy. Cardiovascular disease. Ischemic heart disease. Heart failure. Hypertensive heart disease. Inflammatory heart disease. Valvular heart disease.

Therefore, to do so, one of the simplest ways to empower the patient as well as doctors for early diagnosis is through a simple symptom-based questionnaire. This questionnaire is a simple tool that consists of the different symptoms faced by the patient such as chest congestion, wheezing, symptom from the throat, symptom from the body part, shortness of breath, etc. Using these symptoms in a wide variety of scenarios in the regular lives of patients compared with the regular lives of people with no chest conditions and no related symptom, we are able to diagnose the percentage of a particular chest disease occurring and are able to tell which of the many chest diseases the patient might be suffering from out of a set of diseases. To be able to build such a scalable tool, we need indicators that have been well researched by medical researchers. There are a number of reputed published questionnaires for the identification of Asthma, diagnosis of COPD, etc. We aim to combine these questionnaires into a single tool and adjust the weights assigned to these questions by training the machine

24

U. Kulkarni et al.

with both understanding and controlling data keeping. These questions have yes or no inputs or they have a spectrum of inputs that indicate the extent of the symptom occurring from 1 to 4. While the patient enters the answers to the questionnaire, there is an initial weight that has been assigned to the question that determines and calculates the percentage chances of the disease taking place. This weight changes as we train our machine with more understanding and control data for both Western as well as Indian conditions. As we get more training data, the diagnosis of the system becomes more and more precise. To test the working of the system, there will be extensive use of UCI datasets and CHHS datasets.

3.4 Problem Statement Currently systems utilize a large amount of medical data taken from tests that determine the nature of the chest disease. These are costly and not scalable in nature and require advanced medical professionals. To overcome problems on existing systems, in the proposed system users may not require to search data in various repositories with special features. Users need only to give information which is required to collect. Users can just type a combination of queries and based on user behavior analysis exact data will be predicted. However, over the years, medical researchers have compiled this medical data into prodrome based surveys which are used to determine the complexities.

3.5 Scope The objective of the task is to recognize the primary side effect of chest simplicity— the recognizing component of these diseases. Our project utilizes the idea of side effect based shapes and changed scores for these sections. The project is planned to be incorporated into the everyday activities of any nearby doctor, nursing home, or medical clinic.

3.6 Proposed System Currently, systems utilize a large amount of medical data taken from tests that determine the nature of the chest disease. These are high-priced and not scalable in nature and require advanced medical professionals. To overcome problems of the existing system. In the proposed system, such data is stored in various repositories with special features. The user needs to provide only the information which is required to be collected. Users can just type a combination of queries and based on the user’s behavior analysis, exact data will be predicted. However, over time, medical

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

25

researchers have synthesized this medical data to provide us with symptom-based questionnaires that people can use to detect these diseases. Be that as it may, when utilized in little clinical examinations with sparse patient and control information, these polls have burdens. To validate and deploy these symptom-based questionnaires for the general public, a machine learning system that makes use of a lot of patient and control data is needed. In order to reliably and quickly identify which chest condition the patient has, we intend to combine a number of these symptombased questionnaires with information from actual case studies. Data from patients (patients with chest ailments and their symptoms) and control data from healthy groups without chest issues are the two categories that are required. Consolidating these datasets will bring about weighted scores for each inquiry on the structure, which will permit us to recognize which kind of chest illness the patient has, to train the machine, our new system intends to use supervised machine learning algorithms and built-in Python libraries. 1. Questionnaire Generation and Machine Training At the beginning, we generate a global questionnaire based on the different questionnaires from medical researchers. These questions will have standard weightage scores assigned at the beginning. Once the questionnaire has been established with its standardized scores, the machine will be trained taking into consideration patient data and control data associated with these symptoms that will change the weightage of the score using TensorFlow. 2. Patient Input The patient inputs the answers to the questions using simple yes or no, or multiplechoice ranging from 1 to 4 in extent of the symptom and also chooses between symptoms occurring in daily situations. 3. Disease Probability Calculation Entered percentage chance of the illness is calculated and generated for the user to see. 4. Graph Generation Based on the inputs and the probability, the system will also generate comparison graphs with respect to other diseases and other patients.

4 Methodology 4.1 Supervised Learning There are various methods used in the main classification as follows (i) Supervised Learning Model.

26

U. Kulkarni et al.

(ii) Unsupervised Learning Model. Here we are going to target on supervised methodology mainly on the model as. (i) Support Vector Machine (SVM) (ii) K-Nearest Neighbors (KNN) (iii) Artificial Neural Network (ANN). (a) Support Vector Machine (SVM) [16] A controlled learning model known as the Support Vector Machine (SVM) is depicted as limited layered vector spaces, where each aspect signifies a specific property of an object., and it has been shown that SVM functions admirably for tackling highlayered space issues. Due to its computational ability on tremendous datasets, SVM is most of the time used in report classification, opinion examination, and expectation based undertakings [16]. (b) K-Nearest Neighbors (KNN) [16] The test information is quickly ordered utilizing the preparation tests utilizing KNearest Neighbor (KNN), one more directed learning model. The greater part vote of an item’s closest neighbors decides its grouping in KNN. As another option, distance measurements, which can be essentially as fundamental as Euclidean distance, are utilized to foresee the class of another sample. In the functioning strides of KNN, k is at first determined (No. of the closest neighbors). The test information will then be given a class name in view of the results of the normal democratic [16]. (c) Artificial Neural Network (ANN) The administered learning procedure known as the Artificial Neural Network (ANN) contains three layers: input, secret result, and output. The joints between the key units, the mystery, and the result are not entirely settled by the pertinence of the allotted load of that specific info unit. In general, significance increments with expanding weight. ANN can utilize both direct and sigmoid exchange (actuation) functions. ANNs might be prepared to deal with immense volumes of information with few inputs. The most famous learning calculation for multi-facet feed forward ANNs is the backpropagation learning tool. Three sub-datasets for preparing, approval, and testing ought to be made from the information records for ANN.

4.2 Symptom-Based Questionnaire Symptom-based Questionnaires are required for the following heart related diseases. . . . .

Asthma. COPD. Pneumonia. Tuberculosis.

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

27

4.3 Dataset Training and Testing . Dataset required for the purpose can be obtained from the UCI database, CSSH database and datasets obtained from the National Institute of Tuberculosis and Respiratory Diseases (India). . An ML training service like TensorFlow may be used to train the system based on the dataset selected. . A Cloud ML service like Azure ML or Amazon ML may be used to verify and double check the training. Working of the system: . . . .

User chooses one of the diseases entered Collection of data based on the survey. Working on the input, a chance percentage of the illness occurring. Graphs are calculated indicating relationship with other diseases.

5 Process and Analysis 5.1 General Process The Agile cycle model was employed. Agile showcasing is a strategy that follows programming engineers and attempts to speed up straightforwardness in marketing. Agile is ordinarily a period boxed, iterative technique to programming conveyance that produces programming step by step from the start as opposed to holding on until the finish to introduce the undertaking as a whole. Agile philosophies frequently work by separating projects into little pieces of client usefulness known as client stories, focusing on them, and afterward consistently conveying them in short emphases of about fourteen days. (i) Probability Generation: Here as per the input given by the user this block will try to give the probability of chest disease in which category it will be defined as shown in Fig. 1. (ii) Graph Calculation: Here it is expected depending upon the category by which a definite Path by which the method of medicine can be worked out as shown in Fig. 1.

28

U. Kulkarni et al.

Fig. 1 General Process Diagram [self-prepared as per required for project]

5.2 Use Case Diagram General process will be as given below, (i) User Choice: Here it expected to know the choice of the user from where a classification of the chest diseases can be identified to get analysis done as shown in Fig. 2. (ii) User Input Data: Here the user is expected to give data which may be used for further calculation as shown in data level 0 in Fig. 3. (iii) User Input Data: Here the user is expected to give data which may be used for further calculation as shown in data level 1 in Fig. 4 where a processing and training phase is being worked out which in turn help in preparing a probability output for the graph calculation model.

5.3 Data Flow Diagram See Figs. 3 and 4.

5.4 System Flow Working of the flow shown in Fig. 5. 1. Start. 2. Collect general information of the patient.

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

29

Fig. 2 Use Case Diagram [self-prepared as per required for project]

Fig. 3 Level 0

3. Enter the option about the type of disease the patient is suffering from like. . Heart Disease. . Tuberculosis. . Asthma. 4. Make an expert conclusion out of the information collected through the questionnaire.

30

U. Kulkarni et al.

Fig. 4 Level 1 [self-prepared as per required for project]

5. The expert conclusion is matched using the information from the data set, where one can determine the likelihood that a particular condition exists. 6. If a decision cannot be made from the patient data and the data from the data set, in other words if data does not match, then reexamination of the patient is done with more questionnaires. 7. If the decision is made, that is, if the match is found then the probability of the occurrence of a particular disease is confirmed.

6 Implementation and Results 6.1 Details of Algorithms The project is designed to obtain the highest accuracy. The two methods Support Vector Machine and Random Forest, which have the highest accuracy out of the numerous prediction and classification algorithms available, have been chosen for the project.

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

31

Fig. 5 System flow [self-prepared as per required for project]

Support Vector Machine Support Vector Machines work by graphically expressing the data points in space and generating a geometrical shape that can separate the data points into different groups for classification.SVM offers an algorithm with good performance and there is no need for optimization. It is one of the earliest and most well-known classification machine learning methods. It is a supervised machine learning technique that is frequently used for regression as well as classification.

32

U. Kulkarni et al.

Fig. 6 Support vector machine [free image from google pages]

Finding a hyperplane, a geometrical object with a definite definition that allows it to distinguish between the data points in space, is the foundation of an SVM method. The example of SVM is provided below as shown in Fig. 6. The data points in a data set that would be used to establish the location of the hyperplane are known as support vectors; these are the data points closest to the hyperplane in space. Therefore, the support vectors can be thought of as the most important components of a support vector machine. Working of Random Forest . Each tree is trained using a distinct subset of training data (around 2/3) with replacement. . Error and variable importance are estimated using the remaining training data (OOB-out of box). . The amount of votes from each tree determines the class, and the average of the results is utilized for regression. Advantages of Random Forests . . . . .

Tree pruning is not necessary. Automatic accuracy and variable importance generation. Being overfit is not a concern. In training data, not highly sensitive to outliers. Simple to set limits.

6.2 Data Set and Its Parameters The dataset has been imported from the University of California, Irvine repository for machine learning. The dataset is multivariate with a total 75 attributes. The attributes

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

33

consist of categorical, numeric, binary and continuous attributes. Out of these 75 attributes, 11 major attributes are considered for this problem. The total number of instances are 303. There are 5 labels which are 0 for no disease and 1–4 for the progression of disease. There are multiple datasets from different sources but the Cleveland dataset is used here as it has a smaller number of missing values hence is more accurate. It consists of 0.2% missing values and is used for classification problems.

6.3 Dataset Attributes The 11 main attributes which help machines to learn are: 1. Age: age in years. 2. Thal . 3—Normal. . 6—Defect (fixed). . 7—reversible (defect). 3. Number of containers: 0-3 colored by fluoroscopy. 4. Exercise induced angina 1 = yes. 0 = no. 5. Maximum heart rate achieved. 6. Sex . 1—male. . 0—female. 7. Rest ECG: resting electrocardiographic results . Value 0—Normal. . Value 1—Having ST-T wave abnormality. . Value 2—Showing probable or definite left chamber hypertrophy. 8.

Diagnosis of heart disease: value range between 0 and 4.

Anomaly in Dataset During the analysis of the dataset it is found that the dataset is highly imbalanced. Apart from the label 0, the entries for the labels 1–4 are under sampled as shown in Fig. 7. This means that there is not enough data separately for these labels to effectively predict the presence of heart disease, which would result in low accuracy and precision. For example, if a dataset consists of 100 instances out of which the tuples for label 1 are 98 and for the label 2 are only 2, we say that the data for label

34

U. Kulkarni et al.

Fig. 7 Classification of data according to Diagnosis [self-prepared as per experiment result of project]

2 is under sampled, so even if the classifier predicts all the inputs to be true it would be 98% accurate but actually it won’t be able to classify correctly. To deal with this problem, synthetic minority oversampling can be done by adding the instances of the under sampled labels, or the majority class data can be under sampled by removing its entries. With the kind of data available, under sampling and oversampling will not affect the accuracy of the classification much. To solve this issue in another way, declare two labels., i.e. for positive or negative diagnosis. This balances the dataset and increases the accuracy and precision of classification.

6.4 Execution and Screenshots For machine learning algorithms to be effective, meaningful information must be extracted from the raw dataset. This has been accomplished using the Orange Programme. An open-source tool compartment for information mining, AI, and representation is called Orange.It might be utilized as a Python library and incorporates a visual programming front-end for exploratory information examination and intuitive information visualization.Widgets, otherwise called orange parts, incorporate

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

35

everything from clear information perception, subset choice, and preprocessing to the exact assessment of learning calculations and determining. As shown in Fig. 8 it is a sample of the orange software which indicates ways of interfaces available for user interface. Orange is supported on mac OS, Windows and Linux and can also be installed from the python Package Index Repository. Execution process of the application Step 1: Start the application using “firstpage.py” file as shown in Fig. 9 Step 2: Click on “Click Here To Continue” Tab. After step1 and using the continue button next GUI will be displayed as shown in Fig. 10. Step 3: Enter the patient details. As the execution is to continue the GUI that will collect the personal information which may be used for further disease predication as shown in Fig. 11. Step 4: Click on “Predict!” button. As the machine learning algo is process in background for Random forest and support Vector Machine the GUI will display the accuracy % with Heart disease Yes/NO as shown in Fig. 12. Here, the application begins. This screen guides us to the rest of the application where we can enter the patient’s details. The Patient Details page will appear. In this step we collect a patient’s information page which collects the actual patient information.

Fig. 8 Orange data mining tool interface

36

U. Kulkarni et al.

Fig. 9 First page of graphical user interface [free image from google pages]

Step 3: Enter the patient details. In this step a patient’s information about important and crucial parameters of his/her health condition is collected. This patient information contains items like patient’s age, sex, chest pain, the highest heart rate attained, exercise-induced angina, ST depression, the slope of the ST Peak Segment, the number of major blood arteries, thalassemia, and the resting ECG. Step 4: Click on “Predict!” button. The Results Page Will appear with the diagnosis. The prediction is made based on the above information of the patient. Even though heart disease occurs at a very small age nowadays, age is an important factor as aging blood vessels also age. Every chest pain does not mean heart disease so, type of chest pain is important. Some chest pains are gastric, some are muscular, some are due to angina. Normal heart rate is 76 so maximum heart rate is considered as an important input while diagnosing. Heart disease mainly affects blood vessels so the data about the vessels which are affected is really crucial. In case of Thalassemia the amount of Oxygencarrying protein is less which affects the breathing. ECG provides information about immediate fluctuations in the heart system. So all this crucial information is processed in this step and prediction is made.

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

37

Fig. 10 Second page of graphical user interface [self-prepared as per experimental result of the project]

Graphical results obtained after execution. As shown in Fig. 13.which shows Roc curve for the SVM algorithm. Graphical results obtained after execution. As shown in Fig. 14. which shows ROC curve for Random Forest algorithms. Graphical results obtained after execution. As shown in Fig. 15 is the comparison between the algorithm used for predication of the heart disease respectively. The True Positive Rate means identified having disease and have a disease. The False Positive Rate means identified having disease and does not have a disease. The dotted line represents a nondiscriminatory test i.e. the line just represents 0(zero) probability of the test. A test with wonderful segregation has a ROC bend going through the upper left corner. Accordingly the nearer the ROC bend is to the upper left corner, the higher the general precision of the test.

38

U. Kulkarni et al.

Fig. 11 Details of a patient with heart disease [self-prepared as per experiment result of project]

7 Conclusion and Future Scope 7.1 Conclusion The accuracy obtained for the two mechanisms totally depends on the data set used; also the prediction depends on two types. 1. Detecting the presence of heart attack. 2. How severe is the condition of the heart attack? The accuracy for detecting the presence of heart attack for support vector machines is 84.61% and for random forest is 86.81% and how severe the heart attack for svm is 60.44% and for random forest is 59.34%. Heart sicknesses when analyzed early can be overseen in different ways. By utilizing the above approach we can foresee the presence of coronary illness utilizing the different side effects of a patient. The two classifiers involved give the most dependable expectations for this situation, for example support vector machine and arbitrary woodland classifier. Due to the unavailability of data in abundance we cannot predict heart diseases of different kinds accurately, but the diagnosis of the heart disease can be done with a fair accuracy of about 80–85%. More accurate

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

39

Fig. 12 Final page of graphical user interface [free google image of Heart]

systems can be developed in future for diagnosis of diseases, when enough data is available.

7.2 Future Scope In this project the main focus is on the sample data. Limitations on the real time data actuation is necessary to conclude in a meaningful manner. For the simplicity questionnaire as per our knowledge is used. Some sample databases were also identified but with differences in the weather, environment conditions, working conditions, food conditions, behavior, it was difficult to match with Indian conditions.

40

U. Kulkarni et al.

Fig. 13 ROC curve for SVM using orange software [self-prepared as per experiment result of project]

Fig. 14 ROC curve for random forest using orange software [self-prepared as per experiment result of project]

DCD_PREDICT: Using Big Data on Prediction for Chest Diseases …

41

Fig. 15 ROC curve for Comparison of Random Forest and Support Vector Machine using orange software [self-prepared as per experiment result of project]

References 1. Shin, B., Cole, S.L., Park, S.-J., Ledford, D.K., Lockey, R.F.: Division of Allergy and Clinical Immunology, Department of Internal Medicine, University of South Florida College of Medicine, James A. Haley Veterans’ Medical Center, Tampa, Florida. A new symptom based questionnaire for predicting the presence of Asthma. 73, 296–305 (2006). https://doi.org/10. 1159/000090141 2. Tinkelman, D.G., Price, D.B., Nordyke, R.J., Halbert, R.J., Isonaka, S., Nonikov, D., Juniper, E.F., Freeman, D., Hausen, T., Levy, M.L., Østrem, A., van der Molen, T., van Schayck, C.: Symptom-based questionnaire for differentiating COPD and asthma. 73, 285, 295–298 (2006). https://doi.org/10.1159/000090142 3. Price, D.B., Tinkelman, D.G., Halbert, R.J., Nordyke, R.J., Isonaka, S., Nonikov, D., Juniper, E.F., Freeman, D., Hausen, T., Levy, M.L., Ostrem, A., van der Molen, T., van Schayck, C.P.: Symptom-based questionnaire for identifying COPD in smokers. Respiration 4. Farion, K., Michalowski, W., Wilk, S., O’Sullivan, D., Matwin, S.: Departments of Pediatrics and Emergency Medicine, University of Ottawa Ottawa, Canada. Telfer School of Management, University of Ottawa Ottawa, Canada. School of Information Technology and Engineering, University of Ottawa Ottawa, Canada Institute of Computer Science, Polish Academy of Sciences Warsaw, Poland). A tree-based decision model to support prediction of the severity of asthma exacerbations in children 5. Metting, E.I., In’t Veen, J.C.C.M., Richard Dekhuijzen, P.N., van Heijst, E., Kocks, J.W.H., Muilwijk-Kroes, J.B., Chavannes, N.H., van der Molen, T.: Development of a diagnostic decision tree for obstructive pulmonary diseases based on real-life data

42

U. Kulkarni et al.

6. Singh, V., Gaikwad, A., Waso, S., Sawale, E.: Web based e-health systems and services. IJIRCCE 4(3) (2016), 3253–3258 7. Cleveland database: http://archive.ics.uci.edu/ml/datasets/Heart+Disease 8. Statlog database: http://archive.ics.uci.edu/ml/machine-learningdatabases/statlog/heart 9. Pouriyeh, S., Vahid, S., Sannino, G., Pietro, G.D., Arabnia, H., Gutierrez, J.: A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease. In: IEEE Symposium on Computers and Communication (2017) 10. Xu, S., Zhang, Z., Wang, D., Hu, J., Duan, X., Zhu, T.: Cardiovascular risk prediction method based on CFS subset evaluation and random forest classification framework. In: International Conference on Big Data Analysis (2017) 11. Fathima, S., Hundewale, N.: Comparison of classification techniques-support vector machines and naive bayes to predict the arboviral disease-dengue. In: IEEE International Conference on Bioinformatics and Biomedicine Workshops (2011) 12. Metting, E.I., In’t Veen, J.C.C.M., Richard Dekhuijzen, P.N., van Heijst, E., Kocks, J.W.H., Muilwijk-Kroes, J.B., Chavannes, N.H., van der Molen, T.: Development of a diagnostic decision tree for obstructive pulmonary diseases based on real-life data 13. Dhar, S., Roy, K., Dey, T., Datta, P., Biswas, A.: A Hybrid Machine Learning Approach for Prediction of Heart Diseases. In: 2018 4th International Conference on Computing Communication and Automation (ICCCA), pp. 1–6, 2018. 14. Raju, C., Philipsy, E., Chacko, S., Padma Suresh, L., Deepa Rajan, S.: A survey on predicting heart disease using data mining techniques. In: 2018 Conference on Emerging Devices and Smart Systems (ICEDSS), pp. 253–255 (2018) 15. Sharma, V., Yadhav, S., Gupta, M.: Heart disease predication using machine learning techniques. In: 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN). IEEE, https://doi.org/10.1109/ICACCCN51052.2020. 9362842 16. Hamdaoui, H.E., Boujraf, S., El Houda Chaoui, N., Maaroufi, M.: A clinical support system for predication of heart disease using machine learning techniques. In: 2020 5th International Conference on Advance Technologies for Signal and Image Processing (ATSIP). IEEE, https:// doi.org/10.1109/ATSIP49331.202.9231760

Design of Energy Efficient IoMT Electrocardiogram (ECG) Machine on 28 nm FPGA Pankaj Singh, Bishwajeet Pandey, Neema Bhandari, Shilpi Bisht, Neeraj Bisht, and Sandeep K. Budhani

Abstract An energy-efficient IoMT Electrocardiogram (ECG) machine is proposed in this article. ECG machine is used to check the heart’s rhythm and electrical activity. By analyzing the ECG signal wave, a cardiologist can assess various heart conditions like heart attack, coronary heart disease, cardiomyopathy, or arrhythmia. IoMT ECG machine can measure the heart’s activity and send the reports to a cloud server. A cardiologist can access a patient’s ECG report from the cloud server and give prescription and care instruction to the patient. IoMT ECG machine can provide affordable healthcare to a patient at a remote location. The IoMT ECG is implemented using 28 nm Artix-7 low voltage FPGA. The main objective of this article is to reduce the energy consumption of the ECG machine. Energy efficiency is achieved by scaling capacitance, voltage, frequency, and changing I/O standards. The total energy efficiency of 81.27% is achieved by reducing capacitance from 150 to 1 pF, the voltage from 93 to 87 V, frequency from 5 to 3 GHz, and by changing I/O standard from LVCMOS_18 to LVCMOS_12. Keywords Affordable healthcare · I/O standard · Capacitance · Voltage · Frequency · LVCMOS_18 · LVCMOS_12

1 Introduction An Internet of Things (IoT) device is capable of having communication with similar or different types of devices or with a remote server. An IoT system is comprised of such devices. An IoT system can be homogeneous or heterogeneous, that is devices P. Singh (B) · N. Bhandari · S. Bisht · N. Bisht Birla Institute of Applied Sciences, Bhimtal, Uttarakhand, India e-mail: [email protected] B. Pandey Gyancity Lab, Guragaon, India S. K. Budhani Graphic Era Hill University, Bhimtal, Uttarakhand, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Rishiwal et al. (eds.), Towards the Integration of IoT, Cloud and Big Data, Studies in Big Data 137, https://doi.org/10.1007/978-981-99-6034-7_3

43

44

P. Singh et al.

in an IoT system can be similar or different from each other. An Internet of Medical Thing (IoMT) is an extension of the IoT system. The IoMT system is comprised of connected medical devices. The devices in an IoMT system can vary from a simple blood pressure monitoring machine to a complex CT scan machine capable of analyzing the entire human body. Mobile IoMT devices can provide in-depth insight into a user’s health. An IoMT device capable of tracking heart rate can help a doctor to assess the heart condition of a patient arrived in an emergency room. A fitness band equipped with a fall detection sensor can help elderly people in case of an emergency by contacting the nearby hospital or the family persons. What makes an IoMT device different from other medical devices is that these devices can talk to other devices. An IoMT device does so by having an interface to connect to other devices. To implement an interface that enables an IoMT device to communicate the developers can choose from a varying range of connectivity options such as Ethernet, Wi-Fi, Cellular, Wi-Max, Zigbee, RFID, NFC, LoRA, and Sigfox. Ethernet, Wi-Fi, Cellular, and Wi-Max are best suited for large IoMT devices, where as RFID, NFC, LoRA, and SigFox can be used in small battery-operated devices. With the growing concern of individuals towards their health, there is a rise in demand for mobile IoMT devices. These mobile IoMT devices are battery operated and often need to be charged. Due to the repeated charging of the device, the total cost of ownership increases, and the health of the battery deteriorates. If the battery is not performing up to the satisfaction level due to deterioration, it may need to be replaced which further contributes to the total cost of ownership. The need for repeated charging and battery replacement poses environmental and hazardous threats. Therefore, it is required to lower the energy consumption of these devices. Energy is the amount of power that a device needs to operate. In this article, the terms energy and power have been used interchangeably. In this article, the researchers have proposed an energy-efficient IoMT ECG machine that can connect to a cloud system. An ECG machine is used by a medical professional to assess the various heart conditions of a patient by measuring the electrical signals of the heart. A medical professional assesses the heart condition by looking into the ECG signal wave. An ECG signal comprises of the P, Q, R, S, and T waves. The P wave represents atria depolarization, the QRS complex represents ventricles depolarization, and the T wave represents ventricles repolarization. ECG signal wave is represented in Fig. 1. ECG machine measures the electrical signal with the help of electrodes which are attached to a patient’s body. These electrodes are capable of sensing the electrical pulses. Usually, twelve electrodes are used to capture the electrical pulse. A waterbased gel is applied to these electrodes to ensure maximum contact with the patient’s body and higher signal reception. These electrodes work as a front end to capture the ECG signal. An analog to the digital converter is used to convert the analog signal captured by the electrodes into a digital signal. The signal sensed by these electrodes contains noise. To remove noise from the ECG signal a high pass Finite Impulse Response (FIR) filter and low pass FIR filters are used. A clock is used for varying the operating frequency of the ECG machine. The proposed ECG machine is

Design of Energy Efficient IoMT Electrocardiogram (ECG) Machine …

45

Fig. 1 ECG signal wave

assigned an IPv6 address which distinguishes it from a conventional ECG machine. Low power consumption of machine enables it to run on battery. If researchers can reduce the power consumption of an ECG machine, then it would contribute to the objectives of a green and affordable healthcare system. Reduction in power consumption will ensure the reliability and longevity of the ECG machine. Figure 2 represents the block diagram of a conventional ECG machine and Fig. 3 represents the block diagram of the proposed IoMT ECG machine. To implement the IoMT ECG machine, researchers have used 28 nm Artix-7 low package Field Programmable Gate Array (FPGA) of the cpg236 package. The benefit of this FPGA is lower supply voltage which reduces power consumption. With the help of FPGA, the rapid designing of the medical device is achieved. FPGA has configurable logic block (CLB), interconnects which connect CLB and I/O cells to provide and achieve input and output. Due to the flexibility of FPGA, it could be

Fig. 2 Block diagram of conventional ECG machine

46

P. Singh et al.

Fig. 3 Block diagram of proposed IoMT ECG machine

Table 1 Utilization table

Resource

Utilization Available Percentage of utilization

Lookup table 597

10400

5.74

Flip flop

395

20800

1.90

27

106

25.47

1

32

3.13

IO Global buffer

used to implement any functional logic. The design implementation uses only 597 lookup tables, 395 flip-flop, 27 I/O, and 1 global buffer. Table 1 shows utilization.

2 Background There has been much work done to make energy-efficient components of the ECG machine. Several algorithms are proposed to analyze the ECG signal and reduce the power consumption of it. In this section, a comprehensive study of the previous work is presented. Kim [19] discussed the power consumption challenges of the ECG telemetry device. An ECG patch is used for receiving an electrical signal from the patient’s body. ECG patch could be fitted with Bluetooth low energy consisting of the analog front end, analog to digital converter, and a digital signal processor [4]. Zeng [37] de-signed a low powered ECG signal detection node which communicates with a base station. A remote node continuously detects the heart arrhythmia and transfers it to the base station if an anomaly is detected. To reduce the power consumption, the node computes the Euclidean distance between a measured ECG signal and a reference ECG signal. The effect of voltage scaling on the power consumption of the ECG processor was discussed by Abdallah [1]. Ma [28] proposed a secure ECG data transmission technique based on encryption. By encrypting the ECG data and reducing the power consumption of transmission the overall power consumption

Design of Energy Efficient IoMT Electrocardiogram (ECG) Machine …

47

of the ECG machine is reduced. Abdallah [2] proposed an ECG processor using statistical error compensation. They managed to achieve the power reduction of the ECG processor by 28%. Chen [8] proposed an on-body ECG which uses classification methods to deter-mine normal or abnormal ECG. A tradeoff between the performance and power consumption of the ECG machine can be observed. Chen [9] proposed an injectable ECG with reduced power consumption while maintaining the integrity of ECG results. An adaptive resolution based low power consumption wireless biosensor was presented by Chen [7]. The power consumption of an ECG machine can be substantially reduced by implementing a proper I/O standard. The effects of various power consumption standards are shown by Kumar [24]. Zou [38] designed a power-efficient ASIC ECG for detecting R peak. Kosari [21] proposed a 130 nm CMOS based analog front end for ECG which consumes 68 nW power at 0.5 V, the resulting analog front end provides a high-quality signal with less noise. An analog front end is attached to a patient for the ECG signal reception. Luo [27] proposed a single node Bluetooth enabled wireless ECG which uses digital compressed sensing to reduce power consumption. To remove noise from the signal acquired from the electrode a high pass FIR filter or low pass FIR filter can be used [18]. Bluetooth low energy based three-electrode ECG signal acquisition system which transfers the data collected by the acquisition system to a smartphone is proposed by Hadizadeh [14]. The ECG signal acquisition system can run for 20 days on a 150 mAh capacity battery. Kirti [20] proposed an energy-efficient preprocessing block technique for the ECG signal. The pre-processing block uses a high pass and low pass FIR filter. Tekeste [35] proposed various methods for designing the energyefficient ECG machine processor. The best mother wavelet-based signal compression technique for reducing energy consumption was proposed by Kunabeva [26]. A multi-lead power-efficient ECG machine that consists of a clothing patch capable of operating five electrodes to capture the ECG signal and an acquisition device is proposed by Wang [36]. An ECG machine with less number of lead will be easier to operate and will support the mobility of patients in comparison to a 12 lead ECG machine. A wireless wearing electrode for ECG signal acquisition is developed by Hsiesh [15]. The patient can easily wear wireless electrodes instead of wearing a wired electrode. The signal acquired by the electrodes is sent to the ECG machine wirelessly. Wireless electrodes are powered by a battery. Therefore, it is required to minimize the power consumption of the electrode for the longer operating time. Most of the work that is discussed in this section represents the power efficiency of different components of an ECG machine. No one has presented the power efficiency of the entire ECG machine based on capacitance, voltage, frequency scaling, and I/ O standard.

48

P. Singh et al.

3 Environmental Settings for Energy Efficient IoMT ECG Machine The power consumption of any device can be substantially reduced if the proper approach is used during the synthesis phase. In the synthesis phase, power efficiency at the register transfer level (RTL) is achieved by scaling power consumption factors. The power consumption of any electronic device can be represented by Eq. 1. Power consumption of a CMOS based device can be characterized by the aggregate of dynamic power, static power, and short circuit power. In this article, the researchers have investigated the techniques to reduce the total power dissipation of the ECG machine by scaling the various factors contributing to the total dynamic power consumption. In Eq. 1 Dp represents the dynamic power consumption, Sp represents the static power consumption and SC p represents the short circuit power consumption. P = Dp + Sp + SC

(1)

The dynamic power consumption is the product of capacitance, voltage, and frequency as shown in Eq. 2. Capacitance C is the charge accumulated at the plates of the capacitor when voltage is applied, V is the supply voltage and f is the operating frequency. Dp = C V 2 f

(2)

To analyze the power consumption we have considered capacitance, voltage, and frequency. By scaling down the voltage a substantial reduction in total power consumption can be observed [13, 22, 31, 34]. The effect of capacitance scaling on power consumption is demonstrated by Pandey [30], Singh [33], Bansal [5], Kaur [17]. Reduction in power consumption can be achieved by scaling frequency [11, 23, 25, 32]. Low voltage complementary metal oxide semiconductor (LVCMOS) I/ O standards can substantially reduce the power consumption of the input line, output line, input port, and output port [3, 10, 12, 16]. In this article researchers have used two different I/O standards namely LVCMOS_12 and LVCMOS_18. It is observed that the LVCMOS_12 I/O standard greatly affects the power consumption of the device in comparison to LVCMOS_18. Different Environmental settings that researchers have used for analyzing power consumption are listed in Table 2. Table 2 Environmental settings

Variable

Values

Capacitance

150 pF, 100 pF, 50 pF, 1 pF

Voltage

0.93 V, 0.90 V, 0.87 V

Frequency

5 GHz, 4 GHz, 3 GHz

I/O Standard

LVCMOS_18, LVCMOS_12

Design of Energy Efficient IoMT Electrocardiogram (ECG) Machine …

49

4 Power Analysis of IoMT ECG Machine To curtail the power dissipation of the ECG machine the researchers have tried different configurations using capacitance, voltage, frequency, and I/O standard. A total of 9 configurations are used to assess the power consumption. The initial supply voltage is set to 0.93 V and scaled-down by 0.03 V at each step to 0.87 V, frequency is set to 5 GHz and scaled-down by 1 GHz at each step to 3 GHz, similarly, capacitance is set to 150 pF and scaled-down by 50 pF at each step to 1 pF. The minimum value for capacitance should have been zero if the capacitance is to be scaled down by 50 pF, but 0 pF would mean a perfect open circuit (no accumulation of electrical charge on the capacitor plates). Therefore, the minimum value for capacitance is set to 1 pF. Power consumption readings of the ECG machine at 0.93 V, 5 GHz, 150 pF, 100 pF, 50 pF,1 pF, LVCMOS_18, and LVCMOS_12 are listed in Table 3. There is a reduction of 67.73% in power consumption at 0.93 V, 5 GHz with LVCMOS_18 I/O standard when capacitance is contracted from 150 to 1 pF, similarly there is a reduction of 42.97% in power consumption at 0.93 V, 5 GHz with LVCMOS_12 I/O standard. There is a higher reduction in power dissipation when capacitance is contracted from 150 to 1 pF in LVCMOS_18 in comparison to LVCMOS_12. The power consumption of LVCMOS_12 is lower by 28.25% in comparison to LVCMOS_18 at 150 pF. The power consumption at 1 pF is lower by only 6.68% in LVCMOS_12 in comparison to LVCMOS_18. Therefore, it is observed that the difference in power consumption between LVCMOS_12 and LVCMOS_18 is reduced when capacitance is reduced. Power consumption at 5 GHz, 1 pf with LVCMOS_18 is reduced by 4.27%, similarly, power consumption at 5 GHz, 1 pf with LVCMOS_12 is reduced by 5.15% after reducing the supply voltage by 0.03 V. The same can be verified from the readings of Tables 3 and 4. There is a reduction of only 4.18% with LVCMOS_18 and 4.83% with LVCMOS_ 12 in power consumption when voltage is reduced from 0.90 V to 0.87 V at 5 GHz and 1 pF. A total reduction of 8.28% with LVCMOS_18 and 9.74% with LVCMOS_ 12 is observed at 1 pF and 5 GHz when voltage is reduced from 0.93 V to 0.87 V. The same can be verified from the readings of Tables 3 and 5. Tables 6, 7, 8, 9, 10 and 11 represent the power consumption at reduced voltage and frequency with LVCMOS_18 and LVCMOS_12 I/O standards. The results of these tables are summarized later. Table 3 Power consumption of ECG machine at 0.93 V, 5 GHz Capacitance (pF) LVCMOS_18

LVCMOS_12

I/O power (mW) Total power (mW) I/O power (mW) Total power (mW) 150

853

1159

286

612

100

591

895

197

524

50

328

632

109

435

1

70

374

22

349

50

P. Singh et al.

Table 4 Power consumption of ECG machine at 0.90 V, 5 GHz Capacitance (pF) LVCMOS_18

LVCMOS_12

I/O power (mW) Total power (mW) I/O power (mW) Total power (mW) 150

853

1143

286

595

100

590

879

197

507

50

328

616

109

418

1

70

358

22

331

Table 5 Power consumption of ECG machine at 0.87 V, 5 GHz Capacitance (pF) LVCMOS_18

LVCMOS_12

I/O power (mW) Total power (mW) I/O power (mW) Total power (mW) 150

853

1127

286

578

100

590

864

197

490

50

327

601

109

402

1

70

343

22

315

Table 6 Power consumption of ECG machine at 0.93 V, 4 GHz Capacitance (pF) LVCMOS_18

LVCMOS_12

I/O power (mW) Total power (mW) I/O power (mW) Total power (mW) 150

683

940

229

503

100

472

730

158

432

50

262

519

87

362

1

56

313

18

292

Table 7 Power consumption of ECG machine at 0.90 V, 4 GHz Capacitance (pF) LVCMOS_18

LVCMOS_12

I/O power (mW) Total power (mW) I/O power (mW) Total power (mW) 150

683

928

229

489

100

472

717

158

419

50

262

506

87

348

1

56

300

18

278

Table 8 Power consumption of ECG machine at 0.87 V, 4 GHz Capacitance (pF) LVCMOS_18

LVCMOS_12

I/O power (mW) Total power (mW) I/O power (mW) Total power (mW) 150

683

915

229

476

100

472

704

66

405

50

262

494

87

334

1

56

287

18

265

Design of Energy Efficient IoMT Electrocardiogram (ECG) Machine …

51

Table 9 Power consumption of ECG machine at 0.93 V, 3 GHz Capacitance (pF) LVCMOS_18

LVCMOS_12

I/O power (mW) Total power (mW) I/O power (mW) Total power (mW) 150

517

729

173

398

100

358

569

120

344

50

199

41

66

290

1

42

253

14

238

Table 10 Power consumption of ECG machine at 0.90 V, 3 GHz Capacitance (pF) LVCMOS_18

LVCMOS_12

I/O power (mW) Total power (mW) I/O power (mW) Total power (mW) 150

517

719

173

387

100

358

559

120

333

50

199

400

66

280

1

42

243

14

227

Table 11 Power consumption of ECG machine at 0.87 V, 3 GHz Capacitance (pF) LVCMOS_18

LVCMOS_12

I/O power (mW) Total power (mW) I/O power (mW) Total power (mW) 150

517

709

173

377

100

358

550

120

323

50

198

390

66

269

1

42

234

14

217

The following observations are made from the experiment’s result. 1. Maximum power consumption (1159 mW) is at 0.93 V, 5 GHz, 150 pF with LVCMOS_18 I/O standard, and minimum power consumption (217 mW) is at 0.87 V, 3 GHz, 1 pF with LVCMOS_12. 2. Reduction in capacitance reduces I/O power and total power consumption. 3. Reduction in voltage at constant capacitance and frequency does not affect the I/O power consumption but reduces total power consumption for LVCMOS_18 and LVCMOS_18. 4. Reduction in voltage and frequency does reduce I/O power and total power consumption. 5. The power consumption of LVCMOS_12 is lower than LVCMOS_18 at every experiment’s result. 6. The power consumption ratio of LVCMOS_18 to LVCMOS_12 is higher at higher capacitance (e.g., 150 pF).

52

P. Singh et al.

Table 12 Comparison with existing work Author

Technology (nm)

This work

28

Zeng et al. [37] Abdallah et al. [1]

45

Frequency

Voltage (V)

Reduction in power consumption (%)

3 GHz

0.87

81.27

2.4 GHz

1V

57

0. 6 MHz

0.34

28

Ma et al. [28]

−

−

−

40

Chen et al. [9]

65

10 kHz

0.6

40

Kumar et al. [24]

40

512 kHz

1

77.26

Zou et al. [38]

65

550 kHz

0.7

50

Wang et al. [36]

–

250 Hz

2.1

37.6

7. Power consumption ratio of LVCMOS_12 to LVCMOS_12 is lower at lower capacitance (e.g., 1 pF). 8. The power consumption ratio of LVCMOS_18 to LVCMOS_12 varies with capacitance. 9. There is a reduction of 81.27% in power dissipation when capacitance is lowered from 150 to 1 pF, voltage is reduced from 0.93 V to 0.87 V, frequency is reduced from 5 to 3 GHz and I/O standard is changed from LVCMOS_18 to LVCMOS_ 12. The results of this article are compared with the existing work in Table 12.

5 Conclusion Annual energy consumption of the IoMT ECG machine at 0.93 V, 150 pF, 5 GHz with LVCMOS_18 I/O standard will be 10.152 kWh if the ECG machine is continuously being operated 24 h and 365 days. Similarly, the annual energy consumption of the machine at 0.87 V, 1 pF, 3 GHz with LVCMOS_12 I/O standard will be 1.900 kWh if the ECG machine is continuously being operated 24 h and 365 days. There is a saving of 8.252 kWh per IoMT ECG machine in a year. According to the Ministry of Health and Family Welfare (2018, July 24). There are a total of 65,630 governments public health centers (PHC), community health centers (CHC), sub-district health centers (SDHC), and district hospitals (DH) in India. Consider if every government hospital has two ECG machines on average than a total of 541.578 mWh power can be saved annually. The average tariff of 1 W is 5 rupees per hour in India for non-domestic power consumption. If the government can save 541.578 mWh by using the proposed ECG machine in a year, then 1.89 billion rupees can be saved by the government. The amount of power saved by using an energy-efficient IoMT ECG machine can be used in other industries like agriculture, manufacturing, or transportation.

Design of Energy Efficient IoMT Electrocardiogram (ECG) Machine …

53

An IoMT ECG machine can cater to the need for efficient and approachable healthcare to elderly people, persons with disabilities, and remote areas. The power efficiency of the proposed IoMT ECG machine is achieved by reducing the power consumption by 81.27%. Thus, the reduced power consumption of the IoMT ECG machine will contribute to the objectives of the green healthcare system. Reduction in power consumption also reduces the total cost of ownership.

References 1. Abdallah, R.A., Shanbhag, N.R.: A 14.5 Fj/Cycle/K-Gate, 0.33 V ECG processor in 45nm CMOS using statistical Er-ror compensation. In: Proceedings of the IEEE 2012 Custom Integrated Circuits Conference, pp. 1–4 (2012). https://doi.org/10.1109/CICC.2012.6330670 2. Abdallah, R.A., Shanbhag, N.R.: An energy-efficient ECG processor in 45-Nm CMOS using statistical error compensation. IEEE J. Solid-State Circuits 48(11), 2882–2893 (2013). https:// doi.org/10.1109/JSSC.2013.2280055 3. Aggarwal, A., Pandey, B., Dabbas, S., Agarwal, A., Saurabh, S.: LVCMOS-based low-power thermal-aware energy-proficient vedic multiplier design on different FPGAs. In: Muttoo, S.K. (ed.) System and Architecture, pp. 115–122. Springer (2018). https://doi.org/10.1007/978-98110-8533-8_12 4. Altini, M., Polito, S., Penders, J., Kim, H., Van Helleputte, N., Kim, S., Yazicioglu, F.: An ECG patch combining a customized ultra-low-power ECG SOC with bluetooth low Energy for long term ambulatory monitoring. In: Proceedings of the 2nd Conference on Wireless Health, pp. 1–2. https://doi.org/10.1145/2077546.2077564 (2011) 5. Bansal, M., Bansal, N., Saini, R., Kalra, L., Mohan Singh, P., Pan-dey, B., Akbar Hussain, D.M.: FPGA based low power ROM design using capacitance scaling. Adv. Mater. Res. (2015). https://doi.org/10.4028/www.scientific.net/AMR.1082.471 6. Bui, N.T., Vo, T.H., Kim, B.-G., Oh, J.: Design of a solar-powered portable ECG device with optimal power consumption and high accuracy measurement. Appl. Sci. 9(10), 2129 (2019). https://doi.org/10.3390/app91021297.doi:10.3390/jlpea8030027 7. Chen, S.L.: A power-efficient adaptive fuzzy resolution control system for wireless body sensor networks. IEEE Access 3, 743–751 (2015). https://doi.org/10.1109/ACCESS.2015.2437897 8. Chen, T., Mazomenos, E.B., Maharatna, K., Dasmahapatra, S., Niranjan, M.: Design of a lowpower on-body ECG classifier for remote cardiovascular monitoring systems. IEEE J. Emerg. Sel. Top. Circuits Syst. 3(1), 75–85 (2013). https://doi.org/10.1109/JETCAS.2013.2242772 9. Chen, Y.P., Jeon, D., Lee, Y., Kim, Y., Foo, Z., Lee, I., Langhals, N.B., Kruger, G., Oral, H., Berenfeld, O., Zhang, Z., Blaauw, D., Sylvester, D.: An injectable 64 nW ECG mixed-signal SOC in 65 nm for arrhythmia monitoring. IEEE J. Solid-State Circuits 50(1), 375–390 (2015). https://doi.org/10.1109/JSSC.2014.2364036 10. Das, B., Kiyani, A., Kumar, V., Abdullah, M.F.L., Pandey, B.: Power optimization of Pseudo Noise based optical transmitter using LVCMOS IO standard. Power Gener. Syst. Renew. Energy Technol. (PGSRET) 2015, 1–7 (2015). https://doi.org/10.1109/PGSRET.2015.7312252 11. Das, T., Pandey, B., Rahman, M.A., Kumar, T., Siddiquee, T.: Capacitance and frequency scaling based energy efficient image inverter design on FPGA. Int. Conf. Commun. Comput. Vis. (ICCCV) 2013, 1–5 (2013). https://doi.org/10.1109/ICCCV.2013.6906736 12. Gupta, G., Kaur, A., Pandey, B.: LVCMOS based green data flip flop design on FPGA. Ninth Int. Conf. Adv. Comput. (ICoAC) 2017, 41–45 (2017). https://doi.org/10.1109/ICoAC.2017. 8441192 13. Gupta, T., Verma, G., Kaur, A., Pandey, B., Singh, A., Kaur, T.: Energy efficient counter design using voltage scaling on FPGA. Fifth Int. Conf. Commun. Syst. Netw. Technol. 2015, 816–819 (2015). https://doi.org/10.1109/CSNT.2015.131

54

P. Singh et al.

14. Hadizadeh, E., Rabbani, R., Azizi, Z., Barekatain, M., Hakhamaneshi, K., Khoram, E., FotowatAhmady, A.: Ultra low-power system for remote ECG monitoring. ArXiv:1903.08835 [Eess]. http://arxiv.org/abs/1903.08835 (2019) 15. Hsieh, H.Y., Luo, C.H., Tai, C.-C.: Wireless potential difference electrocardiogram constituted by two electrode-pairs wearing comfort. J. Instrum. 15(08), P08011–P08011 (2020). https:// doi.org/10.1088/1748-0221/15/08/P08011 16. Kalra, L., Bansal, N., Saini, R., Bansal, M., Pandey, B.: LVCMOS I/O standard based environment friendly low power ROM design on FPGA. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1824–1829 (2015) 17. Kaur, I., Rohilla, L., Nagpal, A., Pandey, B., Sharma, S.: Different configuration of low-power memory design using capacitance scaling on 28-nm field-programmable gate array. In: Muttoo S.K. (ed.) System and Architecture, pp. 151–161. Springer (2018). https://doi.org/10.1007/ 978-981-10-8533-8_15 18. Kher, R.: Signal processing techniques for removing noise from ECG signals. J. Biomed. Eng. 1, 1–9 (2019) 19. Kim, N.J., Hong, J.H., Lee, T.-S.: A study on power consumption and transmission rate in ECG signal processing in mobile environment. In: Magjarevic, R., Nagel J.H. (eds.) World Congress on Medical Physics and Biomedical Engineering 2006, pp. 4107–4110. Springer (2007). https://doi.org/10.1007/978-3-540-36841-0_1041 20. Kirti, Sohal, H., Jain, S.: FPGA implementation of power-efficient ECG pre-processing block. Int. J. Recent. Technol. Eng. (IJRTE) 8(1), 2899 (2019) 21. Kosari, A., Breiholz, J., Liu, N., Calhoun, B., Wentzloff, D.: A 0.5 V 68 nW ECG monitoring analog front-end for arrhythmia diagnosis. J. Low Power Electron. Appl. 8(3), 2 (2018) 22. Kumar, A., Pandey, B., Akbar Hussain, D.M., Atiqur Rahman, M., Jain, V., Bahanasse, A.: Low voltage complementary metal oxide semiconductor based energy efficient UART design on spartan-6 FPGA. In: 2019 11th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 84–87 (2019). https://doi.org/10.1109/CICN.2019.890 2356 23. Kumar, A., Pandey, B., Akbar Hussain, D.M., Atiqur Rahman, M., Jain, V., Bahanasse, A.: Frequency scaling and high speed transceiver logic based low power UART design on 45nm FPGA. In: 2019 11th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 88–92 (2019). https://doi.org/10.1109/CICN.2019.890 2375 24. Kumar, T., Memon, A.K., Musavi, S.H.A., Khan, F., Kumar, R.: FPGA based energy efficient ECG machine design using different IO standard. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp 1541–1545 (2015) 25. Kumar, T., Pandey, B., Das, T., Thakur, S.K., Chowdhry, B.S.: Frequency, voltage and temperature sensor design for fire detection in vlsi circuit on fpga. In: Shaikh, F.K., Chowdhry, B.S., Zeadally, S., Hussain, D.M.A., Memon, A.A., Uqaili, M.A. (eds.) Communication Technologies, Information Security and Sustainable Development, pp. 121–133. Springer International Publishing (2014). https://doi.org/10.1007/978-3-319-10987-9_12 26. Kunabeva, R., Manjunatha, P., Narendra, V.G.: Adaptive best mother wavelet based compressive sensing algorithm for energy efficient ECG signal compression in WBAN node. Int. J. Innov. Technol. Explor. Eng. 8(10), 685–692 (2019). https://doi.org/10.35940/ijitee.J8801.088 1019 27. Luo, K., Cai, Z., Du, K., Zou, F., Zhang, X., Li, J.: A digital compressed sensing-based energyefficient single-spot bluetooth ECG node [Research Article]. J. Healthc. Eng. (2018). https:// doi.org/10.1155/2018/2687389 28. Ma, T., Shrestha, P.L., Hempel, M., Peng, D., Sharif, H., Chen, H.-H.: Assurance of energy efficiency and data security for ECG transmission in BASNs. I.E.E.E. Trans. Biomed. Eng. 59(4), 1041–1048 (2012). https://doi.org/10.1109/TBME.2011.2182196 29. Ministry of Health and Family Welfare.: Hospitals in the Country. https://pib.gov.in/PressRele asePage.aspx?PRID=1539877 (2018)

Design of Energy Efficient IoMT Electrocardiogram (ECG) Machine …

55

30. Pandey, B., Kumar, T., Das, T., Yadav, R., Pandey, O.J.: Capacitance scaling based energy efficient FIR filter for digital signal processing. Int. Conf. Reliab. Optim. Inf. Technol. (ICROIT) 2014, 448–451 (2014). https://doi.org/10.1109/ICROIT.2014.6798382 31. Pandey, B., Rahman, M.A., Hussain, D.M.A., Das, A.S.B.: Leakage power reduction with various IO standards and dynamic voltage scaling in vedic multiplier on virtex-6 FPGA. Indian J. Sci. Technol. (2016). https://doi.org/10.17485/ijst/2016/v9i25/96633. 32. Pandey, B., Sharan, P., Dhirani, L.L., Hussain, D.M.A.: Role of scaling of frequency and toggle rate in POD IO standards based energy efficient ALU design on ultra scale FPGA. In: 2018 10th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 50–53 (2018). https://doi.org/10.1109/CICN.2018.8864933. 33. Singh, P.R., Pandey, B., Kumar, T., Das, T., Pandey, O.J.: Output load capacitance based low power implementation of UART on FPGA. Int. Conf. Comput. Commun. Inform. 2014, 1–4 (2014). https://doi.org/10.1109/ICCCI.2014.6921826 34. Singh, S., Agarwal, M., Agrawal, N., Kumar, A., Pandey, B.: Simulation and verification of voltage and capacitance scalable 32-bit Wi-Fi ah channel enable alu design on 40nm FPGA. Int. Conf. Comput. Intell. Commun. Netw. (CICN) 2015, 1363–1366 (2015). https://doi.org/ 10.1109/CICN.2015.264 35. Tekeste Habte, T., Saleh, H., Mohammad, B., Ismail, M.: Introduction to ultra-low power ECG processor. In: Tekeste Habte, T., Saleh, H., Mohammad, B., Ismail, M. (eds.) Ultra Low Power ECG Processing System for IoT Devices, pp. 1–6. Springer International Publishing (2019). https://doi.org/10.1007/978-3-319-97016-5_1 36. Wang, L.H., Zhang, W., Guan, M.H., Jiang, S.Y., Fan, M.H., Abu, P., Chen, C.A., Chen, S.L.: A low-power high-data-transmission multi-lead ECG acquisition sensor system. Sens. (Basel, Switzerland) 19(22), 4996 (2019). https://doi.org/10.3390/s19224996 37. Zeng, M., Chung, I.-Y., Lee, J.-A., Lee, J.-G.: An on-node intelligence-based energy efficient ECG monitoring system. ICTC 2011, 401–405 (2011). https://doi.org/10.1109/ICTC.2011.608 2626 38. Zou, Y., Han, J., Xuan, S., Huang, S., Weng, X., Fang, D., Zeng, X.: An energy-efficient design for ECG recording and R-peak detection based on wavelet transform. IEEE Trans. Circuits Syst. II Express Briefs 62(2), 119–123 (2015). https://doi.org/10.1109/TCSII.2014.2368619

Automatic Smart Irrigation Method for Agriculture Data Rashmi Chaudhry, Vinay Rishiwal, Preeti Yadav, Kaustubh Ranjan Singh, and Mano Yadav

Abstract Along with the growth in technology and humans, humankind is eagerly hunting for methods that are sustainable and economic at the same time. Rurals parts, which are rich in farming in the Indian subcontinent, follow a manual pattern for irrigation i.e., watering their crops. In this chapter, we have proposed a system to implement an automatic water supply to the farms based upon their crops, a system that measures water level of soil and helps to decide to turn on or off the water supply. When implemented practically the product will feature real-time sensing and control, self-controllable, complete elimination of manpower. Various Machine Learning Algorithms are used to find the dryness or moisture content of the soil and predict the water requirements or prediction of the next water cycle in a user-friendly manner. This system offers uniform and essential levels of water for the farm and evades water wastage. Keywords Irrigation · Moisture · IoT · Regression · KNN

R. Chaudhry Netaji Subhas University of Technology, Delhi, India e-mail: [email protected] V. Rishiwal (B) · P. Yadav MJP Rohilkhand University, Bareilly, India e-mail: [email protected] P. Yadav e-mail: [email protected] K. R. Singh Delhi Technological University, Delhi, India e-mail: [email protected] M. Yadav Bareilly College Bareilly, Bareilly, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Rishiwal et al. (eds.), Towards the Integration of IoT, Cloud and Big Data, Studies in Big Data 137, https://doi.org/10.1007/978-981-99-6034-7_4

57

58

R. Chaudhry et al.

1 Introduction Agriculture is the root for Indian economy as reported, and 58% of rural population is dependent on income from agriculture productivity [1]. Traditional agriculture methods are not capable to fulfil the demand of the country as the productivity is affected by various natural and unnatural reasons like rain, weeds, overwatering, inadequate usage of pesticides etc. The agriculture sectors are needed to be implemented with innovative techniques like machine learning, artificial intelligence and IoT [2]. In current era, monsoon cannot be considered as a reliable source for irrigation as sometimes predictions are inaccurate. Heavy rain may also affect the mass food production. Hence, proper mechanism for watering in agriculture farms is the need of the day [3]. Figure 1 represents the technology enablers for smart agriculture. The application of precision agriculture may reduce the burden of the farmers and enhance the production [4]. Irrigation is the important to a beneficial garden and farm. The days of physical watering or trusting on a colleague to aquatic when you are on holiday or absent on corporate are long gone. This data mining can be proposed in an embedded system for automatic irrigation control so that the farmers can water their crops regularly without manual intervention. The proper implementation of a mechanism for such a scenario may reduce the overall cost of the farming [5]. This system is projected to produce an automatic irrigation device that switches the pumping motor ON and OFF for detection of the humidity gratified on the ground. We will use data mining algorithms to find the dryness of the soil and to predict water requirement or prediction for the following water cycle and pump on–off prediction. Irrigation needs vary with crop and that too with terms. There is continuously a variable significant necessity of water for yields throughout various stages of its creation. Automation of the specific irrigation system is essential due to over/less irrigation that would distress the harvest and the nature of the yield. Real irrigation assurances a defensible usage of water and assists in refilling groundwater. The aim of this chapter is easy to understand for any person as automating the irrigation system can effectively use the proposed irrigation system. The benefits and the challenges of automatic irrigation systems are also discussed in the chapter later.

2 Motivation As population is increasing day by day and the resources are limited, the need of huge crop production has been raised. Conventional methods adopted by farmers for agriculture are not sufficient to fulfil this need [6]. Here arises the role of a proper irrigation system. The method, which is not natural and used for adequate watering in agriculture, fields, falls under the irrigation system [7].

Automatic Smart Irrigation Method for Agriculture Data

59

Fig. 1 Technology enablers in smart agriculture

The sectors, which have been mostly benefitted by smart irrigation, are greenhouse plants as human operated management in green house is costly and likely prone to error [8]. A manual pouring of water on the farm has been a long practice in rural areas. ICT may provide an ease for sustainable management of crop watering. The area of WSN has wide scope for processing and analysing the data and information during the amalgamation with information and communication technology [9].

60

R. Chaudhry et al.

The advancement in data mining, machine learning and IOT can provide a solution to this and eliminate the manual intervention of farmers in this recursive responsibility. Request andreduction of supply of food requirements, is significant in creation of nutrition technology. This is a significant issue in humanoid societies to rising and lively request in food production. Because of the absence of water and shortage of land water subsequent in lessening the capacity of water on the ground the agriculturalist can use smart irrigation. Some states are prone to drought and under extreme climate conditions smart irrigation can play a vital role in crop yields. It also helps in key farm management decisions with data driven insight. Thus, smart irrigation can save the water and enable the humans to completely utilize the resources [10]. Figure 2 represents how processes in smart irrigation takes place using machine learning techniques. Lastly, the role of water in human life drags the researcher to move in the direction of smart irrigation. Thus, we can say that the need of smart irrigation arises due to the following reasons: . To save the time and efforts. . To develop a system less affected by human errors. Fig. 2 Smart irrigation using machine learning

Automatic Smart Irrigation Method for Agriculture Data

. . . . .

61

Use of less complex and quick infrastructure. Adequate use of resources. Saving the most precious element on the earth; water. Improving and enhancing the crop yields. Protecting the soil from over or under moisturization.

3 Contribution of the Chapter In, this chapter, we present a more advanced, automated and economically profitable model for smart irrigation in farms. The Random Forest algorithm is used for optimally predicting the parameters. K-NN and Naïve Bayes have been used to validate the effectiveness of the system. A system that is automatic and mobile can help at every stage of irrigation i.e., beginning from the seeds being sowed until the day when the crop is plucked. Based on the discussion, major contributions of the paper are: . Identification of the necessity for ML techniques in irrigation. . A comprehensive and systematic review of literature on machine learning and IoT techniques for smart irrigation by exploring the various parameters of irrigation. . Description of various sensors used in smart irrigation. . Development of an automatic and mobile system for helping the farmers in irrigation and saving the wastage of water. . Future direction for use of machine learning in irrigation.

4 Organization and Roadmap of the Article In this context, the article focuses on the following by considering the contributions of machine learning in smart irrigation: Sect. 2 describes the motivation and need of smart irrigation. Section 3 discusses the contribution of the chapter. Section 4 shows the roadmap of the article. Section 5 flourishes the detailed discussion of various several ML and IoT techniques for smart irrigation. Section 6 provides the information about the data set. Section 7 focuses on methodology and applied machine learning techniques. Section 8 shows the validation of the work i.e., result and analysis section. Section 9 covers the various challenges faced to implement the proposed work. Finally, Sect. 10 concludes the chapter with its future perspective.

62

R. Chaudhry et al.

5 Related Works Ramya et al. [1] proposed a method for the training of ensembled methods through collecting real data. The ensembled methods of machine learning are used to identify optimal decisions among gathered data. The proposed method can efficiently reduce the need of human intervention by providing the methods to interchange traditional agriculture techniques with the new one. This new irrigation technique also provides an aid in saving water as well as its proper management and enhances the productivity. The developed model is a low in cost and equipped with a full-functional system with an accuracy of 90%. Thakur et al. [4] proposed a technique for estimating the requirement of water in the agriculture field. This results in abandoning the wastage of water. The farmers are enabled for getting the information through cloud. This technique is also able to detect if any intrusion occurs. The proposed system occupies the moisture of the soil and intrusion detection as primary parameters resulting in automatic irrigation of the field. Munir et al. [6] proposed a methodology for smart irrigation. The methodology has been processed in three modules. The first module includes the deployment of sensor nodes by considering various parameters like moisture, humidity, moisture and light. The DHT22, BH1750, HL-69 sensors have been used. In the second module the usage of machine learning algorithms has been furnished. KNN is used to train the model. The third module is concerned with the transferring and receiving of data through IoT servers. The model has been implemented in Anaconda. The models exploit KNN for decision-making and can improve latency. Rishesh et al. [8] focuses on developing a smart and efficient irrigation system for plants of greenhouses with the application of machine learning and internet of things. The proposed method can predict the moisture of the soil via installing 4 sensors in the distinguished layers of the soil. Various experiments have been performed on various data sets collected from soil of different fields. The performance of ANN has been compared with SVR resulting in ANN as the outperformer. The authors have also proposed the transfer learning to harvest the processing power of neural network to make it compatible with IoT devices as well as for improving the performance by taking a low amount of data for training. The proposed IoT enabled architecture is a complete solution for efficient irrigation in agriculture. Abuzanouneh et al. [9] proposed an IoT and machine learning based technique named as IoTML-SIS for sensing the parameters in an agriculture land and take actions accordingly for watering in the field. The sensors have been deployed as per the considered parameters like humidity, moisture, light and temperature. The processing of the data takes place at the central cloud server. Based on these processed data decisions are made. To classify the requirements of irrigation in any agriculture land artificial algae algorithm is used amalgaming with the least square-support vector machine. Artificial Algae Algorithm also tunes the parameters of LS-SVM. The proposed method gains the highest accuracy of 97.5%.

Automatic Smart Irrigation Method for Agriculture Data

63

Shivaprasad et al. [10] proposed a recommendation system enabled with Internet of Things and trained with machine learning techniques. The proposed system is capable for efficiently utilizing the water for irrigation. Through the IoT technology, the sensor used to deploy under the layers of soil and they observe the environment for the specified task. The data gathered from these sensors is transferred to the cloud server for further analysis. At this phase machine learning techniques are applied for making the results and decisions more efficient. The main attractive feature of the proposed method is the associated feedback mechanism, which increases the flexibility and durability. The proposed methodology delivers the almost best results in the area of smart irrigation. Sami et al. [11] proposed a system to mitigate the problem with the reliability of sensor nodes due to which sometimes sensor nodes transmit the incorrect data because of some external elements. To overcome with this problem the authors have proposed a use of LSTM-based NN approach for smart irrigation. The real data form various agriculture fields have been taken for validating the system. The humidity, moisture and temperature have been considered as prime parameters. The results are evident of the high accuracy achieved by the proposed system. Ahmed et al. [12] developed a system for smart irrigation using IoT technology for the automation of the water management in agriculture fields. The proposed methodology is organized in three steps: data collection through sensing devices, sending that data to the cloud and producing the results to get the user benefitted. The used side has been provided an app suitable for android applications. Through this app, the users are able to control the activities for moderate usage of irrigation inputs by their voice. The system has been tested and found well performed different metrics i.e., latency and scalability. Togneri et al. [13] present a methodology for predicting the level of moisture in soil. The proposed approach has been tested in the real agriculture land of Brazil. A LightGBM approach has been found best among linear regression, LSTM, random forest and StenGNN. A full data driven approach has been provided for fulfilling the need of water in the irrigation system. Rodic et al. [14] proposed a system for sensing the humidity of the soil. They presented a novel, LoRa-based approach which consumes very low power. The proposed system is also a cost-effective system which is able to sense the humidity of the soil. The use of deep learning technique improves the performance of the system. Viz et al. [15] proposed a system to overcome the issues of soil erosion, unrequired watering and crop related issue. They utilized the features of wireless sensor networks for the same. The machine learning algorithms provide an aid in improving the efficiency of the overall system. Following Table 1 shows Various Machine Learning Techniques for Smart Irrigation. Avanijal agri automation is helping farmers to move into smart irrigation methods. Avanijal has proposed an automatic system that influences IoT and radiocommunication technology to switch irrigation motors and regulates in the arena. This minimumcost method comprises of a regulator that is associated to an app, wireless SNs that are entrenched into the earth and recidivists that establish broad cast amid the controller and the SNs.

Soil moisture

Soil Humidity • AM2302 Temperature DHT11 Light • HL-69 • BH1750 FVI

Soil moisture

Thakur et al. [4] 2019

Munir et al. [6] 2021

Rishesh et al. [8] 2020

Abuzanouneh Temperature et al. [9] 2022 Soil moisture Humidity

• • • • •

Air temperature Soil temperature Humidity Soil moisture UV radiation

Ramya et al. [1] 2019

General field

Greenhouse

Agriculture Farm

Crop type

• l SIM808 • HL-69 • AM2302 DHT11 • BH1750 FVI

–

• CherryPy server 2 different soil samples • LoRa P2P networks • LoRa gateway

PIR sensor (HC-SR501)

DHT11 DS18B20 GUVA-S12SD VH400 ESP32

Device used

Parameter

Author

Table 1 Various machine learning techniques for smart irrigation

IoT Support Vector Regression

Technology

–

General field

–

Artificial Algae Algorithm Least Square-Support Vector Machine

IoT Artificial Neural Network Support Vector Regression

IoT K-NN

National IoT Institute of Technology, Hamirpur, Himachal Pradesh

–

Crop region

Simulation

Experimental

Experimental

Experimental

Experimental

(continued)

An appropriate technique for smart irrigation

A complete solution for smart irrigation

Improved latency

Automatic irrigation of the land and identification of intrusion detection

Monitoring the irrigation process through web interface

Experimental/ Significance simulation-based

64 R. Chaudhry et al.

• • • •

Soil moisture Temperature Air temperature Humidity

Temperature Humidity

Moisture Temperature Humidity Weather Parameters

Shivaprasad et al. [10] 2022

Sami et al. [11] 2022

Ahmed et al. [12] 2020

Pi 3

• LM35 • DHT-22 • Lab designed moister sensor

EC-1258 DS18B20 DHT11 DHT11

Device used

Parameter

Author

Table 1 (continued)

–

Lemon

-

Crop type

Technology

–

Gadhap Sindh Province of Pakistan

Naïve Bayes Decision Tree

IoT

Experimental

(continued)

Identification of dependency between response and characteristics of the system

Prediction of key values related to the SIS system such as temperature, humidity and soil moisture along with their maintenance

Feedback enabled system for smart irrigation

Experimental/ Significance simulation-based

Long-Short-Term-Memory-based Experimental Neural Network

Own dataset IoT and ML The crop dataset from NIT Raipur

Crop region

Automatic Smart Irrigation Method for Agriculture Data 65

Soil Humidity I2C RSSI SNR Soil temperature, A timestamp

Soil moisture Temperature Humidity Gas

Rodic et al. [14] 2022

Viz et al. [15] 2020

Arduino Uno R3 Pi 3 DHT11 MQ2

–

Soil Moisture

Togneri et al. [13] 2022

• • • • •

Device used

Parameter

Author

Table 1 (continued) Crop region

–

–

–

–

Soybeans Brazil Wheat White oats Black oats Yellow maize Brown maize Pinto beans Coffee

Crop type

Support Vector Regression Random Forest Regressor

Recurrent Neural Network Support Vector Machine

LightGBM K-fold

Technology

Simulation

Simulation

Experimental

Developed a mobile and farming friendly system

Shows relativity between soil humidity and RSSI value

full data-driven approaches for irrigation water need estimation

Experimental/ Significance simulation-based

66 R. Chaudhry et al.

Automatic Smart Irrigation Method for Agriculture Data

67

Fig. 3 Smart irrigation technology

Agriculturalists can constitute their irrigation calendar on the app and remotely watch the action afterward, automating procedures that were previous done physically. Avanijal also helps farmers to adopt precision irrigation in which farmers can see on an app that is based on time, volume of water available and soil moisture. Their app helps farmers to irrigate according to different types of soil and their needs, while continuous information is conveyed in the app. Fig. 3 show the technologies enabling smart irrigation.

6 About the Dataset and Features Going through, we have found a dataset on Kaggle, which describes the crop of cotton. The parameters or features it describes are Moisture, Temperature and Pump (should we ON the pump or OFF it). Further, we have worked on a dataset of rice crop, have also passed it through our model, and have checked for its performance. The data requirements are to be accessible on an active website, which will signify the actual time information analytics and

68

R. Chaudhry et al.

the time stamped irrigation design so that in any irregularity counteractive procedures can be taken effortlessly. Pictorial line diagrams incline to deliver temperature and moisture info with the time stamps therefore permitting an easy to recognize, fast and sprightly execution of the system.

7 Methodology and Applied Algorithms We apply machine-learning algorithms to find how much soil is dry and check/ compare with ML Algorithms like KNN, Random Forest etc. Data set will include the type of crop, wheather, it is cotton or mustard. The level of moisture, temperature is also considered. Methodology and flow of the work is shown in Fig. 4. The main goal of this chapter is to generate an intelligent irrigation system that acts on the humidity of mud and helps to take the decision to turn on or off the water supply. The aim of this project is to provide an irrigation system that is automatic for the plants so it helps in saving water. The aim of this project is to reduce human work in irrigation and to save the water and environment.

7.1 Data Processing After collecting the data from a model, we apply machine-learning algorithms to find how much soil is dry and check or compare two ML Algorithms: KNN and Naive Bayes.

7.2 Machine Learning . KNN (K-Nearest Neighbors) Algorithm – The prototypical for KNN represents a completely training dataset (TD). Once a forecast is essential for a hidden info illustration, the KNN will explore over the TD for the k-most comparable illustrations. – The prediction characteristic of the utmost alike illustrations is concise and reverted as the prediction for the hidden illustration. – The similarity portion is reliant on the kind of data. For real-valued info, the Euclidean detachment can be implemented. Other kinds of information such as definite or binary data, pretence distance is used. – In regression difficulties, the average of the forecast characteristic may be reverted. In classification, the utmost dominant class may be reverted.

Automatic Smart Irrigation Method for Agriculture Data

69

Fig. 4 Proposed methodology

. Linear Regression (LR) Algorithm – LR is a lined method for demonstrating the association amid a scalar reply and one or extra descriptive variables (also recognized as reliant and autonomous variables). – In explanatory, the variable is named simple LR and for above one, the procedure is named multiple LR. This term is different from multivariate LR, where numerous connected dependent variables are forecasted than a solitary scalar variable.

70

R. Chaudhry et al.

– In LR, the associations are demonstrated using linear forecaster functions whose unidentified prototypical constraints are predictable using data. These models are known as linear models. – Furthermost frequently, the restricted mean of the reply specified the values of the descriptive variables is presumed to be an affinal function of those info; fewer commonly, restrictive median or certain additional quantile are required. Similar to all methods of regression analysis, LR emphasizes on the restrictive probability dispersal of the reply specified for the values of forecasters than combined probability dispersal of all variables, which is the area of multivariate assessment. Until now, we have predicted the dryness of soil using the K nearest neighbour algorithm and linear regression. Now to count the water quantity requirement and prediction for the next water cycle a linear regression algorithm is implemented and for pump on–off prediction an Artificial Neural Network algorithm is implemented. . Random Forest (RF) – A prevalent ML algorithm fits to the supervised learning methods. It is used for Regression and Classification difficulties. – It is built on collaborative learning concepts that is a procedure of joining numerous classifiers to resolve a typical problem and to advance the performance of the prototypical. – RF is a classifier that comprises a total number of decision trees on numerous subsections of the specified dataset and receipts the average to advance the extrapolative correctness of dataset. – In place of trusting on a single decision tree, it uses multiple trees for decisionmaking and takes decisions by making votes between them. – The maximum number of trees in the forestry primes to advanced correctness and stops the difficulty of overfitting. – The further most significant characteristic of the RF Algorithm is that it can grip the dataset covering incessant variables.

8 Result and Analysis Figures 5 and 6 show the results implemented on time vs moisture graph for training and test data sets respectively. Table 2 shows the ML model score for LR, KNN and RF algorithms. An automatic farm monitoring system is proposed which is a justifiable resolution to numerous prevailing and unforeseen speats such as famishment because of food deficiencies and financial disaster. ML and IoT algorithms such as Random Forest, K Nearest Neighbour give cataloguing and measurable forecasts of soil kind, crop kind and quantity of irrigation essential by the yields. The proportional study of numerous algorithms advises that RF provides an accuracy of 99.1%. The KNN Algorithm is used to predict the dryness of the soil, categorizing the soil into

Automatic Smart Irrigation Method for Agriculture Data

71

dry (d), moist (m), or wet (w). The algorithm was able to predict that accuracy of up to 97.8%. A smart farming method could assist everybody on scalable farming manufacturing to a small-scale agriculturalist to even domestic garden proprietors. The inter broadcasting of numerous devices confirms a smooth flow of all actions at an ease.

Fig. 5 Time versus moisture (training set)

Fig. 6 Time versus moisture (test set)

72

R. Chaudhry et al.

Table 2 ML model scores

Model

Score

Linear regression

0.7300429947845801

K nearest neighbour

0.9782608695652174

Random forest

0.999760004799904

9 Challenges in Proposed Work • Extreme weather conditions can become a potential obstruction towards the model. • The forecast correctness depends upon the appropriate connection of the system. • The algorithms need to be trained on potentially vast data and also the data from different regions of the continent.

10 Conclusion and Future Work This chapter explains a more advanced, automated and economically profitable model for irrigation in farms. The algorithm Random Forest being optimal for the prediction of parameters. This Machine-learning algorithm with help of various sensing devices and involvement of cloud for implementation largely aid the farmers struggling in the fields. A system that is automatic and mobile and can assist at each phase of irrigation i.e., beginning from the seeds being sowed until the day when the crop is plucked. A usage of tools for systematizing the forecasts using Scollaborative learning is recommended as a future work. Acknowledgements This research received support/partial support from the Council of Science & Technology, (CST)—DST, Uttar Pradesh, India, as mentioned in their letter number CST/D-1189, dated 29.08.2022. The authors extend their sincere gratitude to the CST-UP for their valuable support in conducting this research. Additionally, we would like to express our appreciation to the Editor and reviewers for their insightful suggestions and expertise, which significantly contributed to the advancement of this research and the enhancement of the manuscript.

References 1. Ramya, S., Swetha, A., Doraipandian, M.: IoT framework for smart irrigation using machine learning technique. J. Comput. Sci. 16(3), 355–363 (2020). https://doi.org/10.3844/jcssp.2020. 355.363 2. Patel, G.S., Rai, A., Narayan Das, N., Singh, R.P. (eds.): Smart Agriculture: Emerging Pedagogies of Deep Learning, Machine Learning and Internet of Things, 1st edn. CRC Press (2021). https://doi.org/10.1201/b22627

Automatic Smart Irrigation Method for Agriculture Data

73

3. Singh, R., Deshwal, A. and Kumar, K.: Implementation of smart irrigation system using intelligent systems and machine learning approaches. In: Data Science and Innovations for Intelligent Systems, pp. 299–318. CRC Press (2021) 4. Thakur, D., Kumar, Y., Singh, V.: Smart irrigation and intrusions detection in agricultural fields using I.o.T. Procedia Comput. Sci.. 167, 154–162 (2020). https://doi.org/10.1016/j.procs.2020. 03.193 5. Durai, S.K.S., Divya Shamili, M.: Smart farming using machine learning and deep learning techniques. Decis. Anal. J. 3, 100041 (2022), ISSN 2772-6622. https://doi.org/10.1016/j.daj our.2022.100041 6. Safdar Munir, M., Bajwa, I.S., Ashraf, A., Anwar, W., Rashid, R.: Intelligent and smart irrigation system using edge computing and IoT, 6691571 (2021) 7. Janani, M., Jebakumar, R.: A study on smart irrigation using machine learning. Cell Cellular Life Sci. J. 4(2), 1–8 (2019) 8. Risheh, A., Jalili, A. and Nazerfard, E.: Smart Irrigation IoT solution using transfer learning for neural networks. In: 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 342–349. IEEE (2020) 9. Mohammad Abuzanouneh, K.I., Al-Wesabi, F.N., Abdulrahman Albraikan, A., Al Duhayyim, M., Al-Shabi, M., Mustafa Hilal, A., Hamza, M.A., Zamani, A.S., Muthulakshmi, K.: Design of machine learning based smart irrigation system for precision agriculture. Comput. Mater. Contin. Tech. Sci. Press (2021). https://doi.org/10.32604/cmc.2022.022648 10. Shivaprasad, K.M., Madhu Chandra, G., Vidya, J.: Stainable automated CROP irrigation design system based on IOT and machine learning. Int. J. Mech. Eng. 7(3) (2022). ISSN: 0974-5823 11. Sami, M., Khan, S.Q., Khurram, M., Farooq, M.U., Anjum, R., Aziz, S., Qureshi, R., Sadak, F.: A deep learning-based sensor modeling for smart irrigation system. Agronomy 12, 212 (2022). https://doi.org/10.3390/agronomy12010212 12. Abiodun Abioye, E., Hensel, O., Esau, T.J., Elijah, O., Zainal Abidin, M.S., Sylvester Ayobami, A., Yerima, O., Nasirahmadi, A.: Precision irrigation management using machine learning and digital farming solutions. Agric. Eng. 4, 1, 70–103 (2022). https://doi.org/10.3390/agriengin eering4010006 13. Togneri, R., Felipe dos Santos, D., Camponogara, G., Nagano, H., Custódio, G., Prati, R., Fernandes, S., Kamienski, C.: Soil moisture forecast for smart irrigation: the primetime for machine learning. Expert. Syst. Appl. 117653 (2022), ISSN 0957-4174. https://doi.org/10. 1016/j.eswa.2022.117653 14. Dujić Rodić, L., Županović, T., Perković, T., Šolić, P., Rodrigues, J.J.P.C.: Machine learning and soil humidity sensing: signal strength approach. ACM Trans. Internet Technol. 22(2) (2022) Article No.: 39pp 1–21. https://doi.org/10.1145/3418207 15. Vij, A., Vijendra, S., Jain, A., Bajaj, S., Bassi, A., Sharma, A.: Procedia Comput. Sci. 167, 1250–1257, ISSN 1877-0509 (2020). https://doi.org/10.1016/j.procs.2020.03.440

Artificial Intelligence Based Plant Disease Detection Vinay Rishiwal, Rashmi Chaudhry, Mano Yadav, Kaustubh Ranjan Singh, and Preeti Yadav

Abstract Plant disease is the key issue for the farmers, which leads to lesser income and minimal outcome. Pest affected crop also results in small agricultural production of the country. The traditional way of detecting and recognizing plant diseases with the bare eyes by farmers and experts is time consuming, expensive and erroneous. Hence, in this chapter, we use deep convolutional networks algorithms for leaf image classification to provide accurate results. Hence, CNN model is used for distinguishing the healthy and diseased nodes of the crop. The developed model can identify seven types of plant diseases present in the leaf along with the healthy leaves. The dataset used in the study is collected from the controlled environment consisting of 8,685 images of leaves. These images are utilized as an input for training and validating the CNN model demonstrating better performance of it in classifying the plant diseases.

1 Introduction Plant disease detection is a vital problem to tackle as earning of a larger section of the country depends on it. In agriculture scenarios, fruits and vegetables are key elements for farming. With conventional methods, farmers and experts wait for the symptoms to appear which at a later stage of the disease causes more harm. On the other hand, if treatment and water not given on time, loss becomes unrecoverable, resulting in V. Rishiwal · P. Yadav MJP Rohilkhand University, Bareilly, India R. Chaudhry (B) Netaji Subhas University of Technology, Delhi, India e-mail: [email protected] M. Yadav Bareilly College, Bareilly, India K. R. Singh Delhi Technological University, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Rishiwal et al. (eds.), Towards the Integration of IoT, Cloud and Big Data, Studies in Big Data 137, https://doi.org/10.1007/978-981-99-6034-7_5

75

76

V. Rishiwal et al.

farmer’s poverty. In farming, engineered pesticides obtain big substance content as mostly they are dependent on it. The techniques for the implementation of machine learning for detecting plant disease provide wide scope for research [1]. Usage of deep learning mechanisms for the detection of plant disease have provided various solution to deal with anomalies [2]. Several machine learning based techniques have been useful in different stages of farming such as irrigation, seed selection while considering soil quality and water quantity along with whether conditions. In current scenarios, drones have also been used for monitoring the growth and disease detection of the crop. In the same direction, an automated system using machine-learning algorithms like Convolutional neural networks (CNN) is proposed in this chapter which are used for image detection and solve the problem in a quick and much accurate manner. The reason behind selecting CNN among others is that its architecture is designed for image processing [3] through structured arrays. A type of Deep Neural Network (DNN), CNN is capable of classifying images with computer vision techniques and has been evident in classifying text too with natural language processing. By developing a disease detection model with CNN, a low-cost and more accurate solution has been provided for plant disease identification [4]. Neural network’s goal is to identify and resolve problems similar to the human brain but few are strongly complex. In today’s world, projects based on neural networks have millions of neural units and more connections among them. As more brain study advances, neural networks models are also enhancing. The discussed model here is more useful for studying early detection of leaf diseases with more accurate classifications [5].

2 Motivation In agriculture our country, India, has second place in all over the world. The economy of India highly depends on agriculture. Approximately 200 acres of land in India is used for the production of various crops, which is a daily consumption for Indians as meals [6]. Most of the crops do not provide the expected outcomes due to various diseases in plant leaves, fruits and other parts that cannot be ignored as they may cause major productivity loss [7]. It has been raised as a broad spectrum for researchers, as in dealing with the challenges in detecting the plant diseases. Need of proper methodologies is the call of the present era for automation in plant disease detection [8]. Plant diseases are dangerous to world food reliability; even they have catastrophic results for small-scale farmers who financially depend on healthy crops solely. The timely and accurate detection of plant diseases is one of the vital components of agricultural accuracy. It also helps in wastage of funds and several resources, which are used during plant cycle [9]. Continuous monitoring of leaves with such algorithms helps in detecting early signs of the disease and prevent it with proper care needed.

Artificial Intelligence Based Plant Disease Detection

77

Monitoring also takes into account the changing environment along with symptoms and performing complex analysis on that data [10]. Authors of [11], designed an automated system for identifying the diseases of the plant through crop appearance and visual symptoms that are advantageous for uneducated gardeners as well as trained professionals for its diagnostics. Various methods to detect the plant disease have been identified [12]. Applying the machine learning techniques improves the accuracy and reduces the complexity [13]. In machine learning and cognitive science, Neural networks are a mathematical method which are adapted in computer science and several other fields of study, where huge set of neural units i.e., synthetic neurons are set, mimicking the biological brain strategy for solving problems with neurons which are connected by axons [14].

3 Contribution of the Chapter In, this chapter, we present a more advanced, automated and economically profitable model for detecting plant diseases. Through CNN, it is possible to identify the images of healthy and diseased plants leaves. Based on the discussion, major contributions of the chapter are: • Identification of necessity of ML techniques in plant disease detection. • A comprehensive literature review for machine learning based approaches on plant disease detection. • Description of various methods and ML algorithms for detecting disease in plants. • Development of an approach using ML to detect diseases in plants. • Future directions involving increasing machine learning efficiency in plant disease detection.

4 Organization of the Chapter In this context, the chapter focuses on the following contributions of machine learning in the detection of plant diseases: Sect. 2 discusses the motivation to implement machine learning in plant disease detection. Section 3 discusses the proposed contributions of the chapter. Section 4 shows the roadmap of the article. Section 5 flourishes the detailed discussion of various several ML techniques, their methods and comparisons for plant disease detection. Section 6 focuses on challenges in specified area. Section 7 focuses on methodology and applied machine learning techniques along with the flow of the model and image processing. Section 8 gives a brief about performance metrics to be considered. Section 9 is a section to show the validation of the work i.e., result and analysis section. Section 9 covers the various challenges faced to implement the proposed work. Finally, Sect. 10 concludes the chapter with its future scope. Figure 1 shows the roadmap of the article.

78

Fig. 1 Roadmap of the article

V. Rishiwal et al.

Artificial Intelligence Based Plant Disease Detection

79

5 Literature Survey Bashish et al. [15] proposed a framework to identify the diseases in leaves and plant stems. Manual observation done by experts are not always possible as it may be high in cost, especially in developing countries. The image processing techniques may be highly reliable, cheap, faster and cheaper than the traditional one. It has also been observed that thay are highly accurate as well. The authors used k-means technique to classify the images. The model has been trained through CNN with achieved accuracy of 93.00%. Ghavale et al. [16] presented a technique that has presented an approach to protect the plants from various disease due to pesticides, microorganisms and bacterial diseases. The authors give a solution for this problem by analyzing the plant images and some classification algorithms. K-mean approach has been used for segmenting the data. The proposed framework involves four steps namely, image-preprocessing, segmentation using k-means, feature extraction through statistical GLCM and finally a classification through SVM. The proposed approach ensures the true suggestion for usage of pesticides in plants to prevent their diseases. Jadhav et al. [17] discuss a method to measure the severity level in the disease of plants so that they can get a timely cure. This may save the plants and crop from being destroyed. The approach also provides automation for quantifying the infected part. This automated technique is implemented through the K-means algorithm. Soya bean leaf has been sued as a sample for the data set. The proposed approach has been tested with a traditional manual approach and found to be outperforming in terms of reliability and preciseness. Giakwad et al. [18] proposed an automatic disease detection in the pomegranate leaf. The diagnosis was done at the early stage so that the plant may be prevented form damage. The proposed method is able to diagnose various possible diseases among the leaves of the pomegranate pant like rotten fruit, bacterial blight, various spots like Cercospora and Altenrnaria. Classification techniques have been used to segment the images. Waghmare et al. [19] presented a technique for detecting the disease in a grape leaf. The process flow takes place by segmenting the leaf and analyzing it with the help of some high pass filters to identify the infected area. The extracted feature is fractal-based texture. For classifying these features SVM has used. The broadly focused disease among the leaf of grape plant are black dor and downy mildew. The proposed approach achieves an accuracy of 96.6%. Qin et al. [20] proposed a diagnostic approach for the disease detection in alfalfa leaf through pattern-recognition and image processing algorithms. K-median, Kmeans and fuzzy-C-means have been used for the segmentation of the data. For the classification of the images, the approaches like regression tree, linear discriminant analysis, logistic regression and naïve-bayes have been applied. 129 textures have been extracted. Finally, a model has been produced with three machine learning approaches i.e., Random Forest, Support Vector Machine and K-Nearest Neighbour. Results obtained are evident that the proposed approach achieves very high accuracy.

80

V. Rishiwal et al.

Liu et al. [21] discussed a method to detect the diseases in apple leaves which is based on image processing and DCNN. A large data set has been generated by taking 13,689 images of apple leaves form different apple trees. The model was trained using DCNN machine learning technique. It achieves a very high accuracy i.e., 97.61% and fast convergence rate. It also flourishes the scope of deep learning in the area of image processing. Ramesh et al. [22] used random forest technique to classify the diseased leaves from the healthy one. HOG has been used for feature extraction. The proposed approach achieves the accuracy of 70.14%. Pantazi et al. [23] proposed a methodology to detect the disease among plants using one class classifier. The vine leaves are used sample to train the model. The approach has been tested on 46 leaves of different plants. The proposed approach achieves the accuracy of 95.6%. Yogeshwari et al. [24] presented an approach to detect the disease in plants using filtering and enhancement techniques along with machine learning. A 2D AADF filter has been using for mitigating the noise. For enhancement, the AMA technique has been used. Fuzzy-C-Means have been applied for segmenting the input images. Another machine learning technique has been applied for reducing the dimensionality i.e., PCA. Deep convolutional approach has been used for classifying the data. This approach provides a very high accuracy of 97.43%. Geetha et al. [25] proposed an approach for diagnosing the disease in tomato plant. The proposed approach follows four steps: preprocessing, clustering, extracting the features of the sample and finally classifying it. Preprocessing removes the noise form the data. For detecting the infected region on the leaf, clustering is required. KNN is applied to resolve the issue of regression and classification. The method can detect the disease in tomato plant on the basis of color and texture. Bhise et al. [26] proposed an approach for smart farming and disease detection in crops using Internet of Things and machine learning techniques. The proposed system uses various sensors to measure the different parameters like temperature, humidity, moisture etc. This may enhance the crop output. Plant Village dataset of 54,306 images of 13 types of plants have been taken to create the dataset. Xian et al. [27] proposed a classification-based approach for diagnosing the diseases in tomato plant leaves. The ELM, SVM and decision tree techniques have been implemented with image processing. The proposed approach achieves the accuracy of 84.94%. Alatwani et al. [28] presented a VGG-16 model based on CNN technique of machine learning. This approach allows farmers to take timely decision to prevent plant from being damaged after being infected by some disease. Collection of 15,915 leaves were done form various plants of a village. The proposed approach achieves 95.2% accuracy. Badiger et al. [29] applied k-means and SVM with using image processing to detect plant illnesses. The texture of the leaves has been analyzed with PNN Hyperspectral Imaging and RGB Imaging techniques. For this, several types of datasets have been used to classify different types of diseases such as Leaf Spot, Anthracnose,

Artificial Intelligence Based Plant Disease Detection

81

Alternaria, Cercospora, bacterial blights and leaves of healthy plants. The proposed approach achieves the accuracy of 96%. Harakannanavar et al. [30] proposed a k-means clustering based approach to diagnose the diseases in tomato plants. To improve the quality of the data samples, histogram equalization has been applied. CNN, KNN and SVM have been used to classify the data and achieve the accuracy of 88%, 97% and 99.6% respectively. Ramesh et al. [31] utilize the features of random forest technique to classify healthy leaves form the infected one. To extract the features, an oriented gradient histogram was employed. Naive Bayes’s Gaussian, SVM and Logistic Regression where 160 papaya leaf samples are used in a linear discriminant analysis. Singh et al. [32] proposed a CNN based model for plant disease detection. A sample of apple, potato, corn, rice and tomato leaves have been collected. Then CNN has been applied to extract the feature form these samples. Bayesian optimized support vector machine technique is implemented to classify the data. HoG is used to preprocess the data. Binary PSO is used for selection of hybrid features. The accuracy of the proposed approach is 96.1%. Table 1 shows a comparison among various Machine Learning Techniques for Plant Disease Detection. Table 1 elaborates the various techniques for various kind of crops. It also elaborates the various methods of machine learning which can be used for improving the diagnosis accuracy of the plant disease.

6 Issues and Challenges The farmers experience the following issues during a lifecycle of a crop: • • • • • • •

Vague detection of various infections for heterogeneous diseases. Sensitivity for ROI. Complexity in real life applications. Overlapping in stems. Identification of accurate segments for area of interest. Hybrid symptom detection. Homogeneous symptom visibility

7 Methodology Convolutional Neural Network is an extensive learning mechanism, which takes input images, provides alignment to the elements of the image, performs functions, and distinguishes them. In Comparison to other partitioning methods, the processing power previously required by ConvNet is very small. However, the basic mechanisms require manual manipulation of filters and ConvNets learn these filters/symbols through adequate training. Figure 2 shows the working of the CNN model.

Classification

Clustering Classification

2015

2016

2016

2016

Jadhav et al. [17]

Giakwad et al. [18]

Waghmare et al. [19]

Qin et al. [20]

Classification

Clustering

Classification

2014

Ghavale et al. [16]

Method

Clustering Classification

Year

Bashish et al. [15] 2010

Author

K-mean Fuzzy-C-mean K-median SVM K-NN

SVM

SVM

K-means

SVM

K-means neural network

ML technique

Self-generated set citrus leaf Soyabean leaf

Pomegranate leaf Grape leaf Alfalfa leaf datset

SVMRBF SVMPOLY −

− − −

80%

96.6%

−

(continued)

Bacterial leaf blight (DS = 23.10% & Grade = 5), Septeroial brown leaf spot (DS = 26.20% & Grade = 7) Bean leaf pod mottle (DS = 44.16% & Grade =7

SVMRBF 96% SVMPOLY 95%

A set of leaf images taken from Al-Ghor Accuracy 93% area in Jordan

−

Performance result

Dataset

Comparison algorithms

Table 1 Machine learning techniques for plant disease detection

82 V. Rishiwal et al.

Classification

2022

2021

2022

Bhise et al. [26]

Xian et al. [27]

Alatwani et al. [28]

Badiger et al. [29] 2022

Classification Regression

Geetha et al. [25] 2020

Clustering

Classification

Classification

Classification

Filtering and enhancement technique

Pantazi et al. [23] 2019

Classification

2018

Ramesh et al. [22]

Yogeshwari et al. 2020 [24]

Classification

2017

Liu et al. [21]

Method

Year

Author

Table 1 (continued)

K-means SVM

CNN

ELM

CNN

k-NN

Deep convolutional neural network

SVM

Random forest

Deep CNN

ML technique

High accuracy 95.2% with a loss of 0.4418

Accuracy 84.94%

(continued)

Different types of diseases such as leaf Accuracy 96% spot, anthracnose, alternaria, cercospora, bacterial blights and leaves of healthy plants

15,915 plant leaf images from the plant village dataset

−

Hyperspectral imaging, texture analysis, RGB imaging and PNN

Dataset of tomato leaves from the Kaggle

SVM decision tree

Plant village dataset of 54,306 images of − 13 types of plants

−

−

Leaf images of various tomato plants

Accuracy 95.6% High accuracy 97.43%

−

Vine leaf Leaf image data

−

Accuracy 70.14%

Accuracy 97.62%

Performance result

K-NN Naïve bayes SVM BPNN

Real leaf images

Dataset of 13,689 images of diseased apple leaves

− Logistic regression SVM k-NN Naïve bayes

Dataset

Comparison algorithms

Artificial Intelligence Based Plant Disease Detection 83

Classification

2022

2022

2022

Harakannanavar et al. [30]

Ramesh et al. [31]

Singh et al. [32]

Feature Extraction

Classification

Method

Year

Author

Table 1 (continued)

CNN binary PSO

Random forest

SVM CNN k-NN

ML technique

Bayesian optimized support vector machine hybrid feature-based random forest classifier

Linear discriminant analysis Gaussian Naïve Bayes SVM Logistic regression

SVM CNN k-NN

Comparison algorithms

Corn, apple, tomato, rice plants, and potato leaves

160 images of papaya leaves

Tomato leaves

Dataset

Accuracy 96.1%

Accuracy 70.14%

SVM 88% CNN 97% k-NN 99.6%

Performance result

84 V. Rishiwal et al.

Artificial Intelligence Based Plant Disease Detection

85

Fig. 2 Working of CNN model

Step 1: (a) Convolution Operation The first phase of our attacking approach is the convolution operation. An examination of feature detectors has been given here which are essential components of the neural networks’ operations of filtering. Moreover, a discussion over the working of feature maps is also presented along with parameter tuning for most precisely recognizing the patterns and finding results. (b) ReLU Layer The Rectified Linear Unit (ReLU) layer, is discussed here which comes in the second part of the architecture. It is responsible for preventing the exponential growth of the computation requirements of the model because if size of CNN increases, computation cost will also linearly increase as more Relu layers will be added. Figure 3 depicts the ReLU Layer Function. Step 2: Pooling: In this subsection, pooling layer’s functionalities are discussed. It is used for reducing the feature map’s dimensions. Indirectly it also helps in reducing number of parameters to be learnt and computation amount. It helps in summarizing the features available in the feature map’s region by incorporating the convolution layer. Figure 4 shows the functionality of max pooling.

86

V. Rishiwal et al.

Fig. 3 ReLU layer function

Fig. 4 Max pooling

Step 3: Flattening In this step of flattening, the task of conversion of al 2D array of images from pooled feature map into a single linear vector is done. The resultant flatted vector is given as an input to the connected layer in order to classify images shown in Fig. 5. Step 4: Full Connection In this subsection, we put all together in the whole category. The layer which is fully connected is typically a type of feed forward neural network. Last few layers of the

Artificial Intelligence Based Plant Disease Detection

87

Fig. 5 Flattening

model are made fully connected and its served input is the resultant output of the previous convolution and pooling layers. The softmax step is considered as a generalized form of the logistic function in which the input is taken in the form of score vector (x ∈ Rn ) and gives an output probability vector (p ∈ Rn ) at the architecture’s conclusion by a softmax function, which is explained as: ⎛

⎞ p1 ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ P = ⎜ . ⎟wherePi= exi x n e j ⎜ ⎟ j=1 ⎝ . ⎠ pn

(1)

Convolution layer’s Parameter compatibility: The calculation of the feature map dimension of the output size O is calculated by taking I as the input volume length, P as the zero padding, F as the filter length, and S the stride. O=

I − F + Pstart + Pend +1 S

(2)

Receptive Field: For a layer k, the receptive field is the area denoted by Rk × Rk of each pixel input for the k-th activation map. By calling Fj the filter size of the layer j and the stride value of the layer I and with the convention = S0 1, the receptive field at layer k can be computed with formula: Rk=1+k

j=1 (Fj −1)

j−1 i=0

Si

(3)

88

V. Rishiwal et al.

7.1 Advantages of Using Convolution Neural Network (CNN) • Without explicit pre-processing, CNN can learn filters automatically, thus helping in processing and extracting useful information out of input data. • CNN captures space features in the image, which implies the pixel arrangement and their relationships in the image. Through it, we can see the object, its location with reference to the other objects in the image. • CNN also uses parameter sharing, in which a filter is used for different input components to produce a map element. Figure 6 provides information about the summary obtained for the CNN model on the given dataset.

7.2 Flow of the Models • After loading the dataset, we converted each image in the dataset to an array using the function img_to_array from tensorflow. • By using the Scikit-Learn’s LabelBinarizer, each image level was converted to binary levels. Then the instance was saved using the pickle module in python. • Then we scale the sets of data from [0, 255] to [0, 1] as another step in the preprocessing of the input data. • After that we split the dataset into two parts: training (80%) and test set (20%) dataset. • Then we augment the image by creating a generator object which performs rotations, shifts, flips, crops and sheers on our image dataset. • We began by constructing CONV => RELU => POOL. 32 filters, a 3 × 3 kernel, and RELU activation make up our CONV layer (Rectified Linear Unit). We use 25% (0.25%) dropout, maximum pooling, and batch normalization. • Then we created 2 sets of (CONV => RELU)*2 => POOL blocks. Then only one set of FC => RELU Layers. • Then we used Keras Adam Optimizer for my model, and supplied the number of EPOCHS we wish to train for. We have used the value as 25 for this project. • Next, for training and validation accuracy as well as training and validation loss, graphs have been created. Figure 7 shows the basic flow of the model.

7.3 Image Preprocessing An image is crucial in the creation of a successful image separator. Although data sets may comprise training samples which can vary from few hundred to few thousand, the variety cannot be enough for creation of a reliable model. Some of the numerous image improvement methods involve measuring it, angular rotation, and scrolling it horizontally or vertically. The database’s pertinent data is expanded, thanks to these

Artificial Intelligence Based Plant Disease Detection

89

Fig. 6 Summary obtained for CNN model on the given dataset

add-ons. Each image in the Plant Village database is 256 by 256 pixels in size. Kerasdeep learning is used for data processing and picture improvement. The following additional training choices are available: • Rotation: Randomly rotate a training image over a range of angles. • Brightness: By giving the model different images having brightness variation during the training phase, which helps the model adjust to changes in illumination. • Shear: Refers to the variation in the shearing angle.

90

V. Rishiwal et al.

Fig. 7 Basic flow of the model

8 Performance Metrics It is a typical performance statistic for algorithms that divide numbers. It may be described as the total number of predictions made reduced by the number of predictions that were valid. To improve accuracy, we adjust the number of epochs in this case. The findings in this part are based on training using a comprehensive website that includes both unique and extra photos. The outcomes obtained when convolutional networks are taught with just actual photos will not be examined because it is known that they may learn features when trained on big data sets. A total accuracy of 97.00% was attained after a successful network parameter change. Accuracy =

TP + TN TP + FP + FN + TN

(4)

Our true predictions (True Positives and True Negatives, shown in red in the fig. above) are in the numerator, while all predictions generated by the algorithm are in the denominator (Right as well as wrong ones). Figure 8 consist of the epoch counts during the process.

9 Result Analysis Accuracy after every epoch is shown below in Fig. 9. Final Model accuracy is show below. After execution of the algorithm, an overall accuracy of 97.00% was achieved as shown in Figs. 10 and 11 shows the improved results for the number of epochs. Example on test cases: (1) For the below image in Fig. 12 our model predicted with a probability of 0.9976 that the given image is having a disease potato_early_blight.

Artificial Intelligence Based Plant Disease Detection

Fig. 8 Epoch count

Fig. 9 Accuracy after every epoch

91

92

V. Rishiwal et al.

Fig. 10 No of EPOCHS versus ACCURACY

Fig. 11 No of EPOCHS versus LOSS

(2) For the above image our model is predicted with a probability of 0.97 that the given image is potato_healthy as shown in Fig. 13.

10 Conclusion and Future Scope Protecting plants during the farming season is a very challenging task, which largely depends upon deep knowhow and best practices for the crop being sown and the timely identification of challenging factors such as possible pests, pathogens, weeds,

Artificial Intelligence Based Plant Disease Detection

93

Fig. 12 Potato leaf with disease

temperature and humidity. In this chapter, an effort through CNN based methodology has been made to detect plant diseases through images of healthy or diseased plant leaves. After the results that are given in the above and post fine-tuning the network parameters, the model gave an accuracy of 97%. The system in future can be enhanced to a real-time video entry system that gives results quick and accurate, increases the chances of detecting plant diseases, and then cure them at the right time. An intelligent system that addresses recognized diseases is another element that may be added to this. According to studies, treating plant diseases can boost yields by roughly 55%. In future, we will be adding more of such images to our database and will try to identify more plant diseases with good accuracy.

94

V. Rishiwal et al.

Fig. 13 Healthy potato leaf

References 1. Lee, S.H., Chan, C.S., Mayo, S.J., Remagnino, P.: How deep learning extracts and learns leaf features for plant classification. Pattern Recognit. 71, 1–13 (2017). ISSN 0031–3203. https:// doi.org/10.1016/j.patcog.2017.05.015 2. Fuentes, A., Yoon, S., Park, D.: Deep Learning-Based Techniques for Plant Diseases Recognition in Real-Field Scenarios (2020). https://doi.org/10.1007/978-3-030-40605-9_1 3. Hassan, S.M., Maji, A.K., Jasiński, M., Leonowicz, Z., Jasińska, E.: Identification of plant-leaf diseases using CNN and transfer-learning approach. Electron. 10(12), 1388 (2021). https://doi. org/10.3390/electronics10121388 4. Srivastava, P., Mishra, K., Awasthi, V., Sahu, V., Kumar, P.: Plant disease detection using convolutional neural network. Int. J. Adv. Res. 09, 691–698 (2021). https://doi.org/10.21474/ IJAR01/12346 5. Muthukannan, K., Latha, P., Selvi, R., Nisha, P.: Classification of diseased plant leaves using neural network algorithms. ARPN J. Eng. Appl. Sciences. 10, 1913–1919 (2015) 6. Pujari, J., Yakkundimath, R., Byadgi, A.: Automatic fungal disease detection based on wavelet feature extraction and PCA analysis in commercial crops. Int. J. Image, Graph. Signal Processing. 1, 24–31 (2013). https://doi.org/10.5815/ijigsp.2014.01.04 7. Pinstrup-Andersen.: The future world food situation and the role of plant diseases (2001). https://doi.org/10.1094/PHI-I-2001-0425-01

Artificial Intelligence Based Plant Disease Detection

95

8. Anami, B.S., Pujari, J.D., Yakkundimath, R.: Identification and classification of normal and affected agriculture/horticulture produce based on combined color and texture feature extraction. Int. J. Comput. Appl. Eng. Sci. 1 (2011) 9. Strange R.N., Scott, P.R.: Plant disease: a threat to global food security. 43, 83–116 (2005). https://doi.org/10.1146/annurev.phyto.43.113004.133839 10. Chen, C.H., Pau, L. F., Wang, P. S. P.: The Handbook of Pattern Recognition and Computer Vision, 2nd edn, pp. 207–248. World Scientific Publishing Corporation (1998) 11. Chandy, K.T.: Important Fungal Diseases: Plant Disease Control. Booklet No. 342, PDCS.4 12. Ying, G., Miao, L., Yuan, Y., Zelin, H.: A study on the method of image preprocessing for recognition of crop diseases. Int. Conf. Adv. Comput. Control. (2008) 13. Applalanaidu, M.V., Kumaravelan, G.: A review of machine learning approaches in plant leaf disease detection and classification. In: 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), pp. 716−724 (2021). https://doi.org/10.1109/ICICV50876.2021.9388488 14. Huang, K.Y.: Application of artificial neural network for detecting Phalaenopsis seedling diseases using color and texture features. Comput. Electron. Agric. 57, 3–11 (2007) 15. Al Bashish, D., Braik, M., Bani-Ahmad, S.: A framework for detection and classification of plant leaf and stem diseases. In: International Conference on Signal and Image Processing (2010) 16. Gavhale, K.R., Gawande, U., Hajari, K. O.: Unhealthy region of citrus leaf detection using image processing techniques. In: International Conference on Convergence of Technology I2CT, pp. 1–6 (2014) 17. Jadhav, S.B., Patil, S. B.: Grading of soybean leaf disease based on segmented image using K-means clustering. Int. J. Adv. Res. Electron. Commun. Eng. (IJARECE). 4(6) (2015) 18. Gaikwad, S., Karande, K.J.: Image processing approach for grading and identification of diseases on pomegranate fruit. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 7(2), 519–522 (2016) 19. Waghmare, H., Kokare, R., Dandawate, Y.: Detection and classification of diseases of grape plant using opposite colour local binary pattern feature and machine learning for automated decision support system. In: 3rd International Conference on Signal Processing and Integrated Networks (SPIN) (2016) 20. Qin, F., Liu, D.X., Sun, B.D., Ruan, L. Ma, Z., Wang.: Identification of alfalfa leaf diseases using image recognition technology. PLoS ONE 11 (2016) 21. Liu, B., Zhang, Y., He, D.J., Li, Y.: Identification of apple leaf diseases based on deep convolutional neural networks. Symmetry 10(11) (2017) 22. Maniyath, S.R., Vinod, V., Niveditha, M., Pooja, R., Prasad, N., Shashank N., Hebbar, R.: Plant Disease Detection Using Machine Learning, pp. 41–45 (2018). https://doi.org/10.1109/ ICDI3C.2018.00017 23. Pantazi, X.E., Moshou, D., Tamouridou, A.A.: Automated leaf disease detection in different crop species through image features analysis and one class classifiers. Comput. Electron. Agric. 156, 96–104 (2019) 24. Yogeshwari, M., Thailambal, G.: Automatic feature extraction and detection of plant leaf disease using GLCM features and convolutional neural networks. Mater. Today: Proc. (2021) 25. Geetha, G., Samundeswari, S., Saranya, G., Meenakshi, K., Nithya, M.: Plant leaf disease classification and detection system using machine learning. J. Phys.: Conf. Ser. 1712, 012012 (2020) 26. Bhise, N., Kathet, S., Jaiswar, S., Adgaonkar, A.: Smart farming and plant disease detection using IoT and ML. Int. Res. J. Eng. Technol. (IRJET). 07(07), e-ISSN: 2395–0056 (2020) 27. Xian, T.S., Ngadiran, R.: Plant diseases classification using machine learning. J. Phys.: Conf. Ser. 1962, 012024 (2021) 28. Alatawi, A.A., Alomani, S.M., Alhawiti, N.I., Ayaz, M.: Plant disease detection using AI based VGG-16 model. Int. J. Adv. Comput. Sci. Appl. 13(4) 29. Badiger, M., Kumara, V., Shetty, S.C.N., Poojary, S.: Leaf and skin disease detection using image processing. Glob. Transit. Proc. 3(1), 272–278 (2022). ISSN 2666–285X. https://doi. org/10.1016/j.gltp.2022.03.010

96

V. Rishiwal et al.

30. Harakannanavar, S.S., Rudagi, J.M., Puranikmath, V.I., Siddiqua, A., Pramodhini, R.: Plant leaf disease detection using computer vision and machine learning algorithms. Glob. Transit. Proc. 3(1), 305–310 (2022). ISSN 2666–285X. https://doi.org/10.1016/j.gltp.2022.03.016 31. Ramesh, S., Hebbar, R., Niveditha, M., Pooja, R., Bhat, N.P., Shashank, N., Vinod, P.V.: Plant disease detection using machine learning. In: 2018 International Conference on Design Innovations for 3Cs Compute Communicate Control, vol. 8. IEEE (2018). https://doi.org/10.1109/ ICDI3C.2018.00017 32. Singh, A.K., Sreenivasu, S.V.N., Mahalaxmi, U.S.B. K., Sharma, H., Patil, D.D., Asenso, E.: Hybrid feature-based disease detection in plant leaf using convolutional neural network, bayesian optimized SVM, and random forest classifier. Article ID 2845320 (2022). https://doi. org/10.1155/2022/2845320

IoT Equipped Intelligent Distributed Framework for Smart Healthcare Systems Sita Rani, Meetali Chauhan, Aman Kataria, and Alex Khang

Abstract Now days, the fundamental aim of the healthcare sector is to incorporate different technologies to observe and keep a track of the various clinical parameters of the patients in day-to-day life. Distant patient observation applications are becoming popular as economical healthcare services are facilitated by these apps. The process of data management gathered through these applications also requires due attention. Although cloud-facilitated healthcare applications cater a variety of solutions to store patients’ record and deliver the required data as per need of all the stakeholders but are affected by security issues, more response time and affecting the continues availability of the system. To overcome these challenges, an intelligent IoT-based distributed framework to deploy remote healthcare services is proposed in this chapter. In the proposed model, various entities of the system are interconnected using IoTs and Distributed Database Management Systems (DDBMS) is used to cater secure and fast data availability to the patients and health care workers. The concept of Blockchain is used to ensure the security of the patients’ medical records. The proposed model will comprise of intelligent analysis of the clinical records fetched from DDBMS secured with Blockchain. The proposed model is tested with true clinical data and results are discussed in detail. At last, the future scope of the work is presented in the Conclusions section. Keywords Blockchain · Distributed database management system (DDBMS) · IoT · Response time · Security · Smart healthcare

S. Rani (B) · M. Chauhan Department of Computer Science and Engineering, Guru Nanak Dev Engineering College, Ludhiana 141006, Punjab, India e-mail: [email protected] A. Kataria Amity Institute of Defence Technology, Amity University, Noida, India A. Khang GRITEx and VUST, Ho Chi Minh City, Vietnam © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Rishiwal et al. (eds.), Towards the Integration of IoT, Cloud and Big Data, Studies in Big Data 137, https://doi.org/10.1007/978-981-99-6034-7_6

97

98

S. Rani et al.

1 Introduction Health is the most essential aspect of living. In current times, the modern society is suffering from a number of problems like multiple organs failure and various chronical diseases due to stress and anxiety. So, the society requires appropriate facilities, resources, and services from the hospitals such as medications, doctors and nurses on time [1]. With sharp rise in chronic diseases and pandemic era of Covid-19, a sudden hike can be seen in the usage of smart healthcare systems. In order to provide efficient healthcare services to the patients, a major role is played by the smart healthcare system. This has reduced mandatory physical presence of patients at hospitals [2–4]. E-healthcare systems have provided high quality care to the patients by providing online medical services at homes. Earlier, there was a communication gap between patients and doctors due to unavailability of doctors in case of emergency. But now, advanced communication systems and Internet of Things (IoT) technology has made this possible by providing effective communication paradigm . IoT is an appropriate solution to administer such problems occurring in healthcare systems. This paradigm is used to collect patients’ data which is analyzed by doctors to provide medication and medical treatment remotely. This automation in smart healthcare monitoring system has reduced the risk in patients’ life by providing on time medical help to them in case of emergency. There are various monitoring systems such as sensors to collect data, IoT gateway to distribute data, and cloud-based storage for storing patients’ data to get examined by doctors. IoT acts as a chain, responsible in collecting all the information communicated from smart devices via internet. Patients can receive their health records using mobile healthcare apps [5].

1.1 Internet of Things (IoT) IoT is a concept which reflects connectivity of devices, services, and system in terms of machine-to-machine and man-to-machine. This helps to achieve automation which can be seen in a wide range of areas, such as smart cities, healthcare, traffic management, logistics, and waste management. IoT has provided an incredible outlook to the modern healthcare system by facilitating human life with healthcare apps, fitness programs, remote health monitoring, emergency help, etc. In collaboration with medical devices, such as sensors and imaging devices, the service providers can provide best guidance to the patients. Thus, IoT based healthcare services are expected to reduce medical costs for the patients. Recently evolved IoT-based wireless technologies have helped to prevent and diagnose chronical diseases and provide real-time monitoring. Database records and servers play a vital role to maintain medical records, and provide facilities to the patients in need of hour. Table 1 describes the IoT integrated advanced technologies which are useful in domain of healthcare [6].

IoT Equipped Intelligent Distributed Framework for Smart Healthcare …

99

Table 1 IoT-integrated technologies and their benefits in the domain of healthcare Technology

Description

Big data

Data is stored which provides a quick review easily in healthcare systems when required

Cloud computing

Useful to store on demand data and specifies the content via internet

Helps to maintain clinical records, bills, medical history of patients This helps to visualize the data resources which helps the doctors to work effectively Smart sensors

This device provides accurate results and monitors all medical parameters This device controls various aspects such as blood pressure, oxygen levels, temperature and sugar levels

Software

It is used to get associated with patients’ data, medical tests and reports

Artificial intelligence

This concept evaluates, predict, analyze and helps in decision making

It reduces the communication gap between doctors and patients This concept with help of algorithms predict and controls diseases Actuators

This device helps to maintain accuracy in calculated parameters This device helps to control system and makes it act according to the requirements

Virtual Reality

It provides digital information and improves patients safety It provides real-time information with integration of humans with electronic systems

A generalized IoT-based automatic framework developed for healthcare systems is shown in Fig. 1. It describes the results predicted using IoT integrated technologies [7–9]. The major role played by IoT in the domain of healthcare is in the areas of silent symptoms of patients. Early diagnosis might prevent severe illness and saves patients from untimely death. So, early diagnosis might save patients life. In current times, the IoT based health applications focus on medical treatments of the diseases and monitoring the health of the patients via analyzing various parameters using smart devices. The healthcare system is slowly switching to remote healthcare by providing E-health services at homes. Figure 2 shows the IoT-based healthcare application framework to facilitate the residents. There are many applications existing for patient monitoring. In addition, various networks such as Wireless Body Area Network (WBAN), Wireless Local Area Network (WLAN), Radio-frequency Identification (RFID) and Wireless Personal Area Network (WPAN) assist in automatic identification and data capturing [10].

100

S. Rani et al.

Fig. 1 IoT-based automatic design framework

1.2 Smart Healthcare Using traditional healthcare facilities, it was difficult to diagnose and treat patients in case of emergency, which was used to cause mental trauma, cardiovascular disorders, anxiety and depression to the patients and their family members. With launch of smart healthcare applications and services a wide variety of online facilities got available to people at homes. Figure 3 highlights some of the smart healthcare services which have provided ease to human beings with secure services, such as availability of online appointments with doctors, storing medical records of patients, and consultancy to the patients in case of emergency. The whole healthcare system is connected via wireless technology and database is maintained using cloud computing. Table 2 shows various smart applications launched with the purpose to make them available all time to the people. These applications facilitate the people with various health services. They can analyze various parameters themselves on a daily basis and opt to workout and healthy eating habits accordingly. Various remedies can be followed without consulting doctors just by adapting healthy lifestyle [6, 11].

IoT Equipped Intelligent Distributed Framework for Smart Healthcare …

101

Fig. 2 IoT-based healthcare application framework

1.3 DDBMS IoT based distributed technology plays a vital role in smart healthcare system. Distributed healthcare system has various interconnected medical resources with sensors and actuators such as ECG, BP machine, EMG, and Glucometer. With sharp rise in usage of IoT, there has been a major contribution of the distributed database management system (DDBMS) to provide efficient healthcare services to patients. The distributed healthcare system acknowledges various parameters such as blood pressure (BP), sugar levels of diabetic patients etc. via health monitoring devices. Efficient remedies and medical prescription can be suggested by doctors in case of emergency to the patients. The purpose of DDBMS is to extend the healthcare services from various domains of hospitals to patients at homes. The DDBMS in

102

S. Rani et al.

Fig. 3 Smart healthcare services

Table 2 Healthcare applications and services Application

Services

Health assistant

Body temperature, fat, weight, BP, glucose level check

Calorie counter

Calories count from the food eaten

Pedometer

Steps taken and calories burnt

Period tracker

Record of menstrual cycle in women

Google fit

Running, cycling and walking activities

Water your body

Water drinking habits and alerts

Heart rate Monitor

Heart beat

Smart watch

Number of steps taken, BP, heart rate, calories burnt

On track Diabetes

Blood glucose level

Finger print Thermometer

Body temperature

collaboration with IoT-based healthcare are dedicated to provide various services using a variety of healthcare applications with other important resources which include laptop, mobile phones, sensors, actuators, medical equipment, e-patients, patients’ records stored using cloud computing, medical staff of doctors and nurses,

IoT Equipped Intelligent Distributed Framework for Smart Healthcare …

103

Fig. 4 Distributed healthcare system

and database of patients’ records. Figure 4 shows a collaborative healthcare system comprising IoT and DDBMS [12].

1.4 Artificial Intelligence (AI) Artificial Intelligence (AI) is a concept which has no boundaries in terms of smart development. It contains the ability of algorithms in machines to analyze the results without human intervention. The induction of AI with smart devices in the domain of healthcare has set an ideology for healthcare systems to a new level. With advancement in healthcare systems, the health status of people has reached a new level. AI embedded machines comprising sensors help to monitor and diagnose symptoms of diseases at earlier stages. This whole system acts like a robotic nurse for patients who take care, monitor, and record patient’s health condition with consistency. Using AI algorithms, various roles and responsibilities are carried out which requires intelligence in the areas of image analysis, speech recognition, pattern recognition and decision making [13–17]. This provides assistance to suggest best way to cure health. The collaboration of AI algorithms makes it easier to make accurate predictions in laboratories, such as blood group detection, and disease predictions. So, AI is the preferable choice when it comes to decision making [18]. AI algorithm is a mixed composition of various technologies which includes natural language processing (NLP), machine learning, neural networks and robotic systems. AI is used in multiple fields of smart healthcare, such as cancer treatment surgeries, neurology, and cardiology. Figure 5 provides an outlook of AI based healthcare system. AI based healthcare system requires a balanced approach which can be

104

S. Rani et al.

Fig. 5 Smart healthcare using AI

achieved through support from all domains such as doctors, nurses, labs, radiologists, pharmacy, and emergency medical services.

1.5 Blockchain Technology Blockchain in healthcare is linking of patients’ medical records, doctors, hospitals, nurses, medical staff, and health communities for the welfare of patients [19]. Blockchain is a kind of framework used to provide secure data exchange and management process. The basic idea of blockchain is to share data via peer-to-peer network. This helps to communicate data to authenticated users. They can modify or delete data records accordingly. In the healthcare sector, the presence of sensitive information which needs to be secured from third party users to maintain privacy and security is critical. Due to such sensitive content, blockchain concept has been incorporated in the healthcare sector to handle security systems. In addition, the blockchain concept has been used widely to resolve problems of central administration in the database securely [20]. Blockchain has eliminated the need to govern or manage the authentication based on trust and transparency. Blockchain has further enhanced in terms of privacy and security using cryptographic hash functions. Some of the blockchain applications in healthcare are compilation of visitor’s details, patients’ records, records

IoT Equipped Intelligent Distributed Framework for Smart Healthcare …

105

of lab results, and treatment details of patients. All such details are ensured by the blockchain process which includes ambulatory services and data assistance. A commonly observed problem with medical data is the duplication or mis-match of details during analysis of patients’ records. Such issues are tackled using hash function in the blockchain process which includes hashed ledger instead of using a primary key. In addition to this, a blockchain concept is based on certification, due to which claims can be automatically verified whenever required. Blockchain has reduced data compilation as well as cost. It has also reduced wastage and chances of fraud due to digitization of complicated datasets [21].

2 Security Issues in Smart Healthcare Systems There are many security issues in the deployment of smart healthcare systems, discussed below.

2.1 Communication Media Healthcare devices are connected to global as well as local networks via a wide range of wireless links such as Bluetooth, GSM, WIFI, and Zigbee. But wireless network makes traditional security schemes less appropriate. Therefore, it is very difficult to manage security protocols which can handle both wired and wireless technologies equally [22].

2.2 Topology Issues IoT-integrated healthcare devices are connected over various types of the network for data collection, storage, and computation. But the problem occurs when they exit from network due to certain failure. This cause dynamic network topology issues [23].

2.3 Scalability With increase in IoT devices, there is requirement to integrate them in a global information network. Therefore, to design a scalable security scheme without considering security related requirements become a tough task [5, 24].

106

S. Rani et al.

2.4 Mobility and Energy Constraints IoT based devices are dynamic in nature. They work on batteries. But, as different networks have different configurations, more efficient security algorithms are required for mobility [25].

2.5 Memory Constraints Most of the IoT based devices have limited memory, which is one of the major issues faced in storage of data and functioning of the devices.

2.6 Multi-protocol Network IoT devices communicate with other devices over the local network using network protocols. In addition to this, some IoT devices communicate with IoT service providers via the IP network. But security experts face problems related to sound security solutions for multiple protocol communication.

2.7 Tamper Devices An attacker might try to tamper the devices in physical mode and may extract a cryptographic secret content. He might modify, delete, or replace the content with malicious data [5].

3 Existing Healthcare Systems The task of data management plays a very important role to administer different types of the services in smart healthcare systems. The efficient storage of patient health data facilitates disease diagnosis, vaccine scheduling, and deployment of other health services. In the current scenario of smart healthcare framework, penitent health data is maintained in the electronic form using cloud architecture. Over the time, different authors have proposed different secure techniques and frameworks to store patients’ data. In [26], the authors presented a novel approach for sharing patient data in a secure way. Using this technique, patients’ data is organized and shared through a semi-trusted server. In this technique, every attribute of each patient record is saved and transmitted in an encrypted form. The scalability of the medical data is the main

IoT Equipped Intelligent Distributed Framework for Smart Healthcare …

107

feature of this method. In [27], another conflux approach using the encryption mechanism and digital signature is proposed for secure transfer of medical records. In [28], the authors discussed the disadvantages of the approach presented in [27], and introduced a new method to overcome the issues. Authors in [29–31], proposed a number of authentication mechanisms and data transfer protocols to secure the storage and transfer of medicals records on different types of machines and mobile devices. Many authors proposed different privacy preserving techniques to secure electronic medical records in the distributed smart healthcare systems. Although, many efforts are put by the researchers to secure medical data of the patients in smart medical systems adopted by various hospitals, this can also make the faster access of the records inconvenient in emergency situations. Emergency care providers and doctors may face hurdles to provide first aid and other medicals services. To resolve these challenges, medical industry, researchers, and academicians have introduced many smart gadgets/devices to monitor individuals’ health and store health parameters [32, 33]. But these devices are vulnerable to data thefts and failures. To address various security issues in existing healthcare systems, many blockchain models are proposed by the researchers. More secure solutions are proposed by storing the hash tables for cloud data in the blockchain nodes [34–37]. Another, more secure technique is presented by the authors in [38]. This model is proposed to access medical records by the doctors and patients. In [39], the authors proposed a multi-workflow-based system to manage various processes, like clinical trials and complicated surgeries. To store patient’s medical data more securely, and manage personal information efficiently, a novel platform framework is proposed by the authors [40]. Remote monitoring of the patients and wearable gadgets are highly prone to data stealing. To identify malfunctioning devices in a network is also a tedious job. Many authors carried out their work focusing these challenges. In [41], the authors proposed a novel model to address the privacy challenges in remote monitoring systems. Few other systems are also proposed by the researchers for secure smart-contract based remote patient monitoring [20, 42, 43]. The model presented in this paper stores patient health records in a blockchain based network. This secure system is proposed for IoT-based patient health monitoring. Smart contract is supported by Artificial Intelligence to locate the malfunctioning nodes. We have used 5 different attributes to characterize the performance of the proposed model. Main features of the various existing solutions proposed by the authors, and their comparison to the proposed system is summarized in Table 3.

4 Proposed Model The proposed framework is implemented in four layers, i.e., Hospital, remote IoTintegrated medical nodes, distributed medical records, and AI-based smart contract, as shown in Fig. 6.

108

S. Rani et al.

Table 3 Comparative analysis of the existing work with proposed work References no.

Year

IoT-facilitated remote monitoring

Medical records

AI-based smart contract

Distributed record storage

Locate malfunctioning IoT-device

[39]

2020

✗

✓

✓

✓

✗

[41]

2018

✓

✓

✓

✓

✗

[42]

2019

✓

✓

✓

✓

✗

[20]

2020

✗

✓

✓

✓

✗

[43]

2020

✗

✓

✓

✓

✗

[44]

2020

✗

✓

✓

✓

✗

[45]

2020

✗

✓

✓

✓

✗

✓

✓

✓

✓

✓

Proposed work

Fig. 6 Proposed model: framework

. Hospital-Basically, this layer acts as an information warehouse. It keeps complete patients’ information using various attributes; few important ones are patient id, patient name, disease history, and medicines prescribed. . Distributed Medical Records-This layer is integrated with hospital layer. Medical records stored in the hospital layer are distributed across the different nodes to make them secure using the blockchain network. . AI-based Smart Contract-This layer is merged between the hospital and distributed medical records layer, IoT nodes and hospital layer, and IoT nodes and distributed medical records layer. It aids the process of decision making, breach detection, and to check for malicious data.

IoT Equipped Intelligent Distributed Framework for Smart Healthcare …

109

. Remote IoT-Integrated Medical Nodes-These nodes are used to sense the various health parameters of the patients and transfer the sensed data securely to blockchain protected distributed databases. Working of the proposed model is depicted with the flowchart shown in Fig. 7.

5 Results and Discussion This section focusses on the results obtained with the proposed model. The proposed model is evaluated using time taken by the transaction, throughput, and latency. To gather results, total 5 different nodes were deployed, i.e., 4 IoT devices to sense data, and one hospital node. The blockchain to secure distributed medical records on the hospital node was deployed using the Ethereum platform. Our work is deployed on the blockchain using AI supported smart contract. The results obtained for the deployed IoT devices, i.e., transaction processing time and average delay are shown in Tables 4 and 5 respectively. As discussed above, two important parameters considered to evaluate the deployed model are transaction processing time and average delay. As shown in Figs. 8 and 9, transaction processing time and average delay for all used IoT-based medical devices is almost similar. So, the proposed model is highly scalable in terms of number of smart medical devices as well as number of transactions executed. Our proposed model ensures blockchain based secure storage and access of distributed medical records in a smart hospital framework.

6 Conclusions Smart distributed healthcare systems are rapidly becoming popular. They are fulfilling the medical needs of modern society in a more transparent, secure and convenient way. Along with all these features, the proposed blockchain model also protects the healthcare system from a single point of failure using AI-based smart contract technique deployed at the hospital layer. The medical records of the patients are stored in distributed databases. The proposed framework is tested in real time environment for two important parameters, i.e., transaction processing time and average delay. The proposed system can further be enhanced by incorporating more efficient AI algorithms to improve the processing time in the blockchain network to minimize average delays.

110 Fig. 7 Proposed model: working

S. Rani et al.

IoT Equipped Intelligent Distributed Framework for Smart Healthcare …

111

Table 4 Transaction processing time for IoT-based medical devices Number of transactions

Processing time (s) Device 1

Device 2

Device 3

Device 4

50

20

22

18

23

100

31

41

37

45

150

55

62

58

67

200

82

84

73

91

Table 5 Average delay in transaction processing for IoT-based medical nodes Number of transactions

Average delay (s) Device 1

Device 2

Device 3

Device 4

50

0.8

0.9

0.7

0.8

100

2.1

2

1.9

1.7

150

2.4

1.8

2.6

2.1

200

3.6

3.2

3.3

3

Fig. 8 Transaction processing time: medical IoT nodes

112

S. Rani et al.

Fig. 9 Average delay: medical IoT nodes

References 1. Baker, S.B., Xiang, W., Atkinson, I.J.I.A.: Internet of things for smart healthcare: technologies, challenges, and opportunities. IEEE Access 5, 26521–26544 (2017) 2. Mohammed, A.A., Burhanuddin, M., Talib, M.S., Hameed, M.E., Ali, M.F.: A review on IoTbased healthcare monitoring systems for patient in remote environments. Eur. J. Med. Clin. Med. 7(3), 2227–2235 (2020) 3. Puri, V., Kataria, A., Sharma, V.: Artificial intelligence-powered decentralized framework for Internet of Things in Healthcare 4.0. Trans. Emerg. Telecommun. Technol., e4245 (2021) 4. Rani, S. et al.: Threats and corrective measures for IoT security with observance of cybercrime: a survey. Wirel. Commun. Mob. Comput. 2021 (2021) 5. Saha, G., Singh, R., Saini, S.: A survey paper on the impact of Internet of Things. In: 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 331–334. IEEE (2019) 6. Islam, S.R., Kwak, D., Kabir, M.H., Hossain, M., Kwak, K.-S.: The internet of things for health care: a comprehensive survey. IEEE Access 3, 678–708 (2015) 7. Javaid, M., Khan, I.H., Research, C.: Internet of Things (IoT) enabled healthcare helps to take the challenges of COVID-19 Pandemic. J. Oral Biol. Craniofacial Res. 11(2), 209–214 (2021) 8. Rani, S. Gupta, O.: Empirical analysis and performance evaluation of various GPU implementations of Protein BLAST. Int. J. Comput. Appl. 151(7) (2016) 9. Gupta, O., Rani, S., Pant, D.C.: Impact of parallel computing on bioinformatics algorithms. In: Proceedings 5th IEEE International Conference on Advanced Computing and Communication Technologies, pp. 206–209 (2011) 10. Bedón-Molina, J., Lopez, M.J., Derpich, I.S.: A home-based smart health model. Adv. Mech. Eng. 12(6), 1–16 (2020) 11. Rani, S., Kaur, S.: Cluster analysis method for multiple sequence alignment. Int. J. Comput. Appl. 43(14), 19–25 (2012) 12. Birje, M.N., Hanji, S.S.: Internet of things based distributed healthcare systems: a review. J. Data Inf. Manag. 2, 149–165 (2020)

IoT Equipped Intelligent Distributed Framework for Smart Healthcare …

113

13. Arora, V., Leekha, R.S., Lee, K., Kataria, A.: Facilitating user authorization from imbalanced data logs of credit cards using artificial intelligence. Mob. Inf. Syst. 2020 (2020) 14. Bilal, M., Kumari, B., Rani, S.: An artificial intelligence supported E-commerce model to improve the export of Indian handloom and handicraft products in the world. In: Proceedings of the International Conference on Innovative Computing and Communication (2021). Available at SSRN 3842663 15. Kataria, A., Ghosh, S., Karar, V.: Development of Artificial Intelligence Based Technique for Minimization of Errors and Response time in Head Tracking for Head Worn Systems, EIED (2020) 16. Gupta, O., Rani, S.: Bioinformatics applications and tools: an overview. CiiT-Int. J. Biom. Bioinform. 3(3), 107–110 (2010) 17. Sudevan, S., Barwani, B., Al Maani, E., Rani, S., Sivaraman, A.K.: Impact of blended learning during Covid-19 in sultanate of Oman. J Ann. Rom. Soc. Cell Biol., 14978–14987 (2021) 18. Mohanta, B., Das, P., Patnaik, S.: Healthcare 5.0: a paradigm shift in digital healthcare system using artificial intelligence, IOT and 5G communication. In: 2019 International Conference on Applied Machine Learning (ICAML), pp. 191–196. IEEE (2019) 19. Wang, S., et al.: Blockchain-powered parallel healthcare systems based on the ACP approach. IEEE Trans. Comput. Soc. Syst. 5(4), 942–950 (2018) 20. Abugabah, A., Nizam, N., Alzubi, A.A.: Decentralized telemedicine framework for a smart healthcare ecosystem. IEEE Access 8, 166575–166588 (2020) 21. Chakraborty, S., Aich, S., Kim, H.-C.: A secure healthcare system design framework using blockchain technology. In: 2019 21st International Conference on Advanced Communication Technology (ICACT), pp. 260–264. IEEE (2019) 22. Arya, V., Rani, S., Choudhary, N.: Enhanced bio-inspired trust and reputation model for wireless sensor networks. In: Proceedings of Second Doctoral Symposium on Computational Intelligence, pp. 569–579. Springer (2022) 23. Patel, D., Srinivasan, K., Chang, C.Y., Gupta, T., Kataria, A.: Network anomaly detection inside consumer networks—a hybrid approach. Electronics 9(6), 923 (2020) 24. Gupta, O., Rani, S.: Accelerating molecular sequence analysis using distributed computing environment. Int. J. Sci. Eng. Res.–IJSER (2013) 25. Kataria, A., Ghosh, S., Karar, V.: Data Prediction of Optical Head Tracking using Self Healing Neural Model for Head Mounted Display, NISCAIR-CSIR (2018) 26. Li, M., Yu, S., Zheng, Y., Ren, K., Lou, W.: Scalable and secure sharing of personal health records in cloud computing using attribute-based encryption. IEEE Trans. Parallel Distrib. Syst. 24(1), 131–143 (2012) 27. Liu, J., Huang, X., Liu, J.K.J.: Secure sharing of personal health records in cloud computing: ciphertext-policy attribute-based signcryption. Futur. Gener. Comput. Syst. 52, 67–76 (2015) 28. Dattatraya, K.N., Rao, K.R.: Hybrid based cluster head selection for maximizing network lifetime and energy efficiency in WSN. J. King Saud Univ.-Comput. Inf. Sci. (2019) 29. Mohit, P., Amin, R., Karati, A., Biswas, G., Khan, M.K.: A standard mutual authentication protocol for cloud computing based health care system. J. Med. Syst. 41(4), 50 (2017) 30. Chiou, S.-Y., Ying, Z., Liu, J.J.: Improvement of a privacy authentication scheme based on cloud for medical environment. J. Med. Syst. 40(4), 101 (2016) 31. Kumar, V., Jangirala, S., Ahmad, M.J.: An efficient mutual authentication framework for healthcare system in cloud computing. J. Med. Syst. 42(8), 1–25 (2018) 32. Wang, X., Gui, Q., Liu, B., Jin, Z., Chen, Y.: Enabling smart personalized healthcare: a hybrid mobile-cloud approach for ECG telemonitoring. Int. J. Biomed. Health Inform. 18(3), 739–745 (2013) 33. Asghari, P., Rahmani, A.M., Haj Seyyed Javadi, H.: A medical monitoring scheme and healthmedical service composition model in cloud-based IoT platform. Trans. Emerg. Telecommun. Technol. 30(6), e3637 (2019) 34. Liang, X., et al.: Towards blockchain empowered trusted and accountable data sharing and collaboration in mobile healthcare applications. EAI Endorsed Trans. Pervasive Health Technol. 4(15) (2018)

114

S. Rani et al.

35. Al Omar, A., Bhuiyan, M.Z.A., Basu, A., Kiyomoto, S., Rahman, M.S.: Privacy-friendly platform for healthcare data in cloud based on blockchain environment. Futur. Gener. Comput. Syst. 95, 511–521 (2019) 36. Kaur, G., Kaur, R., Rani, S.: Cloud computing-a new trend in IT era. Int. J. Sci. Technol. Manag., 1–6 (2015) 37. Rani, S., Kataria, A., Chauhan, M.: Fog computing in Industry 4.0: applications and challenges—a research roadmap. In: Energy Conservation Solutions for Fog-Edge Computing Paradigms, pp. 173–190. Springer (2022) 38. Ramani, V., Kumar, T., Bracken, A., Liyanage, M., Ylianttila, M.: Secure and efficient data accessibility in blockchain based healthcare systems. In: 2018 IEEE Global Communications Conference (GLOBECOM), pp. 206–212. IEEE (2018) 39. Khatoon, A.J.E.: A blockchain-based smart contract system for healthcare management. Electronics 9(1), 94 (2020) 40. Al Omar, A., Rahman, M.S., Basu, A., Kiyomoto, S.: Medibchain: a blockchain based privacy preserving platform for healthcare data. In: International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage, pp. 534–543. Springer (2017) 41. Griggs, K.N., Ossipova, O., Kohlios, C.P., Baccarini, A.N., Howson, E.A., Hayajneh, T.: Healthcare blockchain system using smart contracts for secure automated remote patient monitoring. J. Med. Syst. 42(7), 1–7 (2018) 42. Kazmi, H.S.Z., Nazeer, F., Mubarak, S., Hameed, S., Basharat, A., Javaid, N.: Trusted remote patient monitoring using blockchain-based smart contracts. In: International Conference on Broadband and Wireless Computing, Communication and Applications, pp. 765–776. Springer (2019) 43. Ali, M.S., Vecchio, M., Putra, G.D., Kanhere, S.S., Antonelli, F.: A decentralized peer-to-peer remote health monitoring system. Sensors 20(6), 1656 (2020) 44. Tanwar, S., Parekh, K., Evans, R.: Blockchain-based electronic healthcare record system for healthcare 4.0 applications. J. Inf. Secur. Appl. 50, 102407 (2020) 45. Chen, L., Lee, W.-K., Chang, C.-C., Choo, K.-K.R., Zhang, N.: Blockchain based searchable encryption for electronic health record sharing. Futur. Gener. Comput. Syst. 95, 420–429 (2019)

Adaptive Particle Swarm Optimization for Energy Minimization in Cloud: A Success History Based Approach Vijay Kumar Sharma, Swati Sharma, Mukesh Rawat, and Ravi Prakash

Abstract Large-scale virtualized data centers have been built to meet the growing processing needs of modern service applications and the shift to Cloud computing. Cloud data centers have high energy consumption, high emissions, and high electricity expenses. Cloud providers could enhance resource use and reduce energy consumption through live migration and dynamic VM consolidation. Downsizing resources extensively may affect performance; therefore, we must handle the energyperformance trade-off to meet our goal to provide exceptional service to our clients. VM placement must be optimized in real time as application workloads become more sophisticated and dynamic. We undertake market research and compare online deterministic algorithms for VM migration and adaptive VM consolidation to understand the problem. By analyzing past VM resource usage, we provide adaptive algorithms for dynamic consolidation. The suggested algorithm decreased energy utilization and met SLA. Extensive simulations using real-world workload traces from over a thousand Planet Lab VMs prove the technique’s effectiveness. This paper offers SHA-PSO, a PSO-based meta-heuristic technique that schedules workloads among Virtual Machines (VM) to minimize energy. Our algorithm stores each generation’s best dynamic PSO solutions. After gathering enough achieved data, PSO parameters from the archive are used to arrange the incoming workload among VMs. The suggested technique outperformed existing scheduling approaches on Cloudsim, a Java-based cloud simulator.

V. K. Sharma (B) · M. Rawat CSED, Meerut Institute of Engineering and Technology, Meerut, India e-mail: [email protected] M. Rawat e-mail: [email protected] S. Sharma IT, Meerut Institute of Engineering and Technology, Meerut, India e-mail: [email protected] R. Prakash CSED, Motilal Nehru National Institute of Technology, Allahabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Rishiwal et al. (eds.), Towards the Integration of IoT, Cloud and Big Data, Studies in Big Data 137, https://doi.org/10.1007/978-981-99-6034-7_7

115

116

V. K. Sharma et al.

Keywords Cloudsim · Virtual machine · Physical machine · Particle swarm optimization

1 Introduction For businesses, non-profits, and individuals that need to run computationally and/or data-intensive applications, cloud computing offers a flexible and cost-effective set of resources. Users may make requests for cloud-based services including processing power, memory, storage space, and networking via various cloud-computing platforms (such Amazon EC2, Google AppEngine, and Microsoft Azure) (CSPs). In return for payment, CSPs provide consumers with the necessary resource in the form of a virtual machine (VM, mimicking a physical computer). An efficient VM resource allocation should reduce the energy consumption of the physical machines (PMs) required to execute users’ applications [1–7] while still providing scalable services to meet a wide range of user needs [3, 4]. Developing an energy-aware resource allocation strategy for assigning virtual machines to physical machines in order to minimize energy costs is the primary focus of this research. This is a basic issue in cloud computing systems [8]. Simply designing PMs that spend power in direct proportion to the VM loads they handle is one way to make a cloud system more power efficient [8]. Many methods have been developed to accomplish PM energy proportionate [9, 10], such as the use of high-quality power supply and voltage regulation modules. Cloud systems are equipped with energy-proportional PMs, however their energy consumption is far from ideal because of inefficient allocation of VMs to PMs [11–13]. Both the PMs and the VMs in cloud computing environments are varied in terms of the resources they use and the costs associated with running them [9]. Allocating many virtual machines to expensive power management processors is wasteful [12]. The energy-aware resource allocation issue has been researched extensively, and many solutions have been developed [10–16] because of its importance in developing green cloud systems. Despite the promise of these methods, we identify two areas for development. To begin, the majority of current methods presuppose the presence of a centraliszd resource management capable of keeping track of all available PMs and VMs and allocating them to available PMs centrally [14–21]. Centralization ensures great system performance, but it’s weak and has a single point of failure, making the cloud system vulnerable [22, 23]. Second, virtual machine (VM) live migration is required for resource consolidation [24] due to the dynamic nature of cloud systems with frequent VM arrival and departure. Performance in cloud computing systems is also dependent on the migration cost (e.g., network traffic cost), which happens when a VM is relocated from one PM to another PM. Many methods [13, 15, 24] move VMs across PMs, albeit they don’t account for the cost of VM migration. Scheduling tasks across several virtual machines in a cloud is an NP-complete problem. If there are more people using the cloud or if it has planned to handle a lot of work, the severity of the issue will increase. It is important to consider a number of

Adaptive Particle Swarm Optimization for Energy Minimization …

117

potential influences when scheduling cloud-based jobs. Furthermore, our goal may be anything; it can be a single goal or several goals. Makespan (to avoid SLA from the users’ viewpoint), energy (from the cloud service providers’ perspective), and cost (from the users’ perspective) are the quality characteristics that must be considered when scheduling workload on the cloud. Jobs scheduled to the cloud data centers may be of varied types and sizes that change dynamically. Because of its scale and complexity, we need a scheduling system that can adapt dynamically too. Due to the ever-changing nature of most jobs, a predetermined algorithm would not provide desirable results. With this in mind, we suggest a dynamic scheduling method for cloud data centers to reduce their long-term energy footprint. The most effective method for solving such issues is the metaheuristic one. We use a PSO variant in our metaheuristic approach. Particle swarm optimization (PSO) is a popular swarm intelligence approach that was initially developed by Kennedy and Eberhart in 1995 [1]. Particle swarm optimization (PSO) uses a straightforward method that mimics the swarm behavior of flocking birds and schooling fish to guide particles to globally optimum solutions [1]. Due to its simplicity, PSO has seen rapid development in recent years, with several successful applications handling real-world optimization problems [1]. Similarly, to other evolutionary computing techniques, the PSO is a populationbased iterative process. Therefore, the method may be computationally inefficient due to the high number of function evaluations (FEs). In addition, when dealing with complex multimodal challenges [1–15], the standard PSO approach might soon get trapped in the local optima. Because of these issues, the PSO has seen very little use. Consequently, enhancing convergence speed and preventing PSOs from becoming stuck in local optima have been the two most compelling areas of study in PSOs. Researchers offered a plethora of different PSO algorithms to achieve these two objectives (called “diversity” and “convergence rate”) because of the algorithm’s ease of modification. Two of the three most important and promising techniques (the other being topological upgrades) are the management of algorithm parameters and the integration of these with auxiliary search operators [4, 5].

2 Background and Related Work Heuristic methods may be employed when conventional methods take too long or don’t provide satisfactory results. The goal of every heuristic algorithm is to find a workable answer to a problem as quickly as possible. Heuristic algorithms are problem-specific techniques that work well for a certain set of problems but poorly on others. Heuristic approaches, on the other hand, are able to quickly and accurately solve a narrow class of problems. However, most heuristic algorithms are too greedy, so they become stuck in the local optimum and cannot find the best global solution, therefore they cannot handle complicated optimization problems. Multiple heuristic approaches are used to solve cloud-related problems. In this chapter, we will go through a wide variety of energy-saving heuristic and meta-heuristic algorithms.

118

V. K. Sharma et al.

Economically, the auction-based approach has been the standard for CSPs to utilize when offering virtual machine (VM) resources to their most important customers [25–30]. Users initiate the auction process by submitting requests detailing their desired quantity and quality of virtual machine (VM) resources. The CSP then makes the call as to which virtual machine resources each user receives. Nejad et al. [25] developed a Vickrey-Clarke-Groves (VCG)-based honest technique to attain the best income while motivating the users to reveal their genuine private information. Unfortunately, VCG mechanism is computationally intractable in the vast cloud systems [26] due to the revenue maximization issue being an NP-hard combinatorial optimization problem. As a result, CSPs prefer approximate truthful procedures with reasonable computational costs [27]. To further accommodate the ever-changing nature of the actual world, Mashayekhy et al. [28] and Zhang et al. [29] include an online honest mechanism that is activated whenever a user submits a request. The online process was enhanced by Zhang et al. [30] when they considered a more versatile bidding language that would allow a user to be satisfied regardless of whether or not the resources, he needed were available at the moment. The auction model isn’t the only way to increase profits in cloud computing; MAbased negotiating technology can do the trick, too. One such technique for negotiating over shared resources was proposed by An et al. [31]. To facilitate efficient and equitable resource trading among egocentric users, Zhao et al. [32] presented an MA-based protocol. In order to facilitate service discovery and service composition, agents are constructed in Sim’s [33] systemic agent-based cloud computing architecture. Chen et al. [34] expand upon the work of [33] to find the optimal throughput for a cloud system. It is crucial for cloud computing systems to guarantee quality of service as outlined in service level agreements between cloud service providers and their customers [14]. Timely responses to user requests [35, 36] and enough network bandwidth [37] are two hallmarks of a good quality of service. An allocation solution of virtual machines to physical machines (VMs) is represented by a “chromosome,” and a genetic algorithm (GA) was devised by Kumar and Verma [35] to speed up responses. First, GA produces a population of hypothetical chromosomes at random and ranks them according to their fitness values (i.e., objective value). In the second stage, parent chromosomes are chosen that have high fitness values (for example, create high levels of QoS) and are utilized to cross-pollinate the offspring. Minimizing reaction latency by VM placement was the focus of the work of Alicherry and Lakshman [36], who presented a method based on integer programming. However, several network-aware VM placement strategies have been developed to arrange the communication-intensive VMs together on the same PMs, in order to provide the necessary network bandwidth for the users [37]. While effective in improving societal well-being, none of these processes takes into account the energy required to power users’ software. Authors Jena et al. [9] suggest and label a method for scheduling tasks known as Task Scheduling using the Clonal Selection Algorithm (TSCA) [18]. When the number of servers and users in a network fluctuates often, TSCA can manage the challenge of scheduling work among several machines. In the suggested paradigm,

Adaptive Particle Swarm Optimization for Energy Minimization …

119

cloud resources are made available to the user through an interface. Time and energy may be conserved via the use of the suggested task-scheduling module, which is responsible for distributing user tasks to the different processing resources available. The CSA-based multi-objective strategy excels in the cloud setting because to its ability to effectively harness system resources to reduce makespan and energy consumption. When compared to other scheduling algorithms, such as the maximum applications scheduling algorithm (which aims to maximize the number of scheduled tasks) and the random scheduling algorithm (which assigns tasks to machines at random), the results show that the TSCSA approach strikes the best balance between a number of objectives [18]. To solve the problem of how to schedule time-sensitive workflow operations in an energy-efficient manner, Safari and Khorsand [13] created the DVFS approach [10– 12]. The team demonstrated that by altering the voltage according to the machine’s frequency and the frequency according to the current host use, energy consumption could be lowered [18]. In their modelling of energy use, the authors ignored the more time- and cost-intensive static energy use in favor of the more complicated and costly dynamic form. For this purpose, the host employs a wide range of voltage levels, thereby lowering the system’s working frequency. To save power when it’s most useful, this method might partition the workflow tasks and distribute them to the most efficient processors. CloudSim was used to examine the proposed method by the authors. Results from experiments showed that this method is superior in terms of execution time, resource usage, energy consumption, and the occurrence of SLA violations. This method, termed asmEDA [13], was first developed by Wu and Wang. The mEDA technique is offered to define the processing permutation of tasks and the Voltage Supply Levels (VSLs). Scheduling a single, high-priority, parallel application is the target of media in an effort to save resources and cut down on wasted time. The authors analyzed the CPU’s total energy usage throughout both its active and inactive states. The authors divided the population into three groups for extensive searching, and then developed unique operators to cut down on runtime and power use without sacrificing performance. Not for the cloud, but for a distributed system, Hu et al. [14] suggested an approach. The programme he created is known as RSMECC. In response to a processing device’s excessive energy consumption, this method reduces the number of parallel processes being executed on that unit. The biggest problem with this technique is that it has to move between many different contexts, which uses up a lot of CPU time. While the algorithm outperforms previous statistical approaches to reducing energy use, it is not practical for the large numbers of context shifts that are feasible in a production cloud setting. Data and its intra-cloud transfer is the primary emphasis of Xiao et al. [15]. His approach calculates how much power is needed to install a Virtual Machine (VM) and how much is needed to move it. Following this effective deployment of virtual machines (VMs), he proposed priority-based scheduling of associated workflows. He proposed the Minimal Data-Access Energy Path method (MDEP). Compared to other scheduling methods, his algorithm is superior. According to his article, his

120

V. K. Sharma et al.

technique uses less energy when processing large amounts of data that must be shared over several datacenters located in different countries. In [16], Sharma et al. recommended using many approaches. An initial dataset of energy consumption versus CPU use is generated using the genetic algorithm. Next, he scheduled operations on a processing unit inside the rack using the backpropagation technique described in the Artificial Neural Network. His algorithm cut both time and money from the project. The popularity of meta-heuristic techniques has increased in recent years as a result of their effectiveness in dealing with enormous and difficult challenges. They may be used instead of attempting to solve NP-hard problems. Metaheuristic methods use heuristics to efficiently seek for near-optimal solutions. Meta-heuristic methods may find better solutions since they are not too optimistic about their chances of success. In general, the performance of meta-heuristic techniques is satisfactory for a wide range of problems despite being non-deterministic, imprecise, and problemindependent. The NP-Complete problem in the cloud may be solved fast with the help of several meta-heuristic algorithms that can find an estimated near-optimal solution. Because there is a large number of possible solutions and it takes a long time to identify the optimal one, the task scheduling issue is considered NP-Complete. In reference [17] Sharma and Garg’s suggested HIGA (Harmony-Inspired Genetic Algorithm) is a novel hybrid technique that incorporates two well-known metaheuristic algorithms (GA and HS). The presented method combines the exploratory power of GA with the exploitation capability of HS to achieve a high rate of convergence. Getting out of the local optimum trap is possible if the technique is in the local optimal. In addition to reducing cycle time, it helps to eliminate unnecessary repetition.

3 Proposed Approach In the suggested method, we merge a dynamic strategy with Particle Swarm Optimization’s ability to adapt to new conditions. The PSO’s adaptability comes from the fact that its parameters (acceleration coefficient, inertia/weight, and scaling factors) may be changed within a generation, with the optimal values for those parameters being saved in an archive. After collecting and archiving precisely 50 such data, we schedule the job on the VM by running PSO again using the optimal set of parameters we gleaned from the archive. By making such a choice, we may be certain that the best possible values have been selected for the computation, as opposed to the prior nature of PSO, when the values determined by each individual’s “personal best,” or “pBest,” were given more weight. Figure 1 is a graphical representation of the algorithm. As shown in Fig. 1, the optimal solution found during Generation 1 and the PSO parameters used are both saved in a database. To generate PSO parameters for subsequent generations, the shown cycle is repeated up to fifty times. At the end of each generation, the PSO

Adaptive Particle Swarm Optimization for Energy Minimization …

121

settings and optimal solution are stored permanently. Once the archive is finished, PSO is used to divide the workload across the virtual machines based on the optimal parameters already stored in the archive. By doing this computation in the past, we can avoid the stagnation issue and keep the rate of convergence high. Additionally, fewer Service Level Agreement (SLA) breaches will result from quicker convergence. Equations 1 and 2 depict the mathematical framework for the dynamic PSO. Vi = r _ω ∗ Vi + r _a1 ∗ r _r1 ∗ (Pi − X i ) + r _a2 ∗ r _r2 ∗ (G − X i )

(1)

where, r _ω is the randomly chosen weight/inertia, r _a1 andr _a2 are randomly chosen acceleration coefficients, r _r1 and r _r2 are randomly chosen scaling factors, Vi is the velocity of the ith particle/solution, X i is the ith position of the particle/solution. Pi is the immediate best of the current generation, and. G is the global best in the current generation. With the help of the rand() function, the acceleration coefficient and scaling factors are randomly adjusted for each generational weight. Equation 2 is used to determine

Fig. 1 Pictorial representation of our proposed algorithm

122

V. K. Sharma et al.

a particle’s new location after its precise velocity within the present population has been determined. The previous position is being replaced with the new location, which is the result of adding the former position to the velocity found in Eq. 1. X i = X i + Vi

(2)

As explained in algorithm 1 where we have shown pseudo code that after we have completed the maximum number of iterations, we save the most recent values, which are the optimal PSO parameters, in a database. We are reiterating this procedure by storing all conceivable parameter values for the PSO and using all potential parameter combinations. For a specific purpose, it is preferable to do such a computationally complex activity offline. The optimal result of the archive will subsequently be used to schedule real workloads in the cloud data center.

where, The inertia, acceleration, and scaling-factor coefficients for the ith PSO iteration are indicated by ini , acci and sf i , respectively. A flowchart in support of the algorithm 1 can be shown as in Fig. 2.

Adaptive Particle Swarm Optimization for Energy Minimization …

Fig. 2 Flowchart showing the suggested method

123

124

V. K. Sharma et al.

4 Results and Discussions The method has been benchmarked against various heuristic and meta-heuristic approaches, and it has come out on top. The benchmarking platform is cloudsim, a widely used cloud simulator. This simulation has been run on a system with the following parameters: OS: Win 10 Pro CPU: intel core i7 4.2 Ghz RAM: 16 GB 2400 MHz Language and Version: Java Development Kit 8 Simulator: CloudSim Integrated Development Environment: Eclipse Luna The proposed algorithm, SHA-PSO, has the lowest scheduling costs compared to the alternatives. The table below maintains an even number of host computers (800) for all algorithms under study. When scheduling a job using the investigated methods, a minimum of 1052 virtual machines (VMs) is required. The simulator has been used to test all algorithms for a total of 86,400 s. There is also an average SLA violation rate included in the table; a rate of less than 15% indicates a satisfactory SLA. Table 1 provides a comprehensive comparison of SHA-PSO to commonly used heuristic and meta-heuristic methods. Values in bold denote the lowest figures found for the two groups. It is expected that SHA-PSO has lower energy requirements than competing state-of-the-art algorithms. For both VM migrations and SLA violations, a result of 0 indicates that no action has been taken. This would be obvious in the case of energy-blind schedulers, which send out VMs to do their task regardless of whether or not they are currently busy. Time to completion is prioritized in both energy-aware and non-energy-aware schedulers, thus service level agreement (SLA) breaches are avoided in both cases. The table shows that conventional approaches such as FCFS, Round-robin, and DVFS have a significant energy footprint. Data center energy usage as a function of the method used to schedule scientific workflows in the cloud is depicted in Fig. 3.In this figure, we see a visualization of the number of virtual machine (VM) migrations that occurred as a result of running different scientific workflow scheduling algorithms in the cloud data center. The Pareto charts seen in Figs. 3, 4, and 5 show the distribution of the data as a cumulative line plotted against a secondary axis in percentage terms.

Adaptive Particle Swarm Optimization for Energy Minimization …

125

Table 1 Simulation results showing energy consumption along with other essential parameters Algorithm name

Energy consumption

Number of VM migrations

Average SLA violation (%)

DVFS

805.55

0

IQR_MC

119.24

24,224

10.31

IQR_MMT

119.35

26,427

10.62

0.00

IQR_MU

118.17

48,112

14.75

IQR_RS

119.37

23,883

11.24

LR_MC

117.44

16,447

10.95

LR_MMT

119.37

17,902

9.42

LR_MU

116.84

29,112

12.71

LRR_MC

119.86

17,788

10.55

LRR_MMT

118.45

19,168

10.58

LRR_MU

118.18

32,145

14.20

LRR_RS

119.20

18,240

11.21

LR_RS

119.10

16,125

9.42

MAD_MC

116.42

23,611

11.02

MAD_MMT

117.89

25,929

11.45

MAD_MU

115.04

46,132

14.28

MAD_RS

117.27

24,143

10.49

NPA

2411.93

0

0.00

THR_MC

121.58

24,794

10.69

THR_MMT

120.33

26,562

10.59

THR_MU

119.42

45,856

14.67

THR_RS

120.38

25,561

10.78

PSO

104.63

32,413

11.37

SHA_PSO

82.54

14,370

8.17

SHA_DE

99.24

41,944

10.71

100.77

37,493

13.12

33,125

12.28

GA ACO

104.37

FCFS

2411.97

HEFT

118.98

0 19,276

0.00 12.65

5 Conclusion and Future Work This study presents a hybrid of two methodologies. The first is the dynamic PSO, and the second is archiving the optimal PSO settings. This energy minimization issue in the cloud falls under the category of NP-complete problems, and there are an infinite number of possible heuristic and meta-heuristic approaches to solving it. Minimizing

126

V. K. Sharma et al.

Fig. 3 A plot of energy consumption of various algorithms used in scientific workflow scheduling inside cloud data center

Fig. 4 A plot of VM migrations incurred due to application of various algorithms used in scientific workflow scheduling inside cloud data center

Adaptive Particle Swarm Optimization for Energy Minimization …

127

Fig. 5 Image showing a histogram of service level agreement (SLA) breaches that have occurred because of the usage of different methods for scheduling scientific workflows inside a cloud data center

such an NP-complete task has enormous potential benefits. We anticipate that in the future, more effective mixtures of the meta-heuristic algorithms will be discovered.

Appendix Abbreviations SHA-PSO

Success History based Adaptive Particle Swarm Optimization

PSO

Particle Swarm Optimization

DVFS

Dynamic Voltage and Frequency Scaling

IQR_MC

Inter Quartile Range with Maximum Correlation

IQR_MMT

Inter Quartile Range with Minimum Migration Time

IQR_MU

Inter Quartile Range with Minimum Utilization

IQR_RS

Inter Quartile Range with Random Selection (continued)

128

V. K. Sharma et al.

(continued) LR_MC

Local Regression with Maximum Correlation

LR_MMT

Local Regression with Minimum Migration Time

LR_MU

Local Regression with Minimum Utilization

LRR_MC

Local Regression Robust with Maximum Correlation

LRR_MMT

Local Regression Robust with Minimum Migration Time

LRR_MU

Local Regression Robust with Minimum Utilization

LRR_RS

Local Regression Robust with Random Selection

LR_RS

Local Regression with Random Selection

MAD_MC

Median Absolute Deviation with Maximum Correlation

MAD_MMT

Median Absolute Deviation with Minimum Migration Time

MAD_MU

Median Absolute Deviation with Minimum Utilization

MAD_RS

Median Absolute Deviation with Random Selection

NPA

Non Power Aware

THR_MC

Static Threshold with Maximum Correlation

THR_MMT

Static Threshold with Minimum Migration Time

THR_MU

Static Threshold with Minimum Utilization

THR_RS

Static Threshold with Random Selection

SHA_DE

Success History based Adaptive Differential Evolution

GA

Genetic Algorithm

ACO

Ant Colony Optimization

FCFS

First Come First Serve

HEFT

Heterogeneous Earliest Finish Time

References 1. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN’95International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995) 2. Mthunzi, S.N., Benkhelifa, E., Bosakowski, T., Guegan, C.G., Barhamgi, M.: Cloud computing security taxonomy: From an atomistic to a holistic view. Future Gener. Comput. Syst. 107, 620–644 (2020) 3. Wang, M., Zhang, Q.: Optimized data storage algorithm of IoT based on cloud computing in distributed system. Comput. Commun. 157, 124–131 (2020) 4. Abdel-Basset, M., El-Shahat, D., Deb, K., Abouhawwash, M.: Energy-aware whale optimization algorithm for real-time task scheduling in multiprocessor systems. Appl. Soft Comput. 93, 106349 (2020) 5. Hussain, M., Wei, L.-F., Lakhan, A., Wali, S., Ali, S., Hussain, A.: Energy and performanceefficient task scheduling in heterogeneous virtualized cloud computing. Sustain. Comput. Inform. Syst. 30, 100517 (2021) 6. Khan, A.A., Zakarya, M., Rahman, I.U., Khan, R., Buyya, R.: HeporCloud: an energy and performance efficient resource orchestrator for hybrid heterogeneous cloud computing environments. J. Netw. Comput. Appl. 173, 102869 (2021)

Adaptive Particle Swarm Optimization for Energy Minimization …

129

7. Sharma, B., Prakash, R., Tiwari, S., Mishra, K.K.: A variant of environmental adaptation method with real parameter encoding and its application in economic load dispatch problem. Appl. Intell. 47(2), 409–429 (2017) 8. Shukla, R., Hazela, B., Shukla, S., Prakash, R., Mishra, K.K.: Variant of differential evolution algorithm. In: Advances in Computer and Computational Sciences, pp. 601–608. Springer, Singapore (2017) 9. Prakash, R., Kumar, S., Kumar, C., Mishra, K.K.: Musical password based biometric authentication. In: 2016 International Conference on Computing, Communication and Automation (ICCCA), pp. 1016–1019. IEEE (2016) 10. Gonzalez, R., Gordon, B.M., Horowitz, M.A.: Supply and threshold voltage scaling for low power CMOS. IEEE J. Solid-State Circ. 32(8), 1210–1216 (1997) 11. Semeraro, G., Magklis, G., Balasubramonian, R., Albonesi, D.H., Dwarkadas, S., Scott, M.L.: Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In: Proceedings Eighth International Symposium on High Performance Computer Architecture, pp. 29–40. IEEE (2002) 12. Le Sueur, E., Heiser, G.: Dynamic voltage and frequency scaling: The laws of diminishing returns. In: Proceedings of the 2010 International Conference on Power Aware Computing and systems, pp. 1–8 (2010) 13. Wu, C., Wang, L.: A multi-model estimation of distribution algorithm for energy efficient scheduling under cloud computing system. J. Parallel Distrib. Comput. 117, 63–72 (2018) 14. Hu, Y., Li, J., He, L.: A reformed task scheduling algorithm for heterogeneous distributed systems with energy consumption constraints. Neural Comput. Appl. 32(10), 5681–5693 (2020) 15. Xiao, P., Zhi-Gang, H., Zhang, Y.-P.: An energy-aware heuristic scheduling for data-intensive workflows in virtualized datacenters. J. Comput. Sci. Technol. 28(6), 948–961 (2013) 16. Sharma, M., Garg, R.: An artificial neural network based approach for energy efficient task scheduling in cloud data centers. Sustain. Comput. Inform. Syst. 26, 100373 (2020) 17. Sharma, M., Garg, R.: HIGA: harmony-inspired genetic algorithm for rack-aware energyefficient task scheduling in cloud data centers. Eng. Sci. Technol. Int. J. 23(1), 211–224 (2020) 18. Ghafari, R., Hassani Kabutarkhani, F., Mansouri, N.: Task scheduling algorithms for energy optimization in cloud environment: a comprehensive review. Cluster Comput. 1–59 (2022) 19. Jeba, J.A., Roy, S., Rashid, M.O., TanjilaAtik, S., Whaiduzzaman, M.: Towards green cloud computing an algorithmic approach for energy minimization in cloud data centers. In: Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing, pp. 846–872. IGI Global (2021) 20. Guo, S., Zeng, D., Lin, G., Luo, J.: When green energy meets cloud radio access network: joint optimization towards brown energy minimization. Mobile Netw. Appl. 24(3), 962–970 (2019) 21. Kak, S.M., Agarwal, P., Afshar Alam, M.: Energy minimization in a cloud computing environment. In: Intelligent Systems, pp. 397–405. Springer, Singapore (2021) 22. Deng, B., Jiang, C., Guo, S.: Energy minimization of resource allocation in cloud-based satellite communication networks. IEEE Commun. Lett. 23(12), 2353–2356 (2019) 23. Yang, Z., Pan, C., Hou, J., Shikh-Bahaei, M.: Efficient resource allocation for mobile-edge computing networks with NOMA: completion time and energy minimization. IEEE Trans. Commun. 67(11), 7771–7784 (2019) 24. Kak, S.M., Agarwal, P., Afshar Alam, M.: Energy minimization in a sustainably developed environment using cloud computing. In: Smart Technologies for Energy and Environmental Sustainability, pp. 39–52. Springer, Cham (2022) 25. Pirozmand, P., Hosseinabadi, A.A.R., Farrokhzad, M., Sadeghilalimi, M., Mirkamali, S., Slowik, A.: Multi-objective hybrid genetic algorithm for task scheduling problem in cloud computing. Neural Comput. Appl. 33(19), 13075–13088 (2021) 26. Zaman, S., Grosu, D.: Combinatorial auction-based allocation of virtual machine instances in clouds. J. Parallel Distrib. Comput. 73(4), 495–508 (2013) 27. Nejad, M.M., Mashayekhy, L., Grosu, D.: Truthful greedy mechanisms for dynamic virtual machine provisioning and allocation in clouds. IEEE Trans. Parallel Distrib. Syst. 26(2), 594– 603 (2014)

130

V. K. Sharma et al.

28. Mashayekhy, L., Nejad, M.M., Grosu, D., Vasilakos, A.V.: Incentive-compatible online mechanisms for resource provisioning and allocation in clouds. In: 2014 IEEE 7th International Conference on Cloud Computing, pp. 312–319. IEEE (2014) 29. Zhang, L., Li, Z., Wu, C.: Dynamic resource provisioning in cloud computing: a randomized auction approach. In: IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pp. 433–441. IEEE (2014) 30. Zhang, H., Jiang, H., Li, B., Liu, F., Vasilakos, A.V., Liu, J.: A framework for truthful online auctions in cloud computing with heterogeneous user demands. IEEE Trans. Comput. 65(3), 805–818 (2015) 31. An, B., Lesser, V., Irwin, D., Zink, M.: Automated negotiation with decommitment for dynamic resource allocation in cloud computing. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Vol. 1, pp. 981–988 (2010) 32. Zhao, H., Liu, X., Li, X.: Towards efficient and fair resource trading in community-based cloud computing. J. Parallel Distrib. Comput. 74(11), 3087–3097 (2014) 33. Sim, K.M.: Agent-based cloud computing. IEEE Trans. Serv. Comput. 5(4), 564–577 (2011) 34. Chen, C., Zhu, X., Bao, W., Chen, L., Sim, K.M.: An agent-based emergent task allocation algorithm in clouds. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp. 1490–1497. IEEE (2013) 35. Shukla, D.K., Kumar, D., Kushwaha, D.S.: Task scheduling to reduce energy consumption and makespan of cloud computing using NSGA-II. Mater. Today: Proc. (2021) 36. Alicherry, M., Lakshman, T.V.: Optimizing data access latencies in cloud systems by intelligent virtual machine placement. In: 2013 Proceedings IEEE INFOCOM, pp. 647–655. IEEE (2013) 37. Alicherry, M., Lakshman, T.V.: Network aware resource allocation in distributed clouds. In: 2012 Proceedings IEEE INFOCOM, pp. 963–971. IEEE (2012)

Field Monitoring and Automation in Agriculture Using Internet of Things (IoT) Ashendra Kumar Saxena, Rakesh Kumar Dwivedi, and Danilla Parygin

Abstract Agriculture is essential withinside the usual monetary improvement of a nation. Internet of Things (IoT) performs an essential position in clever agriculture. In this challenge, an automatic agriculture system is advanced to optimize the water and fertilizer utilization of crops. This challenge additionally consists of detection of animals and prevention of trees. The motive of the test is to discover higher approaches of controlling an irrigation system with the automated system. This chapter proposes the design of field monitoring device using IoT in Agriculture. This system is responsible for monitoring the field’s parameter such as soil moisture, temperature, humidity etc. It also involves controlling of irrigation. For irrigation, the user does not need to check the water level manually. It depicts all the data that is sensed by sensors. This automated model is providing the information to improve the crop yields while saving water. This IoT based device senses the soil moisture, temperature, humanity etc. The amount of the moisture in the soil and release the flow of water through irrigation pipes, in case if it is below than predefined threshold, this device also record the moisture automatically. The proposed device also suggests the growth of plants as per the information collected by sensors. Keywors Intenet of Things · Sensors · Wireless · Smart Farming · Smart Irrigation System

A. K. Saxena (B) · R. K. Dwivedi CCSIT, Teerthanker Mahaveer University, Moradabad, UP, India e-mail: [email protected] R. K. Dwivedi e-mail: [email protected] D. Parygin Volgograd State Techincal University, Vogograd, Russia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Rishiwal et al. (eds.), Towards the Integration of IoT, Cloud and Big Data, Studies in Big Data 137, https://doi.org/10.1007/978-981-99-6034-7_8

131

132

A. K. Saxena et al.

1 Introduction India’s population is continuously increasing and, after some years, food problem will be very serious. Therefore, agricultural development is very neccessary. Today, water scarcity and lack of rain is the main cause of worries of the farmers. The main objective is to have an automatic irrigation system that consume less time, money and power of the farmer. There are many convectional techniques. The old irrigation techniques need manual hardwork. But, with the help of an automatic crop monitoring and irrigation system, human efforts will be minimized [1]. Earlier, Indian farmers used techniques that are based on manual observations that resulted in either more watering or more use of pesticides that directly harm the productivity of crop and fertility of the lands [2]. Here, a system is proposed that would have smart monitoring and automatic irrigation control. The system is connected with the cloud with the help of sensing devices and the sensed data is then monitored via mobile application. Along with crop monitoring, an urgent need is there for a good irrigation system. The lack of rain and unpredictability in the environment generate an_urgent_need for proper utilization of water. Therefore, sensors are used to check the moisture in the soil and temperature of the surrounding in order to check the water level. According to the level, the motor is switched on or off. With the help of technology, it will increase the production and reduce the man power. It will create a system that is low in cost and efficient in finding the sensored data that will help the user to turn the motor on and off. One of the efficient irrigation method is the drip irrigation system where the water is allowed to drip slowly to the roots of the plants. Here, we are using a Smart Drip Irrigation and Monitoring System that basically works in two conditions, the first is under Irrigation and the second is over irrigation. It monitors the field. It can be operated in two modes: i. Power Supply Mode: Power Supply Mode comes into use when there is bad weather condition and the battery does not have the stored energy. ii. Solar Energy Mode: Solar Energy Mode uses solar panel for the working of the system. The energy consumed by the solar panel is stored in a rechargeable battery that could be further utilized. The working modes are available that can be switched according to the needs. “Solar Energy mode is applicable for conserving electricity by reducing the usage of grid power and conserves water by reducing water losses. It is the best utilization of solar power. But, if there is no solar energy available then power supply mode is turned on. This System not only presents an Automated Irrigation System but comforts the user with extraordinary features also. The user is provided with an user interface i.e. an android application that can be easily operated on mobile phones. It helps in providing data visualization, data understanding, and the predictive analysis. It also facilitates farmers to look forward to advanced data in the future. This system works using wi-fi module that transmits the data to the cloud and notifies

Field Monitoring and Automation in Agriculture Using Internet …

133

the user regarding field condition. Due to this alert, the user can improve the condition of field by taking some improvement measures.

2 Related Works The related work explains the use of the utilization of the internet of things in various applications including various approaches of controlling and monitoring that are as follows: S. no.

Author’s

Title

Findings

1

Vineela, Mrs. T.; NagaHarini, J.; Kiranmai, Ch.; Harshitha, G.; AdiLakshmi B.

“IoT Based Agriculture Monitoring and Smart Irrigation System Using Raspberry Pi”

Developed a system where sensors and smart irrigaton system is linked through wireless communication technology. The system proposed is low in cost and uses efficient methods to measure the soil moisture, humidity, temperature so as to control water level

2

‘Suma, Dr. N.; “IoT Based Smart ‘Sandra Rhea Agriculture Samson’, Saranya, S.; Monitoring System” Shanmugapriya, G.; Subhashri, R.

Propose a system that has GPS based monitoring, sensing moisture & temperature, intruders scaring.Wi-Fi, camera are used so as to have good intruder detection.

3

‘Dr. M. Suchithra’, ‘Asuwini T.’, ‘Charumathi M. C.’, ‘Ritu N. Lal

“Sensor Data Validation”

Many sensors that sense temperature, humidity, moisture and fertility in the farm. The actual and saved data is then compared and sent to the WI-FI module and from that it is sent to mobile or laptop using cloud. Notification is sent to the farmers

4

‘M. Jagadesh’, Rajamanickam, S.; Saran, S. P.; Shiridi Sai, S.; Suresh, M.

“Wireless Sensor Network Based Agricultural Monitoring System”

Proposed system where sensors are used to monitor the field. Soil moisture is determined by soil moisture sensor, the humidity in the atmosphereis sensed through humidity sensor, the temperature is analysed through temperature sensor and the water level is checked through water level sensor. For transmittion of data Arduino via Zigbee is used. Raspberry Pi is used for processing the data (continued)

134

A. K. Saxena et al.

(continued) S. no.

Author’s

5

Sreekantha, Dr. D. K.; “Agricultural Crop Develop IoT system where sensors are ‘Kavya. A. M. Monitoringusing IoT- used to monitor the field. Soil moisture A Study” is determined by soil moisture sensor, the humidity in the atmosphere is sensed through humidity sensor, the temperature is analysed through temperature sensor and the waterlevel is checked through water level sensor. For transmittion of data Arduino via Zigbee is used. Raspberry Pi is used for processing the data

Title

Findings

6

‘Mohanraj I*a’, ‘Kirthika Ashokumarb’, ‘Naren Jc’

“Field Monitoring and Automation using IoT in Agriculture Domain”

created an e-Agriculture Application that is based KM-Knowledge base and Monitoring modules. To gain more profits, farmers need good information in the whole farming process.

7

‘Karan Kansara1’, ‘Vishal Zaveri’, ‘Shreyans Shah’, ‘Sandip Delwadkar’, ‘Kaushal Jani’

“Sensor based Automated Irrigation System withIOT: A Technical Review”

Developed smart irrigation where time, money and water is saved to so much extent that helps in having a good system

8

‘M.K. Gayatri’, ‘J. Jayasakthi’, ‘Dr. G.S. Anandhamala’

“Providing Smart Agriculture Solutions to Farmers for Better Yielding Using IoT”

Developed a system where the data is measured through sensors. The sensor data is transmitted to the processor and Bluetooth Low Energy (BLE). Real time data is collected through cloud and its main objective is to have a system that consumes less water and energy

9

Dwarkani M., Chetan; Ram R., Ganesh; S., Jagannathan; Priyatharshini, R.

“Smart Farming System Using Sensors for Agricultural Task Automation”

Develop a system for smart farming where sensors and smart irrigator systemare linked through wireless communication technology. It measures the physical parameter that gives a clear vision to farmers

10

Tang Kha Duy, Nguyen

“Automated Monitoring and Control System forShrimp Farms Based on Embedded System and Wireless SensorNetwork”

Proposed a good solution so as to improve the accuracy in monitoring the environmental conditions which will also reduce human work. The component is capable of collecting, analyzing and presenting data on a Graphical UserInterface (GUI). The proposed system saves money since hiring of labor is reduced and electricity usage as well (continued)

Field Monitoring and Automation in Agriculture Using Internet …

135

(continued) S. no.

Author’s

Title

Findings

11

Ji-hua, Meng;Bing-fang, Wu Li Qiang-zi

“A Global Crop GrowthMonitoring System Based on Remote Sensing”,

Develop an IOT system where sensors are used to monitor the field. Soil moisture is determined by soil moisture sensor, the humidity in the atmosphere is sensed through humidity sensor, the temperature is analyzed through temperature sensor and the waterlevel is checked through water level sensor. For transmission of data Arduino via Zigbee is used. Raspberry Pi is used for processing the data

12

Chetan Dwarkani M., Ganesh Ram R., Jagannathan S., R. Priyatharshini,

“Smart Farming System Using Sensors for Agricultural Task Automation”

A methodology is proposed that works as smart farming and smart irrigator system by connecting the sensor via wireless communication technology

13

Moghaddam M., Entekhabi D., Goykhman Y., Li K., Liu M., Mahajan A., Nayyar A., Shuman D., Teneketzis D.

“Wireless soil moisture smart sensor web using physics-based optimal control: Concept and initial demonstrations”

Developed a “Smart wireless sensor web technology based system”, that measure soil moisture using sensors. The system has a soil moisture sensor and XBee pro-module

14

Suprita Patila, M. vijayalashmi, Rakesh Tapaskar

“Solar Energy Monitoring System using IoT”

Proposed a system that refers to the online display of the power usage of solar energy as a renewable energy. This monitoring is done through raspberry pi using flask framework. Smart Monitoring displays daily usage of renewable energy

15

G. Parameswaran, K. “Arduino Based Sivaprasath Smart Drip Irrigation System Using Internet of Things “

Proposed a project that helps the farmers to irrigate the farmland in an efficient manner with automated drip irrigation system based on soil humidity

16

P. Divya Vani and K. Raghavendra Rao

Data corresponding to the moisture contents of red and black soils were uploaded to the AT&Ts’ M2X Cloud and accessed online and monitored the humidity levels on the screens of the mobile/laptop

“Measurement and Monitoring of Soil Moisture using Cloud IoT and Android System”

136

A. K. Saxena et al.

3 IoT Technologies for Field Monitoring in Agriculture IoT first-rate agriculture products are designed to assist show crop fields victimization sensors and via automating irrigation systems. As a result, farmers and associated producers will, without a doubt, show the sphere conditions from everywhere with none hassle. Take a glance at the several uses of IoT in agriculture via technique that of varied IoT solutions:

3.1 Drones in Agriculture Drones are uncrewed ethereal vehicles (otherwise called UAVs), which are utilized for reconnaissance in different enterprises. Till now, they were principally utilized by organizations working in modern areas like mining and development, armed force, and specialists. Presently, drone innovation is progressively accessible for use in different areas of farming too. However the innovation is as yet early in India, many organizations are attempting so it is effectively accessible to Indian ranchers and fit to be utilized to increment proficiency in farming creation. A ton of robot based horticultural undertakings are going through in India. Think about the accompanying genuine situations: On 26th January 2022, the Government of India has likewise delivered a certificate plot for rural robots, which can now convey a payload that does exclude synthetic compounds or different fluids utilized in showering drones. Such fluids might be showered by adhering to material guidelines and guidelines. On 23rd January 2022, to advance the utilization of robots for rural purposes and diminish the work trouble on the ranchers, the public authority of India has as of late offered, a 100% endowment or 10 lakhs, whichever is less, up to March 2023 to the Farm Machinery Training and Testing Institutes, ICAR Institutes, Krishi Vigyan Kendras and State Agriculture Universities. Moreover, a possibility asset of Rs. 6000/ha will likewise be set in the mood for recruiting Drones from Custom Hiring Centers (CHC). The endowment and the possibility subsidizes will help the ranchers access and embrace this broad innovation at a reasonable cost. On sixteenth November 2020, the Indian government conceded the International Crops Research Institute (ICRISAT), for utilization of robots for farming examination exercises. With this move, the public authority desires to energize maturing scientists and business visionaries to check out at financial plan well disposed drone answers for more than 6.6 lakh Indian towns. However the use will be contingent, yet it is a progressive step. Golden Dubey, Joint Secretary, Ministry of Civil Aviation, stressed that robots are ready to assume a major part in horticulture, particularly in regions including accuracy farming, improvement in crop yield, and beetle control.

Field Monitoring and Automation in Agriculture Using Internet …

3.1.1

137

Benefits of Using Drones in Agriculture

The utilization of robot innovation in agribusiness is digging in for the long haul. As indicated by ongoing examination, the worldwide robot market inside farming would develop at 35.9% CAGR and reach $5.7 billion by 2025. The utilization of robot innovation in agribusiness is digging in for the long haul. This arising innovation can assist with decreasing time and increment the efficiencies of the ranchers. The utilization of robots in the agrarian area is simply expected to ascend as the business develops, thus it is great to know how to sensibly utilize this innovation.

3.1.2

Soil and Field Examination

For effective field arranging, horticultural robots can be utilized for soil and field examination. They can be utilized to mount sensors to assess dampness content in the dirt, landscape conditions, soil conditions, soil disintegration, supplements content, and fruitfulness of the dirt.

3.1.3

Crop Observing

Crop observation is the management of yield progress from the time seeds are planted to the ideal opportunity for gather. This incorporates giving manures with flawless timing, checking for bug assault, and observing the impact of weather patterns. Crop observation is the main way that a rancher can guarantee an ideal gather, particularly while managing occasional yields. Any mistakes at this stage can bring about crop disappointment. Crop reconnaissance helps in understanding and making arrangements for the following cultivating season. Robots can help in powerful yield observation by reviewing the field with infrared cameras and in light of their constant data, ranchers can go to dynamic lengths to work on the state of plants in the field.

3.1.4

Estate

Robots can help in establishing trees and yields, which was finished by ranchers previously. This innovation won’t just save work yet in addition will help in saving fills.

138

3.1.5

A. K. Saxena et al.

Animals the Board

Robots can be utilized to screen and oversee tremendous domesticated animals as their sensors have high-goal infrared cameras, which can identify a debilitated creature and quickly make moves in like manner. Thus, the effect of robots on accuracy dairy cultivating is soon to turn into another ordinary effect.

3.1.6

Crop Showering

Agri-robots can be utilized to splash synthetics as they have repositories, which can be loaded up with composts and pesticides for showering on crops in almost no time, when contrasted with conventional techniques. Subsequently, drone innovation can introduce another period for accuracy horticulture.

3.1.7

Actually Take a Look at Crop Wellbeing

Cultivating is a huge scope action that happens over sections of land. Consistent overviews are important to screen the strength of the dirt and the yield that has been planted. Physically, this might require days, and, surprisingly, then, there is space for human mistake. Robots can do similar work surprisingly fast. With infrared planning, robots can accumulate data about both the strength of the dirt and the harvest.

3.1.8

Stay Away from Abuse of Synthetic Substances

Robots can end up being particularly powerful in decreasing the abuse of pesticides, insect poisons, and different synthetics. These synthetics without a doubt help to safeguard the yield. Be that as it may, their abuse can end up being negative. Robots can distinguish minute indications of bug assaults, and give exact information with respect to the degree and scope of the assault. This can assist ranchers with working out the necessary measure of synthetics to be utilized that would just safeguard the harvests as opposed to hurting them.

Field Monitoring and Automation in Agriculture Using Internet …

3.1.9

139

Plan for Climate Misfires

Weather patterns can end up being a rancher’s dearest companion and most horrendously terrible foe. Since these can’t be precisely anticipated, it turns out to be very hard to get ready for any change in designs. Robots can be utilized to recognize impending atmospheric conditions. Storm drones are now being utilized to improve expectations. Also, this data can be utilized by ranchers to be more ready. Early notification of tempests or absence of downpour can be utilized to design the harvest to be established that would be most ideal to the season, and how to deal with established crops at a later stage.

3.1.10

Screen Development

In any event, when everything is working out as expected, crops should be studied and observed to guarantee that the perfect proportion of yield will be accessible at the hour of collect. It is likewise significant for future preparation, whether it is tied in with deciding the right cost for the open market, or reaping repeating crops. Robots can give exact information about each phase of yield development, and report any varieties before they become an emergency. Multispectral pictures can likewise give exact data about inconspicuous contrasts among solid and undesirable harvests that might be missed by the unaided eye. For instance, focus on harvests will mirror less close infrared light when contrasted with solid yields. This distinction can’t be recognized by the natural eye generally. Yet, robots can give this data in the beginning phases.

3.1.11

Geofencing

The warm cameras introduced over robots can without much of a stretch distinguish creatures or people. In this way, robots can monitor the fields from outside harm brought about by creatures, particularly around evening time. For security purpose, the robots are worked via prepared drone pilots.

3.2 Remote Sensing in Agriculture Remote sensing in agriculture refers to the acquisition of data from agricultural material or product while not touching them. It becomes a really valuable tool in monitoring, evaluation and management of agricultural resources. The development has several applications that admire geology, surveying, forestry, photography and many additional ones. However, the sphere of agriculture has more uses for remote sensing than most sectors. The foremost common tool that you will use for remote sensing is craft or a satellite. Besides, the recent advancements in technology have

140

A. K. Saxena et al.

led to the event of refined drones with high-end cameras and sensors. The drone will travel at a high speed whereas necessary information is obtained from the farms. Furthermore, drones can take shut the aerial footage that shows elaborate information on the plants. For instance, a drone can go terribly low to require an attempt while planes cannot. However, that doesn’t render those tools useless within the agricultural sector. For example, satellites and aircrafts can collect additional information over a good space which might be terribly helpful in Agriculture. A decent example is once finding out the patterns of weather.

3.2.1

Applications of Remote Sensing in Agriculture

There are few applications of remote sensing: i. Assessment of crop progress or harm Remote sensing technology is generally utilized in Agriculture to access progress or damage. It helps to asses, particularly within large farms. All data is saved in storage that researchers and farmers use to check the crops. ii. Crop identification has been created straightforward by the remote sensing technology. As an example crops beneath observation may show mysterious characteristics. Information is collected from such instances and sent to the laboratory for more studies. Within the lab, new aspects are studied to induce correct results. One example of such a side is finding out crop culture. iii. Crop yield estimation and modelling consultants and farmers can simply predict the number of yield which will be created by sure resources. Furthermore, they will predict the amount of your time to be taken when alternative resources to be used. Remote sensing data permits farmers to model and develop higher strategies which will probably improve the yield. iv. Pests and sicknesss infestation identification Remote sensing has contended a significant role once it involves the management of pests in agriculture. It helps with the identification of disease and pests thus returning up with the proper mechanisms. Information collected over time is compared to match diseases victimization the symptoms. v. Soil wet estimation measurement for detecting soil moisture couldn’t be easier than using remote sensing techniques. It works by exchanging data with the soil moisture thus determining the quantity. Information drawn from such activities is analyzed to see the proper crops to grow. vi. Crop production prognostication: Farmers can predict their production comeback over an calculable space in an exceedingly given season. regardless of what the season is, they will simply acquire data of their projected returns. Besides, consultants are ready to predict the standard that yields will turn out beneath any conditions. vii. Planting and harvesting date identification in the past harvesting, and date estimation was done by guesswork. However, with remote sensing, there’s enough data to predict the exact harvest data. Due to the prognosticative feature

Field Monitoring and Automation in Agriculture Using Internet …

viii.

ix.

x.

xi.

xii.

xiii.

xiv.

xv.

141

of this technology farmers can prepare on time. It is able to do that by finding out weather patterns and considering alternative factors admiring soil. Soil mapping is incredibly necessary once it involves the assembly of high yields during this field. Higher soil mapping results in high-quality yields. Remote sensing provides important information that may be accustomed to match crops with the simplest soils. Irrigation management-Since remote sensing provides details of the soil moisture, farmers manage irrigation with ease. With this information, one may also establish elements that haven’t received proper amounts of water. Drought monitoring involves measuring changes in temperature, precipitation, and surface and groundwater supplies, among other factors. Draught patterns aren’t excluded and thus the information is analyzed and transmitted to farmers for awareness. Rain patterns are monitored in such a simple way that you just can tell the distinction between two rainfalls. Crop condition assessment of the health of every crop is simple to hold out victimization remote sensing. If any crop is stressed, remote sensors collect information and send it to the personnel in charge. Consultants use this data to see the standard of the crops. Crop area estimating the farmland on that crops are planted manually may be a terribly arduous task. However, with remote sensing techniques, one can simply puzzle out such details at intervals in a brief time. Land mapping may be utilized in many various ways in Agriculture. Remote sensing collects information that helps to map lands for agricultural functions. For instance, it can be for landscaping or crop growth. As a result if different soils are helpful for various purposes then the land mapping is incredibly necessary. Preciseness farming: Remote sensing has helped with the development of precision farming over the years. Preciseness farming is very important because it produces healthy crops. Furthermore, the crops grow at intervals in the given period thus optimum harvests are realized. Temperature change: Watching temperature change can have an effect on agriculture in each a positive or negative way. However, farmers can capitalize victimization remote sensing techniques. They track climatical changes and use them to their advantage. For instance, they plant crops that may perform well betting on the climate. Furthermore, they will devise ways to build some drought resistant crops. Consultants observe the climate to determine crops which will perform higher within the affected areas.

3.3 Computer Imaging in Agriculture Computer imaging entails the employment of detector cameras installed at exclusive corners of the farm or drones prepared with cameras to offer photographs that bear virtual photo manner. Digital photo processing is the primary theory of processing

142

A. K. Saxena et al.

Associate in Nursing enter photo victimization compter algorithms. Farming can be one in all of the oldest trades in the world, despite the fact that growing technology are turning agriculture right into an exact science. Artificial Intelligence, Machine Learning andComputer Vision programs are presently used to choose under discipline situations, soil moisture, set up crop disease, are expecting climate and crop yields. Computer Vision in agriculture helps farmers to shape better understanding of selections and accumulate big quantities of data that wasn’t accessible even multiple years ago. Satellite imaging additionally as UAV pictures assist to examine considerable tracts of landand enhance farming practices. Let’s take a look abut the maximum current technology make farming extra productive, good value and less hard work in depth. Growing Crop Yields by making use of AI andComputer Vision fashions farmers can display crop boom in close to-actual time.

4 Proposed Automated System Model for Agricuture There are five major components in proposed System Model (shown in Fig. 1); . . . . .

Micro Controller chip Sensor Relay Wifi Actuator

/

SENSOR

POWER SUPPLY

MICROCONTROLLER CHIP

WI-FI

CLOUD

MOBILE APPLICATION

Fig. 1 Proposed block diagram of system model

RELAY

ACTUATOR

Field Monitoring and Automation in Agriculture Using Internet …

143

4.1 Proposed System Block Diagram The sensors are collecting the data from the environment which is processed by a micro controller. The processed data is then put on cloud through wifi. Now, this data can be accessed from anywhere via android app. There is a relay that works as the switch for operating the motor. With the help of the relay, turning on and off of the motor takes place. Hardware Used: (a) Sensor:

The first hardware that comes into use is the Sensor. Sensor is the key part of the project. It senses the data. The sensed data is then put on cloud so that it could be accessed via any device from anywhere. Device could be a mobile phone, laptop or any machine with internet.Sensor can also be grouped inorder to monitor a device thoroughly. A soil moisture sensor (shown in Fig. 2) is a device used to measure the amount of water present in soil. The sensor works by measuring the electrical resistance between two or more electrodes that are inserted into the soil. The resistance of the soil is directly related to the moisture content, so by measuring the resistance, the sensor can determine the moisture level of the soil. Soil moisture sensors are commonly used in agriculture and horticulture to ensure that plants receive the proper amount of water. By monitoring soil moisture levels, farmers and gardeners can avoid overwatering or underwatering their plants, which can lead to poor growth, disease, and other problems. There are various types of soil moisture sensors available, including capacitance-based sensors, time domain reflectometry (TDR) sensors, and gypsum block sensors. Each type of sensor has its own advantages and disadvantages, depending on factors such as the soil type, the application, and the desired level of accuracy.

144

A. K. Saxena et al.

Fig. 2 Soil moisture sensor

(b) Micro Controller Chip:

A micro controller is a small computer on a single IC. It contains one or two CPUs along with memory and programmable i/p o/p peripherals. A microcontroller chip, also known as a microcontroller unit (MCU), is a small computer on a single integrated circuit (IC) that contains a processor core, memory, and input/output (I/O) peripherals. Microcontrollers are commonly used in embedded systems to control devices or processes. Microcontroller chips are designed for low-power and low-cost applications, making them suitable for a wide range of applications, including automotive, medical, industrial, consumer electronics, and more. They typically have a limited amount of memory and processing power compared to general-purpose computers, but they are optimized for real-time processing and control tasks. There are many different microcontroller architectures available, with the most popular being based on the 8-bit, 16-bit, and 32-bit instruction sets. Some common microcontroller manufacturers include Atmel, Microchip, Texas Instruments, and STMicroelectronics. Arduino Uno (shown in Fig. 3) is an open-source microcontroller board based on the Atmel AVR microcontroller. It is a popular platform for creating electronic projects and prototypes due to its

Field Monitoring and Automation in Agriculture Using Internet …

145

ease of use and wide availability of sensors, actuators, and other components. The board has digital input/output pins, analog inputs, and can be programmed using the Arduino software. It is often used in a variety of applications such as robotics, home automation, and Internet of Things (IoT) projects.

Fig. 3 Arduino uno

(c) Relay: A relay (shown in Fig. 4) is a switch that is electromagnetically operated. A switch is meant for telling the actual stage of the device. It shows whether the device is in on mode or off mode. (d) Actuators: Actuators (shown in Fig. 5) is a component of a machine that is responsible for moving and controlling a mechanism.

5 Work Flow of System Model The Proposed workflow (shown in Fig. 6) is divided into two parts:

146

A. K. Saxena et al.

Fig. 4 Relay

Fig. 5 Actuator

. Field quality analysis . Irrigation System

5.1 Field Quality Analysis In Fig. 7, we have discussed the field quality analysis. In this, the System starts and the crop is selected. For selecting the crop, the data is compared. The actual data is

Field Monitoring and Automation in Agriculture Using Internet …

147

Fig. 6 Proposed workflow methodology of system model

compared from the ideal data that is stored in the database. If ideal value is equal to the actual value then success alert is received, otherwise Quality mismatch alert is generated and the system is stopped.

Fig. 7 Proposed flow chart for field quality analysis

148

A. K. Saxena et al.

Fig. 8 Proposed flow chart for irrigation system

5.2 Irrigation System In the second part, irrigation of the field (shown in Fig. 8) is done. The system starts and the sensor value is fetched from the database. Then this value is compared, and if irrigation is required then motor will be on. Otherwise, it will be checked if motor is on or off. If on, the switch off the motor and if already off then delay will be generated so as to recheck the motor.

5.3 System Design The system model (shown in Fig. 9) is framed in such a way that the energy consumed by the solar panel is stored in the battery and this rechargeable battery is capable of being used with solar energy and extra battery power. The Arduino uno is operated via a rechargeable battery and the data that is sensed by sensor is stored in the database and is put on cloud and accessed by the user via mobile application. This is the system design that comprises of the following parts: . Battery It is a device that has two electrochemical cells that give power to other devices. Here, it is a rechargeable battery that can be used for charging other devices and can also store the power. . Solar Panel

Field Monitoring and Automation in Agriculture Using Internet …

149

Fig. 9 System design of proposed sysem model

It is a device that takes sunlight as an input and generates electricity. A 6V solarpanel is used to charge a 3.4 V battery. This takes light energy to convert into electrical energy. . NodeMCU 1.0 NodeMCU 1.0 is the open source platform that is used as software and hardware for the easy use. Here, it is used as the software for reading inputs through sensors and serve it as output such as turning on the led, activating a motor etc. . Sensors The first hardware that comes into use is the Sensor. The Sensor is the key part of the project. It senses the data. The sensed data is then put on cloud so that it could be accessed via any device from anywhere. Device could be a mobile phone, laptop or any machine with internet.Sensor can also be grouped inorder to monitor a device thoroughly.

150

A. K. Saxena et al.

5.4 Irrigation System . This is the second section of the system where water level of the field is determined using the soil moisture sensor. . Irrigation is done according to the amount of water required by the crop fields. And depending on that, the motor is turned off or on. . The data is collected via the sensor that tells the actual value of moisture present in the soil which in turn operates the motor. . Due to automatic Irrigation System,unnecessary use of water does not take place.It helps in providing the required amount of water to the fields at the correct time.

6 Hardware Setup for Proposed System Model Hardware setup (shown in Fig. 10) involves NodeMcu, DH11 Sensor, Soil moisture Sensor, relay and water motor. The sensors are connected to the board and relay lower terminal is connected to the board and No and COM are connected to motor wire. There are two samples: (i) Dry soil (ii) Wet soil. The board is powered by USB cable and motor is charged through electricity. Two soil samples are taken to measure the soil moisture through soil moisture sensor.

Fig. 10 Hardware setup for proposed system model

Field Monitoring and Automation in Agriculture Using Internet …

151

Fig. 11 Home screen of mobile application

7 Android Mobile Application for Monitoring the Work Flow The sensored data is shown in the mobile application (shown in Fig. 11). All the values regarding field can easily be monitored via application. User can easily monitor the motor on/off via app only. If the value shown is 1 then the motor is on. And if the value shown is 0 then the motor is off. This Home page involves the Information about Soil Moisture, Temperature, Humidity and Irrigation System. The major adavantages of this anroid mobile application are: . The sensed data will be depicted. . This data is refreshed after every 10 s. . It will help the user to have real time information.

8 Getting Alerts for Motor On/Off via Mobile Application User could get the alerts about the functioning of motor via android application. This feature of app can help in reducing the wastage of water.

152

A. K. Saxena et al.

Motor off: This is the condition where the column of irrigation system is occupied by value ‘0’. It shows that the motor is off, which will help the user to have real time information aboutthe water supply. Motor on: This is the condition where the column of irrigation system is occupied by value ‘1’. It shows that the motor is on, which will help the user to have real time information about the water supply. Crop Data: Crop Data (shown in Fig. 12) consists of data of crops that is taken as sample input in the system. User can grow through this data for the sake of knowledge. Proposed Algorithms Used for Data Analysis Algorithm for Sensing Soil Moisture Step 1: Start Step 2: Initialise variable soil_moisture, output_value. Step 3: Set soil_sensor pin to output. Step 4: Set baud rate to 9600.

Fig. 12 Crop data

Field Monitoring and Automation in Agriculture Using Internet …

153

Step 5: Provide delay of 2000 ms. Step 6: Read the output values. Step 7: Print the Moisture value. Step 8: Provide delay of 1000 ms. Step 9: Go to step 6. Step 10: End Algorithm for Sensing Temperature and Humidity Step1: Start Step 2: Initialize variable dht, humidity, temperature. Step 3:Set baud rate to 9600. Step 4: Set Output pin to D5. Step 5: Set Low to D5. Step 6: Provide delay of 2000 ms. Step 7: Read the humidity and temperature value. Step 8: Print the humidity and temperature value. Step 9: Convert the temperature to Fahrenheit. Step 10: Provide delay of 1000 ms. Step 11: Go to step 7. Step 12: End.

9 Conclusion The main focus is to study theField Monitoring and Automation in Agricuture using IoTto aid the user in having a good idea of the crop field and have good control over the irrigation system. This system is responsible for monitoring the fields parameter such as soil moisture, temperature, humidty, etc. which lets the user go through a good quality check of crop fields. It also involves controlling of irrigation. Earlier, systems were developed that either worked as the Automatic Irrigation System or Field Monitoring System. They were operated using battery. They were not affordable and stopped working in case of power cut. Solar Energy Monitoring Systems were also developed that analysed the renewable energy usage. It only monitored the solar energy usage and provided the analysis for further expension. But,“Field Monitoring And Automatic Irrigation system using IoT” serves as both Automatic Irrigation System and Field Monitoring System where it is operated in the power supply mode and solar power mode. Using solar energy makes the system afforadable. And, it solves the problem of regular power cut that usually happens in villages. It helps in monitoring the crop field paramaters which determine the quality of the field. After knowing the parameter values, the user can put efforts in lacking quality of the field so as to make the crop field appropriate for growth. Monitoring of fields is done on a regular basis. Along with monitoring, there is an automatic irrigation system that helps in providing appropriate amount of water to the fields. Automatic on/off of motor is taking place according to the field conditions. When there is enough water in

154

A. K. Saxena et al.

the fields then motor will be in switched off state otherwise it will be on. The automatic irrigation of fields helps in reducing unneccassary use of water and it supports the farmers in good irrigation of crop fields. Field Monitoring and Automatic Irrigation system using IoT is easily accessible via mobile application. It could be easily used by the user. It depicts all the data that is sensed by the sensor. The major advantage of developing a mobile app is that the user can access the field information and data from anywhere, just via a mobile application downloaded on his phone. The Mobile app comprises of the crop data that will aid the user in getting knowledge about the crops. This information helps in having more cultivative and productive fields. SMS alerts are sent to the user incase crop field quality is mismatched. Notification alerts let the user get real time information about the crop fields.

References 1. Vineela, T., NagaHarini, J., Kiranmai, C., Harshitha, G., AdiLakshmi, B.: IoT based agriculture monitoring and smart irrigation system using Raspberry Pi. Int. Res. J. Eng. Technol. (IRJET) 05(01), (2018) 2. Suma, N., Sandra, S.R., Saranya, S., Shanmugapriya, G., Subhashri, R.: IOT based smart agriculture monitoring system. Int. J. Recent Innov. Trends Comput. Commun. 5(02), (2018)