Data Analytics and Learning: Proceedings of DAL 2022 (Lecture Notes in Networks and Systems, 779) 9819963451, 9789819963454

This book presents new theories and working models in the areas of data analytics and learning. The papers included in t

111 78 9MB

English Pages 265 [260] Year 2024

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Data Analytics and Learning: Proceedings of DAL 2022 (Lecture Notes in Networks and Systems, 779)
 9819963451, 9789819963454

Table of contents :
Preface
Contents
Editors and Contributors
Two-Stage Word Spotting Scheme for Historical Handwritten Devanagari Documents
1 Introduction
2 Related Works
3 The Proposed Methodology
3.1 Preprocessing and Word Segmentation
3.2 Feature Extraction
3.3 Word Spotting Using Combined Hybrid Approach
4 Experiments and Results
5 Conclusion
References
3D Object Detection in Point Cloud Using Key Point Detection Network
1 Introduction
2 Related Works
3 Proposed Method
4 Experiment
5 Results
6 Conclusions
References
IRIS and Face-Based Multimodal Biometrics Systems
1 Introduction
2 Literature Survey
2.1 Iris Biometrics Systems
2.2 Face Biometrics Systems
2.3 Face-Iris Multimodal Biometrics Systems
3 Methodology
3.1 Iris Biometrics System
3.2 Face Biometrics System
3.3 Multimodal Biometrics System Using Face and Iris
4 Experimental Results and Analysis
5 Conclusion
References
A Survey on the Detection of Diseases in Plants Using the Computer Vision-Based Model
1 Introduction
2 Key Issues and Challenges in the Field of Disease Analysis
3 Literature Survey
4 Conclusion
References
An Approach to Conserve Wildlife Habitat by Predicting Forest Fire Using Machine Learning Technique
1 Introduction
2 Literature Survey
2.1 Supervised and Ensemble Machine Learning Algorithm
2.2 Data Mining Techniques to Predict Forest Fires
2.3 Data Mining Techniques to Forecast Smoulder Surface of Forest Fires
2.4 Parallel SVM Based on MapReduce
2.5 Prediction of Forest Fire Danger Using ANN and Logistic Regression
3 Implementation
3.1 SVM Algorithm
3.2 Decision Tree
3.3 Random Forest
4 Results
5 Conclusion
References
A Method to Detect Phishing Websites Using Distinctive URL Characteristics by Employing Machine Learning Technique
1 Introduction
2 Literature Survey
3 Existing System
4 Proposed System
5 Results
6 Conclusion
References
Aquaculture Monitoring System: A Prescriptive Model
1 Introduction
2 Related Work
3 Methodology
3.1 Experimental Setup and Data Collection
3.2 Prediction of Water Quality Parameters
3.3 Fish Detection
3.4 Depth Estimation
3.5 Size Estimation
4 Results and Discussion
4.1 Water Quality Parameters
4.2 Fish Length Estimation
4.3 Web-Based Application
5 Conclusion
References
Machine Learning-Based Pattern Recognition Models for Image Recognition and Classification
1 Introduction
2 Literature Review
3 Pattern Recognition Models
4 Methodology
5 Performance Measuring
6 Conclusion
References
A Review of Silk Farming Automation Using Artificial Intelligence, Machine Learning, and Cloud-Based Solutions
1 Introduction
2 Research Methodology
3 Results and Discussion
3.1 Egg Counting
3.2 Cocoons Deformity Detection
3.3 Sex Determinations
3.4 Smart Monitoring
3.5 Pebrine Management
4 Conclusion
5 Future Works
References
Comparative Analysis of Generic Outlier Detection Techniques
1 Introduction
2 Techniques Used
2.1 KNN—K-Nearest Neighbour Algorithm
2.2 Isolation Forest Algorithm IFOR
2.3 Gaussian Mixture Model GMM
3 Methodology
3.1 KNN—K-Nearest Neighbour Algorithm
3.2 Isolation Forest Algorithm IFOR
3.3 Gaussian Mixture Model GMM
4 Conclusions
References
Bilingual Visual Script Proof Based on Pre-trained Clustering and Neural Network
1 Introduction
1.1 Scope and Purpose
1.2 Process Overview
2 Description of Stages
2.1 Pre-processing
2.2 Image Segmentation
2.3 Feature Extraction
2.4 Classification and Recognition
3 Analysis and Results
4 Conclusion
5 Future Scope
References
Nitrogen Deficiency and Yield Estimation in Paddy Field
1 Introduction
2 Literature Survey
3 Methodology
3.1 Pre-processing
3.2 Hierarchical Color Segmentation for Region Selection
3.3 Feature Extraction
3.4 SVM Classifier
3.5 Yield Estimation
4 Experimental Result
4.1 Database
4.2 Experimental Setup
5 Conclusion
References
Medical Image Compression Using Huffman Coding for Tiff Images
1 Introduction
2 Materials and Methods
3 Results and Discussion
4 Conclusion
References
Efficient Wavelet Based Denoising Technique Combined with Features of Cyclespinning and BM3D for Grayscale and Color Images
1 Introduction
2 Denoising Methods
2.1 Wavelet Based Denoising
2.2 BM3D Filtering
3 Proposed Method
3.1 Grayscale Image Denoising
3.2 Color Image Denoising
4 Results
4.1 Performance Metrics
4.2 Experimental Results
5 Conclusion
References
PBRAMEC: Prioritized Buffer Based Resource Allocation for Mobile Edge Computing Devices
1 Introduction
2 Related Works
2.1 The Internet of Things(IoT)
2.2 The Deep Deterministic Policy Gradient Algorithm
2.3 Resource Allocation
2.4 Reinforcement Learning
2.5 Artificial Intelligence
2.6 Edge Computing
2.7 Problem Definition and Challenges
2.8 Motivation
2.9 Existing Approaches
3 System Design
3.1 Architecture
3.2 PBRAMEC Algorithm
4 Experimental Results and Analysis
5 Conclusion and Future Work
References
Surface Water Quality Analysis Using IoT
1 Introduction
1.1 General Overview
1.2 Introduction to Proposed System
2 Related Works
2.1 Works Related to IoT and Sensors
2.2 Works Related to Machine Learning and IoT
3 System Design
3.1 Components
3.2 Connections and Setup
4 Experimental Design and Methodology
4.1 Methodology
4.2 Algorithm
5 Results and Discussion
6 Conclusion
References
Children Facial Growth Pattern Analysis Using Deep Convolutional Neural Networks
1 Introduction
2 Proposed Model
2.1 Face Alignment and Pre-processing
2.2 Convolutional Neural Network (DCNN)-Based Feature Extraction
2.3 Distance Measure
3 Experimental Results and Analysis
3.1 Longitudinal Face Image Dataset
3.2 Experimental Setup
3.3 Evaluation
4 Conclusion
References
Classification of Forged Logo Images
1 Introduction
1.1 Related Works
2 Proposed Model
2.1 Binary Classification Model
2.2 Limitations of Binary Classification
2.3 A Hierarchical Approach Using Multi Classification Model
2.4 Dataset
2.5 Experimentation
2.6 Experimentation Analysis
3 Comparative Analysis
4 Conclusion
References
Detection, Classification and Counting of Moving Vehicles from Videos
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 Pre-processing
3.2 Vehicle Detection
3.3 Feature Extraction
3.4 Classification
3.5 Vehicle Counting
4 Experimentation
4.1 Datasets
4.2 Results
5 Conclusion
References
Face Recognition Using Sketch Images
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 Pre-processing
3.2 Feature Extraction and Classification
4 Experimentation
4.1 Datasets
4.2 Results
5 Conclusion
References

Citation preview

Lecture Notes in Networks and Systems 779

D. S. Guru N. Vinay Kumar Mohammed Javed   Editors

Data Analytics and Learning Proceedings of DAL 2022

Lecture Notes in Networks and Systems Volume 779

Series Editor Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

D. S. Guru · N. Vinay Kumar · Mohammed Javed Editors

Data Analytics and Learning Proceedings of DAL 2022

Editors D. S. Guru Department of Studies in Computer Science University of Mysore Mysore, India

N. Vinay Kumar NTT Data Services Bengaluru, Karnataka, India

Mohammed Javed IIIT Allahabad Allahabad, India

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-6345-4 ISBN 978-981-99-6346-1 (eBook) https://doi.org/10.1007/978-981-99-6346-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.

Preface

We write this message with deep satisfaction to the proceedings of the “Second International Conference on Data Analytics and Machine Learning 2022 (DAL’22)” held on December 30 and 31, 2022, at Moodbidri, Karnataka, India, which has the central theme “Data Analytics, Learning and its Applications.” Our research experiences in related areas for the last decade have inspired us to conduct DAL 2022. This conference was planned to provide a platform for researchers from both academia and industries where they can discuss and exchange their research thoughts to have better future research plans, particularly in the fields of data analytics and machine learning. Soon after we notified a call for original research papers, there has been a tremendous response from the researchers. There were 90 papers submitted, out of which we could accommodate only 22 papers based on the reports of the reviewers. Each paper was blindly reviewed by at least two experts from the related areas. The overall acceptance rate is about 25%. The conference is aimed at Data Analytics, Machine Learning, and Computer Vision. For all these areas, we got a number of papers reflecting their right combinations. We hope that the readers will appreciate and enjoy the papers published in the proceedings. We could make this conference a successful one, though it was launched at a relatively short notice. It was because of the good response from the research community and the good effort put in by the reviewers to support us with timely reviews. The authors of all the papers submitted deserve our acknowledgment. The proceedings are published and indexed by Springer-LNNS, which is known for bringing out this type of proceedings. Special thanks to them. We would also like to thank the help of Microsoft CMT in

v

vi

Preface

the submission, review, and proceedings creation processes. We are very pleased to express our sincere thanks to Springer, especially Shalini Selvam, Aninda Bose, and the editorial staff, for their support in publishing the proceedings of DAL 2022. Mysore, India

D. S. Guru General Chair, DAL’22

Bengaluru, India

N. Vinay Kumar Program Chair, DAL’22

Allahabad, India

Mohammed Javed Program Chair, DAL’22

Contents

Two-Stage Word Spotting Scheme for Historical Handwritten Devanagari Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. N. Sushma and B. Sharada

1

3D Object Detection in Point Cloud Using Key Point Detection Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pankaj Kumar Saini, Md Meraz, and Mohammed Javed

19

IRIS and Face-Based Multimodal Biometrics Systems . . . . . . . . . . . . . . . . Vaishnavi V. Kulkarni, Sanjeevakumar M. Hatture, Rashmi P. Karchi, Rashmi Saini, Shantala S. Hiremath, and Mrutyunjaya S. Hiremath

31

A Survey on the Detection of Diseases in Plants Using the Computer Vision-Based Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sowbhagya Takappa Pujeri and M. T. Somashekara

49

An Approach to Conserve Wildlife Habitat by Predicting Forest Fire Using Machine Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. Bhavatarini, S. Santhosh, N. Balaji, and Deepa Kumari

57

A Method to Detect Phishing Websites Using Distinctive URL Characteristics by Employing Machine Learning Technique . . . . . . . . . . . Deepa Kumari, N. Bhavatarini, N. Balaji, and Prashanth Kumar

67

Aquaculture Monitoring System: A Prescriptive Model . . . . . . . . . . . . . . . Pushkar Bhat, M. D. Vasanth Pai, S. Shreesha, M. M. Manohara Pai, and Radhika M. Pai Machine Learning-Based Pattern Recognition Models for Image Recognition and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. R Madhuri, Basavaraj N Jagadale, N. Salma, G. M. Akshata, Ajaykumar Gupta, and T. S. Chandrakantha

77

89

vii

viii

Contents

A Review of Silk Farming Automation Using Artificial Intelligence, Machine Learning, and Cloud-Based Solutions . . . . . . . . . . . . . . . . . . . . . . . 101 Chandrakala G. Raju, Somdyuti Sarkar, Varun Canamedi, J. Parameshwaranaik, and Sukhabrata Sarkar Comparative Analysis of Generic Outlier Detection Techniques . . . . . . . . 117 Kini T. Vasudev, M. M. Manohara Pai, and Radhika M. Pai Bilingual Visual Script Proof Based on Pre-trained Clustering and Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Sufola Das Chagas Silva E Araujo, V. S. Malemath, Uttam U. Deshpande, and Gaurang Patkar Nitrogen Deficiency and Yield Estimation in Paddy Field . . . . . . . . . . . . . . 137 Sharanamma M. Hugar and Mohammed Abdul Waheed Medical Image Compression Using Huffman Coding for Tiff Images . . . 151 Aziz Makandar and Rekha Biradar Efficient Wavelet Based Denoising Technique Combined with Features of Cyclespinning and BM3D for Grayscale and Color Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Aziz Makandar and Shilpa Kaman PBRAMEC: Prioritized Buffer Based Resource Allocation for Mobile Edge Computing Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Shabareesh Hegde, M. T. Shravan, and Adwitiya Mukhopadhyay Surface Water Quality Analysis Using IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 R. Nimishamba, Madhu R. Seervi, and Adwitiya Mukhopadhyay Children Facial Growth Pattern Analysis Using Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 R. Sumithra and D. S. Guru Classification of Forged Logo Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 C. G. Kruthika, N. Vinay Kumar, J. Divyashree, and D. S. Guru Detection, Classification and Counting of Moving Vehicles from Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Alfina Sunny and N. Manohar Face Recognition Using Sketch Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 T. Darshana, N. Manohar, and M. Chandrajith

Editors and Contributors

About the Editors D. S. Guru received his B.Sc., M.Sc., and Ph.D. degrees in Computer Science and Technology from the University of Mysore, Mysore, India, in 1991, 1993, and 2000 respectively. He is currently a professor in the Department of Studies in Computer Science, University of Mysore, India. He was a fellow of BOYSCAST and a visiting research scientist at Michigan State University. He has authored 80+ journals and 270+ peer-reviewed conference papers at international and national levels. His area of research interest covers image retrieval, object recognition, shape analysis, sign language recognition, biometrics, and symbolic data analysis. N. Vinay Kumar received his B.Sc., M.S., and Ph.D. degrees in Computer Science and Technology from the University of Mysore, Mysore, India, in 2009, 2012, and 2019, respectively. He is currently a senior data scientist in the NTT Data, Bangalore, India. He was a DST-INSPIRE fellow and a DST-SERB awardee for the year 2017. He has authored 25+ journals and peer-reviewed conference papers at international levels. His area of research interest covers machine learning, computer vision, data analytics, object recognition, biometrics, and symbolic data analysis. Mohammed Javed is working as an assistant professor in the Department of IT, IIIT Allahabad, since 2018, and has research experience of 7 years in the areas of image processing, pattern recognition, and compression of images and videos. He obtained his Ph.D. degree from the University of Mysore. He has published more than 50+ research papers. He is the recipient of SERB-DST young scientist travel grant for DCC2020, UTAH, USA. He also received Grant-In-Aid from SERB-DST, CSIR, DRDO, and ISRO several times for conducting different international events.

ix

x

Editors and Contributors

Contributors G. M. Akshata Department of PG Studies and Research in Electronics, Kuvempu University, Shimoga District, Karnataka, India N. Balaji NMAM Institute of Technology, Nitte (Deemed to be University), Udupi, India Pushkar Bhat Department of Information and Communication Technology, Manipal Institute of Technology, Manipal, India N. Bhavatarini School of Computer Science and Engineering, REVA University, Bengaluru, India Rekha Biradar Department of Computer Science KSAWU, Vijayapur, India Varun Canamedi Department of Medical Electronics Engineering, B.M.S. College of Engineering, Bengaluru, India M. Chandrajith Department of Computer Applications, Maharaja Institute of Technology, Mysore, India T. S. Chandrakantha Department of PG Studies and Research in Electronics, Kuvempu University, Shimoga District, Karnataka, India T. Darshana Department of Computer Science, School of Computing, Amrita Vishwa Vidyapeetham, Mysore, India Sufola Das Chagas Silva E Araujo PCCE, Goa University, Verna, India Uttam U. Deshpande VTU University, KLE, Belagavi, India J. Divyashree Dr. B R Ambedakar PG Centre, Suvarnagangotri, Chamarajanagara, India Ajaykumar Gupta Department of PG Studies and Research in Electronics, Kuvempu University, Shimoga District, Karnataka, India D. S. Guru Department of Studies in Computer Science, University of Mysore, Mysore, Karnataka, India Sanjeevakumar M. Hatture Nagarjuna College of Engineering, Bengaluru, Karnataka, India Shabareesh Hegde Department of Computer Sciences Amrita School of Computing, Amrita Vishwa, Vidyapeetham, India Mrutyunjaya S. Hiremath eMath Technology, Pvt. Ltd., Bangalore, Karnataka, India Shantala S. Hiremath Sony India Software Centre, Pvt Ltd., Bangalore, Karnataka, India

Editors and Contributors

xi

Sharanamma M. Hugar Department of CSE, Sharnbasva University, Kalaburagi, India Basavaraj N Jagadale Department of PG Studies and Research in Electronics, Kuvempu University, Shimoga District, Karnataka, India Mohammed Javed Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India Shilpa Kaman Department of Computer Science, KSAWU Vijayapura, Vijayapura, India Rashmi P. Karchi Nagarjuna College of Engineering, Bengaluru, Karnataka, India C. G. Kruthika Nitte Meenakshi Institute of Technology, Bangalore, India Vaishnavi V. Kulkarni Alva’s Institute of Engineering and Technology, Moodbidri, Karnataka, India Prashanth Kumar Alvas Institute of Engineering and Technology, Moodabidri, India Deepa Kumari NMAM Institute of Technology, Nitte (Deemed to be University), Udupi, India G. R Madhuri Department of PG Studies and Research in Electronics, Kuvempu University, Shimoga District, Karnataka, India Aziz Makandar Department of Computer Science, KSAWU Vijayapura, Vijayapura, India V. S. Malemath VTU University, KLE, Belagavi, India N. Manohar Department of Computer Science, School of Computing, Amrita Vishwa Vidyapeetham, Mysore, India M. M. Manohara Pai Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India Md Meraz Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India Adwitiya Mukhopadhyay Department of Computer Sciences Amrita School of Computing, Amrita Vishwa, Vidyapeetham, India; Department of Computer Science, Amrita School of Computing, Mysuru Campus, Mysuru, India R. Nimishamba Department of Computer Science, Amrita School of Computing, Mysuru Campus, Mysuru, India Radhika M. Pai Department of Data Science and Computer Application, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India;

xii

Editors and Contributors

Department of Information and Communication Technology, Manipal Institute of Technology, Manipal, India J. Parameshwaranaik Central Sericulture Research and Training Institute, Ministry of Textiles, Government of India, Berhampur, India Gaurang Patkar Don Bosco College of Engineering, Margao, Goa, India Sowbhagya Takappa Pujeri Department of Computer Science and Applications, Bangalore University, Bengaluru, India Chandrakala G. Raju Department of Information Science and Engineering, B.M.S. College of Engineering, Bengaluru, India Pankaj Kumar Saini Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India Rashmi Saini G. B. Pant Institute of Engineering and Technology, Pauri Garhwal, Uttarakhand, India N. Salma Department of PG Studies and Research in Electronics, Kuvempu University, Shimoga District, Karnataka, India S. Santhosh NMAM Institute of Technology, Nitte (Deemed to be University), Udupi, India Somdyuti Sarkar Department of Information Science and Engineering, B.M.S. College of Engineering, Bengaluru, India Sukhabrata Sarkar Central Sericulture Research and Training Institute, Ministry of Textiles, Government of India, Berhampur, India Madhu R. Seervi Department of Computer Science, Amrita School of Computing, Mysuru Campus, Mysuru, India B. Sharada Department of Studies in Computer Science, University of Mysore, Mysuru, India M. T. Shravan Department of Computer Sciences Amrita School of Computing, Amrita Vishwa, Vidyapeetham, India S. Shreesha Department of Information and Communication Technology, Manipal Institute of Technology, Manipal, India M. T. Somashekara Department of Computer Science and Applications, Bangalore University, Bengaluru, India R. Sumithra Department of Studies in Computer Science, University of Mysore, Mysore, Karnataka, India Alfina Sunny Department of Computer Science, Amrita School of Arts and Sciences, Mysore, India

Editors and Contributors

xiii

S. N. Sushma Department of Studies in Computer Science, University of Mysore, Mysuru, India M. D. Vasanth Pai Department of Information and Communication Technology, Manipal Institute of Technology, Manipal, India Kini T. Vasudev Department of CSE, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India N. Vinay Kumar NTT Data Services, Bangalore, India Mohammed Abdul Waheed Department of CSE, VTU CPGS, RO, Kalaburagi, India

Two-Stage Word Spotting Scheme for Historical Handwritten Devanagari Documents S. N. Sushma and B. Sharada

1 Introduction Enormous collections of historical manuscripts are present in various national archives, which contains abundant of valuable information. The retrieval of this information is very much required for historians and scholars who have provided a challenging task in the field of document image analysis [1, 2]. Hence, there is a necessity of historical manuscripts digitization. Only the digitalization is not adequate for the researchers. A complete transcription of handwritten documents is very difficult due to noises of historical documents, handwriting variations, multiple font styles and low quality of documents due to poor physical safeguarding and so on. OCR works accurately for printed documents, whereas for handwritten documents, it is very difficult for accurate recognition [3]. Word spotting is an effectual contentbased retrieval method of extracting all occurrences of a given query word from a document, which can be used instead of complete transcription. Depending upon the representation of the input query word, Word spotting methodology is classified as Query-By-String (QBS) approach where the input component is given in the form of text string, which is like handwriting recognition methods that involve a large amount of training and Query-By-Example (QBE) approach where the input component is the word image, which is like an image matching method and it doesn’t need learning process [4]. Feature representation of word is the significant decision in word spotting system. Generally, low-level and high-level features are used for representing the word images of the documents. Entropy, profiles, Zernike moments, Pixel density, number of pixels, convex hull-oriented features, Histograms of Oriented Gradients (HOG) features are some of the low-level features used for word representation. Endpoints, concavities, and diagonal as well as horizontal strokes, edges, structural and

S. N. Sushma (B) · B. Sharada Department of Studies in Computer Science, University of Mysore, Mysuru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_1

1

2

S. N. Sushma and B. Sharada

gradient features are some of the intermediate-level features used for word representation. Ascenders, descenders, loops, word length, perceptual features such as edges, diagonal and horizontal strokes, loop count, concavity character model zoning information, directional element features are the high-level features used for word representation [5–9]. In order to extract features, word spotting system is classified into segmentation-based methods and segmentation-free methods. In segmentation-based approach [10–12], word-level segmentation of the document image is performed. In segmentation-free approach, the document image is separated into patches and the query word is compared with all the patches present in the document. From the literature, it is quite observed that most of the existing work on this field is limited to English, Latin and Chinese words. Recent past limited works have been carried out in word spotting of Indic scripts. Plenty of digital collections of documents are present in non-oriental scripts such as Indic scripts like Devanagari that leads us for developing an efficient word spotting method in historical handwritten Devanagari documents. Devanagari is a basic script used for writing Indian languages like Hindi, Sanskrit, Marathi, Konkani, Nepali and Sindhi. Devanagari script contains 13 vowels, 34 consonants and 14 modifiers. There are three zones such as lower zone, middle zone and upper zone in Devanagari word. Words are written by writing constituent characters from left to right and combined by the header line Shirorekha. Devanagari script has got the place of national dialect as Hindi is the Indian national language. Therefore, considering the above-mentioned concerns, we present our work of a two-stage Word Spotting scheme for historical handwritten Devanagari documents. Testing and validation are performed on historical handwritten Devanagari documents of the Oriental Research Institute [ORI] @ Mysuru. In this paper, Sect. 2 gives a brief-related works of Word spotting. Section 3 presents a comprehensive explanation of the proposed approach. Section 4 presents the experiments and results. Lastly the conclusions as well as future work are given in Sect. 5.

2 Related Works Rath and Manmatha [1] presented the idea of document segmentation into words, and these word images are compared against query word image to create likeness classes for which the user can provide the ASCII equivalents. Can and Duygulu [13] presented a line segmentation-based word spotting method for historical manuscripts. Frinken et al. [14] used BLSTM neural network for word spotting method followed by CTC token passing algorithm. Picone [15] proposed a model-oriented methodology where each order is first mapped to a semi-continuous HMM (SCHMM) and for matching purpose an improved version of DTW is used. An HMM-oriented spotting scheme was proposed by Frinken et al. [16] where the character n-gram language models are used and a bidirectional long short-term memory neural network methodology for word spotting was proposed by Frinken et al. [17]. A line-based segmentation for comparing words in historical manuscripts is proposed by Can and

Two-Stage Word Spotting Scheme for Historical Handwritten …

3

Duygulu [13] where word matching is based on matching of segmented lines. For online Chinese handwritten documents, Zhang et al. [18, 19] proposed Word spotting performed with candidate scoring that depends on semi-CRF model and using character confidence based method on N-best. A word spotting is proposed by De Stefano et al. [20] presented an innovative learning-free method for word spotting using graph representation. Srihari et al. [21] proposed a word spotting method for handwritten Arabic documents with three main elements: a word segmenter, a shapebased matcher for words and a search interface. Kumar et al. [22] proposed a word spotting method by dynamic background model. Sudholt et al. [23] projected a word spotting method which uses descriptor learning pipeline. A pivot-based word search is proposed by Czuni et al. [23] for spotting in archive manuscripts. For handwritten document images, a probabilistic method for word retrieval is proposed by Cao et al. [24]. Numerous efforts have been made for segmentation-free word spotting. Gatos et al. [25] proposed a segmentation-free approach for word spotting where the salient regions are identified and document image descriptors are allotted to block region and word matching is performed only on the regions of interest. Ryu et al. [8] proposed a segmentation-free word spotting approaches where tags can be used for research on definite issues such as handling with text in different colours. Ryu et al. [26] proposed a method of bag-of-features HMMs for segmentation-free word spotting in handwritten documents. Llados et al. [27] proposed a segmentation-free word spotting or handwritten documents where a global filtering module is used that enables to define several candidate zones and then refining filtering module provides good retrieved results. Jayadevan [28] presented a combined approach for recognizing handwritten legal amounts written in Hindi and Marathi with better recognition accuracy. With this motivation, we have developed a two-stage Word Spotting scheme for historical handwritten Devanagari documents. The experimental results are promising when compared to conventional methods.

3 The Proposed Methodology In the proposed methodology, a given query word image is compared with all the segmented handwritten word images of the source documents and related words are highlighted in the documents. Initially, a set of process are performed on handwritten document such as preprocessing, word segmentation of document images into candidate word images, extracting unique feature set for word representation using two methods. Finally, the query word image is compared with all candidate word images and based on the similarity matching, related word images are spotted in handwritten documents. The overview of the proposed methodology is shown in Fig. 1.

4

S. N. Sushma and B. Sharada Input Handwritten document

Pre-processing

Word Segmentation

Word Database

Feature extraction by GSC Method

Feature extraction by BOL Method

Generate Rank list related to Query word

Generate Rank list related Query word

Word selection by Hybrid Method

Spotted words

Fig. 1 An overview of combined approach of word spotting system

3.1 Preprocessing and Word Segmentation Processing historical handwritten Devanagari documents is quite challenging as it is usually degraded documents and contains large amount of noise. The preprocessing methods applied to the input handwritten documents in preprocessing of the document image in a specific format are: Contrast Enhancement and Binarization, Skew Detection and Correction, Noise Reduction and Thinning. Initially, the document images are converted to grayscale images and irregular illumination in the document

Two-Stage Word Spotting Scheme for Historical Handwritten …

5

Fig. 2 a Sample of historical handwritten Devanagari manuscript image and b thinned document image

images is regulated using Local Contrast Normalization (LCN) technique. Skew is detected and corrected using projection profile method. Then document image is filtered using difference of Gaussian (DoG) filtering method and noises present close to boundaries of the document image are removed using morphological erosion using disk as a structuring element. Finally, thinning is performed followed by size normalization process [25]. Words segmentation of the Devanagari document is performed using connected component analysis technique [8, 26, 27]. The sample of historical handwritten Devanagari manuscript image and thinned document image is shown in Fig. 2.

3.2 Feature Extraction Feature extraction method is dimensionality reduction process where the given input data are represented by set of unique features. In feature extraction method, the input data are characterized in terms of group of distinctive features, which are used in comparing similarity between word images, based on which related words are retrieved [28]. The segmented words are represented using a set of unique features using two approaches.

3.2.1

Feature Extraction Using Gradient, Structural and Concavity Features

In first stage, an outstanding illustration for features of histogram from binarized word images called as Gradient structural and concavity (GSC) features is used for representation of handwritten Devanagari words [29]. Each segmented handwritten

6

S. N. Sushma and B. Sharada

binary word image I is divided vertically into ‘n’ sections and horizontally into ‘m’ sections such that every section has equal number of foreground pixels. From each subsection, three types of features are extracted. Gradient features are extracted using the Sobel gradient operator, which identifies 12-bit gradient features. Structural features are extracted that identifies the occurrence of corners, vertical and horizontal lines, diagonal lines which is determined by 12 rules which results in extracting 12-bit structural features. Finally, the 8-bit concavity features are extracted, which identifies topological, geometrical features containing track of bays, occurrence of holes, vertical strokes and horizontal strokes. The word image division method results in a binary feature vector of n × m × 32 dimensions. Experimentally, it is noticed that the GSC features at 4 × 8 division give the best performance for handwritten word representation. The similarity between the query word image and the segmented word images stored in the database, which is represented by Ω , i.e., set of all n-dimensional binary vector is computed using Correlation measure. Let S ij (i, j = 0 or 1) be the number of occurrences of matches among two-bit level vector, i in the first pattern and j in the second pattern at the corresponding positions and the four probable values are S 00 , S 01 , S 10 and S 11 . Let X and Y be the two-bit level feature vectors of the two words, which is to be compared, then the dissimilarity between these two words can be computed using Correlation equation. ⎡

FGSC

⎤ f 11 f 12 . . . f 1m = ⎣ f 21 f 22 . . . f 2m ⎦ f n1 f n2 . . . f nm

(1)

where, FGSC is the feature vector of all segmented candidate word images in the database. And for the given query image Q[ f 11 f 12 … f 1n ] is compared with all segmented word images using following Correlation equation. D(X, Y ) =

S11 S00 − S10 S01 1 − √ 2 2 ((S10 + S11 )(S01 + S00 )(S11 + S01 )(S00 + S10 ))

(2)

Words ranking list is generated using for all segmented word images in sorted form. Let SGSC = {WG1 , WG2 , WG3 . . . WGn }

(3)

be the group of identical words obtained from GSC method.

3.2.2

Feature Extraction Using Bag of Low-Level (BOL) Features

In second method, bag of low-level features are considered for word image representation like background-to-ink transitions, projection profile, upper projection profile,

Two-Stage Word Spotting Scheme for Historical Handwritten …

7

lower projection profile, distance between upper and lower profile, number of foreground pixels, centre of gravity (C.G.) of the column obtained from the location of fore ground pixels and transition at centre of gravity [30, 31]. The features of all segmented candidate images in the database are represented using standard matrix form FBOL . ⎡

FBOL

⎤ f 11 f 12 . . . f 1m = ⎣ f 21 f 22 . . . f 2m ⎦ f n1 f n2 . . . f nm

(4)

The given query image Q[ f 11 f 12 … f 1n ] is compared with all segmented word images using Euclidean distance. ┌ | n |∑ DE = | (Q i − Fi )2

(5)

i=1

where Q and F are the query word image and the candidate word image, and DE is the similarity measure using Euclidean Distance. Euclidean distance similarity metric is used for ranking of all segmented as well as features extracted candidate words and the ranking list of words is created based on the computing similarity between the query word images and all candidate word images and finally the list is sorted. Let SBOL = {W B1 , W B2 , W B3 , . . . W Bn }

(6)

be the group of identical words obtained from bag of low-level features method.

3.3 Word Spotting Using Combined Hybrid Approach For a given query word, normalized rank list is generated from gradient, structural and concavity (GSC) method and related words list corresponding to its position in document is generated, i.e., S GSC = {W G1 , W G2 , W G3 …W Gn }. Similarly, normalized rank list is generated from bag of low-level features and words list corresponding to its position in document is generated, i.e., S BOL = {W B1 , W B2 , W B3 , …W Bn }. Finally, for generation of final matched word list using combined hybrid approach, intersection between S G and S H is performed. S H = SGSC ∩ SBOL where, S H indicates final list of highly related words.

(7)

8

S. N. Sushma and B. Sharada

4 Experiments and Results Datasets are very much required for testing and validating the accuracy and efficiency of the designed system. As there is no benchmarking dataset existing for handwritten Devanagari documents, we have used two types of historical datasets, which are collected from Oriental Research Institute, University of Mysore, Mysuru, Karnataka for our experimentation. Sample of word images from Dataset 1 and Dataset 2 are shown in Fig. 3. First dataset consists of scanned handwritten Devanagari documents “Geethabhasam” of the great Indian philosopher, Shankaracharya that comprises of a set of 30 pages with 8584 segmented words. Second dataset consists of scanned handwritten Devanagari documents “Padaprathama shakaprarambha” that comprises of a set of 19 pages with 5723 segmented words. The fraction of spotted words that are related to the given query word indicates the Precision values. The fraction of relevant words that is effectively spotted indicates the Recall values and the harmonic mean of precision and recall represents the Fmeasure. P=

|{relevant instances} ∩ {retrieved instances}| |{retrieved instances}|

(8)

R=

|{{relevant instances} ∩ {retrieved instances} |{relevant instances}|

(9)

F =2×

P×R P+R

(10)

where, P indicates Precision, R indicates Recall and F indicates F-measure. The Precision value at an exact Recall value where P = R represents R-Precision index where the precision value is determined for the ‘k’ topmost retrieved words which shows how efficiently the related words are spotted for first ‘k’ top positions from the generated rank list and P @ k is represented as: } { |{relevant instances} ∩ , k,retrieved instances | { } P@k = | , k , retrieved instances |

Fig. 3 Example of word images from dataset 1 and dataset 2

(11)

Two-Stage Word Spotting Scheme for Historical Handwritten …

9

The Mean Average Precision index (MAP) is average of the precision value attained after each relevant word is spotted and is represented as: ∑n MAP =

k=1 (P@k × rel(k)) |{relevant instances}|

(12)

where, rel(k) indicates 1 if the word at the rank list is related, otherwise it indicates 0. For experimentation from all the handwritten word classes in the segmented word database, we select a subset of subset of 10 classes each from Dataset 1 and Dataset 2 (a class = a group of the similar word instances). The number of instances per class differs from 3 to 20, and each sample from all classes was used as a query word. For evaluation, the proposed word spotting approaches are applied to spot top rank 5–15 related words from the database for a given query word image and corresponding Precision, Recall and F-Measure are calculated. Complete results are evaluated by calculating the mean Average Precision (mAP) and mean Average Recall (mAR) over all the classes. Evaluation is performed using word spotting method by gradient, structural and cavity (GSC) features, by bag of low-level features and by combined approach these features. For dataset 1, the results are tabulated in Tables 1, 2 and 3. For dataset 2, the results are tabulated in Tables 4, 5 and 6 and representation of qualitative results using combined two-stage approach are shown in Fig. 4. The comparison of mean average between the proposed word spotting approaches is tabulated in Table 7, and corresponding graphical representation is shown in Fig. 5. Form the experimentation, we observe that the word spotting of handwritten Devanagari historical document images using two-stage combined hybrid approach of gradient, structural and cavity (GSC) features and bag of low-level features achieves significant results when compared to the individual approach.

5 Conclusion Matching and identifying the occurrences of words in historical handwritten Devanagari documents is one of the most difficult tasks due to its processing complexity. Usually word spotting system follows one type of word image representation, which is used for word matching. In this paper, a unique two-stage word spotting approach in historical handwritten Devanagari documents using GSC features and Bag of lowlevel features is proposed. Using two representations of word images and merging at result level attains highest precision. Wide experimentation was conducted by the historical handwritten Devanagari documents, which exhibits that two-stage combined hybrid approach efficiently outperforms than the conventional individual approach in terms of accuracy and results. The proposed method is well appropriate for word spotting of handwritten Devanagari documents. Further, it can be extended for the processing several South Indian languages.

57.25

53.17

55.00

54.03

57.15

53.19

55.31

54.35

55.13

54.05

2

3

4

5

5

7

8

9

10

Rank 10

Precision

1

Class number

59.32

54.2

52.54

54.57

58.53

57.34

54.73

54.49

59.44

52.35

Recall

56.56

54.66

53.43

54.94

55.73

57.24

54.38

54.74

56.13

54.69

F-measure

54.98

55.53

58.57

50.92

52.05

59.23

55.53

53.79

57.43

52.03

Rank 15

Precision

57.43

50.57

54.55

52.79

57.23

55.33

59.49

50.55

57.2

51.25

Recall

56.18

52.93

56.22

51.84

54.52

57.21

57.44

52.12

57.31

51.64

F-measure

53.33

54.34

55.49

52.75

59.55

52.33

55.57

57.35

55.57

58.43

Rank 20

Precision

52.49

53.53

54.54

59.23

55.39

57.43

50.92

54.05

54.31

58.54

Recall

52.91

53.93

55.01

55.80

57.39

54.76

53.14

55.65

54.93

58.48

F-measure

Table 1 Class-wise results in terms of precision, recall and F-measure obtained by gradient, structural and concavity (GSC) features for dataset 1

10 S. N. Sushma and B. Sharada

56.43

62.96

59.67

43.56

55.61

60.97

58.03

54.81

55.54

56.48

2

3

4

5

6

7

8

9

10

Rank 10

Precision

1

Class number

52.49

51.63

58.64

59.23

56.19

57.41

60.92

54.05

58.33

55.68

Recall

54.41

53.51

56.66

58.62

58.48

56.50

50.80

56.72

60.56

56.05

F-measure

51.13

58.34

55.49

46.56

59.56

62.33

56.67

46.31

55.67

56.43

Rank 15

Precision

59.16

54.2

52.64

58.67

46.51

57.38

58.71

54.89

59.44

59.00

Recall

54.85

56.19

4.45

51.92

52.23

59.75

57.67

50.24

57.49

57.69

F-measure

54.92

55.53

59.66

60.92

56.72

59.23

56.51

58.79

63.41

62.03

Rank 20

Precision

Table 2 Class-wise results in terms of precision, recall and F-measure obtained by bag of low-level features for dataset 1

57.41

50.67

54.65

52.79

57.21

55.13

59.49

50.65

57.2

54.26

Recall

56.14

52.99

57.05

56.56

56.96

57.11

57.96

54.42

60.15

57.89

F-measure

Two-Stage Word Spotting Scheme for Historical Handwritten … 11

63.64

67.34

63.33

69

66.97

60.72

59.22

61.92

62.33

64.90

2

3

4

5

6

7

8

9

10

Rank 10

Precision

1

Class number

60.33

67.06

46.73

67.40

67.92

64.63

66.23

68.43

69.66

65.64

Recall

62.15

64.52

54. 25

63.01

64.12

65.08

67. 15

65.16

68.05

64.34

F-measure

67.43

66.64

64.26

69.62

60.92

59.66

66.63

62.96

69.66

64.43

Rank 15

Precision

62.66

63.64

63.73

69.73

67

67.72

67.20

60.64

63.16

66.43

Recall

65.04

65.14

63.05

69.52

63.26

63.36

66.51

61.18

66.41

65.43

F-measure

63.39

66.34

64.29

61.67

62.69

63.46

60.70

63.67

60.27

66.44

Rank 20

Precision

Table 3 Class-wise results in terms of precision, recall and F-measure obtained by two-stage hybrid approach for dataset 1

64.33

60.97

60.73

66.63

64.39

66.76

66.43

66.46

64.06

62.33

Recall

63.86

63.65

62.51

64.15

63.54

65.11

63.52

65.05

62.07

64.38

F-measure

12 S. N. Sushma and B. Sharada

58.51

59.66

59.72

56.56

59.56

58.75

59.52

50.22

56.65

57.53

2

3

4

5

6

7

8

9

10

Rank 10

Precision

1

Class number

52.66

51.65

51.73

59.73

55.66

57.72

57.23

50.68

59.51

56.83

Recall

54.99

54.03

50.96

59.62

56.81

58.63

56.88

54.83

59.58

57.66

F-measure

55.97

52.31

55.09

57.91

50.72

52.96

59.74

51.31

53.91

51.58

Rank 15

Precision

50.31

57.05

56.71

57.52

57.19

55.61

56.23

56.83

59.56

56.55

Recall

52.96

54.58

55.89

57.70

53.76

54.25

57.91

53.93

56.59

53.95

F-measure

51.19

56.35

55.29

56.56

62.69

51.86

50.76

51.67

50.27

56.58

Rank 20

Precision

58.13

60.97

50.73

56.61

58.39

56.76

56.83

56.85

58.05

52.31

Recall

54.44

58.57

52.91

56.58

60.46

54.20

53.59

54.14

53.88

54.36

F-measure

Table 4 Class-wise results in terms of precision, recall and F-measure obtained by gradient, structural and concavity (GSC) features for dataset 2

Two-Stage Word Spotting Scheme for Historical Handwritten … 13

58.33

62.96

59.67

53.56

56.61

60.97

58.03

55.81

56.55

56.58

2

3

4

5

6

7

8

9

10

Rank 10

Precision

1

Class number

52.59

51.63

58.65

59.23

56.19

57.51

60.92

55.05

55.85

56.68

Recall

54.51

53.98

57.19

58.62

58.48

57.06

57.00

57.27

59.19

57.49

F-measure

55.67

59.85

56.72

56.95

58.81

58.39

56.98

59.39

58.65

56.3

Rank 15

Precision

50.78

57.51

55.98

55.21

51.63

57.95

52.66

56.67

57.1

56.12

Recall

53.11

58.66

56.35

56.07

54.99

58.17

54.73

58.00

57.86

56.21

F-measure

51.13

58.35

57.07

56.56

59.56

62.33

56.67

56.31

62.67

65.53

Rank 20

Precision

Table 5 Class-wise results in terms of precision, recall and F-measure obtained by bag of low-level features for dataset 2

59.16

55.29

52.65

58.67

56.51

57.38

58.71

55.89

59.59

59

Recall

54.85

56.73

54.77

57.60

57.99

59.75

57.67

56.10

61.09

62.09

F-measure

14 S. N. Sushma and B. Sharada

72.30

73.03

72.19

72.74

76.31

69.82

68.04

73.14

72.36

72.67

2

3

4

5

6

7

8

9

10

Rank 10

Precision

1

Class number

72.74

68.43

76.94

76.23

68.65

72.96

62.16

68.67

66.30

75.43

Recall

72.05

70.39

75.04

72.13

69.15

74.35

67.45

70.43

69.65

73. 65

F-measure

65.61

72.76

76.43

76.03

73.97

65.63

67.34

72.67

62.96

71.13

Rank 15

Precision

62.49

63.63

76.76

72.23

72.39

67.43

73.92

76.06

76.33

75.57

Recall

64.05

68.79

76.59

74.31

73.38

66.53

70.03

74.75

69.25

73.05

F-measure

63.33

76.34

72.49

62.76

72.72

62.33

72.67

67.36

72.67

68.43

Rank 20

Precision

Table 6 Class-wise results in terms of precision, recall and F-measure obtained by two-stage hybrid approach for dataset 2

59.32

76.20

62.76

76.67

58.63

68.19

76.73

76.49

72.44

62.36

Recall

61.25

76.27

67.25

69.15

65.65

65.26

74.79

71.25

72.65

65.39

F-measure

Two-Stage Word Spotting Scheme for Historical Handwritten … 15

16

S. N. Sushma and B. Sharada

Fig. 4 Representation of results for the query word “dharma” [dataset1] and “karm” [dataset2] using two-stage hybrid approach

Table 7 Mean Average Precision (MAP) results of proposed word spotting methods for handwritten Devanagari document Methodology

MAP dataset 1

MAP dataset 2

GSC method

55

56

Bag of low-level method

54

55

Proposed two-stage combined hybrid method

65

71

Fig. 5 Comparison of proposed word spotting methods for handwritten Devanagari document

Acknowledgements We express thanks to Oriental Research Institute (ORI), Mysuru, Karnataka, India for providing the historical handwritten Devanagari datasets.

Two-Stage Word Spotting Scheme for Historical Handwritten …

17

References 1. Rath TM, Manmatha R (2007) Word spotting for historical documents. Int J Doc Anal Recognit 9(2–4):139–152. https://doi.org/10.1007/s10032-006-0027-8 2. Rath TM, Manmatha R, Layrenko V (2004) A search engine for historical manuscripts. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval 3. Bhardwaj, Kompalli S, Setlur S, Govindaraju V (2008) An OCR based approach to word spotting in Devanagari documents. In: Proceedings of the 15th SPIE—document recognition and retrieval, p 6815 4. Zagoris K, Pratikakis I, Gatos B (2017) Unsupervised word spotting in historical handwritten document images using document-oriented local features. IEEE Trans Image Process 26(8):4032–4041 5. Malanker AA, Patel MM (2020) Handwritten Devanagari script recognition: a survey. IOSR J Electr Electron Eng 9(2):80–87 6. Garg NK, Kaur L, Jindal M (2013) Recognition of offline handwritten hindi text using SVM. Int J Image Process (IJIP) 7(4):395–401 7. Mondal T, Ragot N, Ramel JY, Pal U (2014) A flexible sequence matching technique for word spotting in degraded documents. In: International conference on frontiers in handwriting recognition, pp 210–215 8. Ryu JW, Koo HI, Cho N (2014) Language-independent text-line extraction algorithm for handwritten documents. IEEE Signal Process Lett 21(9):1115–1119 9. Serrano JR, Perronnin F (2009) Handwritten wordspotting using hidden markov models and universal vocabularies. Pattern Recognit 42(9):2106–2116 10. Rothacker L, Fink G (2015) Segmentation-free query-by-string word spotting with bag-offeatures HMMs. In: Proceedings of 13th international conference on document analysis and recognition, pp 661–665 11. Almazaan J, Gordo A, Fornes A, Valveny E (2014) A segmentation free word spotting with exemplar SVMs. Pattern Recognit 47(12):3967–3978 12. Zhang X, Tan C (2013) Segmentation-free word spotting for handwritten documents based on heat kernel signature. In: Proceedings of 12th international conference on document analysis and recognition (ICDAR), pp 827–831 13. Can EF, Duygulu P (2011) A line-based representation for matching words in historical manuscripts. Pattern Recognit Lett 32:1126–1138 14. Frinken V, Fischer A, Baumgartner M, Bunke H (2014) Word spotting for self-training of BLSTM NN based handwriting recognition systems. Pattern Recognit 47:1073–1082 15. Picone J (1990) Continuous speech recognition using hidden Markov models. In: Proceedings of IEEE ASSP magazine, pp 26–41 16. Fischer A, Frinken V, Bunke H, Suen CY (2013) Improving HMM-based word spotting with character language models. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 506–510. https://doi.org/10.1109/ICDAR.2013.107 17. Frinken V, Fischer A, Martínez-Hinarejos CD (2013) Handwriting recognition in historical documents using very large vocabularies. In: ACM international conference proceeding series, pp 67–72. https://doi.org/10.1145/2501115.2501116 18. Zhang TY, Suen CY (1984) A fast parallel algorithm for thinning digital patterns. Commun ACM 27:236–239 19. Zhang H, Wang D-H, Liu CL (2010) Word spotting from online Chinese handwritten documents using one-vs-all trained character classifier. In: Proceedings of 12th ICFHR, pp 271–276 20. De Stefano C, Fontanella F, Marrocco C, Scotto di Freca A (2014) A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recognit Lett 35:130–141 21. Zhang B, Srihari SN, Huang C (2004) Word image retrieval using binary features. Doc Recognit Retr XI 5296:45–53

18

S. N. Sushma and B. Sharada

22. Kumar G, Shi Z, Setlur S, Govindaraju V, Ramachandrula S (2012) Word spotting framework using dynamic background model. In: Proceedings of international conference on frontiers in handwriting recognition, pp 582–587 23. Czuni L, Kiss P, Gal M, Lipovits A (2013) Local feature based word spotting in handwritten archive documents. In: Proceedings of 11th international workshop on content-based multimedia indexing (CBMI), pp 179–184 24. Cao H, Bhardwaj A, Govindaraju V (2009) A probabilistic method for word retrieval in handwritten document images. Pattern Recognit 42(12):3374–3382 25. Papandreou, Gatos B, Louloudis G, Stamatopoulos N (2013) ICDAR 2013 document image skew estimation contest. In: Proceedings of international conference of document analysis and recognition, pp 1444–1448 26. Ryu J, Koo HI, Cho NI (2015) Word segmentation method for handwritten documents based on structured learning. IEEE Signal Process Lett 22(8):1161–1165 27. Llados J, Rusinol M, Fornes A, Fernandez D, Dutta A (2012) On the influence of word representations for handwritten word spotting in historical documents. Pattern Recognit Artif Intell 26(5):1–25 28. Jayadevan R (2012) Automatic processing of handwritten bank cheque images—a survey. Int J Doc Anal Recognit 15(4):99–110 29. Jayadevan R, Kolhe SR, Patil PM, Pal U (2011) Database development and recognition of handwritten Devanagari legal amount words. In: Proceedings of international conference on document analysis and recognition (ICDAR), pp 304–308 30. Mohamad MA, Arif M, Nasien D, Hassan H, Haron H (2015) A review on feature extraction and feature selection for handwritten character recognition. Int J Adv Comput Sci Appl 6(2):204 31. Rothacker L, Vajda S, Fink G (2012) Bag-of-features representations for offline handwriting recognition applied to Arabic script. In: Proceedings of the international conference on frontiers in handwriting recognition

3D Object Detection in Point Cloud Using Key Point Detection Network Pankaj Kumar Saini, Md Meraz , and Mohammed Javed

1 Introduction We live in a 3D world, and to collect information about 3D objects around us, we use a 3D point cloud which is represented by millions of points. In autonomous vehicles, this point cloud is very useful because it gives us information about the world around the car in 3D, which helps us to see the surrounding objects, and be able to detect objects and perform certain decision-making. For autonomous driving, LiDAR data is a widely accepted 3D point cloud. Early forms of LiDAR or 3D laser scanning were successfully utilized in the 1970 s for submarine detection from an aircraft. Nowadays, environmental research would be impossible to envision without the use of remote sensing RADAR and LiDAR. The device fires laser light pulses at the surfaces, some as fast as 160,000 per second [1]. The time it takes for each pulse to return is measured by a sensor on the device. The LiDAR sensor can determine the distance between the target and the device with high precision, since light moves at a constant and predictable speed. Nowadays, LiDAR device is mounted on the roof of the vehicle to capture the surroundings in 3D. Using this 3D data, Many novel object detection techniques have been proposed in the literature. Viola-Jones detector [2], HOG [3], and other feature extractors were used to create early object detection models. These models were slow, imprecise, and under-performed on datasets that were unfamiliar. Later on, CNNs and deep learning for image classification were reintroduced, changing the landscape of visual perception. Deep learning-based object detection is now very popular in applications like self-driving cars, person identification, and robotics. The techniques rely on the extracted features to get the best accuracy from the model. Both two-stage and oneP. K. Saini (B) · M. Meraz · M. Javed Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India e-mail: [email protected] M. Meraz e-mail: [email protected] M. Javed e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_2

19

20

P. K. Saini et al.

Fig. 1 Illustrating the same point cloud with bounding boxes and their labels using different representations. Green and blue colours show Point cloud data, and bounding boxes shown in pink colour, where a is 3D representation and b is BEV representation

stage detectors depend on these extracted features. For example, some models focus on developing features related to the regional proposal for better development in the second phase [4, 5]. Also, many methods try to extract more and more discriminative multi-modal features by fusing 3D point clouds and RGB images [6, 7]. For real-time usage of 3D object detectors, especially for autonomous driving, it requires higher efficiency on top of high accuracy. Models that work in real time, i.e., they directly process the image data to detect an object in 3D, and some models use processed point cloud or LiDAR data for much better accuracy. But processing-based models are computationally very expensive, and they cannot be used in real time. A few researchers have tried to use a BEV image (as shown in Fig. 1) generated from LiDAR data to get real-time performance. In this research work, we are motivated to use BEV images for detecting object. In this paper, an improved object detection in 3D framework with one-stage detector has been proposed, which can be used for detecting objects in a 3D point cloud. As seen in Fig. 3, the framework can be separated into two components—ResNet and KFPN. In the proposed method, ResNet is used for object classification and KFPN is used for detection of points in an image. The proposed method is also tested with the KITTI dataset. The results are compared with the base model RTM3D [8]. Our main contributions include (i) giving a BEV image as an input to the model, which has 3 channels embedded with intensity, height, and density in the form of heat maps, (ii) using the adaptive learning rate so that the model can learn more precisely, and (iii) predicting 8 key points, which is less than the base model RTM3D [8].

3D Object Detection in Point Cloud Using Key Point Detection Network

21

2 Related Works In general, there are two categories of 3D detectors: (i) In single-stage detectors, feature learning occurs directly from the input and generates regress bounding boxes and confidence. (ii) In two-stage detectors, feature learning occurs in two stages, with the second stage using region-proposal-aligned features to refine first-stage predictions. Due to the refinement in the second stage, these detectors take more time for detection but benefit from the second stage. Two-stage detectors attain higher precision more often, whereas single-stage detectors, because of simpler network structures, attain faster results [9]. The motivation behind using the single-stage detector [10, 11] is that in recent years, single-stage detector accuracy is gradually catching up with two-stage detector accuracy [4, 12]. Single-stage algorithms are provided to improve computational efficiency using a modified compact form and a fully convolutional network to handle the point cloud. For feature extraction, VoxelNet [13] uses the voxel feature encoding layer for efficient feature learning. PointPillar [14] separates a point cloud into pillars to learn point features. Point-GNN [15] uses a graph neural network, while 3DSSD [11] improves classification by using extracted feature and point-based sampling. SECOND [16] tweaks the sparse convolutional to extract features from sparse voxels more efficiently. By extracting features from complete point clouds to oversee the features acquired from incomplete point clouds, Associate-3Ddet [17] encourages the model to infer from incomplete points. To augment the features, SA-SSD [18] uses in parallel to the backbone network an auxiliary network to regress box centres and semantic classes. For enhanced post-processing, CIA-SSD [19] integrates IoUaware confidence rectification with DI-NMS. It uses a lightweight BEV network for the extraction of robust spatial semantic characteristics inspired by [20]. SESS [21] employs a semi-supervised technique to lessen reliance on manual annotations for indoor scene detection. SE-SSD [22] employs two identical SSDs as teacher and student, who train each other and achieve the highest level of accuracy. The two-stage approaches reuse the point cloud with full resolution in the second step to give accurate results, in contrast to single-stage techniques that try to detect accurately by doing so in the first stage. For a second-stage refinement, PointRCNN employs PointNet [23] to use semantic information and raw points from region proposals. PV-RCNN [4] learns important features inside the region proposal using both voxel-based and point-based (raw points) networks. In order to refine the predicted confidences, CLOCs PVC [24] integrated semantic features from points and picture, whereas 3D-CVF [25] in both the stages derives semantics from multi-view images and mixes them with point features. To prevent ambiguity and improve feature representation, to extract the region’s proposal features PART-A2 [12] uses a voxel-based network. Similarly, STD [26] uses voxelization to translate region-proposal semantic data into compact representations reducing the number of anchors in order to boost performance.

22

P. K. Saini et al.

After reviewing single- and two-stage approaches, we selected a single-stage detector and BEV representation for our work because it will reduce point cloud processing time, which will increase efficiency.

3 Proposed Method The proposed work in this paper is inspired by RTM3D [8]. The model consists of ResNet with a feature pyramid network. ResNet is an acronym for Residual Network, which solves the problem of vanishing or exploding gradients by introducing the concept of Residual Blocks [27] (Fig. 2). The network also includes skip connections to skip some layers in between. The following subsections describe the input to the model, the Key point Detection Network, the Feature Pyramid Network, and the different losses used by the model. Input Data: The model will accept a point cloud image. Three channels make up the heat map-style BEV picture. The object has three usual representations: the first channel embeds the point cloud’s intensity into the heat map, the second channel embeds the point cloud’s height or Z-axis value, and the third channel embeds the

Fig. 2 a First channel of the image embedded with intensity value in the form of BEV, b Second channel of the image embedded with height or Z-axis value in the form of BEV, c Third channel of the image embedded with density value in the form of BEV

3D Object Detection in Point Cloud Using Key Point Detection Network

23

point cloud’s density values, as seen in Fig. 2. The step-by-step procedure for BEV map generation is given in Algorithm 1. Algorithm 1 Algorithm for BEVmap generation Input: P : PointCloud, B : Boundary Output: BEVmap : 3 channel heat maps in the form of BEV image Data: P ← [P1 , P2 , P3 .....PN ], B ← [minX, maxX, minY, maxY, minZ, maxZ] Function GetBEVmap(P,B): Set height Set width for p in P do p[0] ← discr eti zation p[1] ← discr eti zation + width / 2 end Sort with −z, y, x unique_indices ← unique indices in P unique_counts ← unique counts in P P ← P with unique_indices max_height ← B[max Z ] - B[min Z ] height_map ← P[2]/max_height intensit y_map ← P[3] densit y_map ← min(1, log(unique_counts + 1)/ log(64)) B E V map[0] = intensit y_map B E V map[1] = height_map B E V map[2] = densit y_map return B E V map B E V map = GetBEVmap(P,B)

Key point Detection Network: The model that we have used is equipped with a key point detection network. The network creates the perspective points and vertices of the 3D bounding box using a heat map with three channels that denote height, intensity, and density as displayed in Fig. 3. The backbone, key point feature pyramid network, and detection heads make up the architecture. Its anchor-free strategy aids in obtaining a quick detection speed. Given below are the network details. Backbone: For much faster speed and accuracy, we have used ResNet18, which takes an image of a heat map with 3 channels. ResNet is meant for image classification and has maximal downsample as a factor of 32. Key point Feature Pyramid: There is no difference in key points in the images of different sizes which detect multi-scale 2D boxes in different pyramid layers. Therefore, we do not use the Feature Pyramid Network, instead we use the Key point Feature Pyramid Network, which was introduced in [8] to detect scale-invariant key points.

24

P. K. Saini et al.

Fig. 3 The proposed Key point Detection architecture, where the model takes BEV image as an input and outputs primary centre heat map, centre-offsets, and directions to estimate the bounding box in 3D

Feature Pyramid Network: The FPN is created keeping precision and speed in mind for this pyramid concept. It substitutes the feature extractor (like Faster R-CNN) and provides good quality multi-featured map layers than the traditional feature pyramid.

4 Experiment We have used the KITTI dataset [28] to train our 3D object detector network. KITTI assesses 3D object identification and BEV object detection, which includes pedestrians, automobiles, and bicycle pedals. Each class is tested at three degrees of difficulty: easy, medium, and hard. Weight testing was based on the height of the object in 2D effects, closure, and reduction. The model is trained on 6000 training samples provided in the KITTI dataset and validated on the remaining 1480 training samples. A total of 7481 training examples were used for training and validation purposes, and a test set containing 7518 samples was used for evaluating the results. For better results and accuracy, we have used three different types of losses, which work together to give the total loss. Focal Loss: Focal Loss was created to deal with single-stage object detection where the background and foreground classes are extremely imbalanced during training (e.g., 1:10000). Within a KITTI point cloud, our network frequently gives more negative predictions than positive ones. Unfortunately, there are just a handful of basic facts that produce only 4–6 positives each. As a result, the gap between the back-

3D Object Detection in Point Cloud Using Key Point Detection Network

25

ground and foreground classes widen. To fix this problem, the authors of RetinaNet [29] established an effective single-stage loss which is known as focused loss. We have also employed this loss in our model. The classification loss is expressed as follows: F L ( pt ) = − αt (1 − pt )γ log( pt ) α and γ are the focused loss parameters, while pt is the model’s predicted probability. In our training, we utilize the values α = 0.25 and γ = 2. L1 Loss: Least Absolute Deviations is the name of the L1 loss function, also called LAD. To reduce the overall error of all absolute differences between the actual and projected values, the L1 Loss Function is applied: L1 loss f unction =

n ∑

|| ytr ue − y pr edicted ||

i=1

Balanced L1 loss: The object detection job uses the balanced L1 loss function. Problems of localization and classification are tackled using a multi-task loss [30], which is defined as L p,u,tu ,v = L cls ( p, u) + λ [u ≥ 1] L loc (t u , v) Classification and localization are represented by the objective functions L cls and L loc . In L cls , p and u are the prediction and targets. The appropriate regression results with class u are represented by tu . The regression target is v. λ is used to fine-tune the weight loss during multi-task learning [31]. The balanced L1 loss ratio’s main idea is to promote important regression gradients, such as inlier gradients (accurate samples), re-evaluate the involved samples and functions, and achieve balanced training in classification. Overall performance and precise localization have been achieved. In contrast to conventional smooth L1 loss, which sets the inflexion point to discriminate between inliers and outliers and clips the large gradients of outliers with a maximum value of 1.0 [31], traditional smooth L1 loss is used to derive balanced L1 loss. Localization loss L loc uses balanced L1 loss defined as L loc =



L b (tiu − vi )

i ∈ x,y,w,h

where L loc is balanced L1 loss. For inliers, α is employed to boost gradient, but the gradient of outliers is unaffected. For tuning the upper bound of regression errors, we have used γ; it helps objective functions to better balance the tasks involved. After integrating the gradient equation above, we can get the balanced L1 loss:

26

P. K. Saini et al.

L b (x) =

α (b|x| + 1) ln(b|x| + 1) − α|x|, i f |x| ≤ 1 b L b (x) = γ|x| + C, other wise

where α ln(b + 1) = γ constrains the parameters γ, α, and b. The default values for α and γ are 0.5 and 1.5, respectively. We have used adaptive learning for 300 epochs and it will reduce automatically. According to AP, the effects of BEV and 3D detection were tested. This learning rate is advised to improve the model’s accuracy. As described for training, it is important to fine-tune the rate of learning. The exponential decaying learning rate is a widely used strategy. He et al. [27] decrease the rate by 0.1 in each of 30 epochs. This step may be termed as “step decay”. Szegedy et al. [32] reduce the rate at 0.94 for every two epochs. As an alternative, Loshchilov et al. suggest a cosine annealing method [33]. A condensed form where the learning rate is reduced from the initial value to 0 is by applying the cosine function. . T represents batches, then for each batch .t, the rate of learning .ηt is computed as ηt =

( ( )) 1 tπ 1 + cos η 2 T

where the starting learning rate is .η. This scheduling is termed as “cosine” decay. Our network was built with the popular deep learning PyTorch framework and the machine Intel Xeon G-6248 40 cores, 2.5 GHz CPU, and NVidia V100 GPUs. We have padded the original image to 604 .× 604 for both training and testing purposes. We impose the 3D bounding box of ground truth on the images and also use data augmentation by flipping the bounding box and images as well.

5 Results We trained our model using the KITTI dataset using 6000 samples, and the different training losses are shown in Fig. 4. The average loss or overall loss, centre-offset loss (which indicates how far an object is from the centre of the image), dimension loss (which indicates how different an object’s dimensions are predicted to be), direction loss (which indicates how different an object’s predicted direction is), heatmap centre loss (which indicates how different an object’s predicted centre is), and z coordinate loss (which indicates how different an object’s predicted height difference) all decrease as the number of steps increases. The experimental results are compared with the base model as well, as shown in Table 1. We can see that in 3D object detection for the car class, in easy samples

3D Object Detection in Point Cloud Using Key Point Detection Network

27

Fig. 4 Graph showing different losses while training on 6000 samples; here Y-axis is loss value and X-axis is step value Table 1 The table shows the result analysis of 3D detection of Car class using our model and base model RTM3D [8] Method Input type Time 3D detection (%) RTM3D Ours

Image BEV image

0.05 s 0.04 s

Easy (%) 14.41 55.52

Medium (%) 10.34 46.70

Hard (%) 8.77 44.27

accuracy is increased by a factor of .2.85; for medium samples it is improved by a factor of .3.51, and for hard samples accuracy is improved by a factor of .4.04. In Fig. 5 our results are displayed. Front view images with bounding boxes or predicted bounding boxes and BEV images with bounding boxes have been visualized. As we can see, our model is able to predict cars, pedestrians, and cyclists easily. The results show that our model can handle situations which are tough to handle, such as truncated and crowded objects. We have also demonstrated that the proposed technique does accurate localization in different scenarios using Fig. 5 of the BEV and front view images. We do not process the point cloud, which helps in increasing the overall speed of our model. We have also compared our model and base model results in BEV detection as shown in Table 2. As we can see, for easy samples, the accuracy is increased by a factor of .2.92, for medium samples it is increased by a factor of .3.76, and for hard samples it is increased by a factor of .4.20. So, overall we can say that our model has improved accuracy in comparison with the base model, but still it needs a lot of improvements to catch the state-of-the-art results.

28

P. K. Saini et al.

Fig. 5 Sample experimental results, where a and b show the front view taken with the camera. c and d show corresponding front view and BEV with predicted bounding boxes. With image, the bounding boxes are shown with red colour, and in BEV they are shown with cyan colour and red colour shows front of the bounding box Table 2 The table shows the results analysis of BEV detection for Car class using our model and base model RTM3D [8] Method BEV detection (%) RTM3D Ours

Easy (%) 19.17 75.21

Medium (%) 14.20 67.70

Hard (%) 11.99 62.36

6 Conclusions In this work, we provide a 3D object detection method that is faster and more precise. The 3D object detection challenge has been reformulated as a key point detection problem. Many existing 3D object detection algorithms, such as BEV and foresight

3D Object Detection in Point Cloud Using Key Point Detection Network

29

representation, convert cloud point data into 2D presentations, losing most of the local information. We have tailored the key point detection network for 3D detection so that it can produce key points of 3D boxes and other prior knowledge about objects using images that are in the form of bird’s-eye views and contain values for intensity, height, and density. Our model is also using three different loss functions for optimized training, and we are also using an adaptive learning rate, which helps in easily converging the model. Stable and accurate 3D bounding boxes are generated by our approach. With all those changes, we are able to get improved accuracy from the base model RTM3D, but we are unable to compete with the benchmarks. For future work, we are working in this field to get comparative results as benchmarks. Acknowledgements The author acknowledge the Central Computing Facility (CCF) of the IIIT Allahabad, as well as the resources supplied by the PARAM Shivay Facility at IIT-BHU Varanasi, as part of the National Supercomputer Mission. Also, the Ministry of Education, Government of India, has provided financial assistance for this research.

References 1. GISGeography (2022) A complete guide to LIDAR: light detection and ranging. https:// gisgeography.com/lidar-light-detection-and-ranging/ 2. Li Q, Niaz U, Merialdo B (2012) An improved algorithm on Viola-Jones object detector:1–6. ISBN: 978-1-4673-2368-0 3. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893 4. Shi S et al (2021) PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection 5. Chen Y, Liu S, Shen X, Jia J (2019) Fast point R-CNN. Preprint at arXiv: 1908.02990 6. Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2017) Joint 3D proposal generation and object detection from view aggregation. Preprint at arXiv: 1712.02294 7. Chen X, Ma H, Wan J, Li B, Xia T (2016) Multi-View 3D Object Detection Network for Autonomous Driving. Preprint at arXiv: 1611.07759 8. Li P, Zhao H, Liu P, Cao F (2020) RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving in computer vision. In: , Proceedings, part III of the ECCV 2020: 16th european conference, Glasgow, UK, 23–28 Aug 2020. Springer-Verlag, Glasgow, United Kingdom, pp 644–660. https://doi.org/10.1007/978-3-030-58580-8_38. ISBN: 978-3030-58579-2 9. Zheng W, Tang W, Jiang L, Fu C (2021) SE-SSD: self-ensembling single-stage object detector from point cloud. Preprint at arXiv: 2104.09804 10. He C, Zeng H, Huang J, Hua XS, Zhang L (2020) Structure aware single-stage 3D object detection from point cloud:11870–11879 11. Yang Z, Sun Y, Liu S, Jia J (2020) 3DSSD: point-based 3D single stage object detector. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) 12. Shi S, Wang Z, Wang X, Li H (2019) Part-A2 net: 3D part-aware and aggregation neural network for object detection from point cloud. Preprint at arXiv: 1907.03670 13. Zhou Y, Tuzel O (2018) VoxelNet: end-to-end learning for point cloud based 3D object detection. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4490–4499

30

P. K. Saini et al.

14. Lang AH et al (2018) PointPillars: fast encoders for object detection from point clouds. Preprint at arXiv: 1812.05784 15. Shi W, Rajkumar R (2020) Point-GNN: graph neural network for 3D object detection in a point cloud. Preprint at arXiv: 2003.01251 16. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis. https://link.springer.com/article/10.1023/B:VISI.0000029664.99615.94 17. Du L et al (2020) Associate-3Ddet: perceptual-to-conceptual association for 3D point cloud object detection:13326–13335 18. Lihua W, Jo KH (2021) Fast and accurate 3D object detection for lidar-camera-based autonomous vehicles using one shared voxel-based back-bone. IEEE Access:1–1 19. Zimmer W, Grabler M, Knoll A (2022) Real-time and robust 3D object detection within roadside LiDARs using domain adaptation 20. Tarvainen A, Valpola H (2017) Weight-averaged consistency targets improve semi-supervised deep learning results. Preprint at arXiv: 1703.01780 21. Zhao N, Chua TS, Lee G (2020) SESS: self-ensembling semi-supervised 3D object detection:11076–11084 22. Zheng W, Tang W, Jiang L, Fu C (2021) SE-SSD: self-ensembling single-stage object detector from point cloud. Preprint at arXiv: 2104.09804 23. Qi CR, Su H, Mo K, Guibas LJ (2016) PointNet: deep learning on point sets for 3D classification and segmentation. Preprint at arXiv: 1612.00593 24. Pang S, Morris D, Radha H (2020) CLOCs: camera-LiDAR object candidates fusion for 3D object detection. Preprint at arXiv: 2009.00784 25. Chen W, Li P, Zhao H (2022) MSL3D: 3D object detection from monocular, stereo and point cloud for autonomous driving. Neurocomputing 494:23–32. ISSN: 0925-2312. https://www. sciencedirect.com/science/article/pii/S0925231222004593 26. Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) STD: sparse-to-dense 3D object detector for point cloud. Preprint at arXiv: 1907.10471 27. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. Preprint at arXiv: 1512.03385 28. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3354–3361 29. Lin T, Goyal P, Girshick RB, He K, Dolláar P (2017) Focal loss for dense object detection. Preprint at arXiv: 1708.02002 30. Girshick RB (2015) Fast R-CNN. Preprint at arXiv: 1504.08083 31. Pang J et al (2019) Libra R-CNN: towards balanced learning for object detection. Preprint at arXiv: 1904.02701 32. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. Preprint at arXiv: 1512.00567 33. Sun J, Yang Y, Xun G, Zhang A (2021) A stagewise hyperparameter scheduler to improve generalization. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery and data mining, pp 1530–1540 (Association for Computing Machinery, Virtual Event, Singapore). ISBN: 9781450383325. https://doi.org/10.1145/3447548.3467287

IRIS and Face-Based Multimodal Biometrics Systems Vaishnavi V. Kulkarni, Sanjeevakumar M. Hatture, Rashmi P. Karchi, Rashmi Saini, Shantala S. Hiremath, and Mrutyunjaya S. Hiremath

1 Introduction Biometrics enable identity and authentication security since each person has unique features and characteristics. Credit cards, ID cards, and passports can be stolen, lost, forgotten, or damaged. A biometrics system requires stages. Developing a sound system requires feature extraction, categorization, and identification. Dust on the camera lens or a moving or rotating subject can cause issues. Machine learning automatically extracts features, categorizes data, and finds patterns without programming. Iris identification also used deep learning. Numerous studies used deep learning for segmentation, identification, and classification. Biometrics are physical (facial, hand, iris, and fingerprint) and behavioral (keystroke, signature, voice, and stride) features that cannot be forgotten or manufactured. Multimodal biometrics systems examine multiple parameters, increasing recognition and authentication. Public transit, airport security, the military, banking, commercial applications, IT firms, universities, and national and state governments use face and iris recognition [1].

V. V. Kulkarni (B) Alva’s Institute of Engineering and Technology, Moodbidri, Karnataka 574225, India e-mail: [email protected] S. M. Hatture · R. P. Karchi Nagarjuna College of Engineering, Bengaluru, Karnataka 562164, India R. Saini G. B. Pant Institute of Engineering and Technology, Pauri Garhwal, Uttarakhand 246194, India S. S. Hiremath Sony India Software Centre, Pvt Ltd., Bangalore, Karnataka 560103, India M. S. Hiremath eMath Technology, Pvt. Ltd., Bangalore, Karnataka 560072, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_3

31

32

V. V. Kulkarni et al.

Fig. 1 Different levels of fusion

Remote, touchless facial capture is not invasive. The face alone cannot distinguish twins. Iris patterns are left and right. Environmental factors affect face recognition’s race detection. Face recognition uses neural networks and transfer learning with VGG16 and ResNet-50. Daugman’s segmentation and normalization approaches have high identification rates. Deep learning algorithms quickly verify Iris and face features. Figure 1 shows multimodal biometric fusion levels. Unimodal biometrics use one source of information for authentication and suffer from noise, non-universality, intra-class variance, inter-class variance, and spoof attacks. Multimodal biometrics solve these issues. Multimodal biometrics use numerous aspects to improve recognition. One photo can capture facial and iris biometrics. Remote face biometrics are not invasive. Twins have consistent left and right iris patterns. The proposed research intends to build a multimodal biometrics system that combines face and iris biometrics. It extracts Iris and Face biometrics data for person recognition via transfer learning.

2 Literature Survey The techniques and methodologies contributed to Iris, Face, and the fusion of both traits. The biometric recognition by various researchers in the form of research papers, articles, white papers, and patents published are summarized in the following.

2.1 Iris Biometrics Systems Iris recognition comprises multiple steps. First, an iris scanner takes a picture of the iris, then HE, median, and gamma correction algorithms are employed for preprocessing. CNN, PNN, GLCM, HOG, HD, BGM, and 2-D-Gabor Filter are feature

IRIS and Face-Based Multimodal Biometrics Systems

33

extraction methods. Localization and segmentation use the Hough Transform and astute edge detector. Normalization utilized Daugman’s Rubber Sheet model, and WED, Jaccard, Dice, Hamming, Euclidean, and Mahalanobis distances for template matching. Precision is attained 85–90% of the time.

2.2 Face Biometrics Systems Face recognition uses frequency domain filters, contrast-constrained adaptive histograms, adaptive histogram equalization, and scaling (CLAHE). Face detection features can be extracted using LBPH, PCA, LDA, Eigen face, and Viola–Jones. Transfer learning models and CNN are employed. AR, VTU-BEC-DB, LFW, AT&T, ORL, PASCAL, YALE, FERET, UBIRIS, FDDB, UFDD, FRGCV2, and YouTube faces are accessible. SVM, cascade classifiers, Euclidean distance, KNN, Fuzzy logic, and CNN are used for classification.

2.3 Face-Iris Multimodal Biometrics Systems Dimensionality reduction methods include PCA and SRKDA. KPCA is used to extract features. Matching uses Hamming, Euclidean, Mahalanobis, Jaccard, Dice Coefficients, WED, and distances. Feature, score, and decision-level fusion are most common. Table 1 gives the survey of the multimodal system.

3 Methodology Multimodal iris-face biometrics and its mechanism are described. Transfer learning “transfers” the expertise of a pre-trained NN model to a more straightforward problem. Some hidden layers may be frozen during transfer training to accelerate calculation. Many approaches exist to employ transfer learning. To extract the image’s features and train the SVM classifier, it can employ the ConvNet model’s layer. Figure 2 shows the flow. Transfer learning fine-tunes a neural network with the target dataset and saves time compared to training from scratch. This essay uses transfer learning twice. In one, the pre-trained model’s transferability is tested; in the other, it is used to transfer knowledge and abilities for the new target task. Using a trained SVM classifier, the transferability of the pre-trained model on the VISA dataset may be pre-tested. Choosing a suitable pre-trained model is key to applying transfer learning. The training dataset’s features were extracted using ResNet-50’s desired feature layer and the pre-trained VGG16 model. Using 1200 training samples, a multi-class SVM classifier was trained (300 Left eyes, 300 Right eyes, and 600 Faces). After

34

V. V. Kulkarni et al.

Table 1 Multimodal biometrics systems using ıris-face trait References

Methodologies/ Fusion level Feature extraction

Dataset used

Zahidur Rahman et al. [2]

Kernel Principal Component Analysis (KPCA)

Ammour et al. [3]

PCA, libor masek Score and YALE and CASIA method, 1-D log decision-level V1.0, ORL and CASIA gabor filter fusion weighted sum rule

Euclidean distance YALE and CASIA V1.0 = 97.78%, and ORL and CASIA V1.0 = 100%

Ammour et al. [4]

2-D Log Gabor filter, SRKDA, Viola–Jones algorithm

Feature, CASIA iris distance score, and database decision-level fusion, OR rule

Euclidean distance EER is 0.24%

Bouzouina 2-D Log Gabor and filter, SRKDA, Hamami [5] Viola–Jones algorithm

Score and CASIA iris distance decision-level database and sum, max, min, weighted sum rules

Euclidean distance FAR = 0.06% at GAR = 99.5%

Score and FERET and CASIA V3 feature-level fusion, sum, max, min, weighted sum rules

Classifier and accuracy (AL-CNN) ant Lion-Convolutional Neural Network, 97.7%

Ammour et al. [6]

DCT, PCA, gabor Feature-level filter, zernike fusion moment, genetic algorithm

CASIA-IrisV3-interval Support Vector Machine (SVM) 98.8%

Eskandari and Toygar [7]

1-D Gabor Score-level ORL and wavelets, hough, fusion, CASIA-V3-interval snake, and min–max rule distance, regularized-level set (DRLS), DCT, and PCA

Hamming distance 98%

Kagawade and Angadi [8]

LBP, LDA, Mean–Variance Normalization (MVN)

Nearest neighbor classifier 97.0%

Score-level ORL, FERET and weighted, CASIA and UBIRIS sum based rule, min–max normalization

IRIS and Face-Based Multimodal Biometrics Systems

35

Fig. 2 The fundamental idea behind transfer learning

retrieving test features from test sample sets, the trained classifier’s accuracy was tested. SVM classifier accuracy data illustrate how transferrable the model is when fusing.

3.1 Iris Biometrics System The Iris recognition architecture based on the transfer learning approach is proposed and shown in Fig. 3. Figure 4 shows a few examples of Iris images from the VISA database. Iris Segmentation/Localization. Iris segmentation removes non-essential parts of an eye image, such as eyelids and eyelashes, for feature extraction. Locating the pupil and iris in an eye image is iris localization. The system will not detect the person if the iris is not found. Segmentation helps determine inner and exterior boundaries. Daugman’s Integro-Differential operator, active contour models, and Hough Transform [9] are segmentation approaches. The proposed method leverages Daugman’s Integro-Differential Operator (DIO) for its iris recognition performance

Fig. 3 The proposed architecture for ıris recognition

36

V. V. Kulkarni et al.

Fig. 4 Sample Iris images of the VISA database from a right eye and b left eye

and the first derivative information to compute faster. Figure 5 shows a segmented iris input image and output. The DIO segments the arcs of the upper and lower eyelids, circular iris, and pupil regions. Daugman’s operator searches the input image pixel by pixel, as defined in Eq. 1: | | | | ∮ | I (x, y) || ∂ | max(r,x0 ,y0 ) |Gσ (r) ∗ ds| | | ∂r 2π r | | (r,x0 ,y0 )

Fig. 5 a Input ımage and b ıris segmentation

(1)

IRIS and Face-Based Multimodal Biometrics Systems

37

Fig. 6 Daugman’s rubber sheet model

where I(x, y) is the eye image, r is the radius, Gσ (r) is a Gaussian Smoothing function with scale σ that searches iteratively for a maximum contour integral, * denotes convolution, and S is the contour of the circle given by the parameters (r, x 0 , y0 ). The operator searches for the circular path by adjusting the maximum pixel values by changing the circular contour’s radius and center x and y positions to determine the eyelids’ correct location [10]. Normalization: Normalization transforms the iris into a rectangle. It reduces segmented iris images so features can be retrieved. Daugman’s Rubber Sheet Model, as shown in Fig. 6, Wilde’s Image Registration Technique, and Virtual Circles are normalization methods. The rubber sheet model eliminates iris image discrepancies. Daugman presented a rubber sheet model to normalize the segmented iris region that remaps all Cartesian points into a polar coordinate (r, θ ) system. The non-concentric polar representation is normalized to a rectangular block of a specific size [11]. The normalization by Daugman’s Rubber Sheet Model, that is, the iris region is remapped to the pair of polar coordinates that is represented by (r, θ ) into a Cartesian coordinate (x, y) and is given by Eq. 2: I (x(r, θ ), y(r, θ )) → I (r, θ )

(2)

x(r, θ ) = (1 − r)xp (θ ) + rx1 (θ )

(3)

y(r, θ ) = (1 − r)yp (θ ) + ry1 (θ )

(4)

where

where I(x,y) is an image of the iris, (x, y) are Cartesian coordinates, (r, θ ) are the corresponding polar coordinates, and xp , yp , x1 , y1 are the pupil and iris border coordinates along the direction of angle θ given by Eqs. 3 and 4. The sample input iris image and output produced by Daugman’s rubber sheet model are shown in Fig. 7.

38

V. V. Kulkarni et al.

Fig. 7 a Input ıris ımage and b rubber sheet model

Feature Extraction by ResNet-50 Model. ResNet-50 symbols 2, 3, and 5 indicate the number of blocks in a 3-channel picture input layer. The model exhibits a variableinput convolution block and a constant-input identity block. Figure 8 shows the experiment’s ResNet-50 model. The following diagram shows the network’s basic structure. ResNet-50 has 16 processing blocks and two shortcut modules. One is a convolutional layer-free ID block module (the input has the same dimension as the output). Second is convolutional block modules. The shortcut path has a convolutional layer (the input dimension is less than the output). These modules have layered bottleneck architectures (1X1, 3X3, 1X1 convolutional layers). VISA Iris is divided into training and testing datasets. ResNet-50 DCNN extracts features from the training dataset iris pictures. Knowledge base stores extracted features. Knowledge base feature vectors train a multi-class SVM classifier. Query/

Fig. 8 ResNet architecture for proposed model

IRIS and Face-Based Multimodal Biometrics Systems

39

test photos are identified using a multi-class SVM classifier. The proposed system’s efficacy is evaluated using the confusion matrix created by the multi-class SVM classifier [12].

3.2 Face Biometrics System Figure 9 depicts the proposed face recognition architecture based on transfer learning. Figure 10 shows a few examples of Iris images from the VISA database. Face Detection: The AdaBoost algorithm for face detection is effective because it has a low false positive rate, can recognize faces in real time, and is versatile [13, 14]. This method is appealing since it can recognize additional items with minor changes. Figure 11 shows a segmented input image and output. The network receives a shape-and-size-specific picture (224, 224, and 3). The first two layers feature 64 channels, 3X3 filters, and the same padding. After a stride (2) top pool layer, the following two levels feature 128 and filter size convolution layers (3, 3). Then, a maximum pooling layer of stride (2, 2) was applied. Then come two convolution layers with filter sizes three and 256 filters. After that, two sets of three convolution layers follow. Each has 512 filters with the same size and padding. Then, the image is transmitted to a two-layer convolution stack. The convolution and max-pooling filters are 3 × 3. Some layers use a 1 × 1 pixel to alter input channels. Layers do this. After each convolution layer, one pixel is padded to protect the image’s spatial information. Always use this padding. Figure 12 depicts the training-and-testing model. Facial Feature Extraction by VGGNet-16 Model VISA Iris is divided into training and testing datasets. The training dataset’s iris feature extraction using VGGNet-16 DCNN. Knowledge base stores extracted features. Knowledge base feature vectors train a multi-class SVM classifier. Query/test photos are identified

Fig. 9 The proposed architecture for face recognition

40

V. V. Kulkarni et al.

Fig. 10 Sample face images of the VISA database

Fig. 11 a Input ımage and b face detection

using a multi-class SVM classifier. The proposed system’s effectiveness is evaluated using the confusion matrix created by the multi-class SVM classifier to forecast persons.

IRIS and Face-Based Multimodal Biometrics Systems

41

Fig. 12 VGGNet architecture for proposed model

3.3 Multimodal Biometrics System Using Face and Iris The features extracted from the face and iris are fused at the feature level using the average rule, and score-level fusion is performed using the maximum rule to improve recognition accuracy. Feature-level fusion. Figures 8 and 12 demonstrate pre-trained ResNet-50 and VGGNet-16 models. ResNet-50 employs 7 × 7 kernels in the first convolutional layer to filter images for global context. VGGNet-16 uses 3 × 3 convolution kernels to filter images and acquire local, high-resolution information. Because both networks are good at learning new properties and generalizing them, we are considering combining them to get global and local information. This improves performance by better distinguishing individuals and strengthening our system. In a fully convolutional network, fc1000 corresponds to a 224 × 224 input window. Fully convolutional networks may extract candidate region features with one forward computation. Due to the 224 × 224 network input, we can detect a person in a 28 × 28 area. The face and iris deep convolution features are categorized using multi-class SVM [15]. Figure 13 shows how Iris and Face are merged using the average rule and classified using a multi-class SVM classifier. The features extracted from face and iris traits are fused by an average method using Eq. 5: AF = (IF + FF )/2

(5)

Average Feature Vector

Iris Features

Face Features

IF = {I1, I2,..., I1000}

FF = {F1, F2,…, F1000}

Fig. 13 Feature-level fusion

42

V. V. Kulkarni et al.

The proposed hybrid multimodal biometrics system is shown in Fig. 14. Score-Level Fusion. Multi-biometrics systems often use score-level data fusion. This approach initially computes recognition results for each unimodal system. Next, recognition score data are turned into a multimodal system to improve system performance. Figure 15 shows how the scores from all three modal systems are merged to improve the system’s overall performance [16]. Multi-class SVM generates iris and face score vectors. The fundamental rule is used to score created vectors. Labels sort the decision value. Figure 19 shows the decision score.

Unimodal - Face

Unimodal –Iris

Face Dataset

Iris Dataset Multimodal – Feature Fusion

VggNet-16 Features

ResNet-50 Features

SVM Classifier

SVM Classifier

Score and Decision

Score and Decision

Feature Level Fusion (Average Rule)

SVM Classifier

Score and Decision

Score Level Fusion (Maximum Rule)

Final Prediction

Hybridmodal

Fig. 14 Proposed hybrid multimodal biometrics system

Iris Scores IS1 : IS30

Face Scores FS1 : FS30

Max Scores max (IS1, FS1) : max (IS30,FS30)

Fig. 15 Score-level fusion using maximum rule

Identify person

IRIS and Face-Based Multimodal Biometrics Systems

43

First, the facial, iris, and feature fused classification score vectors are computed independently and normalized according to Eq. 6 at the minimum value. The second phase combines iris, face, and composite feature scores using Eq. 7. Using the fused system’s maximum performance threshold, a decision is made: SC i =

SC i − minsci maxsci − minsci

Fscore = max(SC Ii , SC Fi , SC Fui )

(6) (7)

where SC i represents the score normalization of the face iris and feature fused sample i, and minsci and maxsci are the minimum and maximum values in the score vector of sample I, respectively. SC fu , SC fi , and SC ri are the score values of the face, iris, and feature fused sample I, respectively.

4 Experimental Results and Analysis VISA is collected for experimentation. The VISA database was established with a simple image acquisition setup and biometric sensors. The figure exhibits VISA Iris and Face pictures. Each person has three right-eye and three left-eye photos in the database, age 10 through 90, and 67 men and 23 women. The 640 × 480 jpeg images were taken indoors and outdoors. Figure 16 shows iris database images. Figure 17 shows a face database. The experiments use 100 VISA records [8]. A confusion matrix evaluates system performance. The confusion matrix shows each cell’s total observations. Rows of the confusion matrix indicate the genuine class, while columns represent the anticipated class. Diagonal and off-diagonal cells correspond to appropriately categorized observations: Specificity =

TN TN + FP

(8)

Sensitivity =

TP TP + FN

(9)

Precision =

TP TP + FP

(10)

TP + TN TP + TN + FP + FN

(11)

Accuracy =

The suggested system evaluates the VISA dataset to assess the precision of classification acquired from test and training data in the layers “Softmax” and then categorizes these features at FC using a separate classification model (SVM). All models

44

V. V. Kulkarni et al.

Fig. 16 Sample iris images of the VISA database

(Sensitivity, Accuracy, Specificity, and Precision) were evaluated using Eqs. 8–11. Four experiences were used to evaluate recognition performance: iris, face, feature level, and score-level fusion. The design used 80% of the database, testing 20% (i.e., each person has ten images, eight images for design and two for testing). This experiment uses 480 designs and 120 test photographs. The design picture is divided into 336 training shots (each receiving seven) and 144 validation photos (70% training, 10% validation) (each person one image). The recognition rates for feature-level fusion are 93.334% and 96.7354%, respectively. The receiver operating characteristic curve illustrates a classification model’s performance at each level (ROC curve). Figure 18 displays the actual positive and false favorable rates. Figure 19 compares Iris, Face, and multimodal biometrics. The performance comparison of unimodal and multimodal biometrics systems is shown in Table 2. A total of 100 persons are considered, and six images of every person are considered for Iris and Face. The accuracy achieved for Iris recognition is 86.181%, and for face, recognition is 89.938%. The accuracy for feature-level and score-level fusion achieved is 93.334 and 96.735%.

IRIS and Face-Based Multimodal Biometrics Systems

Fig. 17 Sample face images of the VISA database without pre-processing Fig. 18 ROC curve for ıris, face, feature, and score-based hybrid methods

45

46

V. V. Kulkarni et al.

Comparison of Unimodal and Multimodal Accuracy (%)

Fig. 19 Performance comparison of ıris, face, and multimodal biometrics systems

100 95 90 85 80

86.181

Iris

89.938

Face

93.334

Feature

96.735

Score

Methods of Comparison

Table 2 Performance comparison of unimodal and multimodal biometrics systems Modality

Traits

Fusion level (%)

Accuracy (%)

Sensitivity (%)

Specificity (%)

Precision (%)

Unimodal

Iris



86.181

84.53

88.57

86.35

Face



89.938

84.53

88.57

86.35

Iris

Feature level

93.334

90.38

92.65

96.65

Face

Score level

96.735

95.49

94.35

97.75

Multimodal

5 Conclusion DCNNs are becoming key in many fields because they automatically process data. Auto-acquires traits. Without manual feature extraction, this clarifies data. This article shows iris feature extraction and classification. Biometrics for user authentication is evolving. Face biometrics simplify online identification verification. Iris’ patterns repeat. Improved recognition with face and iris data. Multi-class SVM classified VGG16 and pictures using ResNet-50 features. Average and maximum rules combine scoring and features. The experiment used a 2.50 GHz 2.70 GHz Intel Core i5-7200U CPU and 8 GB RAM. Faster score-level fusion than feature-level (0.30 vs. 0.62 s). Matching score-level fusion with the strongest fusion rule outer is more accurate than feature-level fusion. The recommended techniques fuse features 93.334% and scores 96.7355%. Other datasets can use the stated approaches.

References 1. Hatture SM, Kulkarni VV (2022) Face and ıris based multimodal biometric systems: ıssueschallenges and open-research problems. J Emerg Technol Innov Res (JETIR) 9(6):32–42 2. Zahidur Rahman Md, Hasan Hafizur Md, Rahman, Mujibur Rahman Majumdar Md (2019) Distinguishing a person by face and iris using fusion approach. In: International conference on sustainable technologies for ındustry 4.0 (STI) (Dhaka), 24–25 December 2019

IRIS and Face-Based Multimodal Biometrics Systems

47

3. Ammour B, Bouden T, Boubchir L (2018) Face-ıris multimodal biometric system based on hybrid level fusion. In: 2018 41st ınternational conference on telecommunications and signal processing (TSP), pp 1–5 4. Ammour B, Bouden T, Boubchir L (2018) Face-ıris multimodal biometric system using multiresolution log-gabor filter with spectral regression kernel discriminant analysis. The Institution of Engineering and Technology IET Biometrics 7(5):482–489 5. Bouzouina Y, Hamami L (2017) Multimodal biometric: ıris and face recognition based on feature selection of iris with GA and scores level fusion with SVM. In: 2017 2nd ınternational conference on bio-engineering for smart technologies (BioSMART), pp 1–7 6. Ammour B, Bouden T, Amira-Bird S (2017) Multimodal biometric identification system based on the face and iris. In: 2017 5th ınternational conference on electrical engineering—boumerdes (ICEE-B), pp 1–6 7. Eskandari M, Toygar Ö (2014) Fusion of the face and iris biometrics using local and global feature extraction methods. Signal Image Video Process 8 8. Kagawade VC, Angadi SA (2021) VISA: a multimodal database of face and ıris traits. multimedia tools and applications 9. Patil M, Gowda S (2017) An approach for secure ıdentification and authentication for biometrics using ıris. In: International conference on current trends in computer, electrical, electronics and communication (CTCEEC), pp 421–424 10. Angadi SA, Kagawade VC (2017) Iris recognition: a symbolic data modeling approach using Savitzky-Golay filter energy features. In: International conference on smart technologies for smart nation (SmartTechCon), pp 334–339 11. Wang Z, Li C, Shao H, Sun J (2015) Eye recognition with mixed convolutional and residual network (MiCoRe-Net). IEEE Access 14(8) 12. Hatture SM, Kulkarni VV (2022) Iris biometric based person ıdentification using deep learning technique. Int Res J Eng Technol (IRJET) 09(07):1786–1790 13. Angadi S, Hatture S (2019) Face recognition through symbolic modeling of face graphs and texture. Int J Pattern Recognit Artif Intell 14. Angadi SA, Kagawade VC (2018) Face recognition through symbolic data modeling of the local directional gradient. Springer Nature Singapore Pte Ltd.; Shetty NR et al (eds) Emerging research in computing, ınformation 15. Al-zanganawi W, Kurnaz S (2020) Human biometrics detection and recognition system using SVM and genetic algorithm ıris as an example. In: 4th ınternational symposium on multidisciplinary studies and ınnovative technologies (ISMSIT), pp 1–4 16. Kaur P, Sood K (2021) Facial-iris automatic multimodal biometric identification system usingCNN method. J Emerg Technol Innov Res (JETIR) 8(4)

A Survey on the Detection of Diseases in Plants Using the Computer Vision-Based Model Sowbhagya Takappa Pujeri

and M. T. Somashekara

1 Introduction Agriculture has evolved much more than just feeding the ever-growing population [1]. Rapid growth of the population has resulted in a high demand of agriculture yields. Agriculture sector is struggling to cater to the needs of ever-growing demand of the population. Due to irregular rainfall, drought or different climatic conditions, most of the time agricultural crops suffer from a variety of plant diseases. These diseases affect not only the production quantity but also the quality of the crop yield. This results in catastrophic or chronic losses, due to which Indian farmers suffer to recover from bank loans and finally attempt suicide. The traditional way of manually inspecting the visual quality cannot be systematically defined as the traditional way is sometimes not predictable and is inconsistent. Identifying diseases in the field of agriculture is one of the booming areas of research. Detection of diseases in the early stage helps farmers to reduce the usage of pesticides. We can detect the extent to which the disease has affected using image segmentation. Image segmentation is the process of splitting image into different segments based on similarity [1]. There are various techniques to detect pathologies. A few diseases do not show any traits, or a few of them will be visible only when it is too late to take an action [2]. In situations like this, disease detection and classification of agricultural yields can be done through the usage of Sophisticated Machine-based approaches. This paper throws light on the review of existing reported techniques which are useful in the detection of diseases [3].

S. T. Pujeri (B) · M. T. Somashekara Department of Computer Science and Applications, Bangalore University, Bengaluru 560056, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_4

49

50

S. T. Pujeri and M. T. Somashekara

2 Key Issues and Challenges in the Field of Disease Analysis Diseases are identified automatically by considering the input image data from different input sources. In this survey paper, various research papers are taken into consideration for the identification and classification of leaf diseases, and key issues are discussed: Quality of the images taken. Background of images captured. Light conditions at which image was captured. Large datasets need to be gathered. Noises have to be removed from images. Image segmentation has to be done to spot out the disease on leaves. More than one disease affecting a leaf. Trained data images have to be prepared. Classification of healthy and unhealthy leaf images. Different parameters such as size, texture and colour of the leaves have to be taken into consideration. Periodical observations have to be carried out. Appropriate disease identification. Image processing with machine learning algorithm techniques that can detect the diseases accurately are suggested in review papers.

3 Literature Survey The authors Anilkumar M. G., Karibasaveshwara T. G., Pavan H. K. and Sainath Urankar D. focus on the early detection of diseases such as mahali roga, bleeding in the stem and yellow spot on the leaves and provides remedies to the same using Convolutional Neural network. Experimentation was conducted by capturing an affected leaf and healthy leaf image dataset which consisted of 620 images. Steps followed here are image pre-processing, feature extraction, training and classification. Overall accuracy obtained is estimated to be 88.46% [4]. The authors S. Siddesha, S. K. Niranjan and V. N. Manjunath Aradhya use colour histogram and moments as features with K-NN classifier. The experiment was carried out on a dataset comprising 800 images, out of which 4 classes made use of 2 colour features and 4 distance measures with K-NN. Accuracy obtained by classifying is 98.13% achieved for 20 per cent training with K = 3 and Euclidean distance measure is used for colour histogram features [5]. The authors Gittaly Dhingra, Vinay Kumar and Hem Datt Joshi decribe an agricultural application which makes use of computer vision-based models in order to identify and classify the plant leaf disease. This study has considered major diseases affecting the plants [6].

A Survey on the Detection of Diseases in Plants Using the Computer …

51

The authors Shitala Prasad, Sateesh K. Peddoju and Debashis Ghosh describe a mobile-based client-server system design. To detect leaf disease, Gabor Wavelet Transform (GWT) is used. Once the leaf is captured, mobile pre-processing is done. Leaf image analysis was done using k-means unsupervised algorithm. Extraction of features was done using GW conversion. Here, the author has used his own created dataset. He has proposed that in future, leaves taken in complex background with different conditions of light can be processed efficiently [7]. The authors Shanwen Zhang, Zhuhong You and Xiaowei Wu describe the segmentation of leaves, which helps in feature extraction. The neighbouring pixels are grouped into homogeneous regions based on some particular feature such as brightness, texture and colour. Super pixel clustering is used to achieve this segmentation. Further, the authors suggest that the use of the Expectation Maximization (EM) algorithm is probably a good approach for colour segmentation [8]. The author Dhanuja K. C. (2020) describes the use of the K-Nearest Neighbour (KNN) algorithm to find out the diseases in areca nut. Texture factor had been considered for the grading of the areca nuts. For training and testing the model, a total of 144 areca nut samples were used which included 49 positive samples, 46 poor samples and 49 unhealthy samples [9]. The authors Ajit Danti and Suresh advise a technique for segmenting and classifying raw areca nuts. In this paper, the novel method proposed for the classification of areca nut into two classes is based on colours red and green [10]. The authors Manpreet Sandhu and Pratik Hadawale have developed an automated system for detecting traits using leaf image classification. Here, the images are automatically captured with the help of unmanned aerial vehicle (UAV) with a camera. Texture factor is considered. The presence of spots and rotting areas in the leaf will be automatically detected by using the machine learning algorithm [11]. The authors Patil Supriya, Mule Gitanjali and Labade Vidya describe the identification of diseases in pomegranate, and it also suggests the solution for this condition. The expected advanced system includes the pre-processing of images, segmentation, feature extraction from image and classification. Images are resized during early pre-processing. Colour segmentation is carried out. Colour morphology and textures (Gabor filter) are used for extracting the features. Minimum distance classifier is used [12]. The authors Swathy Ann Sam et al. have used algorithms such as SVM, KNN, CNN and Decision trees for the identification of traits in leaves. In this work, a captured image is uploaded to the developed system. The algorithm will test the uploaded image sample and checks whether the sample image is affected by any diseases or not; it prints the identified disease (86%) [13]. The authors Keyvan Asefpour Vakilian and Jafar Massah in the paper “An Artificial Neural Network Approach to Identify Fungal Diseases of Cucumber (Cucumis sativus L.) Plants Using Digital Image Processing” have described the usage of Artificial Neural Network (ANN) model with three layers to detect 2 different types of fungi in cucumber plant leaves [14]. The authors Weizheng, S., Yachun, W., Zhanliang, C., and Hongda, W. describe the segmentation of an image on the leaf region. The Otsu method is used. In the

52

S. T. Pujeri and M. T. Somashekara

HSI colour system, to segment a diseased spot and to minimize the disturbance of illumination changes and the vein, the H component was used. Later, the Sobel operator was used to segment and examine affected spots. At last, diseases were graded by calculating the quotient of disease spot leaf areas. Researchers suggest that this technique, which is used to grade the plant leaf spot diseases, is faster and accurate [15]. The authors Mohammed Brahimi, Kamel Boukhalfa and Abdelouahab Moussaoui, in the paper “Deep Learning for Tomato Diseases: Classification and Symptoms Visualization,” have used a deep learning approach. Convolutional Neural Network (CNN) is used for disease classification. Pre-trained models are used in experiments. Occlusion techniques are used to localize the disease regions, which help humans in understanding the diseases. The authors have taken datasets from Goodfellow, Bengio, etc. [16]. The author Kuo-Yi Huang considered 6 geometric features such as the area, perimeter, principle and secondary axis length, axis number and compactness of the areca nut image, 3 colour R, G and B bands features such as the mean grey level of an areca nut image and application of NNs and image processing techniques for the detection and classification of the quality of areca nuts [17]. The authors H. Al-Hiary, S. Bani-Ahmad, M. Reyalat, M. Braik and Z. AlRahamneh, in the paper “Fast and Accurate Detection and Classification of Plant Diseases,” have grouped pixels on few features into K total classes. When more than one disease has affected a leaf, then more than one cluster is present. Further, the future enhancement has been mentioned that the research needs to be carried to increase the accuracy of detection [18]. The authors Anandhakrishnan M. G. et al. have done image pre-processing, removal of noises from the image, normalization of intensities, feature extraction, reflections removal and masking portions of the image. Then, refined images were used for training. A deep CNN model was trained for image classification. TensorFlow library for numerical computations was used. In this, a dataset was created for training a neural network for the purpose of classification. The dataset has been collected manually from the field [19]. The authors Ashish Nage and V. R. Raut describe the image processing for the purpose of detection of diseases affecting the plants. Here in this paper, the author developed an Android application that provides help to the farmers in identifying plant disease by uploading the image of a leaf to the system. The factor considered here was colour. Convolution Neural Network algorithm was used [20] (Table 1).

4 Conclusion In this paper, a brief glimpse of various image processing techniques such as pre-processing, noise removal, feature extractions, clustering and segmentation is presented.

A Survey on the Detection of Diseases in Plants Using the Computer …

53

Table 1 A comparative study Sl. Authors No.

Problem addressed

Techniques used

1

Anilkumar et al.

Early detection of diseases such as mahali or kole roga, yellow leaf spot and stem bleeding

Convolutional Yellow spots Neural Network (CNN)

Diseased and 88.46 healthy images dataset of 620 images

2

Siddesha et al.

To detect disease

KNN classifier Euclidean distance measure Manhattan distance, Minwoski

Colour histogram and colour moments

800 images 98.13 of 4 different classes were considered. 2 colour features. 4 distance measures with KNN

3

Gittaly Dhingra et al.

To identify and classify plant leaf disease

PNN classifier

Colour and texture

Images are acquired from digital cameras

4

Shitala Prasad et al.

To detect leaf To detect leaf Texture, colour Created own disease disease Gabor dataset Wavelet Transform (GWT) and GLCM is used KNN classifier

93

5

Shanwen Zhang et al.

To detect leaf Super pixel Shape and diseases clustering, colour EM—Expect Maximization (EM) algorithm

Images are acquired from digital cameras

85.7

6

Dhanuja K. C.

To detect disease in areca nut

Texture

144 areca nut samples consisting of 49 good image samples, 46 poor and 49 negative samples

7

Ajit Danti et al.

Classification KNN of areca nut classifier

Colour and texture

Images are acquired from digital cameras

K-NN (K nearest neighbour)

Factors considered

Dataset

Accuracy (%)

92

(continued)

54

S. T. Pujeri and M. T. Somashekara

Table 1 (continued) Sl. Authors No.

Problem addressed

Techniques used

Factors considered

Dataset

8

Manpreet Sandhu Developed an Machine et al. automated learning system for algorithms detecting disease using leaf image classification

Texture

UAV with camera is used to automatically capture the leaf images

9

Patil Supriya et al.

Detection of diseases in pomegranate

Minimum distance classifier is used for classification purposes

Colour morphology and texture features

Images are acquired from digital cameras

10

Swathy Ann Sam Detection of et al. disease

SVM, KNN, decision tree, CNN

Spots

Images are acquired from digital cameras

11

Keyvan Asefpour Identify Vakilian et al. fungal diseases of cucumber (Cucumis sativus L.)

Artificial Neural Network (ANN)

Textural

Images are acquired from digital cameras

12

Weizheng et al.

Detection of leaf spot diseases in plants leaf

Otsu method, HSI colour system, segmentation on disease spot regions was done by using Sobel operator

Colour

Images were acquired through digital cameras

13

Mohammed Brahimi et al.

Tomato disease detection

Convolutional Spots, colour Neural and blights Network (CNN) and Occlusion techniques used to localize the disease regions

Datasets taken from Goodfellow, Bengio, etc.

Accuracy (%)

86

99.185

(continued)

A Survey on the Detection of Diseases in Plants Using the Computer …

55

Table 1 (continued) Sl. Authors No.

Problem addressed

Techniques used

Factors considered

Dataset

14

Kuo-Yi Huang

Detecting and classifying the quality of areca nuts

Application of neural networks and image processing techniques

6 geometric features—area, perimeter, the principle axis length, the secondary axis length, axis number and compactness of the image, 3 colour features such as the mean grey level of an image on the RGB bands

Images are acquired from digital cameras

15

Al-Hiary et al.

Detection and plant diseases classification

CCM model and NN classifier

Colour and texture

Images taken 94 from a digital camera

16

Anandhakrishnan Plant leaf et al. disease

Deep learning and Convolutional Neural Network (CNN)

Manually dataset was acquired from the field

17

Ashish Nage et al.

Convolutional Colour Neural Network (CNN)

Images are acquired from digital cameras

Detection of plant disease by uploading a leaf image

Accuracy (%)

This paper just throws light on the approaches that have been used to identify the heterogeneous diseases affecting the plant leaves, so as to reduce the burden on the farmers. Farmers don’t have to worry and roam around in order to take experts’ advice. By carrying out a comparative study of all the above methods on the basis of various factors and parameters, a better method can be brought out in the detection of diseases that affect different parts of a plant, especially leaves, in an efficient, appropriate and effective way.

56

S. T. Pujeri and M. T. Somashekara

References 1. Shedthi BS, Shetty S, Siddappa M (2017) Implementation and comparison of K-means and fuzzy C-means algorithms for agricultural data. In: 2017 international conference on inventive communication and computational technologies (ICICCT), pp 105–108. https://doi.org/10. 1109/ICICCT.2017.7975168 2. Barbedo J (2013) Digital image processing techniques for detecting, quantifying and classifying plant diseases. Springerplus 2:660. https://doi.org/10.1186/2193-1801-2-660 3. Joshi B, Kumar A, Kashyap S, Nagdi N, Vinayak S, Verma D (2021) Smart plant disease detection system. ISSN: 2456-2319 4. Anilkumar MG, Karibasaveshwara TG, Pavan HK, Sainath Urankar D (2021) Detection of diseases in arecanut using convolutional neural networks 5. Siddesha S, Niranjan SK, Manjunath Aradhya VN (2018) Color features and KNN in classification of raw arecanut images. In: 2018 second international conference on green computing and internet of things (ICGCIoT), pp 504–509. https://doi.org/10.1109/ICGCIoT.2018.8753075 6. Dhingra G, Kumar V, Joshi HD (2017) Study of digital image processing techniques for leaf disease detection and classification. Springer-Science 7. Prasad S, Peddoju SK, Ghosh D (2015) Multi-resolution mobile vision system for plant leaf disease diagnosis. Springer, London, pp 379–388 8. Zhang S, You Z, Wu X (2017) Plant disease leaf image segmentation based on superpixel clustering and EM algorithm. Springer 9. Dhanuja KC (2020) Areca nut disease detection using image processing technology. Int J Eng Res V9. https://doi.org/10.17577/IJERTV9IS080352 10. Mallaiah S, Danti A, Narasimhamurthy S (2014) Classification of diseased arecanut based on texture features. Int J Comput Appl 11. Sandhu M, Hadawale P, Momin S, Khachane A (2020) Plant disease detection using ML and UAV. Int Res J Eng Technol V7 12. Bhange M, Hingoliwala HA (2015) Smart farming: pomegranate disease detection using image processing. Procedia Comput Sci 58:280–288. ISSN 1877-0509 13. Sam SA, Varghese SE, Murali P, John SJ, Pratap A (2020) Time saving malady expert system in plant leaf using CNN 13(3) 14. Vakilian KA, Massah J (2013) An artificial neural network approach to identify fungal diseases of cucumber (Cucumis sativus L.) plants using digital image processing 46(13):1580–1588. Taylor and Francis 15. Weizheng S, Yachun W, Zhanliang C, Hongda W (2008) Grading method of leaf spot disease based on image processing. In: 2008 international conference on computer science and software engineering, vol 6, pp 491–494 16. Brahimi M, Boukhalfa K, Moussaoui (2017) A deep learning for tomato diseases: classification and symptoms visualization 31(4):299–315. Taylor & Francis 17. Huang K-Y (2012) Detection and classification of areca nuts with machine vision. Comput & Math Appl 64(5):739–746. ISSN 0898-1221 18. Al-Hiary H, Bani-Ahmad S, Reyalat M, Braik M, AlRahamneh Z (2011) Fast and accurate detection and classification of plant diseases. Int J Comput Appl 17(1):31–38 19. Hanson AMGJ, Joy A, Francis J (2017) Plant leaf disease detection using deep learning and convolutional neural network. Int J Eng Sci Comput 7(3) 20. Nage A, Raut VR (2019) Detection and identification of plant leaf diseases based on python. Int J Eng Res Technol (IJERT) 08(05)

An Approach to Conserve Wildlife Habitat by Predicting Forest Fire Using Machine Learning Technique N. Bhavatarini, S. Santhosh, N. Balaji, and Deepa Kumari

1 Introduction Forest fires are the source of concern because they cause significant major damage to the environment, property, and human lives. As a result, it’s critical to spot the fire early on. One of the many causes of forest fires is global warming, which is caused by an increase in the world’s average temperature. Lightning, thunderstorms, and human error are among the other causes. Wildfires in India have devastated a median of 1.2 million acres of forest each year. In India, temperatures have been warmer than typical in the first 3 months of the year. The month of March 2021 was the third warmest in the last 100 years. Physical models and mathematical models are among the tools available today for predicting the spread of fires in fireplaces. To specify and anticipate fireplace growth in many places, these models draw on a range of knowledge from forest fire simulations and laboratory trials. Simulation techniques have recently been employed to anticipate forest fires; however, they have run across certain challenges, such as computer file accuracy and simulation tool execution time. Machine learning can be categorized under these types, namely, unattended and supervised. Under supervised machine learning methods, we have Regression, ANN, SVM, and decision trees. The information properties do not appear to be tagged in unsupervised learning. As a result, the formula should be used to outline the labels.

N. Bhavatarini School of Computer Science and Engineering, REVA University, Bengaluru, India e-mail: [email protected] S. Santhosh (B) · N. Balaji · D. Kumari NMAM Institute of Technology, Nitte (Deemed to be University), Udupi, India e-mail: [email protected] D. Kumari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_5

57

58

N. Bhavatarini et al.

The formula will figure out the organization of the data collection as well as the relationships between the alternatives.

2 Literature Survey 2.1 Supervised and Ensemble Machine Learning Algorithm This work evaluates the data values in the dataset and predicts each feature’s effect on forest fires. Few methods are used for tackling classification and regression problems. An entire year dataset has been collected for evaluation. Parameters like temperature, wind speed, humidity, burnt forest region, and outside rain are considered for the test. Apart from this, the input data file includes X and Y axes, Fine Fuel Moisture Code, unique park coordinates, Drought Code (DC), Duff Moisture Code (DMC), and Initial Spread Index (ISI). Figure 1 exhibits a sample (first 10 rows) of the UCI forest fire dataset. In this paper, there are two graphs constructed, variance of feature w.r.t. fire and variance of feature w.r.t. each month. The discrepancy calculated for rain is 0.09 which is close to 0; hence it does not affect the model. With the input dataset, we derived that the count of forest fires that occurred is 270 and not occurred is 247. A bar graph is represented displaying the frequency of occurrence of forest fire each month. It was noticed that forest fires were often in August and September, while rare in May and April; the other months reported random. The normalization techniques utilized are Principal Computing Analysis, Feature or Min Max Scaling and label encoding. The classification rate is predicted by machine learning algorithms like KNN, SVM, Bagging, Boosting, and Random Forest. When PCA was applied, among the machine learning techniques, logistic regression had the most excellent classification rate of 68%. When PCA was not involved, gradient boosting mechanism had the highest classification rate of 68%.

Fig. 1 Sample dataset

An Approach to Conserve Wildlife Habitat by Predicting Forest …

59

2.2 Data Mining Techniques to Predict Forest Fires This study aims to determine how to predict fires in Slovenia using various data mining techniques. This study uses forest structure GIS (geographical information system) and weather prediction models like Aladin and MODIS. The collected data is grouped into three categories, (i) GIS data, (ii) MODIS data, and (iii) ALADIN data. The GIS data is the geographic area, including cities, highways, railways, forest lands, distance from the road, etc., captured. MODIS and ALADIN are weather prediction models that capture temperature, evaporation, sun energy, humidity, transpiration, etc. The captured data were analyzed using data mining techniques like random forest, logistic regression, decision trees, boosting ensemble methods, and bagging. Table 3 in Fig. 2 represents the results of the experiments performed with various mining algorithms on the collected dataset. Accuracy, precision, and recall are the metrics considered for the evaluation. Kappa statistics is performed for each dataset. The results show that the bagging of the decision tree offers the best performance compared to all other mining mechanisms in terms of accuracy, precision, and kappa statistics taken for each dataset.

2.3 Data Mining Techniques to Forecast Smoulder Surface of Forest Fires This work uses various available algorithms like SVM and Random Forests. Four unique input parameters are considered for the evaluation, weather attributes, spatial, temporal, and Fire Weather Index (FWI) components. Real-time data was collected from the northeast region of Portugal. FWI is basically used for rating fire hazards. FWI includes six components 1. 2. 3. 4. 5. 6.

Fine Fuel Moisture Code Duff Moisture Code Initial Spread Index Drought Code Buildup Index FWI.

The proposed solution needs four direct weather inputs temperature, rain, wind speed, and humidity. The resolution is based on Support Vector Machines (SVM) and can detect small-scale fires. This approach shows that all types of weather conditions affect the model. Outside temperature is one of the most critical factors. This procedure indicates that all weather conditions affect the model, with the outside temperature being the most important feature, followed by the cumulative drizzle.

60

N. Bhavatarini et al.

Fig. 2 Performances in terms of accuracy, precision, and kappa

2.4 Parallel SVM Based on MapReduce The mode was to use a support vector machine for classification and regression. Large computation and storage requirements increase due to the boost in the number of training vectors, therefore parallel SVM is studied to increase the speed of computing. The large dataset is first handled using MapReduce. MapReduce is an efficient distributed evaluating model to operate on extensive data mining problems. MapReduce is developed in software tools like Hadoop and Twisters. Hadoop and Twisters are open-source MapReduce software. Insistent MapReduce process is not supported by MapReduce in Hadoop. Twisters support both iterative and non-iterative MapReduce and combine tasks. Many SVM models have been developed like .libSV M, .light SV M, .ls − SV M, and so on. . LibSV M is taken as the significant and efficient

An Approach to Conserve Wildlife Habitat by Predicting Forest …

61

SVM model, this will be widely applied in practice due to its excellent property. Using parallelization, training samples are divided into subsections and each subsection is trained with a .libSV M model. The non-support vectors are filtered with .subSV Ms, and the support vectors of each .subSV M are taken as the input of the next layer .subSV M. The global SVM model will be obtained through iteration and this shows that parallel SVM along with MapReduce reduces the computation time.

2.5 Prediction of Forest Fire Danger Using ANN and Logistic Regression Northwestern Spain Galician region is considered for the evaluation. The image produced by the MODIS model is used to predict the Land Surface Temperature (LST). As the higher temperature is associated with lower humidity, which qualifies plants to catch fire, the LST parameter is one of the most impacting parameters. The dataset used for the evaluation considers eight days of land surface temperature, fire, and the year’s history. ANN and logistic regression techniques are used to evaluate forest fire hazards from remote sensing and collect fire history data. Remote sensing inputs are collected on Land Surface Temperature and EVI (Enhanced Vegetation Index). Multiple combinations of input datasets are used for the evaluation with the logistic regression technique. In ANN, all varieties of variables are tested, and the results are compared against the two tested methods. Results indicate high accuracy and recall value in ANN compared to logistic regression. This evaluation helps in determining maps of fire endangerment that can avert fires (Fig. 3).

Fig. 3 Architecture diagram for the proposed model

62

N. Bhavatarini et al.

3 Implementation We can either give a manual input of the data or use the existing dataset that we retrieved from a popular website . K aggle. The user’s dataset will be saved in the system’s database. The stored data will be pre-processed by the system. The model is created using a variety of machine learning methods and trained on pre-processed data. The model is evaluated, and the most accurate algorithm is chosen. The final model will forecast the outcomes. Using datasets on wild animals that are retrieved from the forest department will give a broad insight into how it impacted the wildlife and the extinction of particular animals. The output is in the form of graphs, scatterplot graphs, correlation graphs, and distribution graphs. We are using various algorithms to get the appropriate result.

3.1 SVM Algorithm This method generates an outcome barrier that splits .n-dimensional space into classes. These classes help us to categorize new data points in the future immediately. By using this algorithm, we can draw a specific line between the fire and the non-burning forest area. This helps in stopping the fire from spreading (Fig. 4).

Fig. 4 SVM

An Approach to Conserve Wildlife Habitat by Predicting Forest …

63

3.2 Decision Tree Multiple variable analysis is done using decision trees. They enable the prediction, explanation, description, and classification of a result. It divides the dataset into smaller subsets with lesser associated degrees. This algorithm is used to classify and determine the link between the various parameters that we input.

3.3 Random Forest For classification and regression, it employs an ensemble learning paradigm. In a random forest, the trees in the area unit run in parallel. From the original data, it generates n-tree bootstrap samples. When propagating each tree, it occupies bagging and randomization features to generate a non-relationship on the forest of trees whose council projection is more factual than any single tree. Later, from those variables, it selects the optimal split (Fig. 5).

4 Results The result will be in the form of graph which indicates the heat map of a particular region. This can help cross-connect to an exact pinpoint in a forest area. The Fig. 6 shows the density of the forest area with attributes like wind, temperature, rain, etc. Fig. 7 gives insights into the forest area which determines the exact location of the burnt area.

Fig. 5 Random forest

64

N. Bhavatarini et al.

Fig. 6 Density of the forest area with attributes

Fig. 7 Exact locations of the burnt area

5 Conclusion This paper presents a forest fire prediction mechanism and also how it affects wildlife, based only on meteorological data and forest department data. This research looks into the elements that cause fires to occur more frequently. Meteorological variables (Temperature, Relative Humidity, and Wind speed) are considered. Extreme temperatures, moderate humidity, and strong wind speeds all contribute to an increased risk of burning. Because a substantial number of animal species perish and occasionally become extinct as a result of forest fires, we may use forest department census data to anticipate which species may be endangered. This model’s procedure may be as follows: the user enters the location and zip code. We’ll acquire latitude and longitude

An Approach to Conserve Wildlife Habitat by Predicting Forest …

65

using any APIs and use the coordinates as parameters to retrieve weather conditions for a certain day, such as peak temperature, minimum temperature, humidity, wind speed, and so on. The output is in the form of graphs, scatterplot graphs, correlation graphs, and distribution graphs.

References 1. Anshori M, Alauddin MW, Mahmudy WF, Mari F (2019) Prediction of forest fire using neural network based on extreme learning machines (ELM). IEEE Explore. ISBN:978-1-7281-3880-0 2. Wijayanto AK, Sani O, Kartika ND, Herdiyeni Y (2017) Classification model for forest fire hotspot occurrences prediction using ANFIS algorithm. In: IOP conference series: earth environment Science, vol 54, p 012059 3. Shenoy A, Thillaiarasu N (2022) A survey on different computer vision based human activity recognition for surveillance applications. In: 6th International conference on computing methodologies and communication (ICCMC), Erode, India, pp 1372–1376. https://doi.org/10. 1109/ICCMC53470.2022.9753931 4. Cortez P, Morais AJR (2021) A data mining approach to predict forest fires using meteorological data 5. Sun Z, Fox G (2005) Study on parallel SVM based on MapReduce. In: 2021 International Conference, vol 2. IEEE, pp 1214–1 217 6. Yang S, Lupascu M, Meel KS (2019) Predicting forest fire using remote sensing data and machine learning meel. Int J Recent Technol Eng (IJRTE) 8(2). ISSN: 2277-3878 7. De M, Labdhi L, Garg B (2020) Predicting forest fire with different data mining techniques. IJSDR. ISSN: 2455-2631 8. Suresh Babu V (2019) Developing forest fire danger index using geo-spatial techniques. Thesis, IIT Hyderabad. Report no: IIIT/TH/2019/21 9. Shenoy A, Suvarna S, Rajgopal KT (2023) Online digital cheque signature verification using deep learning approach. In: 2nd International conference on edge computing and applications (ICECAA), Namakkal, India, pp 866–871. https://doi.org/10.1109/ICECAA58104.2023. 10212410 10. Rishickesh R, Shahina A, Khan AN (2019) Predicting forest fires using supervised and ensemble machine learning algorithms. Int J Recent Technol Eng (IJRTE) 8(2). ISSN: 2277-3878 11. Balaji N, Karthik BPH, Bhat B, Praveen B (2021) Data visualization in splunk and Tableau: a case study demonstration. J Phys Conf Ser 12. Balaji N, Karthik Pai BH, Manjunath K, Venkatesh B, Bhavatarini N, Sreenidhi BK (2022) Cyberbullying in online/e-learning platforms based on social networks. In: World conference on smart trends in systems, security and sustainability—WorldS4. Springer Lecture Notes in Networks and Systems–LNNS 13. Santhosh S, Shenoy A, Kumar S (2023) Machine learning based ideal job role fit and career recommendation system. In: 7th International conference on computing methodologies and communication (ICCMC), Erode, India, pp 64–67. https://doi.org/10.1109/ICCMC56507.2023. 10084315

A Method to Detect Phishing Websites Using Distinctive URL Characteristics by Employing Machine Learning Technique Deepa Kumari, N. Bhavatarini, N. Balaji, and Prashanth Kumar

1 Introduction Digitalization is happening at a fast pace, as a result, the number of internet users is increasing day by day. There is a strong dependency built between the internet and people. As a result, many websites are created and it is very difficult to find legitimate ones. E-commerce is growing exponentially, even a small firm has its website. To reach maximum customers, mobile applications are created for E-Commerce websites . The customers are attracted by the exciting offers and do not pay much attention to their data security. Taking advantage of this situation the attacker exploits the user by misusing the personal information. This social engineering cyber-attack is known as phishing. In phishing, the user is lured to give personal information like a credit card, debit card details, personal identification number, passwords, username, bank details, card verification value, one-time passwords [1], etc. There are many ways a phisher can perform this attack. The most common ways are through emails and replicating a legitimate website. That means creating a website that appears the same as a trusted third-party website using the same UI emblem, catchline, and legend. The fraud website may include click baits which are spoofed D. Kumari · N. Balaji (B) NMAM Institute of Technology Nitte (Deemede to be University), Udupi, India e-mail: [email protected] D. Kumari e-mail: [email protected] N. Bhavatarini School of Computer Science and Engineering, REVA Univeristy, Bengaluru, India e-mail: [email protected] P. Kumar Alvas Institute of Engineering and Technology, Moodabidri, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_6

67

68

D. Kumari et al.

links, when clicked the users are redirected to the attacker server and a fake webpage extracting the user’s personal information due to some SQL booster in a cloud domain [1]. Thus, one should know to recognize genuine and fake websites to prevent click baits. A technique introduced in this research work utilize a uniform resource locator (URL) to identify the non-legitimate websites. To locate whichever data or resources uniform resource locators are used. URLs are human-readable text strings. The proposed method uses the metadata in the URL to determine the legitimate website. Because URL has some typical characteristics which can differentiate between the websites. First, some features which differentiate between legitimate and non-legitimate websites are selected. Then the classifier is trained, cross-validated, and tested with those features. Random forest and SVM algorithms are used, which are best suitable for this purpose. The random forest also has the advantage that it does not over-fit the data. The proposed method has the advantage that there is no need to visit the website to know about its legitimacy. Just by inspecting the URL conclusion can be made about the legitimacy of the website. Therefore this approach is very safe.

2 Literature Survey Browser plugins are used to detect and warn users about malicious pages [2]. A parameter called spoof index is created. If the index value overshoot a certain limit then it is not a legitimate website. This method has a disadvantage in that it produces high false positives. Many anti-virus tools are created to detect phishing. Toolbars are created that use static signatures to equate patterns that are usually present in the malicious scripts [3]. The main disadvantage of the toolbars is that they will just prompt or warn the user with a dialogue box which users usually ignore dismiss or misinterpret. In [4], the logo of the website is used to recognize its legitimacy. An algorithm called SIFT (Scale Invariant Feature Transformation) is used which generates key points for the images. Those key points are changeable and resistant to image distortion, resizing, and rotation. Once the key points are generated, the distance between them is taken and stored in the repository which is used to decide the legitimacy. The only issue with this technique is that it takes more time and also reports many false negatives. Blacklisting of URLs is mentioned in [5]. In this approach, the URL entries which are available in the blacklist are denied access. That is, a user is not allowed access those web pages which appear on the blacklist. The main requirement and constraint about this approach are that the list needs a continuous updating process. In [6], Google Blacklist is used. The advantage is that it will be updated spontaneously and continuously. It also maintains a white list which contains a list of the

A Method to Detect Phishing Websites Using Distinctive URL Characteristics …

69

thousand most popular URLs. By using logistic regression filtering is done in this approach. Page ranking is also used to determine the legitimacy of the URL. In [7], a contentbased solution called goldfish is proposed. Image of the current website is captured, this is then converted into a computer-readable format with the help of OCR. It is been sent as input to Google search engine and analyze the page rank to find out the legitimacy. CANTINA is a very famous content-based anti-phishing technique [8] that examines the keywords of the website to decide the legitimacy. Keywords are extracted as signature terms to carry Google search. If the domain name of the query web page appears in the top .n search results, then the page is considered to be genuine. The limitations of this approach is that if the extraction of keywords is not appropriate the whole essence of the method is of no use. Another content-based based method is explored in [9]. In this method, words are extracted and weights are assigned to them based on their co appearance. By summing the weights of the term frequency and inverse document frequency who is lookup is conducted. Based on the domain name the phishing website and the legitimate one are differentiated. Some phishing attacks include the injection of malware into the victim’s system to get complete control of the victim’s platform. The malware is injected through any fake URL on which the victim clicks and the file containing the malware will be downloaded to the victim’s system. The approach in [10] combines automatic identification of JavaScript code with anomaly detection. The drawback of this method is that only ten features are considered for classification. Whereas there might be different attacking classes that are not considered.

3 Existing System Legitimacy is classified referring to the standards of fake data, the classification techniques is implemented for extraction of phishing data. URL, encryption, domain identity, and security are used for the identification of spurious website. A phishing website is detected when the user performs some transactions through an online website using a data mining algorithm. Secured transaction process application is used by most E-commerce enterprises. Even though the URLs of illegal websites or fake websites will be added by the admin where the system will scan for the phishing website using an algorithm. Using machine learning techniques, new keywords can be added to the website to detect fake websites. In the existing systems, all the data related to websites will be stored in one place which is not effective. Also if any new fake websites cannot be detected if they are not blacklisted. Phishing attacks and data hacking is increasing randomly, there is a need to find a solution for recognizing phishing URLs. The unique characteristics of the phishing website are to be recognized before training the model of phishing

70

D. Kumari et al.

websites. To overcome the disadvantages of these existing systems we propose our effective solution for secured access to websites.

4 Proposed System To identify Phishing website attacks, it is required to identify the features of illegal websites. Based on the unique features of phishing websites, the discrepancy between authorized websites and spurious web pages can be detected. The phishing website features are classified based on several factors like page styles, source code, security and encryption, URLs, page contents, web address bar, domain identity, and the social human factor. The first step towards selecting features of phishing websites includes domain name features and URLs. These features are again used for attribute extraction which focuses on several standards such as Long URL address, make over using the symbol .//, IP address, and the URLs having the symbol .@. Legitimate websites and phishing website’s distinguish-ability is achieved by inspecting the features by using several rules. This process of distinguish-ability is called feature extraction. After feature extraction, the training of the features is to be done using ML techniques and libraries. The training of many models is to be done and the accuracy of each model is tested. The ML model with the best accuracy will be selected. The selected model should fit best for the proposed system. We tested the correctness of 2 different algorithms of ML as follows. Support vector machine (SVM) and random forest (RF) classifier, among these classifiers the Random forest classifier have the best and high accurate classifier. The advantage of the proposed method is that it detects illegal websites including new ones and also which are not blacklisted. scikit-learn algorithms are used for the data mining process. The different interfaces, modules and components, and architecture for designing elements of the proposed system which goes through the design (Fig. 1). The entire architecture of the recommended system could be explained as: 1. Input URL—The URL that is to be detected is tokenized and taken. 2. Feature Extraction—Unique characteristics of phishing websites are detected using Phishing URLs. Features which are given importance to are explained as follows. – Lexical Features—Textual features of the URL are known as lexical features. URLs are in the form of human-readable text strings. Using the client programs these URLs can be parsed in a ordinary way. – Page Based Features—Popularity features shows the demand of a web page among Internet users. Malicious sites are always less popular than the text edit has been completed, the traffic rank analysis is obtained from alexa.com.

A Method to Detect Phishing Websites Using Distinctive URL Characteristics …

71

Fig. 1 Block diagram of the proposed model

– Host Based Features—Host-based features are based on the observation that mischievous sites are consistently registered in disreputable hosting regions. 3. Evaluation—Accurate machine learning algorithm is chosen by using its accuracy rate. The correctness of both RF and SVM algorithms is computed and the random forest algorithm performs better. Following steps are performed when a URL is given, URL is divided into tokens and words are identified. perusal of the URL is categorized into 3 classes. 1. Lexical features—By analyzing the URLs features are obtained and not the website appearance. Statistical properties of the URL string are included. 2. Host-based features—From the URL, the host name aspect is accessed to obtain host-based features. 3. Popularity-based features: Popularity-based features show users’ interest in various web pages. It helps to analyze how frequently different Internet users access a web page. Once the functionalities are extracted, they are written to a CSV file where the analyzed functionalities represented by each column (Fig. 2). A .CSV file having the dataset is loaded. It is divided into a test and training set. Then data is sent to different classifiers to predict the best fit data classifier. Two classifiers are used to perform this task: a RF classifier and a SVM. Both of these classifiers fall under the supervised machine learning algorithm category. The RF classifier is used to perform the categorizing task, and it analyzes the relationship between the dependent variables using regression analysis. A SVM technique is used to manage the categorization process. Performance analysis of classifiers is to be performed. A confusion matrix is used for performance analysis shown in Figs. 3

72

D. Kumari et al.

Fig. 2 Flowchart of attribute extrication

and 4. The best classifier selection process is to be done. correctness and exactness are to be obtained for evaluation. RF classifier shows the best fit with higher accuracy than support vector machine.

5 Results The proposed system is tested for different URLs. A confusion matrix is created using both SVM and random forest. Precision, recall, and F1 score is calculated using the data represented in Figs. 3 and 4. From the graph in Fig. 5, which is plotted by considering the size of the URL and total count of URLs as parameters it is clear

A Method to Detect Phishing Websites Using Distinctive URL Characteristics …

73

Fig. 3 Confusion matrix obtained from RF

Fig. 4 Confusion matrix obtained from Support vector machine

that considering the size of the URL, the legitimate and phished websites can be differentiated. Sample Phished and Legitimate outputs are displayed after running in the implemented module shown in Figs. 5, 6 and 7. Figure 8 represents the result of scores when it is been simulated using Python for different algorithms. Figure 9 shows the graph representation of different algorithms and their score values.

74 Fig. 5 Graph represents the URL versus Number of URLs based on length

Fig. 6 Represents the output of a original website

D. Kumari et al.

A Method to Detect Phishing Websites Using Distinctive URL Characteristics …

Fig. 7 Represents the result of a Phished website Fig. 8 Represents rank obtained from various methods

Fig. 9 Represents rank obtained from various methods

75

76

D. Kumari et al.

6 Conclusion Phishing is a cyber-attack where the critical data of the viper is accessed. The phisher replicates an authorized website or can get information about the number of views and clicks or spoofed links to redirect the sufferer to a illegal web page, where the sensitive data is extracted. Metadata in the URL helps detect a non-legitimate website. The data classification is performed using the ML techniques like SVM and RF. The proposed system aims to verify if the website is an illegal website or not. As per the experimental analysis performed, the Random forest-based classifiers outperform support vector machine classifiers with a classification accuracy of 96%, and for SVM, the categorizing precision is 90% for the given datafile of the illegal site.

References 1. Balaji N, Karthik Pai BH, Bhaskar Bhat B, Praveen B (2021) Data visualization in splunk and Tableau: a case study demonstration. J Phys Conf Ser 2. Teraguchi N, Mitchell JC (2004) Client-side defense against web-based identity theft. Computer Science Department, Stanford University. http://cryptostanford.edu/SpoofGuard/webspoof. pdf 3. Kojm T (2007) Clam AntiVirus 0.97. Manual internet (2011). http://www.clamav.net/doc/ latest/clamdoc.pdf 4. Wang G, Liu H, Becerra S, Wang K, Belongie SJ, Shacham H, Savage S (2011) Verilogo: proactive phishing detection via logo recognition. University of California, San Diego, Department of Computer Science and Engineering 5. Ahmed AA, Abdullah NA (2016) Real time detection of phishing websites. In: 2016 IEEE 7th annual information technology, electronics and mobile communication conference (IEMCON). IEEE, pp 1–6 6. Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on Recurring malcode, pp 1–8 7. Dunlop M, Groat S, Shelly D (2010) Goldphish: using images for content-based phishing analysis. In: 2010 Fifth international conference on internet monitoring and protection. IEEE, pp 123–128 8. Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on world wide web, pp 639–648 9. Tan CL, Chiew KL (2014) Phishing website detection using URL-assisted brand name weighting system. In: 2014 International symposium on intelligent signal processing and communication systems (ISPACS). IEEE, pp 054–059 10. Cova M, Kruegel C, Vigna G (2010) Detection and analysis of drive-by-download attacks and malicious JavaScript code. In: Proceedings of the 19th international conference on world wide web, pp 281–290

Aquaculture Monitoring System: A Prescriptive Model Pushkar Bhat, M. D. Vasanth Pai, S. Shreesha, M. M. Manohara Pai , and Radhika M. Pai

1 Introduction Fish is one of the largest traded commodities in the world and is worth $362 billion globally, and millions depend on aquaculture industry for their livelihood. With over three billion people consuming fish, the demand for fish has pushed natural sources to their limits. Currently, over 85% of marine fish stocks are overfished. As a result, aquaculture is one of the fastest-growing industries in the world, especially in developing countries. Aquaculture is a potentially sustainable alternative to fishing; reduces the pressure on the wild-caught fishing industry, and further helps prevent over-exploitation and overfishing. However, huge costs and lower-than-expected yields dissuade people from farming fish. Lower yields are a result of unhealthy fish due to bad water quality. Dissolved Oxygen (D.O.) and the temperature of water play a key role in the growth of a fish. Fish have a range of temperatures within which they grow the best, and temperature out of this range forces the fishes to grow slower and not achieve maximum size. Fishes are cold-blooded, i.e., they do not regulate their body temperature. Hence, they are directly influenced by the temperature of the water. D.O. is a crucial factor in the growth of fish. A low level of D.O. leads to slower growth rates and smaller sizes. In extreme cases, it leads to mass mortality of fish. An increase in temperature leads to a decrease in dissolved oxygen. Thus, it is necessary to maintain the temperature constant within the required range. Knowing the current values of water quality parameters helps in understanding and assessing the current quality of water and taking necessary actions to ensure that the required level of quality is maintained. This information is not of much use unless it is instantly accessible to fish farmers. Currently, the fish farmers must travel by boat to the fish farms to collect the data. Making multiple daily trips P. Bhat · M. D. Vasanth Pai · S. Shreesha · M. M. Manohara Pai (B) · R. M. Pai Department of Information and Communication Technology, Manipal Institute of Technology, Manipal 576104, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_7

77

78

P. Bhat et al.

to the fish farm is not feasible and becomes more difficult and dangerous during the monsoon season. In short, even though data is collected in real time, it is not accessible in real time. This inaccessibility is a major cause of fish farmers’ inability to avoid preventable losses. Fishes are incredibly sensitive to the changes in their environment, and constant minute changes can have an impact on the health of the fish. It is possible to nullify these changes with timely measures. Prediction of water quality parameters helps prevent deterioration of water quality by indicating it beforehand and providing an opportunity to stop the deterioration. Using machine learning, it is possible to predict the values of the parameters. Unhealthy fish do not grow as big as healthy ones. They also grow at a slower rate, therefore, the fish farmer must assess the health of the fish. The size of the fish and the growth rate of the fish are excellent indicators of fish health and are currently used to assess the health of fish. However, current methods for determining the size of fish involve manually handling the fish, which are inefficient and time-consuming and can disturb the fish. The contribution of the study is as follows: • Analyzing the interrelation between the water quality parameters and their effect on fish. Identifying parameters important for the growth of fish. • Prediction of water quality parameters. Rule-based anomaly detection of predicted parameters. • Proposes a non-contact method to estimate the size of fish using computer vision. • Proposes a web-based application for easy access to current readings of sensors, predicted values, and estimated fish size. Figure 1 gives an overview of the proposed system. The organization of the paper hereafter is as follows. Section 2 gives a summary of the related works done to date. Section 3 gives the details of the methodology. Section 4 discusses the results of the study. Section 5 presents the conclusion of the study. References used in the study are listed at the end.

Fig. 1 Block diagram of proposed system

Aquaculture Monitoring System: A Prescriptive Model

79

2 Related Work Prediction of water quality parameters is a well-known application of machine learning techniques; a few common methods and their shortcomings are highlighted here. The importance of non-contact measurement of fish size is well understood. Various methods developed for the same have been summarized in this section. Attempts have been made to provide real-time access to water quality data using IoT. This section also gives an overview of these attempts. Several methods have been proposed for the prediction of water quality parameters [1–3, 5, 7, 10–13, 21, 23, 24] Variants of ANN [7, 10, 12, 13, 24] SVM [11, 23] and ANFIS [1, 2] are the widely used methods. ANNs suffer from the vanishing gradient model problem making them less suitable for times series forecasting problems. SVMs also are not optimized for remembering long-term dependencies which are crucial for time series forecasting. Most of the proposed models only predict one of the necessary parameters, usually D.O. [3, 5, 7, 23] or pH [1], but as mentioned above, the growth of fish depends on more than one factor. Therefore, the prediction of all necessary parameters is essential. Even studies that have considered the necessary factors [21] have only proposed a predictive model. The use of stereo vision is a well-known technique for depth estimation [16, 18] and has been used to estimate the length of fish as well [4, 6, 17, 19, 20, 22]. The common method employed in these studies is to use two or more markers on the fish and sum up the distance between the markers. This method gives an excellent estimate of the size of the fish. It also takes into consideration the bending of fish to a certain extent. However, the process of identifying markers is tedious. The markers vary from fish to fish, necessitating manual marking of at least the head and tail. With advancements in technology, studies have suggested methods for the automatic identification of these markers using shape analysis. However, the method had difficulty estimating the size of smaller fishes. The use of contour and convex hull to identify the marking point is also explored. This requires high-quality images. It, therefore, necessitates higher-quality cameras and better lighting conditions. However, both methods still require the fish to be at the optimal angle and perform poorly at imperfect angles. Several systems have been proposed for remote monitoring of aquaculture [8, 9, 15]. The proposed systems consist of sensors and a microcontroller. However, none have proposed a system that predicts future values of the parameters. Prediction of future values is important as it not only helps the fish farmers make a better decision, but also provides an opportunity to take countermeasures to reduce fish kills due to deterioration of water quality. A real-time accessible aquaculture monitoring system with a prescriptive model, which estimates fish size has not yet been proposed. This system can help the fish farmers avoid financial losses by making better decisions.

80

P. Bhat et al.

3 Methodology This section gives a detailed description of the methodology used in the paper. It gives the details of the setup used to collect data. It also provides insights into the preprocessing steps and their reasons. It also gives a brief idea of the theory involved in the study.

3.1 Experimental Setup and Data Collection For the prediction of water quality parameters, data that was collected for four months from the backwaters of the river Swarna was used. The data consists of readings of water temperature (.◦ C), air temperature (.◦ C), pH, Salinity (ppt), Dissolved Oxygen (DO) (%), air pressure (Pa), Conductivity (.µS/cm), Relative Humidity (%RH), and Redox Potential (mV). For estimation of the size of fish, a parallel stereo vision camera setup is used to collect the video footage of the fish kept in an aquarium. The stereo vision camera setup consists of two cameras with CMOS sensors fixed horizontally and the distance between the two optical centers is 8 cm. The cameras have a 1920 .× 1080 pixels resolution and a frame rate of 30 fps. Cameras have a focal length of 2 mm and a field of vision of 70.◦ .

3.2 Prediction of Water Quality Parameters Water temperature, dissolved oxygen (D.O), pH, and salinity have a direct effect on aquatic life. Fish and other aquatic organisms need oxygen. The level of D.O. will determine the ability of ponds and other water bodies to support aquatic life. If D.O. falls to extremely low levels (. 2), c(n) = 2H (n − 1) − (2(n − 1)/n)

(4)

where the Harmonic number H(i) is calculated using the Euler’s constant. This is how the score to measure anomaly s(x, n) is to be understood: – Points are clearly abnormalities if they return an s value near 1. – Points can be safely regarded as regular instances if their s value is significantly lower than 0.5. – Additionally, if every point returns an s value of roughly 0.5, the sample as a whole is devoid of any apparent anomalies. Even if the approach is computationally efficient in typical IFOR, the model has a bias as a result of the contamination parameters and how the branching occurs. There is some variation in the anomaly scores in a graphical distribution as a result

Comparative Analysis of Generic Outlier Detection Techniques

123

of the painting of vertical and horizontal lines to divide the plane. Additionally, the contamination parameter supplied when training the model affects the final anomaly score. The following Figs. 6 and 7 are examples of anomaly score distribution on a two-dimensional dataset for the purpose of easy visualisation (Fig. 8). Fig. 6 IFOR anomaly score distribution for a single cluster dataset

Fig. 7 IFOR anomaly score distribution for a double cluster dataset

124

K. T. Vasudev et al.

Fig. 8 Anomalies detected by the IFOR algorithm for the multivariate dataset 2

3.3 Gaussian Mixture Model GMM The Gaussian Mixture Model (GMM) approach uses a model to group points into a certain number of clusters and aids in the detection of outliers. Finding the points that no cluster wants to claim for itself allows it to accomplish this. Each data point can be given a probability that it was generated by one of the various Gaussian distributions using Gaussian mixture models. These can be interpreted as “Which cluster is most likely to be responsible for this data point?” because they have been normalised. [3] GMM is used primarily for clustering and is a probabilistic model that represents a blend of various Gaussian distributions on population data [5] (Fig. 9). p(X ) =

K 

πk G(X |μk k )

(5)

k=1

Equation 3 is the formula for calculating the probability of a said point(X), found at a certain area of the Gaussian Distribution (Tables 1, 2 and 3). The number of anomalous points detected by an algorithm doesn’t directly infer, which is the efficient algorithm but how accurately the anomaly scores are allotted for each point.

4 Conclusions kNN is a very fast algorithm. During the training phase, it doesn’t pick up any new skills. The training set of data does not yield any discriminative function. To put it another way, it doesn’t require any training. It only uses the training dataset it has

Comparative Analysis of Generic Outlier Detection Techniques

125

Fig. 9 Anomalies detected by the GMM algorithm for the two-dimensional dataset

Table 1 Dataset 1: multivariate 380 rows × 9 columns

Table 2 Dataset 2: multivariate 1000 rows × 10 columns

Table 3 Dataset 3: multivariate 2000 rows × 10 columns

Algorithm

Anomalous points

Normal points

kNN

16

364

IFOR

19

361

Algorithm

Anomalous points

Normal points

kNN

42

958

IFOR

50

950

Algorithm

Anomalous points

Normal points

kNN

83

1917

IFOR

95

1905

stored when making real-time predictions. Compared to other training-based algorithms, the KNN algorithm is significantly faster because of this. The accuracy of the KNN algorithm is unaffected by the addition of fresh data because it does not require training before generating predictions. However, the main problem with using kNN is that it does not work well with large datasets. The performance of the algorithm suffers with large datasets because it is extremely expensive to calculate the distance between each existing point and each new point. This applies to larger dimensions of data as well [6]. As the dataset becomes more multivariate, its performance degrades. If a dataset has some hidden, unobservable parameters, GMM should be utilised. This is so that the GMM model can accommodate mixed membership, which is not possible with traditional models. Instead of just a flag, this method gives each point a likelihood that it belongs to a particular cluster. However, it requires a lot of iterations to identify anomalies correctly.

126

K. T. Vasudev et al.

A smaller sample size performs better when used with the Isolation Forest Algorithm. Since it takes more cuts to extract anomalies from samples that travel further into the tree, they are less likely to be anomalous. As the sample size grows, the complexity grows as a result. Since there are no distance or density measures to detect abnormalities, the technique also requires less processing and memory. In all distance-based methods and density-based methods, this eliminates the significant computational cost of distance calculation. However, the final anomaly score is based on the contamination parameter supplied when the model was being trained. The judgement function’s threshold for whether a scored data point should be regarded as an outlier is controlled by the contamination parameter. It doesn’t affect the model itself [7]. This suggests that in order to make a more accurate prediction, we should know ahead of what proportion of the data is aberrant. The model also has a bias because of the way the branching occurs.

References 1. Basora L, Olive X, Thomas D (2019) Recent advances in anomaly detection methods applied to aviation, pp 99–110 2. Basora L, Olive X, Thomas D (2019) Recent advances in anomaly detection methods applied to aviation, pp 4–6 3. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:1–58 4. TowardsDataScience Article. https://towardsdatascience.com/understanding-anomaly-detect ion-in-python-using-gaussian-mixture-model-e26e5d06094b. Accessed 5 Jan 2020 5. Lavin A, Ahmad S (2015) Evaluating real-time anomaly detection algorithms—the numenta anomaly benchmark, pp 2–3 6. Miguel M, Álvarez-Carmona A (2021) Semi-supervised anomaly detection algorithms: a comparative summary and future research directions 7. Hansani Z (2017) Robust anomaly detection algorithms for real-time big data: Comparison of algorithms

Bilingual Visual Script Proof Based on Pre-trained Clustering and Neural Network Sufola Das Chagas Silva E Araujo, V. S. Malemath, Uttam U. Deshpande, and Gaurang Patkar

1 Introduction 1.1 Scope and Purpose Script identification is also a form of optical character recognition, which converts data handwritten, typewritten, or printed text and that collected with the help of a scanner into machine-editable text [1]. This is a field of computer vision, machine learning, and AI. A large amount of work has been carried out in this field, and emphasis on recognizing the script or handwriting has become important. Optical character identification is a procedure that makes a network learn, comprehend, improve, and interpret written or printed characters in their own language, but shows them in the manner determined by the use [1, 2]. The image processing technology of optical character recognition is used to detect any character on a computer or typewriter, printed, or handwritten. Much research has been piloted to bring about accurate algorithms keeping in mind the execution time and precise results [2]. The purpose is to build an efficient system which inputs a digital picture, then pre-processes the images, and then extracts features by employing three clustering algorithms: K-means, Hierarchical, and Fuzzy C-means. The last stage provides a character prediction in percentage accuracy. S. Das Chagas Silva E Araujo (B) PCCE, Goa University, Verna, India e-mail: [email protected] V. S. Malemath · U. U. Deshpande VTU University, KLE, Belagavi, India e-mail: [email protected] G. Patkar Don Bosco College of Engineering, Margao, Goa, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_11

127

128

S. Das Chagas Silva E Araujo et al.

1.2 Process Overview For character recognition, there are two acquisition modes, online and offline. In the first case, sensing devices such as tablets or touch screens allow the computer to record the trajectory of the pen tip. In the second case, character patterns may be printed or written on papers. Printed or written documents can be digitized as images through scanners or cameras and then undergo pre-processing, segmentation, and feature extraction, and then ultimately the handwritten text is classified to recognize the words. In the proposed system, the images of the script collected are sized and converted into a greyscale image for pre-processing. The threshold is selected for the dataset, and the neural network is educated for the collected dataset to get accurate classes of representation. Different classifier algorithms which will yield accurate results were integrated and compared. Feature vectors incorporated with unique features were mined from the image involed in training the network. This trained network will then futher recognise the diffrent classes of characters. Figure 1 shows a diagram of the hand-scripted character recognition architecture. For identification, the proposed system processing of the images is described in Fig. 2.

Fig. 1 Hand-scripted character research architecture

Bilingual Visual Script Proof Based on Pre-trained Clustering …

129

Fig. 2 Proposed framework for processing of script proof

2 Description of Stages The various stages of processing the image data includes pre-processing, region of interest extraction, uniqueness extraction, grouping, and finally correctly identifying and categorizing [1, 3]. The dataset consists of different English and Devanagari handwritten characters which are stored as 28 × 28 pixel image formats are shown in Table 1 [1, 3, 4].

2.1 Pre-processing Firstly, the image is imported from any supported graphics image file format [5]. Grey Scaling: Grey scaling is the process of transforming an image from another colour system, such as RGB, CMYK, and HSV, to shades of grey [5, 6]. It might be completely black or completely white. Dimension reduction: For example, RGB pictures have three colour channels and three dimensions, but greyscale images have only one [6, 7]. Thresholding: To boost processing speed, the image is represented into a binary image by selecting a valuable threshold to represent pixels above the threshold as 1, and everything lower than that value is set to 0. To account for fluctuations in the picture backdrop, the algorithms have progressed from global thresholding to local

130

S. Das Chagas Silva E Araujo et al.

Table 1 Script samples and their labels

Sample image

English characters

Class label

A

0

B

1

C

2

D

3

E

4

F

5

G

6

adaptive thresholding [8]. They now range from quite simple to rather complicated algorithms [9]. Global thresholding selects a single threshold value for the whole document picture, which is frequently based on an estimate of the background level from the image’s intensity histogram [9]. Adaptive (local) thresholding is a method for image processing in which various parts of the picture require different threshold settings. Figure 3 shows the original image and the image output after Adaptive (local) thresholding.

(a)

(b)

Fig. 3 Processed script: a original script; b adaptive threshold processed script

Bilingual Visual Script Proof Based on Pre-trained Clustering …

131

2.2 Image Segmentation In general, segmentation is the separation of components from a larger entity. As previously stated, an image for character identification may be a page that can be split into distinct lines, which can subsequently be segmented into words, individual characters, and strokes. K-means computes the centroids and iterates for an ideal centre [10]. It is also known as a flat clustering algorithm. Following are the steps: Step 1: Stipulate number of groups K. Step 2: Modify centroids by arranging the script points to randomly choose K data points for the centroids [11]. Step 3: This above process is continued till there is no variation to the centroids. Step 4: Assign each data point to the closest cluster. Hierarchical clustering is an unsupervised transfer learning system, customized to collect the unlabeled datasets into groups. This hierarchical cluster analysis groups the datasets by monitoring the bottom-up approach. HCA considers each image as one cluster, and then combines the closest type of classes in one grouping. This continues iteratively and stops when all of the dataset is grouped into a single cluster. Steps involved in this algorithm: Step 1: Calculate the proximity matrix. Enclosing the distance for pair of patterns. Consider each shape as one cluster. Step 2: Find the most similar pair of shapes as proximity matrix. Merge the pair of scripts into one group. Step 3: Procedure ends when all scripts of similar type fall in one group. Fuzzy C-Means clustering assigns a possibility score to belong to a group [12]. Following steps are involved: Step 1: Fix values of c group, choose a value m, between 1.25 5.5&& p H < 6.5 then print "Water can be used for agriculture" end if if p H > 6.5&& p H ≤ 7 then print "Water can be used for drinking" end if end while

5 Results and Discussion To provide a reference on parameters for each water type, four water samples from various water sources were analyzed. Tap water, soapy water, cold water, and saltwater were the four water types selected. The four water samples were examined concurrently at the room’s temperature. For a total of 12 hours, readings were obtained at 1-hour intervals. Our proposed system works in three parts, i.e., Hardware, cloud database, and GUI application. Once the data is collected from the sensor to the cloud, we generate a file with data and that file needs to be added to the GUI interface to get

Surface Water Quality Analysis Using IoT

197

the proper state of the water quality. One’s the file is uploaded to the Python script (K-means algorithm), it runs that data file in iteration and gets one value once the values are constant after two iterations shown in Fig. 3. Once the file is uploaded and set the pH value algorithm classifies water as drinkable, can be used for agriculture, can be used for aquaculture, water is acidic or alkaline this is shown in Fig. 4. The graph in Fig. 5 shows the reading frequency of the pH sensor. We could get the pH level of the water and the temperature of the water at an accuracy rate of 90%. A few times, even if the temperature reduces after some time, it couldn’t send the updated temperature value to the cloud database.

Fig. 3 Processing of the data

Fig. 4 Water classified result

198

R. Nimishamba et al.

Fig. 5 pH value variations in seconds

6 Conclusion Individuals can use our technique to evaluate the quality of tap water, and water in lakes, reservoirs, and other bodies of water. Drinking water should have a pH of seven, according to studies. Anything over or below this will have an impact on the individual drinking it. To determine the proper amount of water for testing, we employ water level sensors, temperature sensors, and pH sensors to determine the pH and temperature of the water. Following testing, data from the sensor is transferred to the cloud, where our application displays the pH and temperature of the water and indicates if it is fit for drinking, use in agriculture, or aquaculture. This will contribute to a reduction in the intake of potentially harmful or chemically laced water, making the population healthier and more hygienic. We focus on employing a pH sensor to measure the water’s quality; however, this sensor can also be modified to measure the amount of oxygen present in the water. This project provides the capacity to examine the water quality in a local area. This can be upgraded for usage in the industry. We can also work on creating a mobile application that will enable users to receive real-time notifications on their phones. Acknowledgements We would like to pay our gratitude to our Chancellor, Sri. Mata Amritanandamayi Devi, who is the guiding light and inspiration behind all our works toward societal benefit. We would also like to thank all the staff at Amrita Vishwa Vidyapeetham who have provided us the support and motivation in the completion of this work. This work would not have been possible without the infrastructure and support provided by the Discovery Labs, Department of Computer Science, Amrita Vishwa Vidyapeetham, Mysuru Campus.

Surface Water Quality Analysis Using IoT

199

References 1. Adamo F, Attivissimo F, Carducci CGC, Lanzolla AML (2014) A smart sensor network for sea water quality monitoring. IEEE Sens J 15(5):2514–2522 2. Anand S, Nath A, Jayan A, BIBB IB et al (2021) Automatic water management system 3. Bhanu KN, Mahadevaswamy HS, Jasmine HJ (2020) IoT based smart system for enhanced irrigation in agriculture. In: 2020 International conference on electronics and sustainable communication systems (ICESC). IEEE, pp 760–765 4. Chen Y, Han D (2018) Water quality monitoring in smart city: a pilot project. Autom Constr 89:307–316 5. Geetha S, Gouthami SJSW (2016) Internet of things enabled real time water quality monitoring system. Smart Water 2(1):1–19 6. Gupta K, Kulkarni M, Magdum M, Baldawa Y, Patil S (2018) Smart water management in housing societies using IoT. In: Second international conference on inventive communication and computational technologies (ICICCT). IEEE, pp 1609–1613 7. Jan F, Min-Allah N, Dü¸stegör D (2021) IoT based smart water quality monitoring: recent techniques, trends and challenges for domestic applications. Water 13(13):1729 8. Jha BK, Sivasankari GG, Venugopal KR (2020) Cloud-based smart water quality monitoring system using IoT sensors and machine learning. Int J Adv Trends Comput Sci Eng 9(3) 9. Jung A-V, Le Cann P, Roig B, Thomas O, Baurès E, Thomas M-F (2014) Microbial contamination detection in water resources: interest of current optical methods, trends and needs in the context of climate change. Int J Environ Res Public Health 11(4):4292–4310 10. Krishna S, Sarath TV, Kumaraswamy MS, Nair V (2020) IoT based water parameter monitoring system. In: 2020 5th International conference on communication and electronics systems (ICCES). IEEE, pp 1299–1303 11. Kunze J, Mayer V, Thiergart L, Javed S, Scheppe P, Tran T, Haug M, Avezum M, Bruegge B, Ezin EC (2020) Towards swarm: a smart water monitoring system. In: IEEE conference on industrial cyberphysical systems (ICPS), vol 1. IEEE, pp 332–337 12. Lakshmikantha V, Hiriyannagowda A, Manjunath A, Patted A, Basavaiah J, Anthony AA (2021) IoT based smart water quality monitoring system. Global Transitions Proc 2(2):181– 186 13. Mamun KA, Islam FR, Haque R, Khan MGM, Prasad AN, Haqva H, Mudliar RR, Mani FS (2019) Smart water quality monitoring system design and KPIs analysis: case sites of fiji surface water. Sustainability 11(24):7110 14. Manohar N, Archana AU (2021) Cloud-based flood prediction using IoT devices and machine learning algorithms. In: Second international conference on electronics and sustainable communication systems (ICESC). IEEE, pp 754–762 15. Mukhopadhyay A, Vinayaka R (2021) A smart-contract-based blockchain for a healthcare IoT network. Int J Electr Healthcare 11(3):256–270 16. Mukta M, Islam S, Barman SD, Reza AW, Khan MSH (2019) IoT based smart water quality monitoring system. In: IEEE 4th International conference on computer and communication systems (ICCCS). IEEE, pp 669–673 17. Myint CZ, Gopal L, Aung YL (2017) Reconfigurable smart water quality monitoring system in IoT environment. In: IEEE/ACIS 16th international conference on computer and information science (ICIS). IEEE, pp 435–440 18. Narendran S, Pradeep P, Ramesh MV (2017) An internet of things (IoT) based sustainable water management. In: IEEE global humanitarian technology conference (GHTC). IEEE, pp 1–6 19. Pasika S, Gandla ST (2020) Smart water quality monitoring system with cost-effective using IoT. Heliyon 6(7):e04096 20. Prasad AN, Al Mamun K, Islam FR, Haqva Haq (2015) Smart water quality monitoring system. In: 2015 2nd Asia-pacific world congress on computer science and engineering (APWC on CSE). IEEE, pp 1–6

200

R. Nimishamba et al.

21. Ramadhan AJ, Ali AM, Kareem HK et al (2020) Smart water-quality monitoring system based on enabled real-time internet of things. J Eng Sci Technol 15(6):3514–3527 22. Rashid MM, Nayan AA, Simi SA, Saha J, Rahman MO, Kibria MG (2021) IoT based smart water quality prediction for biofloc aquaculture. Int J Adv Comput Sci Appl 12(6) 23. Regan F, Lawler A, McCarthy A (2009) SmartCoast Project- smart water quality monitoring system. Environmental Protection Agency, Ireland 24. Rekha P, Rangan VP, Ramesh MV, Nibi KV (2017) High yield groundnut agronomy: an IoT based precision farming framework. In: 2017 IEEE global humanitarian technology conference (GHTC). IEEE, pp 1–5 25. Shahra EQ, Wu W (2020) Water contaminants detection using sensor placement approach in smart water networks. J Ambient Intell Humaniz Comput:1–16 26. Ullas S, Vishwas HN (2019) Flow management and quality monitoring of water using internet of things. In: International conference on smart systems and inventive technology (ICSSIT). IEEE, pp 477–481

Children Facial Growth Pattern Analysis Using Deep Convolutional Neural Networks R. Sumithra and D. S. Guru

1 Introduction Face Recognition Technology (FRT) offers a wide range of applications in law enforcement, security, and commerce. These applications range from the static matching of skilled photographs on passports, credit cards, photo I.D.s, driving licenses, and other documents to the real-time matching of surveillance video/images [1]. Several issues can be addressed while designing a face verification or identification system. Among many, automatic recognition through face aging or age-based recognition from face images has emergent applications. Aging in human faces has been studied in computer vision-related research, which might correctly forecast one’s appearance through time [2]. Developing models that define age growth in faces is a difficult undertaking. The effects of facial aging are most visible in form differences in younger years, and wrinkles and other texture alterations in later years. Face anthropometric research shows that various face regions develop at varying rates, therefore a few facial characteristics change substantially less as age grows than other facial features [3]. Kwon and da Vitoria Lobo [4] proposed a model for identifying facial photographs of infants, young adults, and senior citizens. In that study, they employed face anthropometry techniques to categorize face photos like those of newborns or adults, and they offered ways for analyzing facial wrinkles to further estimate adult faces such as those of young adults and senior citizens. Burt and Perrett [5] generated complicated faces for several age groups by assessing the average shape and texture of human faces from each age group. They found a shift in the apparent age of faces by integrating the differences between R. Sumithra (B) · D. S. Guru Department of Studies in Computer Science, University of Mysore, Manasagangotri, Mysore, Karnataka 570006, India e-mail: [email protected] D. S. Guru e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_17

201

202

R. Sumithra and D. S. Guru

such complicated faces and conventional ones. By using a probabilistic eigenspaces approach to conduct face recognition throughout age progression, Ramanathan and Chellappa [6] suggested a Bayesian age-difference classifier. Even though the techniques discussed above present unique strategies to address age progression in faces, most of them overlook the psychophysical data acquired on age progression in their formulation. Face recognition algorithms that use such information would be better suited to manage age progression in faces. On individuals in the age range, the craniofacial growth model can anticipate one’s look throughout decades and accomplish face recognition over age progression. Lanitis et al. [7] created an aging function for human faces based on a parametric model and conducted automatic age progression, across age verification, age estimation, and other tasks. But all these methods are reformed in 2012, where AlexNet gained the ImageNet competition by a considerable margin using a deep learning technique [8]. Convolutional neural networks, for example, employ a cascade of many layers of processing units for feature extraction. They acquire different degrees of abstraction by learning different levels of representations [9]. The emergence of deep convolutional neural networks has significantly improved performance on several face identification and verification system and further associated the age estimation process with neural network-based approaches. DeepFace [10] used 4 million facial photos to train nine-layer models that achieved state-of-the-art accuracy on the LFW (standard face dataset) benchmark dataset, for the first time approaching human performance in the uncontrolled condition (DeepFace: 97.35% vs. Human: 97.53%). As a result of this study, researchers’ focus has turned to deep-learning-based methodologies, and accuracy has risen drastically to above 99.80% in from last 3 years. As illustrated in Fig. 1, this study intends to incorporate state-of-the-art pre-trained DCNN architectures that were submitted for the ImageNet competition and obtained the top performance on the ILSVRC 2012-16 [11] challenge over the last 4 years. DCNN learns data representations with varying degrees of feature extraction using several processing layers. Due to their best performance, the author had utilized this pre-trained DCNN architecture for feature extraction. In this work, the author presents a longitudinal study of facial growth rate analysis from age 1 to 12 years on children’s longitudinal face datasets. To achieve this, seven different patches of face images are computed. From each patch, the top 10 pre-trained Deep Convolutional Neural Networks (which were achieved the highest performance on ILSVRC 2012–16) based features are extracted. The Euclidean distance measure is used to estimate the distance between the deep features of two successive ages of a child and is considered as a stable age, only if the distance is negligibly small. The organization of the paper is as follows: Sect. 2 presents the proposed model and Sect. 3 with experimental results and its analysis, followed by the conclusion in Sect. 4.

Children Facial Growth Pattern Analysis Using Deep Convolutional …

203

Fig. 1 Top-1 validation accuracies for top-scoring single-model architectures (Courtesy [11])

2 Proposed Model Face alignment, preprocessing, and feature extraction are the phases in the proposed model. Figure 2 depicts the overall design of the proposed model.

2.1 Face Alignment and Pre-processing Consider a scanned photograph with a facial image of size n × n; some manual alignments are required. Figure 3a shows how the face portion picture was cropped using a paint tool and gridlines to manually match the images (Fig. 3b). The picture Age n Row based Face Patches Column based Face Patches Input face image

Input face image

Face alignment

Face alignment

Preprocessing

Preprocessing Row based Face Patches Column based Face Patches

Age n+1

Fig. 2 Flow diagram of the proposed model

Deep Feature Extraction AlexNet GoogLeNet ResNet18 ResNet50 ResNet101 VGG16 VGG19 Inception-v3 Inception-ResNet-v2

Distance Measure (Child age n, child age n+1)

204

R. Sumithra and D. S. Guru

(a)

(b)

(c)

Fig. 3 Illustration of face alignment a sample scanned child image; b gridlines for an image alignment c aligned image

Age: 10 Age: 8 Left:1-3 patches; Center: 4-6 patches; Right: Full Face 7 patch

Age: 9 Left:1-3 patches; Center: 4-6 patches; Right: Full Face 7 patch

Left:1-3 patches; Center: 4-6 patches; Right: Full Face 7 patch

Age: 11 Left:1-3 patches; Center: 4-6 patches; Right: Full Face 7 patch

Fig. 4 Example of seven piecewise face patches of each child ranging from different ages

was then rotated to change the skewness in the image, to align two eyes horizontally straight, and scaled to n’ x n’ (where n’ n) as seen in Fig. 3c. The aligned face images are then cropped through piecewise row patches and column patches to analyze each facial component, as shown in Fig. 4. The face image I (X, Y ) is in 2D spherical coordinates, where X is in the x-axis and Y in the y-axis. As illustrated in Fig. 4, the author had considered the seven face patches of a child at different ages. The idea behind working on face patches is to analyze which part of the face is more contributing to the analysis of the stability of the face. Consider the piecewise row patches and piecewise column patches of face images as follows: ] ) ([ ] ) ([ X 3∗ X X [1Y ] ; Patch 2 : I [1Y ] ; Patch 1 : I 1 2 2 4

Children Facial Growth Pattern Analysis Using Deep Convolutional …

205

] ) ( [ ]) X Y 1 [1Y ] ; Patch 4 : I [1X ] 1 Patch 3 : I ; 2 2 ]) ( [ ]) ( [ Y X 3∗Y · ; Patch 6 : I [1X ] Y ; Patch 5 : I [1X ] 4 4 4 ([

Patch 7 : I ([1 X ] [1 Y ])

2.2 Convolutional Neural Network (DCNN)-Based Feature Extraction The Convolutional Neural Network (CNN) architecture is used to extract the deep features because their architectures are pre-trained with millions of images. The author had used the top 10 pre-trained models to get the best results on the ILSVRC 2012-16 [11] competition. In this work, author had used AlexNet [8], GoogLeNet [12], ResNet-18, -50, and -101 [13], VGG-16 and -19 [14], Inceptionv3 and -ResNetV2 [13], and SqueezeNet [15]. The CNN is made out of concealed layers, completely associated layers, convolution layers, activation function (Rectified Linear Unit (ReLu), sigmoid, etc.), batch normalization, softmax layer, pooling layer (max, average, min, etc.,), scaling factor, dropout, and fully connected layers. A detailed explanation of each network is provided here. AlexNet [8]: There are eight weighted layers in total, the first five of which are convolutional and the rest of which are completely connected. The second, fourth, and fifth convolutional layers’ kernels are only connected to the preceding layer’s kernel maps. Both response normalization layers and the fifth convolutional layer are followed by max-polling layers. Every convolutional and fully connected layer’s output is subjected to the ReLu non-linearity. The Inception module is also known as the Inception module, which plays an important role in the GoogLeNet Architecture [12]. The filter sizes 1 × 1, 3 × 3, and 5 × 5 are limited in the Inception of Architecture. The inception design now includes 3 × 3 max pooling. There are 22 layers in the network. The first levels are convolutional layers that are shallow. An average polling layer with a 5 × 5 filter size and stride 3 produces 4 × 4x512 and 4 × 4x528 outputs for the (4a) and (4d) stages, respectively. For dimension reduction and corrected linear activation, a 1 × 1 convolution with 128 filters was used. A 1024-unit fully linked layer with corrected linear activation. The inception module of this network contains 57 layers, 56 of which are convolutional layers and one fully-connected layer. ResNet-18 layers, -50 layers, and -101 layers [13]: Three filters are used in these convolutional layers, which follow two basic design rules: (i) the layers have the same number of filters for the same output feature map size; and (ii) if the feature map size is half, the number of filters is doubled to maintain the time complexity per layer. It uses convolutional layers with a stride of 2 to do direct down-sampling. Adopted batch

206

R. Sumithra and D. S. Guru

normalization (B.N.) right after each convolution and before activation, initialized the weights and trained all plain/residual nets from scratch. The architecture used Stochastic Gradient Descent (SGD) with a mini-batch size of 256. The first layer is made up of 3 × 3 convolutions. Then, for feature maps of sizes 32,16,8, it utilizes a stack of 6n layers with 33 convolutions, with 2n layers for each feature map size. The filter numbers are 16, 32, and 64, respectively. A global average pooling, a 10way fully linked layer, and softmax round out the network. There are 6n+2 stacked weighted layers. These three architectures have 21, 54, and 105 layers with weights: 20, 53, and 104 are convolutional, each with one fully connected layer. VGG-16 Layers and -19 Layers [14]: The picture is transmitted through a stack of convolutional layers in this architecture, which utilizes filters with an extremely narrow receptive field: 3 3. Three fully-connected (FC.) layers follow a stack of convolutional layers: the first two have 4096 channels each, the third performs 1000 ways, and the configuration of the fully connected layers is the same across all networks. Rectification non-linearity is present in all concealed levels. It’s also worth noting that, except for one, none of these networks has Local Response Normalization (LRN). These two networks comprise 16 and 19 weighted layers, respectively, with 13 and 16 convolutional layers and 3 fully connected layers each. Inception-v3 and Inception-ResNet-v2 [13]: The standard 7 × 7 convolution has been factorized into three 3 × 3 convolutions by the network. The network’s inception section contained three traditional inception modules at 35 × 35, each with 288 filter searches. Using the grid reduction approach, this is reduced to a 17 × 17 grid with 768 filters, followed by five occurrences of the factorized inception modules. The grid reduction technique reduces this to an 8 × 8x1280 grid. It had two inception modules at the course 8 × 8 level, with a concatenated output filter bank size of 2048 for each tile. These two architectures have 94 and 239 layers, 93 and 238 convolutional layers, each with one fully connected layer. SqueezeNet [15]: The network starts with a solo convolution layer (conv1), then 8 Fire modules (fire2-9), and finally a final Conv layer (conv10), which gradually increases the number of filters per fire module from the start to the finish. Squeeze and extend layer activations are treated with ReLu. Following the fire9 module, a 50% dropout is applied. The NiN architecture inspired this design choice. This network has 26 convolutional layers and one fully-connected layer. Table 1 shows complete information on the network used, activation layer during feature extraction, feature dimension, and size of the input image taken during experimentation.

2.3 Distance Measure After the features are extracted, the Euclidean adopted the Euclidean distance measure for a monotonically increasing two consecutive ages of a child by using simple Eq. (1).

Children Facial Growth Pattern Analysis Using Deep Convolutional …

207

Table 1 Top Deep Convolutional Neural Networks (DCNN) and its layers for our feature extraction concerning its dimension Network

Activation layer

Feature dimension

Image input size

AlexNet

‘fc6’

4096

227-by-227

VGG16

‘fc6’

4096

224-by-224

VGG19

‘fc6’

4096

224-by-224

ResNet18

‘fc1000’

1000

224-by-224

ResNet50

‘fc1000’

1000

224-by-224

ResNet101

‘fc1000’

1000

224-by-224

GoogLeNet

‘Loss3-classifier’

1000

224-by-224

SqueezeNet

‘ClassificationLayer_predictions’

1000

227-by-227

Inceptionv3

‘Predictions’

1000

299-by-299

InceptionResNetv2

‘Predictions’

1000

299-by-299

Layer 1

Input Patch

CON V_1 11X1 1X3

Re Lu1

POOL ING1

Layer 2

CON V_2 5x5x 48

ReLu 2

Layer 3

POOLI NG2

CON V_3 3x3x 256

ReL u3

POOLI NG3

Layer 5

Layer 4

CON V_4 3x3x 192

ReL u4

POOLI NG4

CON V_5 3x3x 192

ReL u5

POOLI NG5

Laye

fc6 4096

ReLu 6

Laye

fc7 4096

Feature Matrix

ReLu 6

Compute distance

Deep Convolutional Neural Network Layers (Feature

Fig. 5 Deep architecture of our model

┌ | n |∑ Euclidean (M, ) = | (Mk − Nk )2 · · ·

(1)

k=1

where M and N are DCNN-based feature vectors of two successive ages of a child and ‘k’ is the number of features, and its representation is shown in Fig. 5. The distance measures for 32 children are computed and calculated the average of all 32 children, concerning each patch. To better visualize the obtained results, the author plotted a graph for an average of 32 children distance measures from 1 to 12 years of monotonically increasing of two consecutive ages of a child.

3 Experimental Results and Analysis A dataset collected during this study was used to validate the proposed framework. This section goes into great depth about how the dataset was created and how the experimental findings were analyzed.

208

R. Sumithra and D. S. Guru

3.1 Longitudinal Face Image Dataset The enormous quantity of data necessary to completely comprehend the human face and its growth process is one of the most critical difficulties. Any face recognition system relies heavily on its dataset, and it is a prerequisite for designing and testing the system. Any work includes the compilation of a relevant dataset. The creation of such a dataset is also a significant addition to this field of study. The face photographs of adolescents in the FG-NET [16] collection, which has 82 participants in total, but only 8 of them have face images at ages less than 10 years old, are among the benchmarking datasets. The Face Tracker database only has one image per kid, the Cross-Age Celebrity Database (CACD) does not have any subjects under the age of ten, and the In-the-Wild Child Celebrity (ITWCC) database has an average age of 10 years for the first photograph of each subject [17]. Table 2 gives a full description of the dataset that may be used to apply an accurate face recognition system to aging applications. These are some standard benchmarking datasets, but due to the above-said unresolved problem, the availability of these datasets is not feasible to our problem. The set of photos in this database was created from photographs of children aged 1–15 years, with one image per year intervals. Children’s photos were gathered from all feasible sources, with the sole requirement that the youngster faces the camera and have a neutral look wherever possible. During the photo gathering, several occasions or sources were examined, such as birthdays, school IDs, yearly school days, and so on. The following is an example of a dataset-gathering procedure: • Approached several schools to collect identity cards of students studying in the first to tenth grades. Table 2 A comparison between the publicly available dataset with our dataset Dataset

Number of subjects

Number of images

Image per subject

Age range

Public availability

82

1002

6–18 (avg. 12)

0–69 (avg. 16)

YES

MORPH-1 [17]

631

1690

1–6

16–69

YES

MORPH-2 [17]

13,673

55,608

304

FG-NET [16]

ITWCC [18]

1–53 (avg. 4) 16–77 (avg. 2)

YES

1705

3+

5mos- 32

NO

TITLE

314

3144

3–5

0–4

NO

CARD

2000

163,446



16–62

YES

PCSO

18,007

147,784

5–60 (avg. 8) 18–83 (avg. 31)

NO

32

421

12–15

NO

Proposed dataset

1–15

Children Facial Growth Pattern Analysis Using Deep Convolutional …

209

• Collecting group photos of students. • Gathered images of children of age from one to fifteen from their respective parents. The challenges in carrying out such a technique are numerous: • The number of identity cards obtained is insufficient, and the quality of those obtained is poor. • Recognizing the position of children in various group photos is difficult and needs a lot of human work. Then personally called the parents of 15-year-old children to obtain images of their children at various ages ranging from 1 to 15 years. Collecting images of a child from the age of 1 year to 15 years at a 1-year interval is a time-consuming process. As a result, the author chose to gather images from children aged [1, 12] to [1, 15], accepting both hard and soft copies of photographs. The majority of the images were scanned from personal collections of children’s photographs. As a result, the quality of the images is determined by the photographer’s photography talents, the imaging equipment used, the photographic paper and printing quality, and the state of the photographs. As a result, these datasets have a wide range of resolution, quality, lighting, perspective, and expressiveness. The Sum of 421 images consists of 32 children with 14 males and 18 females, shown in Fig. 6. In this study, the author developed a dataset consisting of 421 scanned photos of 32 children’s longitudinal face images of chronological aging photographs taken at a regular interval of one image each year from the age of one to fifteen years. The author utilized an HP LaserJet Professional M1136 MFP scanner machine to scan images of toddlers at 1200 dpi resolution, modifying area dimensions with width and height to acquire just the facial section. For the collection of 32 children’s face photographs, the total time spent creating this database is nearly 4 months.

3.2 Experimental Setup For the practical analysis, the author took an equal number of images from each child. The minimum age of an image in our created dataset is 12 years, so it took only 12-year-old face images of all 32 children. Utilized 32 children’s 12 face images each. Therefore, 384 longitudinal face images are only considered in this work out of 421 face images. The author cropped, rotated, and resized the photos to 1000 × 1000 pixels during testing. As explained in Sect. 2.1, face patches are computed and processed for the extraction of DCNN features. The results were obtained from top DCNN-based features (named as F1, F2, …F10) for different patches as shown in Fig. 7.

210

R. Sumithra and D. S. Guru

Fig. 6 Scanned photographs of our dataset

3.3 Evaluation The author had built a Ground Truth (GT) for 10 children with the support of 10 human experts to assess the suggested model’s efficacy. During the development of GT, we sought 10 persons over the age of 25, with sufficient intellectual aptitude and maturity to recognize a child’s age and estimate the high rate of resemblance between two successive ages. We dubbed them "human experts." We gave 10 children’s pictures, ranging in age from 1 to 12 years, to human specialists for examination. Then, utilized kappa coefficient metrics were to determine the model’s validity and quality (Cohen and Jacob et al. 1968). The kappa coefficient, as illustrated in equation, is a technique for determining the degree of discrepancy between GT and the generated system outcomes (2). Kappa coefficient (K) = Po =

n ∑ i=1

Po − Pe ··· 1 − Pe

pii . . .

(2)

(3)

Children Facial Growth Pattern Analysis Using Deep Convolutional … GoogLeNet ResNet-18 layers, ResNet-50 ResNet-101 VGG-16 VGG-19 Inception-ResNet-v2

Distance Measure

1.2

PATCH:1

1

211

0.8 0.6 0.4 0.2 0 (1,2)

(2,3)

(3,4)

(4,5)

(5,6)

(6,7)

(7,8)

(8,9)

(9,10)

(10,11)

(11,12)

Two consecutive age's of a children

(a) PATCH:2

Distance Measure

1.2

AlexNet GoogLeNet ResNet-18 layers, ResNet-50 ResNet-101 VGG-16 VGG-19 Inception-ResNet-v2

1 0.8 0.6 0.4 0.2 0 (1,2)

(2,3)

(3,4)

(4,5)

(5,6)

(6,7)

(7,8)

(8,9)

(9,10)

(10,11)

(11,12)

Two consecutive age's of a children

(b) AlexNet GoogLeNet ResNet-18 layers, ResNet-50 ResNet-101 VGG-16 VGG-19 Inception-v3 Inception-ResNet-v2

PATCH : 3

1.2

Distance Measure

1 0.8 0.6 0.4 0.2 0 (1,2)

(2,3)

(3,4)

(4,5)

(5,6)

(6,7)

(7,8)

(8,9)

(9,10)

(10,11)

(11,12)

Two consecutive age's of a children

(c)

Distance Measure

AlexNet GoogLeNet ResNet-18 layers, ResNet-50 ResNet-101 VGG-16 VGG-19 Inception-ResNet-v2 SqueezeNet

PATCH: 4

1.2 1 0.8 0.6 0.4 0.2 0 (1,2)

(2,3)

(3,4)

(4,5)

(5,6)

(6,7)

(7,8)

(8,9)

(9,10)

(10,11)

(11,12)

Two consecutive age's of a children

(d) Fig. 7 The results obtained from top DCNN-based features and different patches as a Patch 1; b Patch 2; c Patch 3; d Patch 4; e Patch 5; f Patch 6; g Patch 7

212

R. Sumithra and D. S. Guru

Distance Measure

1.2

AlexNet GoogLeNet ResNet-18 layers, ResNet-50 ResNet-101 VGG-16 VGG-19 Inception-ResNet-v2 SqueezeNet

PATCH: 5

1 0.8 0.6 0.4 0.2 0 (1,2)

(2,3)

(3,4)

(4,5)

(5,6)

(6,7)

(7,8)

(8,9)

(9,10)

(10,11)

(11,12)

Two consecutive age's of a children

(e)

Distance Measure

AlexNet GoogLeNet ResNet-18 layers, ResNet-50 ResNet-101 VGG-16 VGG-19 Inception-ResNet-v2 SqueezeNet

PATCH: 6

1.2 1 0.8 0.6 0.4 0.2 0 (1,2)

(2,3)

(3,4)

(4,5)

(5,6)

(6,7)

(7,8)

(8,9)

(9,10)

(10,11)

(11,12)

Two consecutive age's of a children

(f)

Distance Measure

AlexNet GoogLeNet ResNet-18 layers, ResNet-50 ResNet-101 VGG-16 VGG-19 Inception-ResNet-v2 SqueezeNet

Full Face Images

1.2 1 0.8 0.6 0.4 0.2 0 (1,2)

(2,3)

(3,4)

(4,5)

(5,6)

(6,7)

(7,8)

(8,9)

(9,10)

(10,11)

(11,12)

Two consecutive age's of a children

(g) Fig. 7 (continued)

Pe =

n ∑

pi. p.i

(4)

i=1

where Po is the observed relative agreement among the observers, which has been determined using equation by adding the principal diagonal elements of each observer (3). The total of product Pi. and P.i , where Pi. is the sum of all rows of the observer I and P.i is the sum of all columns of the observer I was determined using Eq. (4), as shown in Table 3. C in Table 3 represents children’s count. Po is 6, Pe is 32.28, and Kappa coefficient (K) is 0.85, which is determined from Eqs. (3) to (4). According to the Kappa coefficient, there is a 0.85% discrepancy between each observer, hence the agreement

Children Facial Growth Pattern Analysis Using Deep Convolutional …

213

Table 3 In comparison to ground truth, the proposed model performs well Child

Proposed model (stable age)

Ground truth (stable age)

C1

8

8

C2

9

8

C3

9

9

C4

8

8

C5

9

9

C6

6

8

C7

10

10

C8

10

8

C9

8

9

C 10

8

8

is 0.15% (15%). The author had noticed that the results from patch 4 (piecewise row center of the face) and patch 7 (whole face) appear the same; therefore, the row center of a face and full-face image performs similarly. Patch 4, which is the center of the facial components, plays a significant role in facial discrimination while considering other face patches. Also noticed that the gradual decrease in the growth rate of children from age 3 to 9 from the above graphs. From the acquired results, calculated the accuracy of 10 children, as well as GT, based on the observations of 10 human experts. From all witnesses, identified the most commonly occurring stable age of every child. In this extensive experiment, the author can declare that the growth rate at age 9 is slightly decreased; therefore, face images greater than age 9 may be efficient to adopt any face recognition system.

4 Conclusion This work presents a longitudinal study of facial growth rate analysis on children’s longitudinal face datasets from age 1 to 12 years. During experimentation, seven different patches of face images are computed. From each patch, features are extracted using pre-trained Deep Convolutional Neural Networks (DCNN) and calculated the distance between two successive ages of a child using Euclidean distance measure. The average of 32 children is estimated from each patch. To measure the goodness of the proposed model, a suitable dataset is created during experimentation. It comprises 384 longitudinal children’s face images of 32 subjects, each having precisely 12 face images from age 1 to age 12, a single sample per year. With this experimentation, the author noticed that the growth patterns from age 3 to 9 are gradually decreased. From the resulting graph, children’s growth rate pattern is flattened smoothly from age 3 to 9; therefore, face images greater than age 9 may be efficient to adopt any face recognition system.

214

R. Sumithra and D. S. Guru

For brevity, the author also likes to conclude the dataset considered in this work is scanned photographs. To address this problem, a suitable dataset is essential, with high quality and quantity of images. This work also reveals that scanned pictures are not much enough while solving this problem.

References 1. Zhao W, Chellappa R, Phillips PJ, Rosenfeld A (2003) Face recognition: a literature survey. ACM Comput Surv (CSUR) 35(4):399–458 2. Ramanathan N, Chellappa R, Biswas S (2009) Age progression in human faces: a survey. J Vis Lang Comput 15:3349–3361 3. Ramanathan N, Chellappa R (2006) Modeling age progression in young faces. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 1. IEEE, pp 387–394 4. Kwon YH, da Vitoria Lobo N (1999) Age classification from facial images. Comput Vis Image Underst 74:1–21 5. Burt M, Perrett DI (1995) Perception of age in adult Caucasian male faces: computer graphic manipulation of shape and color information. J R Soc 259:137–143 6. Ramanathan N, Chellappa R (2005) Face verification across age progression. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 462–469, San Diego, U.S.A 7. Lanitis A, Taylor CJ, Cootes TF (2002) Toward automatic simulation of aging effects on face images. IEEE Trans Pattern Anal Mach Intell 24(4):442–455 8. Ranjan R, Sankaranarayanan S, Bansal A, Bodla N, Chen J-C, Patel VM, Castillo CD, Chellappa R (2018) Deep learning for understanding faces: machines may be just as good, or better, than humans. IEEE Signal Process Mag 35(1):66–83 9. Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deep face: closing the gap to human-level performance in face verification. In: Proceedings of IEEE conference on computer vision pattern recognition, pp 1701–1708 10. Canziani A, Paszke A, Culurciello E (2016) An analysis of deep neural network models for practical applications. arXiv:1605.07678 11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 12. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 13. Simonyan K, Zisserman A (2014) Deep convolutional networks for large-scale image recognition. arXiv:1409.1556 14. Iandola FN et al (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv:1602.07360 15. FG-NET (Face and Gesture Recognition network) (2014). http://www.prima.inrialpes.fr/ FGnet/ 16. Ricanek K, Tesafaye T (2006) Morph: a longitudinal image database of normal adult ageprogression. In: IEEE International Conference on Automatic Face and Gesture Recognition, March 2006, pp 341--345 17. Ricanek K, Bhardwaj S, Sodomsky M (2015) A review of face recognition against longitudinal child faces. BIOSIG 2015

Classification of Forged Logo Images C. G. Kruthika, N. Vinay Kumar, J. Divyashree, and D. S. Guru

1 Introduction The logo is a picture or a symbol that gives instant brand recognition for the business or organization. Based on the texture features extracted, the logo images are classified by assigning the label to an unknown logo. It is important to check the logo for its individuality after designing it. Otherwise, similar logos which resemble the design of the existing logos may affect the well- being of the organization. In order to avoid such confusion, there should be a system that can test the newly created logo for its individuality. It can be resolved by recognizing the logos class that the newly designed logo belongs to and verifying it among the pre-defined classes, hence we proposed a problem on classification of logo images on the basis of their texture feature.

1.1 Related Works In literature survey, we will notice the work applied on [1] where Speeded Up strong Features methodology is chosen for recurrent segmentation and classification to handle the copy and move the forged document [2]. The main target of the work is C. G. Kruthika (B) Nitte Meenakshi Institute of Technology, Bangalore, India e-mail: [email protected] N. Vinay Kumar NTT Data Services, Bangalore, India J. Divyashree Dr. B R Ambedakar PG Centre, Suvarnagangotri, Chamarajanagara, India D. S. Guru University of Mysore, Manasagangothri, Mysore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_18

215

216

C. G. Kruthika et al.

to classify the color brand pictures into its predefined 3 categories based on their look is explained employing a single feature, over one feature, and K-NN classification [3]. A “model” for the classification of logo’s supported figurative representation of options is bestowed, In every category, the similar-looking model based on pictures is clustered using the K- means cluster. The intra cluster variations are handled using symbolic interval knowledge [4]. This work represents a way for multi label retrieval and classification of brand pictures. For this, Convolutional Neural Networks are trained to classify logos [5]. In this paper, a replacement vehicle brand recognition approach is bestowed using Histograms of bound Gradients (HOG) and Support Vector Machine (SVM) [6]. Multiple descriptors methodology at the side of the context-dependent similarity conception is employed to observe logo duplication [7]. This paper proposes a way for brand recognition using deep learning recognition pipeline consisting of brand region proposal followed by the Convolutional Neural Network specifically trained for brand classification, even if they’re not exactly localized [8]. Recognition System for Document Images uses a piecewise painting algorithmic program (PPA) and some chance options at the side of a choice tree for logo recognition [9]. Integration of the conventional block-based and key point-based strategies is used to observe copy and move forgery in images [10]. The model of this paper uses Single Shot Multi-box Detector (SSD) technique, a base network to construct the model via Transfer Learning to detect Adidas, Coke, and DHL from varied media [11]. School room events like somnolence of a student, steady, discussion, alert, and noisy events are classified using Two level classification and straightforward threshold based classifiers are employed for classification [12]. The projected model extracts three features they are, colour, shape, and texture from logo pictures, These features are then amalgamated to stress the superiority of feature level fusion strategy, symbolic feature choice is then adopted to point out the effectiveness of feature sub-setting in classifying the brand pictures [13]. Augmentation Network (DRNANet) and Discriminative Region Navigation is used here for brand classification of Logo 2000 plus images [14]. This paper baseline methodology Logo Yolo, which includes Focal loss and CIoU loss into the progressive YOLO Verstion-3 framework for large-scale logo detection in LogoDet-3K [15]. Speed of detecting a Copy-Move forgery and enhancing the accuracy and efficiency rate of detecting forgery is proposed using feature point extraction and morphological operation [16]. Custom architecture and transfer learning approaches of deep learning are used to determine copy-move forgery in image and Video Forensics [17]. Review work on different forgery image analyses using different techniques from 2017 to 2020 on fraud images is analysed to spot out the latest tools and technologies used for detection. In literature, there are only a few works that have been done on logos forgery present in documentary. Till now there is no work on classification of genuine and forgery colour logos. Considering this, we tend to think of classify the logo images.

Classification of Forged Logo Images

217

2 Proposed Model The proposed system model uses two methods for the classification of forged logo images are explained in the sub sections.

2.1 Binary Classification Model A binary classifier model is employed to classify the colour logo image on the basis of their texture features. Texture of the logo is extracted using two different models like Gray Level Co-Occurrence Matrix, Local Binary Patterns. The K Nearest Neighbour classifier and Support Vector Machine classifier are applied for the purpose of classification. Experimentations on a dataset containing 5025 colour logo images of 2 classes are tested to demonstrate the proposed model performance.

2.1.1

Methodology

The proposed model has 3 stages: Pre-processing, Feature extraction, and Binary classification. Given a set of logo images, the texture features viz GLCM and LBP are extracted during the training phase. In the classification stage, an unknown test logo image is given as a query, texture features of the test logo images are extracted and are fed into K Nearest Neighbour and Support Vector Machine classifier to identify the appropriate classes for the given unknown logo images. The proposed model of binary classification of logo images based on texture features is given in Fig. 1.

Pre-processing This stage involves remodelling information into an associate understandable format, and two different stages are used in this, namely image size and greyscale conversion. Firstly we rescale the dimensionality of the logos from P × Q to p × q to sustain the consistency in its dimensionality. Then, transform the RGB logo images into grayscale images. This conversion supports to obtain the texture features.

Feature Extraction The texture of the logo images as a very important function in logo classification. For texture feature extraction, grayscale logo images are used for extraction. Two different texture features such as Gray Level Co-occurrence Matrix and Local Binary Pattern are used.

218

C. G. Kruthika et al.

Fig. 1 Flow diagram of binary classification

Binary Classification Binary classification involves only two classes, the classifier of the proposed model group where the images are into their belonging classes on the basis of the similarity matching and classification rules. In this work, K nearest neighbour classifier and Support vector machine classifier are used applied for the purpose of logo classification. Both are supervised learning methods.

2.1.2

Dataset

A new data set named “Logo Dataset” is designed in this work, It comprises of logo images of different institutions, industries, banks, sports, brands etc. that are downloaded from the internet. First-class contains the original logo which is collected from the internet, the second class contains forgery logo images which are created by using paint, paint-3D, adobe Photoshop C6, etc. Class 1: Contains genuine logo images that are downloaded from the internet. Genuine logo: The genuine “Logo Dataset” comprises of colour logo images of different institutions, brands, companies, industries and sports, etc. which are downloaded from the internet (Fig. 2).

Classification of Forged Logo Images

219

Fig. 2 Genuine logos

Class 2: Contains forgery logo images which are created by using paint, paint-3D, adobe Photoshop C6, etc. Retouch Logos: This method is used to improve or repair by making slight additions or alterations. Some selected regions have to undergo geometric transformations like scaling, rotation, skewing, stretching and flipping to make an astounding forged logo image (Fig. 3). Copy and Move: In this method, some area of the logo image of any shape and size are copied and pasted to other areas in the same logo image. The purpose is to hide some regions of the image like unwanted portions or to add some local features (Fig. 4).

Fig. 3 Retouch logos

220

C. G. Kruthika et al.

Fig. 4 Copy and move logos

Fig. 5 Enhancement logos

Enhancement: In this method, the colour logo images undergo various enhancements like background and foreground colour change of logo, a blur of background, etc. (Fig. 5). Splicing: This method uses a cut and pastes system from more than one logo image to create another fake logo image. Splicing the borders between the spliced regions can visually be impossible to perceive (Fig. 6).

2.1.3

Experimentation

The logo images of the two classes are kept in a database. The logo images from the database are fed into GLCM and LBP feature extraction models individually.

Classification of Forged Logo Images

221

Fig. 6 Spliced logos

After that attributes of the logo are presented in the matrix. Rows and the columns of a matrix show the dataset and attributes. Next, the matrix is split into the training and testing dataset respectively. Training logo images are used for representation and Testing logo images are used for testing the classification system. We conducted seven sets of experiments under a varying number of training set images “20, 30, 40, 50, 60, 70, and 80%”. At the testing stage, the system uses the remaining “80, 70, 60, 50, 40, 30, and 20%” of logo images respectively for classifying them as any one of the two classes in both SVM and KNN. During experimentation the classification results are tabulated in the confusion matrix. The effectiveness of the proposed model is judged based on classification results like (accuracy, precision, recall, and F-Measure) calculated from the confusion matrix. The results are evaluated using precision (min, max, avg), accuracy (min, max, avg), recall (min, max, avg), and F-Measure (min, max, avg). Training and testing dataset results are tabulated in Tables 1, 2, 3 and 4 respectively, and the best results are measured by average of F-measure. Table 1 GLCM with KNN classification results Training and testing (%)

Accuracy

Precision

Recall

F-measure

20–80

76.59

48.09

49.39

48.72

30–70

76.66

50.86

49.85

50.31

40–60

75.96

47.02

48.85

47.98

50–50

75.39

45.98

48.52

47.2

60–40

75.86

50.41

49.05

49.67

70–30

75.97

46.94

48.72

47.78

80–20

75.29

46.3

47.45

46.69

222

C. G. Kruthika et al.

Table 2 LBP with KNN classification result Training and testing (%)

Accuracy

Precision

Recall

F-measure

20–80

75.52

49.82

50.23

50

30–70

75.11

49.47

49.73

49.6

40–60

75.7

50.69

49.86

50.33

50–50

75.53

52.14

49.98

51.01

60–40

76.5

53.68

50.21

51.83

70–30

76.68

54.86

50.33

53.41

80–20

76.42

53.64

50.85

51.79

Table 3 GLCM with SVM classification result Training and testing (%)

Accuracy

Precision

Recall

F-measure

20–80

54.47

49.72

49.57

49.65

30–70

72.89

50.53

50.36

50.45

40–60

57.63

50.52

50.76

50.64

50–50

70.14

48.26

48.62

48.44

60–40

79.69

52.51

50.09

51.27

70–30

79.67

51.12

50.04

50.57

80–20

79.9

56.69

50.12

53.2

Table 4 LBP with SVM classification result Training and testing (%)

Accuracy

Precision

Recall

F-measure

20–80

66.69

45.71

46.86

46.79

30–70

56.84

53.39

52.22

52.8

40–60

73.47

49.71

49.51

49.61

50–50

44.97

51.82

52.7

52.26

60–40

75.02

52.49

51.37

52.92

70–30

39.58

51.92

52.52

52.22

80–20

63.28

51.83

52.43

52.13

2.1.4

Experimentation Analysis

In the proposed logo image binary classification model, four different conventional models are used to classify the same logo image database. Comparing all the four different models (a) a model makes use of GLCM texture features with K Nearest Neighbour, (b) a model makes use of LBP texture features with K Nearest Neighbour, (c) a model make use of GLCM texture attributes with SVM, (d) the model makes use of LBP texture attributes with SVM. The above models are measured in terms of F-measure to check the robustness of the binary classifier using confusion matrix

Classification of Forged Logo Images

223

Table 5 Best result of different models Model

Methods

Train-test (%)

Accuracy

Precision

Recall

F-measure

1

GLCM with KNN

30–70

76.66

50.86

49.85

50.31

2

LBP with KNN

70–30

76.68

54.86

50.33

53.41

3

GLCM with SVM

80–20

79.9

56.69

50.12

53.2

4

LBP with SVM

60–40

75.02

52.49

51.37

52.92

obtained during classification. Validity measures like (accuracy, precision, recall, and F-measure) are used for testing these four models. Model-1, gives the best results for 30–70% of the training and testing dataset are represented in Table 1 Model-2, gives the best results for 70–30% of the training and testing dataset are represented in Table 2. Model-3, gives the best results for 80–20% of the training and testing dataset that are represented in Table 3 based on Model-4, and the best results are obtained for 60–40% of the training and testing dataset are represented in Table 4. From Table 5 we can observe that model-2 is a better model compared to model-1, model-3, and model-4 in terms of average F-measure and remaining other measures. Hence, model-2 best suits for classifying the large datasets.

2.2 Limitations of Binary Classification In binary classification, there is an imbalance nature of classes in the Logo dataset. The first class contains genuine logos mainly consisting of colour logos of different institutions, companies, sports, brands, and industries, etc. which are downloaded from the internet. And all the other 4 classes are forgery logo images that are created by using paint, paint-3D, adobe Photoshop C6, etc. i.e., 1:4 ratio. Because of the imbalanced nature, a hierarchical approach is adopted to the forgery logo images across 4 classes for classifying similar logo images to its predefined classes are explained in the next sections.

2.3 A Hierarchical Approach Using Multi Classification Model In the previous Sect. 2.1, we have explained pre-processing and Feature extraction steps for the logo dataset, but there is an imbalance nature of classes in the logo dataset. The first class contains genuine logos (C1) of different institutions, companies, sports, brands, and industries, etc. which are downloaded from the internet. And all the other four classes are forgery logo images that are created by using paint, paint-3D, adobe Photoshop C6, etc. viz Retouch (C2), Copy Move (C3), Enhancement (C4), and Slicing (C5) i.e., 1:4 ratio. Because of the imbalanced nature, a

224

C. G. Kruthika et al.

hierarchical approach using multiclassification is adopted to the forgery logo images across four classes for classifying similar logo images. In this stage, the logos which look similar are classified across predefined classes.

2.3.1

Methodology

Architecture of the proposed system is represented in Fig. 7. This method mainly uses a two-level hierarchy for the classification of the forged logos across the four predefined classes C2, C3, C4, and C5 respectively. At the first level, the system is checked for genuine and forged logo classification using the binary classification steps as explained in 2.1. After applying binary classification in the first level if the input logo belongs to the forged class then a second level classification involves multiclassification to classify the forged logo across the predefined classes. Here we use the same GLCM, and LBP for feature extraction. And KNN and SVM (one v/s one) and (one v/s all) for the classification purpose in the second level of classification.

Fig. 7 Hierarchical model

Classification of Forged Logo Images

225

2.4 Dataset For this work, the same dataset which has been explained in Sect. 2.1.2 is used but here in order to balance the data set five classes have been considered. “Logo Dataset” of 5025 logo images of five different classes. The classes represent the original logo which is collected from the internet. Retouch, Copy and Move, Enhancement, and Splicing, are created by using paint, paint-3D, Adobe Photoshop C6, etc. The logo images of five classes are kept in a database.

2.5 Experimentation The logo images from the database are fed into GLCM and LBP feature extraction models individually. Later in the second-level hierarchy multi-classifier techniques are applied to forged logo images in order to classify them into predefined classes. Experimentation is conducted on seven sets under the training dataset distributed as “20, 30, 40, 50, 60, 70, and 80%” respectively. For the testing the model uses the remaining “80, 70, 60, 50, 40, 30, and 20%” respectively. For classifying them as any one of the five classes. The observation based on 15 trials are noted using the min, max, and avg values. Also we fed the appended features and appended label to multiclass support vector machines of both (1 v/s 1) and (1 v/s all) by using Fitcecoc. The efficiency of the proposed model is tested using accuracy, precision, recall, and F-Measure. These results are measured in terms of accuracy (min, max, avg), precision (min, max, avg), recall (min, max, avg), and F-Measure (min, max, avg). The minimum, maximum, and average of respective results are obtained due to the 15 iterations of experiments performed on training data, and the results are judged based on the best avg. F-measure obtained under respective training and testing percentage.

2.6 Experimentation Analysis In the designed Hierarchy Multi classification model, four different conventional models are used to classify the same logo image database. The training and the testing percentage of datasets are split and varied from “20 to 80%”. The 15 iterations of experiments noted based on the min, max, and avg values. We thought of comparing all the four different models (a) a model makes use of GLCM texture features withK Nearest Neighbour, (b) the model make use of LBP texture features with K Nearest Neighbour, (c) a model makes use of GLCM texture features with (one versus one and one versus all) Multi-Class Support Vector Machine, (d) a model makes use of LBP texture features with (one versus one and one versus all) Multiclass Support Vector Machine. Based on the analysis effectiveness of the proposed model in terms

226

C. G. Kruthika et al.

of average F-measure is tested. The same validity measures are used for validating these four models. Model-1, gives the best results for 80–20% of training and testing dataset and are represented in Table 6. Model-2, gives the best results for 70–30% of training and testing dataset are represented in Table 7. Model-3, gives the best results for the GLCM with Multiclass (one versus one) SVM classification and is represented in Table 8. Model-4, gives the best results for the LBP with Multiclass (one versus one) SVM classification and is represented in Table 9. The results as shown in the Tables 6,7, 8, and 9 are for the respective models in classifying the logo images. From Table 10 it is very clear that model-3 is superior in multi-classification model compared to model-1, model-2, and model-4 in terms of avg. F- measure and the remaining other measures. Hence Model—3 best suits for classifying the large dataset. Table 6 GLCM with KNN classification result Training and testing (%)

Accuracy

Precision

Recall

F-measure

20–80

34.74

25.13

25.15

25.14

30–70

32.9

23.62

25.26

25.3

40–60

32.72

24.93

24.98

24.94

50–50

34.16

24.72

24.96

24.83

60–40

24.79

24.68

24.92

24.79

70–30

32.88

24.92

24.91

24.91

80–20

32.35

22.88

24.81

25.73

Table 7 LBP with KNN classification result Training and testing (%)

Accuracy

Precision

Recall

F-measure

20–80

35.52

25.08

25.1

25.08

30–70

31.04

24.88

24.88

24.89

40–60

32.27

24.84

24.83

24.98

50–50

31.92

25.26

25.2

25.23

60–40

32.39

25.09

25.14

25.24

70–30

33.34

25.02

24.98

25.26

80–20

30.93

25.21

25.29

25.2

Accuracy

Precision

Recall

F-measure

Table 8 GLCM with multiclass (1 v/s 1) and (1 v/s all)

GLCM with multiclass (1 v/s 1) SVM classification 86.13

86.24

90.46

88.3

GLCM with multiclass (1 v/s all) SVM classification 81.47

82.22

87.21

83.24

Classification of Forged Logo Images

227

Table 9 LBP with multiclass (1 v/s 1) and (1 v/s all) Accuracy

Precision

Recall

F-measure

87.25

85.32

86.52

85.14

LBP with multiclass (1 v/s 1) SVM classification 82.83

83.48

LBP with multiclass (1 v/s all) SVM classification 81.99

83.81

Table 10 Results of the proposed models Model Methods

Train-test Accuracy Precision Recall F-measure

Multi-classification models 1

GLCM with KNN

80–20%

32.35

22.88

24.81

25.73

2

LBP with KNN

70–30%

33.34

25.02

24.98

25.26

3

GLCM with Multiclass SVM(1 4 v/s 1)

86.13

86.24

90.5

88.3

4

LBP with multiclass SVM(1 v/s all)

4

81.99

83.81

86.52

85.14

Binary classification models 1

GLCM with KNN

30–70%

76.66

50.86

49.85

50.3

2

LBP with KNN

70–30%

76.68

54.86

50.3

53.41

3

GLCM with SVM

80–20%

79.9

56.69

50.12

53.2

4

LBP with SVM

60–40%

75.02

52.49

51.37

52.92

3 Comparative Analysis Among two proposed methods, Binary and Hierarchical multiclassification, eight different conventional models are used to classify the same logo image database. In binary classification, the best result is obtained for LBP with KNN classification compared to all other models of binary classification. In multi-classification, the best result is obtained for GLCM with a multi SVM classifier of one v/s one gives better results as compared to other models and classifier, best results of all the proposed models are represented in Table 10. Analysing the result of binary and multiclassification it’s very clear that multiclassification gives the best result as compared with binary classification, the comparative results of the methods are tabulated in the Table 11, hence GLCM with a multi SVM classifier of one versus one is a superior model among all the other models.

228

C. G. Kruthika et al.

Table 11 LBP with multiclass (1 v/s 1) and (1 v/s all) Methods

Train-test Accuracy Precision Recall F-measure

Multiclassification GLCM with multiclass SVM(1 v/s 1) 4

86.13

86.24

90.46

88.3

76.68

54.86

50.33

53.41

Binary classification LBP with KNN

70–30%

4 Conclusion In this paper, the main focus was to detect and classify the different logo forgery using Binary and Hierarchical Multi classification of logo images based on texture features. GLCM, LBP models are used to extract the texture features. The classification is carried out using two Classifiers: K Nearest Neighbour and Support Vector Machine classifier. Hierarchical approach is applied in a multi-classification method to avoid the imbalanced nature of binary classification, and mainly we used eight different models to classify logo images using two different methods. Among all the models GLCM with a multi SVM classifier of one v/s one gives a superior result to classify the forged images across the predefined classes.

References 1. Balaji V, Ajith Kumar P, Kiren Aananth A, Gunasekar N, Ciyamala Kushbu S (2018) Forgery detection in documents © 2018 by SSRG - IJECE J 2. Guru DS, Vinay Kumar N, Symbolic representation and classification of logos. Springer Singapore 3. Vinay Kumar N, Pratheeka, Vijaya Kanthaa V, Govindarajua KN, Guru DS (2016) Features fusion for classification of logos. In: International conference on computational modeling and security (CMS 2016). Elsevier 4. Gallego AJ, Pertusa A, Bernabeu M (2019) Multi-label logo classification using convolutional neural networks. In: Morales A, Fierrez J, Sánchez J, Ribeiro B (eds) Pattern recognition and image analysis. IbPRIA 2019. Lecture notes in computer science, vol 11867. Springer, Cham 5. Llorca DF, Arroyo R, Sotelo MA, Vehicle logo recognition in traffic images using HOG features and SVM. IEEE 6. Billa P, Balijepalli AK, Rao R (2017) An implementation of effective logo matching and detection using multiple descriptors to enhance the resolution. Int J Comput Appl (0975 – 8887) 161(5) 7. Bianco S, Buzzelli M, Mazzini D, Schettin R (2017) Deep learning for logo recognition. Neurocomputing 8. Alaei A, Delalandre M, A complete logo detection/recognition system for document images. In: 11th IAPR international workshop on document analysis systems (DAS’14) 9. Mariam Raju P, Nair MS, Copy-move forgery detection using binary discriminant features. Science Direct 10. Raorane A, Patil S, Kurup L (2019) Logo recognition using deep learning and storing screen time in MongoDB database. Int J Adv Trends Comput Sci Eng 8(5):2535–2539

Classification of Forged Logo Images

229

11. Guru DS, Vinay Kumar N, Mahalakshmi Gupta KN, Nandini SD, Rajini HN, Namratha Urs G (2017) An hierarchical framework for classroom events classification. In: 17th international conference on intelligent systems design and applications (ISDA2017). Springer, AISC 12. Guru DS, Vinay Kumar N (2017) Interval valued feature selection for classification of logo images. In: ISDA 2017: intelligent systems design and applications, pp 154–165 13. Wang J, Min W, Hou S, Ma S, Zheng Y, Wang H, Jiang S (2020) Logo-2K+: a large-scale logo dataset for scalable logo classification. Proc AAAI Conf Artif Intell 34(04):6194–6201 14. Wang J, Min W, Hou S, Ma S, Zheng Y, Jiang S (2020) LogoDet-3K: a large-scale image dataset for logo detection. IEEE Trans Multimedia 2008.05359v1 [cs.CV] 15. Sharma S, Verma S, Srivastava S (2020) Detection of image forgery. Int J Eng Res & Technol (IJERT) 9(06). ISSN: 2278-0181 IJERTV9IS060214. www.ijert.org 16. Rodriguez-Ortega Y, Ballesteros DM, Renza D Copy-move forgery detection (CMFD) using deep learning for image and video forensics. J Imaging 17. Khudhair ZN, Mohamed F, Kadhim KA (2021) A review on copy-move image forgery detection techniques ILATOSPM 2020. J Phys: Conf Ser 1892:012010. IOP Publishing. https://doi.org/ 10.1088/1742-6596/1892/1/012010

Detection, Classification and Counting of Moving Vehicles from Videos Alfina Sunny and N. Manohar

1 Introduction Facial sketches are considerably utilized by law requirement offices to strengthen suspects distinguishing proof and dread included in criminal exercises. Multiple vehicle detection and classification are two challenging and important stages of intelligent traffic surveillance and monitoring, traffic density estimation and so on. Existing challenges such as lighting changes, shadows and reflections, background noise, motion speed, and collision prevent real-time vehicle recognition and categorization. Vehicle detection aids in the recognition, location, and localization of many visual instances of vehicles in a video and the research can be expanded to identify the exact locations of the vehicles and labelling them. Vehicle classification categorizes vehicles which are detected from videos considering several technical parameters. Considering existing methods which sufficiently detect and classify vehicles from different traffic videos, it finds to be less efficient considering the change in weather conditions and limiting the detection of multiple vehicle types with the lack of comparisons. Various vision-based methods are the existing or common approach to analyze vehicles from images or videos. Deep learning algorithms show the nature of abstract hierarchical representation which enables the Convolutional Neural Network to detect and classify the vehicles quickly and accurately. Applying YOLO to images helps with training of data from anywhere and also promises high precision compared to other methods [1]. Comparing with ordinary videos of traffic scenarios, our method considers traffic scenarios in different harsh weather conditions like rainy, fog, sunny, night etc. With the videos offering less quality and failing to detect vehicles in these scenarios are likely to effect the robustness of the existing methodologies of vehicle surveillance systems. A. Sunny (B) · N. Manohar Department of Computer Science, Amrita School of Arts and Sciences, Mysore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_19

231

232

A. Sunny and N. Manohar

Motivated by these challenges, in this research, a system combining CNN and computer vision techniques is used to construct a system that can efficiently recognize and categories automobiles from videos in various weather circumstances for mixed road users considering it as the most powerful machine learning tool which is trained using a large collection of diverse data [2]. CNN algorithm is also used for classification and labelling of classification is done automatically by the use of CNN algorithm [3]. A recorded traffic video is processed and trained using an image dataset and the vehicle objects are detected from different weather conditions, background substitution and object segmentation is done prior to obtaining the bounding boxes and after extracting the features classification and counting of vehicles is done with respective to each frame.

2 Related Work The related works emphasize the research works that are already existing in the area of vehicle detection and classification from videos and images. Priyadarshini et al. [4] proposed a method to detect and track Multi Type and Multiple Vehicles (MTMV) in motion using a monocular camera in an input video using Enhanced YOLO V3 as well as enhanced visual background extractor algorithms. This proposed approach is tested on ten different input videos of 30 fps (2,46,345 images) and two benchmarking datasets KITTI and DETRAC considering eight high level and low- level features. The method obtained 96.6% tracking accuracy and 96.8% detection accuracy for a sequence of frames. Zhu et al. [5] proposed a strong vehicle flow detection system that combines virtual detection regions and virtual detection lines with background dictionary learning. They also suggested a robust detection technique for real-time applications by introducing a novel way for initializing background dictionary in order to generate a robust background dictionary. Geng et al. [6] propose a multiple end-to-end vehicle detection neural network that can recognize vehicles on feature maps with varied scales. A completely convolutional network that can be trained from start to finish is the Multiscale Vehicle RCNN (MV-RCNN). This model employs two RPN (Region Proposal Networks) and RCNN-based detectors for various parameters for accurate vehicle detection. The dataset consisted of 16020 images taken from genuine urban surveillance footages, with a precise accuracy of 78.6% when compared to baseline approaches. Their technique can also be used to detect pedestrians in traffic scenarios. Song et al. [7] propose a deep learning-based object detection and tracking system for traffic management and control in highway surveillance video situations. This approach includes surface extraction and segmentation using the Gaussian mixture modelling, flood filling algorithm and morphological expansion, a YOLO V3 network is used for vehicle object detection with reasonable gradient fall and lower loss value. Tracking the ORB characteristics of numerous objects yielded the vehicle trajectory. The system does have an effective object detection rate of 83.46%, as well as solid performance and usability. Chen et al. [8] using the Faster R-CNN and SSD network baselines, and

Detection, Classification and Counting of Moving Vehicles from Videos

233

demonstrate an efficient and successful method for vehicle recognition and classification. To improve the baseline, the proposed work uses the k-means method to cluster automobile scales and aspect ratios in vehicle datasets, detects vehicles on various feature maps based on vehicle size, and concatenate low and high-level feature maps for semantic information. This paper also proposes inception and batch normalization for improving detection results and the effectiveness of the system is evaluated on the Vehicle dataset (JSHD). Shiru and Xu [9] proposed an algorithm to detect front vehicle for better real time performance and efficiency. The image processing operations are completed initially, followed by ROI extraction, ROI positioning, and ROI validation for the identified automobiles. The front vehicles are identified using shadow aspects, vertical edge properties, texture features, and symmetry characteristics. The algorithm has been tested on 700 frame videos at 25 frames per second and has a low rate of misdetection and erroneous detection. Feng et al. [10] present a novel framework for building precise vehicle trajectories from UAV images in mixed traffic settings, thus improving the trajectory data. The solution uses YOLOv3, the ShiTomasi corner feature, and EEMD (Ensemble Empirical Mode Decomposition) for trajectory data noise removal for backdrop registration. On three 24fps aerial movies taken from urban roads, the suggested framework had an average recall of 91.98% for non-motor vehicles and 78.13% for pedestrians. He and Li [11] proposed a novel method using deep learning for multi-source vehicle detection. YOLOv2 is used for accurate detection of vehicle objects, and radars are used to determine the vehicle’s position. This paper also presents a method for converting radar coordinate systems to a video pixel coordinate systems. The efficiency of the system is improved by combining video detection and radar information testing on 18 full-HD videos, which improves the system’s real-time and practical performance. Momin and Mujawar [12] propose a co-training-based vehicle recognition method based on semantic characteristics such as color, date and time, speed, and travel direction. Motion and Haar features are used for vehicle detection and Adaboost algorithm for training where the samples are randomly selected. This method also has an improved accuracy for even crowded scenes therefore improving the accuracy of feature selection. Shobha and Deepu [13] conducted a comprehensive investigation on various vehicle detection, recognition, and tracking methods. The single camera approach and the multi camera approach are both reviewed in this work. A single camera system for vehicle identification includes appearance-based and motion-based models, as well as different recognition, classification, and tracking approaches. The difficulties of using a multiview system have also been discussed. Wang et al. [14] based on stationary footage, provide an overview of vision-based vehicle recognition and tracking approaches. The paper focuses on a rectilinear stationary camera that is set up by the roadside based on a variety of objectives, such as ROI selection, vehicle identification, and shadow elimination. Knowledge-based approaches, motion-based methods, and wavelet-based methods are all covered, as well as an introduction to shadow elimination methods. Soumya and Radha [15] proposed a heavy vehicle detection method for enhancing the digital transportation infrastructure integrated with YOLOv4. Heavy vehicles like trucks and buses are considered and to avoid overfitting and improve the ideal speed, transfer learning and mosaic data augmentation approaches are used.

234

A. Sunny and N. Manohar

When compared to existing state-of-the-art techniques, the suggested system has a detection accuracy of 96.54%, and its performance is evaluated using the COCO test set and the PASCAL VOC 2007 test set. Liu et al. [16] demonstrated a real-time highway vehicle tracking framework that uses a two-stage deep learning based architecture for vehicle detection and re-identification, as well as a novel deep learning based network called Vehicle Tracking Context to extract the features. Lechgar et al. [17] presented a spatio-temporal deep learning based tracking system in which data augmentation is done for training and testing and YOLO is trained until an acceptable precision with certain validity conditions. This algorithm is applied on images and the LinktheDot algorithm is finally used for tracking the vehicle in aerial context. Bui et al. [18] propose a comprehensive vehicle counting framework by integrating YOLO and DeepSort algorithms along with a distinguished region tracking framework for monitoring the trajectories of the vehicles. It has achieved 85% accuracy on a dataset from the CVPR AI City Challenge 2020 and the proposed architecture works well with a variety of scenarios, including traffic density, multi-direction, video definition, lighting, and weather. Hu et al. [1] propose using a Convolutional Regression Network using Appearance and Motion Features (CRAM) tracker centered on the Convolutional Regression framework to collect appearance and motion features from aerial footage in order to track automobiles using deep neural networks for accurate object tracking. Hou et al. [19] propose a framework as an extension to the Deep SORT tracking method that evaluates the average detection confidence of each track in attempt for the tracking algorithm to engage with incorrect detection results like low confidence true positives and high confidence false positives in complex surroundings. They also presented a vehicle re-identification dataset based on the UA-DETRAC training dataset, and the DETRAC MOT evaluation method was used to analyse the final outcomes of the proposed algorithms. Lou et al. [20] proposed a deep learning model that uses YOLO to detect moving automobiles and track them dynamically using a modified Kalman Filter approach in both day and night light thereby solving the issues of missed or misleading vehicle identification in continuous frames. This method can produce better results in dark nights and rainy days and also when the ground has a strong reflection interference and the method has an average accuracy of 92.11%. Dai et al. [21] present a 90% precise video-based vehicle counting system that is based on object detection, tracking, and trajectory analysis for sophisticated traffic control and dynamic signal timing. Gomaa et al. [22] propose a methodology for autonomous vehicle detection and tracking from aerial movies, where the authors present an integrated approach combining morphological operations and feature point motion analysis. The approach achieves recall, precision, and tracking accuracy around 95.1, 97.5, and 95.2%, respectively, attributed to its quick processing speed. Akshay et al. [23] proposed a combined multiple objects tracking approach from videos by combining Optical flow and Kalman Filter with a promising accuracy for dynamic background. Chandrajit et al. [24] applied the Kolmogorov-Smirnov nonparametric statistical test and stochastic neural networkbased classification in the spatiotemporal domain, while we presented an algorithm for segmentation and classification of moving objects from surveillance images.

Detection, Classification and Counting of Moving Vehicles from Videos

235

3 Proposed Methodology The proposed architecture has five phases which include the preprocessing, vehicle detection, segmentation, feature extraction, classification and vehicle counting. Figure 1 explains the flow of our proposed system.

3.1 Pre-processing Preprocessing is considered to be the most relevant step to build a Convolutional Neural Network even though it doesn’t require the processing of images as it can work well with raw images. On this account we have used OpenCV package for image identification and edge detection with the dataset divided into train and test data. TensorFlow and Karas are used as the pre-processing packages for CNN before training which accept raw images as input and give objects as output for training the model which in turn normalizes the features on its own. A fully convolutional Fig. 1 Proposed architecture

236

A. Sunny and N. Manohar

network is designed and for training the first layer of CNN, we use Maxpool2D to down sample the input images along spatial dimensions using Rectified Linear Unit (ReLU) activation function, which is easier to train and frequently results in better performance as it ensures that no neurons are activated at the same time and Adam Optimizer for adaptive optimization. In the view of increasing the learning rate between the layers, we perform batch normalization which will normalize the outputs from the activation functions and with a second convolutional layer with maximum pool size of two and stride value as two is further modelled with the same ReLU activation function, and as the next stage of CNN, we convert the data into an one-dimensional array as input for the next layer which is connected to a fully connected layer by the method of flattening and we have a final output layer with 25 epochs to run the process and plots the accuracy and loss of the proposed network using the binary cross entropy function for training CNN.

3.2 Vehicle Detection Detection is considered to be the most important phase of the proposed framework. Various vehicles including the considered classes are detected from the image as well as video frames from the dataset with more than 1000 images from each class which is derived from the MIO-TCD dataset. The images are classified into background and target vehicles using the blurring and Canny Edge Detection methods and considering both as separate entities, the vehicles from the images are detected based on their visibility in each frame. Object detection always depends on what features, which is a close representation of the object we need to detect that we select from the image to divide the frame into target and background. Here we use the predefined YOLO weights of each vehicle obtained from a publicly available source and subtract the background features from the frames to detect the images in each frame from the image as well as the video dataset which forms bounding boxes that are used to locate the presence of vehicles in each frame.

3.3 Feature Extraction A pre-trained CNN model is used for the purpose of feature extraction in our proposed methodology with the final layer as a custom layer which is referred to as the Region of Interest Pooling layer which extracts the specific features of the specified vehicles. The CNN’s output is then split into two outputs by a fully connected layer, one for class prediction via a Softmax layer, and the other for bounding box prediction by a linear output. This technique is then done for each region of interest in a given image many times. We build a Heatmap using the CNN and thereby obtain the bounding boxes from these heatmaps which in turn generate filled rectangle boxes from the box and

Detection, Classification and Counting of Moving Vehicles from Videos

237

Fig. 2 Blobs from rectangle using Heatmap

Fig. 3 Extracting Features from a traffic image in foggy weather

use this to vote on the regions which could possibly contain a vehicle and then a binary mask is created to obtain the bounding rectangles from the blobs as shown in Fig. 2. Figure 3 represents the raw features and normalized features obtained from an image in the foggy weather condition.

3.4 Classification The most crucial feature of our suggested framework is the classification of vehicles based on their weather conditions. CNN considered as an artificial neural network

238

A. Sunny and N. Manohar

is one of the best classifiers with its specialization to extract the patterns and finally makes sense to them when automated with feature extraction. A fully convolutional network is designed with two convolutional layers so that the framework is more sophisticated with the same padding, two hidden maxpooling layer and one layer of flattening. There are 32 filters of kernel size 3*3 and the max pooling layer with stride value as 2 and a second convolutional layer with the same filter size and kernel size and same pooling values. With the pre-trained Convolutional network, testing on our image and video dataset, the vehicles are classified and labelled as Car, Bus, Truck, Bicycle and the pedestrians in the frame that are also classified, all based on their visibility in each frame with 25 epochs in each frame.

3.5 Vehicle Counting Vehicle counting as the final stage of the framework counts the number of vehicles detected from the video dataset, Nowadays, the number of vehicles on a road grows rapidly over time, and we’ve discovered that a road’s strength, or the number of vehicles it can accommodate, is closely tied to the number of heavy-duty vehicles passing through. The purpose of our counting system is to help the authorities in identifying and alerting the road users whether it’s safe to use a particular road and whether there is a requirement of building a flyover over the road to maintain the usual traffic and controlling the number of heavy load vehicles to the road which it can’t accommodate.

4 Experimentation 4.1 Datasets Any framework which requires authentication and validation requires a dataset with proper classes and annotations and identifying a dataset which can obtain the desired output is critical to identify. To validate our model, we have used MIO vision Traffic Camera Dataset (MIO-TCD) with different classes including bicycle, bus, car, pedestrian etc. which is shown in Fig. 4. which is defined with different weather conditions including rainy, fog, sunny, sand and snow. These are then classified to train and test data for training the model and testing our architecture. For the video dataset, we have collected 10 videos of 3 minutes each.

Detection, Classification and Counting of Moving Vehicles from Videos

239

Fig. 4 Sample vehicle images from MIO vision traffic camera dataset

4.2 Results The result obtained from the above experiment has given an encouraging and promising accuracy while detecting, classifying and counting the vehicle from the MIO vision Traffic Camera Dataset (MIO-TCD) and 10 video clips considering the various weather conditions in different traffic scenarios. For evaluation of our proposed method, we have considered three parameters precision, recall and accuracy with respect to the number of epochs. Experimental results of the proposed framework for the detection of vehicles show that they can obtain an average accuracy of 94.4%, precision of 0.89 and recall of 0.86 on considering different weather conditions using the YOLO algorithm. Figure 5 demonstrates the accuracy and loss plot considering 25 epochs using the binary cross entropy function. Overall performance analysis of our methodology demonstrates that for detection, classification and counting and depending on the visibility of the vehicles in each frame, we were able to reach an average accuracy of 94.4%.

240

A. Sunny and N. Manohar

Fig. 5 Accuracy and loss plot

5 Conclusion In this study, the detection, classification and counting of vehicles from real time traffic videos is standardized as an object detection and classification framework. The detection, classification, and counting procedures are connected together in this system, resulting in a resilient approach from various traffic circumstances, including changes in weather conditions. The results reveal that our method has a high level of accuracy, with an average of 94.4% for merging Convolutional Neural Networks with YOLO on testing MIO vision TCD and traffic camera videos with complex backdrops. Furthermore, the framework is capable of counting cars, which will aid intelligent traffic control in the future by limiting the number of heavy and light vehicles allowed on specific routes and highways, so significantly assisting and sustaining traffic flow during day and night. As a future work, we aim to propose an additional stage of tracking the vehicles and extracting the trajectories of the tracked vehicles.

References 1. Hu Z, Yang D, Zhang K, Chen Z (2020) Object tracking in satellite videos based on convolutional regression network with appearance and motion features. IEEE J Sel Top Appl Earth Obs Remote Sens 13:783–793. https://doi.org/10.1109/JSTARS.2020.2971657 2. Manohar N, Kumar YHS, Rani R, Kumar GH (2019) Convolutional neural network with SVM for classification of animal images. In: Sridhar V, Padma M, Rao K (eds) Emerging research in electronics, computer science and technology. Lecture Notes in Electrical Engineering, vol 545. Springer, Singapore. https://doi.org/10.1007/978-981-13-5802-9_48 3. Akshay S, Mytravarun TK, Manohar N, Pranav MA (2020) Satellite image classification for detecting unused landscape using CNN. In: 2020 international conference on electronics and

Detection, Classification and Counting of Moving Vehicles from Videos

4. 5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

241

sustainable communication systems (ICESC), pp 215–222. https://doi.org/10.1109/ICESC4 8915.2020.9155859 Sudha D, Priyadarshini J (2020) An intelligent multiple vehicle detection and tracking using modified vibe algorithm and deep learning algorithm. Soft Comput 1–13 Zhu H, Wang J, Xie K, Ye J (2018) Detection of vehicle flow in video surveillance. In: 2018 IEEE 3rd international conference on image, vision and computing (ICIVC), pp 528–532. https://doi.org/10.1109/ICIVC.2018.8492794 Geng H, Guan J, Pan H, Fu H (2018) Multiple vehicle detection with different scales in urban surveillance video. In: 2018 IEEE fourth international conference on multimedia big data (BigMM), pp 1–4. https://doi.org/10.1109/BigMM.2018.8499095 Song H, Liang H, Li H et al (2019) Vision-based vehicle detection and counting system using deep learning in highway scenes. Eur Transp Res Rev 11:51. https://doi.org/10.1186/s12544019-0390-4 Chen L, Ye F, Ruan Y et al (2018) An algorithm for highway vehicle detection based on convolutional neural network. J Image Video Proc 2018:109. https://doi.org/10.1186/s13640018-0350-2 Shiru Q, Xu L (2016) Research on multi-feature front vehicle detection algorithm based on video image. Chin Control Decis Conf (CCDC) 2016:3831–3835. https://doi.org/10.1109/ CCDC.2016.7531653 Feng R, Fan C, Li Z, Chen X (2020) Mixed road user trajectory extraction from moving aerial videos based on convolution neural network detection. IEEE Access 8:43508–43519. https:// doi.org/10.1109/ACCESS.2020.2976890 He Y, Li L (2018) A novel multi-source vehicle detection algorithm based on deep learning. In: 2018 14th IEEE international conference on signal processing (ICSP), pp 979–982. https:// doi.org/10.1109/ICSP.2018.8652388 Momin BF, Mujawar TM (2015) Vehicle detection and attribute-based search of vehicles in video surveillance system. In: 2015 international conference on circuits, power and computing technologies [ICCPCT-2015], pp 1–4. https://doi.org/10.1109/ICCPCT.2015.7159405 Shobha BS, Deepu R (2018) A review on video based vehicle detection, recognition and tracking. In: 2018 3rd international conference on computational systems and information technology for sustainable solutions (CSITSS), pp 183–186. https://doi.org/10.1109/CSITSS. 2018.8768743 Wang G, Xiao D, Gu J (2008) Review on vehicle detection based on video for traffic surveillance. In: 2008 IEEE international conference on automation and logistics, pp 2961–2966. https://doi.org/10.1109/ICAL.2008.4636684 Sowmya V, Radha R (2021) Heavy-vehicle detection based on YOLOv4 featuring data augmentation and transfer-learning techniques. J Phys: Conf Ser 1911(1):012029. https://doi.org/10. 1088/1742-6596/1911/1/012029 Liu X, Dong Y, Deng Z (2020) Deep highway multi-camera vehicle re-id with tracking context. In: 2020 IEEE 4th information technology, networking, electronic and automation control conference (ITNEC), pp 2090–2093. https://doi.org/10.1109/ITNEC48623.2020.9085008 Lechgar H, Malaainine MEI, Rhinane H (2020) Spatio-temporal tracking of vehicles using deep learning, applied on aerial videos. In: 2020 IEEE international conference of moroccan geomatics (Morgeo), pp 1–7. https://doi.org/10.1109/Morgeo49228.2020.9121907 Bui KN, Yi H, Cho J (2020) A vehicle counts by class framework using distinguished regions tracking at multiple intersections. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 2466–2474. https://doi.org/10.1109/CVPRW5 0498.2020.00297 Hou X, Wang Y, Chau L (2019) Vehicle tracking using deep SORT with low confidence track filtering. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–6. https://doi.org/10.1109/AVSS.2019.8909903 Lou L, Zhang Q, Liu C, Sheng M, Zheng Y, Liu X (2019) Vehicles detection of traffic flow video using deep learning. In: 2019 IEEE 8th data driven control and learning systems conference (DDCLS), pp 1012–1017. https://doi.org/10.1109/DDCLS.2019.8908873

242

A. Sunny and N. Manohar

21. Dai Z et al (2019) Video-based vehicle counting framework. IEEE Access 7:64460–64470. https://doi.org/10.1109/ACCESS.2019.2914254 22. Gomaa A, Abdelwahab MM, Abo-Zahhad M (2020) Efficient vehicle detection and tracking strategy in aerial videos by employing morphological operations and feature points motion analysis. Multimed Tools Appl 79:26023–26043. https://doi.org/10.1007/s11042-020-09242-5 23. Akshay S, Thomas S, Prashanth AR (2016) Improved multiple object detection and tracking using KF-OF method. Int J Eng Technol 8(2):1162–1168 24. Chandrajit M, Rani NS, Nageshmurthy M (2021) Robust segmentation and classification of moving objects from surveillance video. IOP Conf Ser: Mater Sci Eng 1085:012009. https:// doi.org/10.1088/1757-899X/1085/1/012009

Face Recognition Using Sketch Images T. Darshana, N. Manohar, and M. Chandrajith

1 Introduction Facial sketches are considerably utilized by law requirement offices to strengthen suspects’ distinguishing proof and dread included in criminal exercises. An automated bio-metric approach to recognizing an individual based on certain physiological characteristics is called face recognition. Generally, the method of identifying a person through biometrics uses genetic factors (such as fingerprints, iris patterns, or faces) or behavioral patterns (such as writing style, tone, or keystroke pattern). Outlines utilized in measurable examinations are drawn by measurable craftsmen taking after the verbal depiction given by eyewitnesses or the victim. In a legal portrayal, the face depiction is dependent on the onlooker’s recollection, resulting in instability in facial properties. The volatility of face attributes has been disregarded by the majority of existing sketch drawing recovery solutions. Based on the writing surveys I had conducted, it was observed that the confront photo-sketch acknowledgment framework conveys single, 2D, and 3D inescapable models to differ facial highlights and consequently produce unused pictures. Utilizing the profound learning strategies, they had made single sketches per confront which the combination of all the proposed frameworks with driving strategy yields assist execution for both seen and scientific portrays. In today’s world, as innovation is prospering, the number of wrongdoings was to force. In this manner, the produced portrayal given by the onlooker of the victim is the only clue to observing the casualty. So, I contemplate harnessing a more progressed inescapable model which allows more adaptability and more precision within the variety of facial highlights. T. Darshana · N. Manohar (B) Department of Computer Science, School of Computing, Amrita Vishwa Vidyapeetham, Mysore Campus, Mysore, India e-mail: [email protected] M. Chandrajith Department of Computer Applications, Maharaja Institute of Technology, Mysore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D. S. Guru et al. (eds.), Data Analytics and Learning, Lecture Notes in Networks and Systems 779, https://doi.org/10.1007/978-981-99-6346-1_20

243

244

T. Darshana et al.

2 Related Work This project was completed after going through a lot of research. The initial idea of using face recognition to help the Police was introduced by Aamir Khan [1]. Later many researches and upgrades were discussed and published to improve the efficiency and challenges faced by their previous works. Some major ones are as follows: Cheng [2] introduced facial sketch-photo recognition, Hu Han’s matching composite sketches to facial photographs in 2016, Parkhi [3] described a method for the trouble of face pose discrimination by the usage of SVM in 2020 and the recent one by Pentland et al. [3] on a technique on face recognition from different angles and under varying illuminations. There are so many more works referred to and studied in the evolution of this project at different stages. They are briefly mentioned below. Xiaoou, and Tang presented automated facial image retrieval from police mugshot databases. It is crucial for agencies of the law. It will make it easier for detectives to discover or pin down prospective suspects. When a perpetrator’s visual picture is unavailable, the best substitute is typically a pencil drawing backed by an eyewitness’ recall. Aamir Khan [1] investigated the image retrieval of a problematic uneven face sketch. To deal with the modality gap among both sketches and facial photographs, we used many types of drawings made by the sketch maker and projected a fuzzy rule-based layering method. Localizing the face components yields the facial combinations. The proposed technique considerably outperforms the progressive methods. Pentland et al. [4] presented a technique for recognizing and categorizing emotions. The photos were compressed using an image compression technique (FMM). Viola Jones Haar-like object detector is then used to detect the person’s face in the compressed picture. Sheng et al. [5] proposed a unique strategy to use face attribute information for sketch-photo identification. The developed model can rebuild the image and sketch dimensions into a discriminative integrating subspace in a traditional manner. Using a paired deep neural network with face features supplied by eyewitnesses, they compared the system to progressive sketch-photo recognition methods and demonstrated its superiority. Face-matching forensic drawings to pictures is difficult for two basic reasons, according to Sim et al. [6]. (1) Sketches are frequently imperfect and do not adequately depict the subject’s face. Because the gallery images are photographs and the probe images are sketches, we must compare different images. Another key contribution is the large-scale matchup experiment performed with forensic sketches. Chang et al. [7] presented multiple classification algorithms that have already been studied, and in past years, classifiers such as the support vector machine, K nearest neighbor, and artificial and convolution neural networks have been popular. Parkhi [3] involves two contributions: First, they developed a process for assembling large datasets with small label noise while minimizing the amount of manual annotation needed. Several key ideas were developed in order to rank the data presented to annotators in a less sophisticated way. The procedure has been developed as a method for identifying faces, but can also be applied to other object classes or finer-grained tasks. A second analysis was to show that a deep CNN, without any frills but with appropriate training, can obtain results

Face Recognition Using Sketch Images

245

equivalent to the best in the field. This observation will apply to many other applications. Wan [8] researched a difficult face recognition problem: matching composite drawings to facial pictures. They employed two freely accessible face composite software tools and developed component-based representation (CBR) to bridge the mode gap between composition sketches and facial pictures. Using a typical gallery set of mugshots, the suggested technique outperforms a state-of-the-art COTS face matcher considerably. The suggested technique outperforms when the gallery set is filtered for gender information. Tolba [9] presents a singular face reputation method for the usage of drawn sketches through remodeling a photo image into a sketch, and they lessen the distinction among pictures and comic strip. Wu [10] proposed a FaceNet primarily based face cartoon recognition method. The authentic gallery image photographs have been transferred to sketch style to reduce the discrepancy between face sketches and snapshots. Then the FaceNet was followed to generate a function vector for every face photograph which can be used to degree the face similarity. Experimental outcomes suggest the prevalence of the proposed method. This approach is that the preliminary strive and the properly skilled FaceNet are used. Parkhi [3] researched an adverse model for generating quality facial pictures from sketching. A function level loss was used to achieve adequate similarities of both the synthesized images and the ground fact shots, in addition to the commonly utilized discriminators for ensuring the integrity of the turbines had been employed. Face identification is the usage of the synthesized pictures in a collection set with 10K historical past images showing that the technique is instrumental in enhancing comic strip primarily based face recognition accuracy. Yu et al. [11] provide a paradigm for neural representation-based face caricature creation. The proposed approach divides neural fragments into key layers and uses them to act as goal convolution layers. Their approach makes use of the enhanced three-D Patch fit to seek in training photograph feature map areas inside the first degree. Then, the go-layer price aggregation in item areas is used to locate the first-class in shape with inter-layer uniformity for query neural patches in the second level. Because the technique uses neural patches rather than RGB patches to create images, the outputs should have clear sketch edgeswhile avoiding mosaic results. Mittal et al. [12] offer a unique approach for comparing reinforced drawings and facial photos. To effectively shape the heterogeneous data, the suggested technique performs inductive transfer on the functions obtained utilizing a deep studying framework. When compared to current techniques, the suggested algorithm produces step forward results on the thee-PRIP dataset. Experiments on a large library of 2400 themes further indicate that the suggested approach is adaptable and reaches a rank of forty and an accuracy of 58%. The face popularity challenge is defined as a situation in differentiation space, which fashions differences among facial photographs, according to Huang [13]. They conceptualize face popularity as an elegant problem in unique locations. Variation across identical people’s faces and diversity among appearances of different persons are two instances. Altering the translation of the selection floor resulted in the discovery of a similarity measure across faces based on examples of variance among faces. FERET database is used for the experimentation using SVM classifier. The effectiveness of

246

T. Darshana et al.

each verification and identification scenario was measured. Mittal et al. [12] established a component-based full technique and worldwide methods for face popularity and assessed their overall effectiveness in terms of resilience to pose alterations. Ten features are extracted to represent the face and then aggregated to a single feature and finally fed into SVM classifier. Each pattern detects the entire face, extracts it from the snapshot, and feeds it into the classification algorithm. The structures were studied on a dataset with 8.593 gray face images that covered faces turned around intensively as much as 400. Despite the fact that the global machine utilized a more effective classifier, the component-based totally machine surpassed the direct mapping in all circumstances. This implies that using facial additives as entry functions rather than the entire face sample greatly simplifies the assessment of face popularity. Zaho [14] offered a brand new improvement in components primarily based totally face reputation through incorporating a three-D morphable version into the education process. The three-D face version of everyone and sundry within the database based on face snap pictures of someone and a three-D morphable version was calculated. By rendering the three-D fashions below various poses and lighting fixtures conditions, a massive wide variety of artificial face snap shots is utilized to teach an element primarily based totally on the reputation device. An element primarily based totally on a reputation fee of round ninety-eight is finished for faces turned around up to ±360 in depth. A primary downside of the device became the want for a massive wide variety of education snap shots taken from viewpoints and below distinct lighting fixture conditions. Huang [15] provided a consumer-precise answer that is followed which calls for mastering consumer-precise assist vectors. As referred to before, SVM changed into skilled to differentiate among the populations of within-consumer and among-consumer distinction snap shots respectively. Regardless of visualization or pre-processing, SVM was used to retrieve relevant discriminating facts from the training data. To achieve this goal, they devised trials in which facial pictures are depicted in each Principal Component (PC) and Linear Discriminant (LD) subspace. To solve the multi-elegance category issue for a Kelegance category check, a set of K most effective pairwise coupling classifiers (O-PWC) is built by Phillips [16], each of which is the most dependable and useful for the equivalent beauty inside the sensation of rectangle error cross-entropy. The final choice may have been made by merging the outcomes of those K

3 Proposed Methodology The proposed architecture for Face Recognition utilizing sketch pictures has 3 phases, they are pre-processing, feature extraction, and classification. The workflow of a strongly proposed technique is depicted in Fig. 1.

Face Recognition Using Sketch Images

247

Fig. 1 Proposed architecture

3.1 Pre-processing The process starts by uploading Face image datasets. This step is part of our first phase of Pre-Processing. This phase helps our research by enhancing the image quality to extract the important feature of face recognition. The background of the original face images was cut out first, leaving only the facial region, so that the detailed aspects of the face could be extracted without the background information. Here we employ OTSU thresholding for the binarization of face images and control the resolution by adjusting the RGB values. The datasets consist of n number of face images of different angles, postures, and detailing. Since face images are subjected to a variety of views, including posture, brightness, emotions, and background, this normalization is a crucial part of this phase. Normalization of the face image includes adjusting the

248

T. Darshana et al.

pixel intensity for better resolution and brightness, and connecting the coordinates of the eyes, nose, and mouth (detected by direct image processing technique) by rotation and translation of images. These images are our training data or the initial data that we are using to train our model. After the training datasets are fed to machine learning algorithms, the next step is to train the model on how to make predictions. Here our models discover and learn patterns from our Face sketch datasets.

3.2 Feature Extraction and Classification In this step sketches and their images are compared, one being the source and the other the target. This step helps to find and transfer the attributes of the sketches. Feature Extraction is an important part of image processing. A layer-by-layer model that has been pre-trained outperforms a model that has not been pre-trained. Here the extraction of features is done automatically as it’s a deep learning approach. The features such as eyes, nose, eyebrows, lips, ears, hair, and the orientation of faces are extracted using the deep learning approach. Our data sets are hand-drawn sketches. And so, it won’t be following any predetermined patterns as it’s totally up to the artist who draws the sketches. This is a sort of dimensionality reduction in which multiple pixels of an image are accurately represented in order to acquire distinct elements of a face sketch. This is a kind of filtering step where the most significant parts of our faces are identified. Feature extraction is automated with a CNN model. The convolution layer is the foundation of CNN. It carries most of the computational load of the network. In this paper, we used the CNN Model for classification as shown in Fig. 2. Cropped face photos of 35 persons are sorted into two categories: training and testing. We’ll train the CNN model with the photographs in the training folder which contains 35 classes with each class containing 20 images, then test it with the hand-drawn sketches in the testing folder which contains 10 sketches to see whether it can recognize the faces in the unseen images. In our model, there are 2 hidden layers of convolution, 2 hidden layers of max pooling, 1 layer of flattening, 1 hidden ANN layer, and 35 neurons where the input images are passed. There are 32 filters of size 3 × 3 indicating the depth of the output of applying the convolutional layer with 32 feature maps. The spatial dimensions are then reduced via a max-pooling layer of filter size 2 × 2. In the next convolutional layer, the spatial dimension is reduced using 64 filters of size 3 × 3 and then transferred to the max-pooling layer of filter size 2 × 2. The Conv2D layer has 128 filters in total. The model is then trained using our 35-class dataset. In our CNN model, a ReLU activation function is employed to ensure that the model does not activate all of the neurons at the same time, reducing the amount of computing necessary to run the neural network. The layer that follows is the output layer trained for 35 classes. We have used 10 epochs to run the process.

Face Recognition Using Sketch Images

249

Fig. 2 CNN architecture

4 Experimentation 4.1 Datasets Data needs content for context to create and make an experience of it. Datasets with considering different challenges plays an important test the effeciency of the proposed system. Here we have 35 separate classes in which each class contains 10 sketches for training and 20 face pictures for testing. Some of the challenges are that the orientation of faces should be perfectly maintained which is misleading for the machine to find distinguishing features for classification; the spectator’s position should be straight and the background should be normal. Hand-drawn facial sketch datasets were utilized in our research. The hand-drawn sketch dataset is the one in which a sketch is defined on the boundaries of a full-face picture. It shows nearby similarities in its extent and elements to a photograph of a similar face. Figure 3 shows the samples of our model.

4.2 Results We had good accuracy in matching the sketches to regular images using the suggested methods, and the results were quite promising. We tried this method on several datasets with different thresholds and found the optimal threshold. There were many challenges like (a) The orientation of the faces should be perfectly maintained, which is misleading for the machine to find distinguishing marks for classification. (b) The spectator position should be straight. (c) The background should be normal, the proposed model can cope with most challenges. For the evaluation, we considered six parameters: Precision, Recall, Accuracy, Fi Score, and Support. It delivers a graphical depiction of evaluation metrics for various datasets. The Fi Score or FMeasure gives an excellent output for identifying sketch images. This confirms that the face sketch recognition has no uniformity and is still at a premature phase having lots of possibilities and scope to evolve in the future. From Accuracy and Precision, it is well evident that nothing can be guaranteed in this field, and they are still subjected to errors. There are only 2 points where accuracy and precision are at a decent value of 94% or above and there are also a couple of points where both can be seen below 0.9. This also talks about our inputs, that is, the face sketches. As sketches are very subjective and wouldn’t be uniform across images, limitations are inevitable. The

250

T. Darshana et al.

Fig. 3 Sample datasets

Fig. 4 shows the accuracy, precision and recall of the system for the experimentation conducted on different datasets.

Fig. 4 Evaluation matric line chart

Face Recognition Using Sketch Images

251

5 Conclusion Experiments have yielded very favorable impacts. We summarize the main approaches described in more than 30 papers for the recognition of face sketches. It is understood that the Face Sketch Recognition field is young and that there are no uniform methods of evaluating them, so it would be imprudent to explicitly state how different methods can perform better. We encourage that further research should focus on adopting common techniques in creating and developing Face Sketch Recognition systems by implementing a defined assessment process that allows forensic analysts to evaluate the face sketch innovation in specific areas. As part of this project, lots of images have been collected and taken. More than 500 face sketches have been drawn of people with different colors, postures, and face textures. The images used were variably lit to provide distinct luminance. Several challenges were faced such as the orientation of faces should be perfectly maintained which is misleading for the machine to find distinguishing features for classification, the spectator’s position should be straight, and the background should be normal. To overcome these challenges we have considered sketch images with different orientations. With the help of a strong dataset, the model was trained to provide matches of utmost accuracy and precision. Extensive experimentation is conducted to check the effeciency of the proposed system. The images to be matched are uploaded to a different location from where the comparison and identification are being done, and the input sketches are put in a different defined location. This project also consists of the humongous efforts of all those nice people who provided their pictures, the artists who drew the sketches for me, those library people who provided books and their insights on the subject matter, and my guide. Datasets with different challenges can be considered in the future work.

References 1. Aamir Khan M (2015) A fuzzy rule-based multimodal framework for face sketch-to-photo retrieval. Elsevier 2. Cheng Y (2016) A modified contrastive loss method for face recognition. Elsevier 3. Parkhi OM (2016) Deep Face Recognition. IEEE 4. Pentland A, Moghaddam B, Starner T (2021) View-based and modular eigenspaces for face recognition. In: Proceedings of IEEE CS conference on computer vision and pattern recognition, pp 84–91. 5. Sheng B, Li P, Gao C, Ma Fellow K-L (2017) Deep neural representation guided face sketch synthesis. IEEE 6. Sim T, Sukthankar R, Mullin M, Baluja S (2017) Memory-based face recognition for visitor identification. In: AFGR 7. Chang K, Bowyer KW, Sarkar S (2021) Comparison and combination of ear and face images in appearance-based biometrics, IEEE Trans Pattern Anal Mach Intell 25(9) 8. Wan W (2018) FaceNet based face sketch recognition. IEEE 9. Tolba AS (2019) Face recognition: a literature review 12:831–835 10. Wu H (2018) Face recognition based on convolution siamese networks. IEEE

252

T. Darshana et al.

11. Yu S, Han H, Shan S, Dantcheva A, Chen X (2016) Improving face sketch recognition via adversarial sketch-photo transformation. IEEE 12. Mittal P, Vatsa M, Singh R (2019) Composite sketch recognition via deep network—a transfer learning approach. IEEE 13. Huang I (2020) Combined classifiers for invariant face recognition. Springer 14. Zaho W (2020) Robust image based 3D face recognition. Maryland University. IEEE 15. Huang J (2018) Component-based face recognition with 3D morphable models. Springer, pp 30–35 16. Phillips PJ (2021) Face recognition: a literature survey. IEEE 17. Belhumeur P, Hespanha P, Kriegman D (2021) Recognition using class specific linear projection. IEEE. Trans Pattern Anal Mach Intell 19(7):711–720 18. Brunelli R, Poggio T (2019) Face recognition: features versus templates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1052 19. M.A. Grudin (2019). A compact multi-level model for the recognition of facial images, Ph.D. thesis, Liverpool John Moores University.