Machine Vision Inspection Systems, Machine Learning-Based Approaches [2] 1119786096, 9781119786092

Machine Vision Inspection Systems (MVIS) is a multidisciplinary research field that emphasizes image processing, machine

1,074 117 40MB

English Pages 352 [337] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Machine Vision Inspection Systems: Image Processing, Concepts, Methodologies, and Applications 1119681804, 9781119681809

This edited book brings together leading researchers, academic scientists and research scholars to put forward and share

662 114 28MB Read more

Machine vision and navigation 9783030225865, 9783030225872

1,390 131 17MB Read more

Machine Vision and Image Recognition 9781774074213, 9781774073551

Machine Vision and Image Recognition informs the readers about the behavior fusion for visually guided service robots an

841 130 13MB Read more

Image acquisition and preprocessing for machine vision systems 9781680150964, 1680150960, 9780819482020

This book provides a combination of the operational details of imaging hardware and analytical theories of low-level ima

1,238 124 20MB Read more

Handbook of Human-Machine Systems 9781119863632

Insightful and cutting-edge discussions of recent developments in human-machine systems InHandbook of Human-Machine Sys

234 23 29MB Read more

Vertov, Snow, Farocki: Machine Vision and the Posthuman 1441169156, 9781441169150

Vertov, Snow, Farocki: Machine Vision and the Posthuman begins with a comprehensive and original anthropological analysi

687 45 9MB Read more

Linear Algebra for Computer Vision, Robotics, and Machine Learning

3,149 604 5MB Read more

Machine Vision Handbook [1 ed.] 9781849961684, 9781849961691

The automation of visual inspection is becoming more and more important in modern industry as a consistent, reliable mea

261 83 69MB Read more

A Guide for Machine Vision in Quality Control 0815349270, 9780815349273

Machine Vision systems combine image processing with industrial automation. One of the primary areas of application of M

1,661 173 11MB Read more

Machine Learning for Beginners: Learn to Build Machine Learning Systems Using Python (English Edition) 9789389845426, 9389845424

Get familiar with various Supervised, Unsupervised and Reinforcement learning algorithms Key FeaturesUnderstand the ty

2,000 599 4MB Read more

Machine Vision Inspection Systems, Machine Learning-Based Approaches [2]
1119786096, 9781119786092

Author / Uploaded
Muthukumaran Malarvel
Soumya Ranjan Nayak
Prasant Kumar Pattnaik
Surya Narayan Panda

Citation preview

Machine Vision Inspection Systems, Volume 2

Scrivener Publishing 100 Cummings Center, Suite 541J Beverly, MA 01915-6106 Publishers at Scrivener Martin Scrivener ([email protected]) Phillip Carmical ([email protected])

Machine Vision Inspection Systems, Volume 2 Machine Learning-Based Approaches Edited by Muthukumaran Malarvel

Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India

Soumya Ranjan Nayak

Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India

Prasant Kumar Pattnaik

School of Computer Engineering, KIIT Deemed to be University, India

and

Surya Narayan Panda

Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India

This edition first published 2021 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA © 2021 Scrivener Publishing LLC For more information about Scrivener publications please visit www.scrivenerpublishing.com. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions. Wiley Global Headquarters 111 River Street, Hoboken, NJ 07030, USA For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com. Limit of Liability/Disclaimer of Warranty While the publisher and authors have used their best efforts in preparing this work, they make no rep resentations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchant- ability or fitness for a particular purpose. No warranty may be created or extended by sales representa tives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further informa tion does not mean that the publisher and authors endorse the information or services the organiza tion, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Library of Congress Cataloging-in-Publication Data ISBN 978-1-119-78609-2 Cover image: Pixabay.Com Cover design by: Russell Richardson Set in size of 11pt and Minion Pro by Manila Typesetting Company, Makati, Philippines Printed in the USA 10 9 8 7 6 5 4 3 2 1

Contents Preface xiii 1 Machine Learning-Based Virus Type Classification Using Transmission Electron Microscopy Virus Images Kalyan Kumar Jena, Sourav Kumar Bhoi, Soumya Ranjan Nayak and Chittaranjan Mallick 1.1 Introduction 1.2 Related Works 1.3 Methodology 1.4 Results and Discussion 1.5 Conclusion References 2 Capsule Networks for Character Recognition in Low Resource Languages C. Abeysinghe, I. Perera and D.A. Meedeniya 2.1 Introduction 2.2 Background Study 2.2.1 Convolutional Neural Networks 2.2.2 Related Studies on One-Shot Learning 2.2.3 Character Recognition as a One-Shot Task 2.3 System Design 2.3.1 One-Shot Learning Implementation 2.3.2 Optimization and Learning 2.3.3 Dataset 2.3.4 Training Process 2.4 Experiments and Results 2.4.1 N-Way Classification 2.4.2 Within Language Classification 2.4.3 MNIST Classification

1 2 3 4 6 16 16 23 24 25 25 26 26 28 31 31 32 32 33 34 37 39

v

vi Contents 2.4.4 Sinhala Language Classification 2.5 Discussion 2.5.1 Study Contributions 2.5.2 Challenges and Future Research Directions 2.5.3 Conclusion References 3 An Innovative Extended Method of Optical Pattern Recognition for Medical Images With Firm Accuracy— 4f System-Based Medical Optical Pattern Recognition Dhivya Priya E.L., D. Jeyabharathi, K.S. Lavanya, S. Thenmozhi, R. Udaiyakumar and A. Sharmila 3.1 Introduction 3.1.1 Fourier Optics 3.2 Optical Signal Processing 3.2.1 Diffraction of Light 3.2.2 Biconvex Lens 3.2.3 4f System 3.2.4 Literature Survey 3.3 Extended Medical Optical Pattern Recognition 3.3.1 Optical Fourier Transform 3.3.2 Fourier Transform Using a Lens 3.3.3 Fourier Transform in the Far Field 3.3.4 Correlator Signal Processing 3.3.5 Image Formation in 4f System 3.3.6 Extended Medical Optical Pattern Recognition 3.4 Initial 4f System 3.4.1 Extended 4f System 3.4.2 Setup of 45 Degree 3.4.3 Database Creation 3.4.4 Superimposition of Diffracted Pattern 3.4.5 Image Plane 3.5 Simulation Output 3.5.1 MATLAB 3.5.2 Sample Input Images 3.5.3 Output Simulation 3.6 Complications in Real Time Implementation 3.6.1 Database Creation 3.6.2 Accuracy 3.6.3 Optical Setup

41 41 41 42 43 43

47 48 48 50 50 51 51 52 55 55 55 56 56 57 58 59 59 59 59 60 60 60 60 61 61 64 64 65 65

Contents vii 3.7 Future Enhancements References

65 65

4 Brain Tumor Diagnostic System—A Deep Learning Application 69 Kalaiselvi, T. and Padmapriya, S.T. 4.1 Introduction 69 4.1.1 Intelligent Systems 69 4.1.2 Applied Mathematics in Machine Learning 70 4.1.3 Machine Learning Basics 72 4.1.4 Machine Learning Algorithms 73 4.2 Deep Learning 75 4.2.1 Evolution of Deep Learning 75 4.2.2 Deep Networks 76 4.2.3 Convolutional Neural Networks 77 4.3 Brain Tumor Diagnostic System 80 4.3.1 Brain Tumor 80 4.3.2 Methodology 80 4.3.3 Materials and Metrics 84 4.3.4 Results and Discussions 85 4.4 Computer-Aided Diagnostic Tool 86 4.5 Conclusion and Future Enhancements 87 References 88 5 Machine Learning for Optical Character Recognition System 91 Gurwinder Kaur and Tanya Garg 5.1 Introduction 91 5.2 Character Recognition Methods 92 5.3 Phases of Recognition System 93 5.3.1 Image Acquisition 93 5.3.2 Defining ROI 94 5.3.3 Pre-Processing 94 5.3.4 Character Segmentation 94 5.3.5 Skew Detection and Correction 95 5.3.6 Binarization 95 5.3.7 Noise Removal 97 5.3.8 Thinning 97 5.3.9 Representation 97 5.3.10 Feature Extraction 98 5.3.11 Training and Recognition 98 5.4 Post-Processing 101

viii Contents 5.5 Performance Evaluation 5.5.1 Recognition Rate 5.5.2 Rejection Rate 5.5.3 Error Rate 5.6 Applications of OCR Systems 5.7 Conclusion and Future Scope References 6 Surface Defect Detection Using SVM-Based Machine Vision System with Optimized Feature Ashok Kumar Patel, Venkata Naresh Mandhala, Dinesh Kumar Anguraj and Soumya Ranjan Nayak 6.1 Introduction 6.2 Methodology 6.2.1 Data Collection 6.2.2 Data Pre-Processing 6.2.3 Feature Extraction 6.2.4 Feature Optimization 6.2.5 Model Development 6.2.6 Performance Evaluation 6.3 Conclusion References 7 Computational Linguistics-Based Tamil Character Recognition System for Text to Speech Conversion Suriya, S., Balaji, M., Gowtham, T.M. and Rahul, Kumar S. 7.1 Introduction 7.2 Literature Survey 7.3 Proposed Approach 7.4 Design and Analysis 7.5 Experimental Setup and Implementation 7.6 Conclusion References 8 A Comparative Study of Different Classifiers to Propose a GONN for Breast Cancer Detection Ankita Tiwari, Bhawana Sahu, Jagalingam Pushaparaj and Muthukumaran Malarvel 8.1 Introduction 8.2 Methodology 8.2.1 Dataset

103 103 103 103 104 105 105 109 110 113 113 113 115 116 119 120 123 124 129 130 130 134 134 136 151 151 155 156 157 157

Contents ix 8.2.2 Linear Regression 8.2.2.1 Correlation 8.2.2.2 Covariance 8.2.3 Classification Algorithm 8.2.3.1 Support Vector Machine 8.2.3.2 Random Forest Classifier 8.2.3.3 K-Nearest Neighbor Classifier 8.2.3.4 Decision Tree Classifier 8.2.3.5 Multi-Layered Perceptron 8.3 Results and Discussion 8.4 Conclusion References 9 Mexican Sign-Language Static-Alphabet Recognition Using 3D Affine Invariants Guadalupe Carmona-Arroyo, Homero V. Rios-Figueroa and Martha Lorena Avendaño-Garrido 9.1 Introduction 9.2 Pattern Recognition 9.2.1 3D Affine Invariants 9.3 Experiments 9.3.1 Participants 9.3.2 Data Acquisition 9.3.3 Data Augmentation 9.3.4 Feature Extraction 9.3.5 Classification 9.4 Results 9.4.1 Experiment 1 9.4.2 Experiment 2 9.4.3 Experiment 3 9.5 Discussion 9.6 Conclusion Acknowledgments References 10 Performance of Stepped Bar Plate-Coated Nanolayer of a Box Solar Cooker Control Based on Adaptive Tree Traversal Energy and OSELM S. Shanmugan, F.A. Essa, J. Nagaraj and Shilpa Itnal 10.1 Introduction

159 160 160 161 161 162 163 163 164 165 169 169 171 171 175 175 177 179 179 179 181 181 182 182 184 184 188 189 190 190

193 194

x Contents 10.2 Experimental Materials and Methodology 196 10.2.1 Furious SiO2/TiO2 Nanoparticle Analysis of SSBC Performance Methods 196 10.2.2 Introduction for OSELM by Use of Solar Cooker 198 10.2.3 Online Sequential Extreme Learning Machine (OSELM) Approach for Solar Cooker 199 10.2.4 OSELM Neural Network Adaptive Controller on Novel Design 199 10.2.5 Binary Search Tree Analysis of Solar Cooker 200 10.2.6 Tree Traversal of the Solar Cooker 205 10.2.7 Simulation Model of Solar Cooker Results 206 10.2.8 Program 207 10.3 Results and Discussion 210 10.4 Conclusion 212 References 214 11 Applications to Radiography and Thermography for Inspection 219 Inderjeet Singh Sandhu, Chanchal Kaushik and Mansi Chitkara 11.1 Imaging Technology and Recent Advances 220 11.2 Radiography and its Role 220 11.3 History and Discovery of X-Rays 221 11.4 Interaction of X-Rays With Matter 222 11.5 Radiographic Image Quality 222 11.6 Applications of Radiography 225 11.6.1 Computed Radiography (CR)/Digital Radiography (DR) 225 11.6.2 Fluoroscopy 227 11.6.3 DEXA 228 11.6.4 Computed Tomography 229 11.6.5 Industrial Radiography 231 11.6.6 Thermography 234 11.6.7 Veterinary Imaging 235 11.6.8 Destructive Testing 235 11.6.9 Night Vision 235 11.6.10 Conclusion 236 References 236

Contents xi 12 Prediction and Classification of Breast Cancer Using Discriminative Learning Models and Techniques M. Pavithra, R. Rajmohan, T. Ananth Kumar and R. Ramya 12.1 Breast Cancer Diagnosis 12.2 Breast Cancer Feature Extraction 12.3 Machine Learning in Breast Cancer Classification 12.4 Image Techniques in Breast Cancer Detection 12.5 Dip-Based Breast Cancer Classification 12.6 RCNNs in Breast Cancer Prediction 12.7 Conclusion and Future Work References

241 242 243 245 246 248 255 260 261

13 Compressed Medical Image Retrieval Using Data Mining and Optimized Recurrent Neural Network Techniques 263 Vamsidhar Enireddy, Karthikeyan C., Rajesh Kumar T. and Ashok Bekkanti 13.1 Introduction 264 13.2 Related Work 265 13.2.1 Approaches in Content-Based Image Retrieval (CBIR) 265 13.2.2 Medical Image Compression 266 13.2.3 Image Retrieval for Compressed Medical Images 267 13.2.4 Feature Selection in CBIR 268 13.2.5 CBIR Using Neural Network 268 13.2.6 Classification of CBIR 269 13.3 Methodology 269 13.3.1 Huffman Coding 270 13.3.2 Haar Wavelet 271 13.3.3 Sobel Edge Detector 273 13.3.4 Gabor Filter 273 13.3.5 Proposed Hybrid CS-PSO Algorithm 276 13.4 Results and Discussion 277 13.5 Conclusion and Future Enhancement 282 13.5.1 Conclusion 282 13.5.2 Future Work 283 References 283 14 A Novel Discrete Firefly Algorithm for Constrained MultiObjective Software Reliability Assessment of Digital Relay Madhusudana Rao Nalluri, K. Kannan and Diptendu Sinha Roy 14.1 Introduction

287 288

xii Contents 14.2 A Brief Review of the Digital Relay Software 291 14.3 Formulating the Constrained Multi-Objective Optimization of Software Redundancy Allocation Problem (CMOO-SRAP) 293 14.3.1 Mathematical Formulation 294 14.4 The Novel Discrete Firefly Algorithm for Constrained MultiObjective Software Reliability Assessment of Digital Relay 297 14.4.1 Basic Firefly Algorithm 298 14.4.2 The Modified Discrete Firefly Algorithm 299 14.4.2.1 Generating Initial Population 299 14.4.2.2 Improving Solutions 299 14.4.2.3 Illustrative Example 301 14.4.3 Similarity-Based Parent Selection (SBPS) 303 14.4.4 Solution Encoding for the CMOO-SRAP for Digital Relay Software 305 14.5 Simulation Study and Results 305 14.5.1 Simulation Environment 305 14.5.2 Simulation Parameters 306 14.5.3 Configuration of Solution Vectors for the CMOOSRAP for Digital Relay 306 14.5.4 Results and Discussion 306 14.6 Conclusion 317 References 317

Index 323

Preface The edited book aims to bring together leading researchers, academic scientists, and research scholars to put forward and share their experiences and research results on all aspects of an inspection system for detection analysis for various machine vision applications. It also provides a premier interdisciplinary platform for educators, practitioners and researchers to present and discuss the most recent innovations, trends, methodology, applications, and concerns as well as practical challenges encountered and solutions adopted in the inspection system in terms of machine learning-based approaches of machine vision for real and industrial application. The book is organized into fourteen chapters. Chapter 1 deliberated about various dangerous infectious viruses affect human society with a detailed analysis of transmission electron microscopy virus images (TEMVIs). In this chapter, several TEMVIs such as Ebola virus (EV), Enterovirus (ENV), Lassa virus (LV), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Zika virus (ZV), etc. are analyzed. The ML-based approach mainly focuses on the classification techniques such as Logistic Regression (LR), Neural Network (NN), k-Nearest Neighbors (kNN) and Naive Bayes (NB) for the processing of TEMVIs. Chapter 2 focused to identify and differentiate handwriting characters using deep neural networks. As a solution to the character recognition problem in low resource languages, this chapter proposes a model that replicates the human cognition ability to learn with small datasets. The proposed solution is a Siamese neural network which bestows capsules and convolutional units to get a thorough understanding of the image. Further, this chapter attests that the capsule-based Siamese network could learn abstract knowledge about different characters which could be extended to unforeseen characters. Chapter 3 presented Optics growth with the development of lens in terms of accuracy. The 4f-based optical system is used as a benchmark to

xiii

xiv Preface develop a firm system for medical applications. This method performing transforms with the optical system helps in improving accuracy. The image of the patient placed in the object plane is exposed to optical rays, the biconvex lens between the object and Fourier Plane performs an optical Fourier transform. This system indicating the normal or abnormal condition of the patient and helps in high-speed pattern recognition with optical signals. Chapter 4 studied about brain tumor diagnosis process on digital images using a convolutional neural network (CNN) as a part of the deep learning model. To classification of brain tumors, eight different CNN models were tested on magnetic resonance imaging (MRI). Additionally, the detailed discussion on machine learning algorithms and deep learning techniques is presented. Chapter 5 focused on optical character recognition. In this chapter, the detailed study was presented on handwritten identification and classification techniques and their applications. Furthermore, this chapter discussed their limitations along with an overview of the precision rate of Artificial Neural Network-based approaches. Chapter 6 presented an automated process of detection of defects on wood or metal surface. Generally, monitoring the quality of raw material plays a crucial role in the production of a quality product. Therefore, this chapter developed the classification model using the multiclass support vector machine to identify the defected present into the wood. Chapter 7 focused computational linguistics towards text recognition and synthesis, speech recognition and synthesis, and conversion between text to speech and vice versa. This chapter branches out towards a textto-speech system (TTS) which is used for conversion of natural language text into speech distinguishing itself from other systems that render symbolic linguistic representations like phonetic transcriptions into speech. This chapter mainly deals with an intelligible text-to-speech program that allows a visually impaired or a person with a reading disability to familiarize a language. Chapter 8 deliberated surveyed about breast cancer among Indian females. The survey revealed that only 66.1% of women were diagnosed with cancer and survived. To identify the tumor for breast cancer various machine learning algorithms were adopted in the literature. In this chapter, a comparative study of existing classifiers like support vector clustering (SVC), decision tree classification algorithm (DTC), K-nearest neighbors

Preface xv (KNN), random forest (RF), and multilayer perceptron (MLP) are demonstrated on Wisconsin-breast-cancer-dataset (WBCD) of UCI Machine learning repository. Chapter 9 focused on communication for hearing impaired people. Since most members of this community use sign language, it is extremely valuable to develop automatized traductors between this language and other spoken languages. This chapter reports the recognition of Mexican sign-language static-alphabet from 3D data acquired from leap motion and MS Kinect 1 sensors. The novelty of this research is the use of six 3D affine moments invariants for sign language recognition. Chapter 10 presented the solar cooker precise for scientific design. The human interference methods of traditional are exceeding trust for thermal applications and the environment cannot adapt to the variable source. In this chapter, the novel solar cooker has been discussed and based adaptive control through an online Sequential Extreme Learning Machine (OSELM). Chapter 11 discussed the uses and applications of X-ray images. In this chapter, a detailed study was conducted on radio-diagnosis, nuclear medicine, and radiotherapy remain strong pillars for inspection, diagnosis, and treatment delivery systems. Also, discussed recent advances in artificial intelligence using radiography such as computed tomography. Chapter 12 addressed the detection and analysis of breast illnesses in mammography images. This chapter presented the use of overlay convolutional neural networks that allow characteristic extraction from the mammography scans which is thereafter fed into a recurrent neural community. Also, this chapter would in actuality assist in tumor localization in case of breast cancers. Chapter 13 focused on compression of medical images like MRI, ultrasound, and medical-related scans. Generally, voluminous data is embedded in medically produced images from various procedures and it produces images that need more storage space, managing which is difficult. Therefore, this chapter discussed compression of medical images and also techniques to classify the compressed images which are useful in telemedicine. Chapter 14 presented a computer relays a special-purpose system designed specifically for sensing anomalies in the power system. Since all modern engineered systems, including modern computer relays, are constituted of increased proportions of software sophistication, software reliability assessment has become very important. This chapter discussed a

xvi Preface constrained multi-objective formulation of the optimal software reliability allocation problem and thereafter develops a customized Discrete Firefly algorithm (DFA) to solve the aforementioned problem, using computer relay software as a case study. Muthukumaran Malarvel Soumya Ranjan Nayak Prasant Kumar Pattnaik Surya Narayan Panda November 2020

1 Machine Learning-Based Virus Type Classification Using Transmission Electron Microscopy Virus Images Kalyan Kumar Jena1*, Sourav Kumar Bhoi1, Soumya Ranjan Nayak2 and Chittaranjan Mallick3 Department of Computer Science and Engineering, Parala Maharaja Engineering College, Berhampur, India 2 Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India 3 Department of Mathematics, Parala Maharaja Engineering College, Berhampur, India

1

Abstract

Viruses are the submicroscopic infectious agents having the capability of replication itself inside the living cells of human body. Different dangerous infectious viruses greatly affect the human society along with plants, animals and microorganisms. It is very difficult for the survival of human society due to these viruses. In this chapter, Machine Learning (ML)-based approach is used to analyze several transmission electron microscopy virus images (TEMVIs). In this work, several TEMVIs such as Ebola virus (EV), Entero virus (ENV), Lassa virus (LV), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Zika virus (ZV), etc. are analyzed. The ML-based approach mainly focuses on the classification techniques such as Logistic Regression (LR), Neural Network (NN), k-Nearest Neighbors (kNN) and Naive Bayes (NB) for the processing of TEMVIs. The performance of these techniques is analyzed using classification accuracy (CA) parameter. The simulation of this work is carried out using Orange3-3.24.1. Keywords: ML, TEMVIs, Classification Techniques, LR, NN, kNN, NB

*Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (1–22) © 2021 Scrivener Publishing LLC

1

2 Machine Vision Inspection Systems

1.1 Introduction ML [1–34] plays an important role in the today’s era for the researchers and scientists to carry out their research work. ML is considered as one of the most important application of artificial intelligence. Systems can be learned and improved from experience in automatic manner without any explicit programming by using ML mechanism. The main focus of ML is to develop computer programs that can access data as well as use it for learning purpose. ML techniques can be mainly classified as unsupervised learning techniques and supervised learning techniques. Unsupervised learning techniques focus on clustering techniques and supervised learning techniques focus on classification techniques. Hierarchical clustering, distance map, distance matrix, DBSCAN, manifold learning, k-means, Louvain clustering, etc. are some ML-based clustering techniques. ML [1–34] focuses on several classification techniques such as LR, NN, kNN, NB, decision tree, random forest, AdaBoost, etc. The similar objects can be grouped into a set which is known as cluster by using clustering techniques. Classification techniques are used to categorize a set of data into classes. In classification technique, the algorithm can learn from the data input provided to it and then use this learning mechanism to classify new observations. These techniques are mainly used to categorize the data into a desired and distinct number of classes where label can be assigned to each class. It is a very challenging task to categorize the set of data into classes accurately. Several ML-based classification techniques can be used for such classification. Viruses [57, 58] are the submicroscopic infectious agents and they are having the replication capability due to which they replicate itself inside the living cells of human body. Viruses can be classified as DNA and RNA viruses on the basis of nucleic acid, cubical, spiral, and radial symmetry, complex viruses on the basis of structure, bacteriophage, plant and animal, insect viruses on the basis of host range. Several viruses can be transmitted through respiratory route, feco-oral route, sexual contacts, blood transfusion, etc. Very dangerous viruses such as SARS-CoV-2, EV, ENV, LV, ZV, dengue virus, Hepatitis C virus have adverse effects which greatly affect the human society in the current scenario. In this work, several ML-based classification techniques such as LR, NN, kNN, NB are focused for the implementation of classification mechanism on several TEMVIs such as EV, ENV, LV, SARS-CoV-2 and ZV. The main contribution of this work is stated as follows. • ML-based approach is used for the processing of several TEMVIs such as EV, ENV, LV, SARS-CoV-2 and ZV.

ML-Based Virus Classification Using TEMVIs 3 • ML-based approach focuses on several classification techniques such as LR, NN, kNN and NB for such processing. • These techniques are compared using the performance metric such as CA. • This work is carried out using Orange3-3.24.1. The rest of the chapter is organized as follows. Section 1.2 describes related works, Section 1.3 describes methodology for the processing of TEMVIs, Section 1.4 describes results and discussion and Section 1.5 describes the conclusion.

1.2 Related Works Different works have introduced by several researchers and scientists for the processing of virus as well as other images for wide variety of applications in the real world scenario [1–34, 35–55]. Some of the works are described as follows. Singh et al. [2] focus on the review of several ML as well as image processing techniques for the detection and classification of paddy leaf diseases. Al-Kasassbeh et al. [5] focus on the feature selection mechanism by the help of ML-based approach for the classification of malware. Yang et al. [6] focus on a sequence embedding-based ML mechanism for the prediction of human-virus protein–protein interactions. Dey et al. [7] focus on ML-based techniques for sequence based prediction of viral host interactions between human proteins and SARS-CoV-2. Karanja et al. [9] focus on ML-based techniques as well as image texture features for the analysis of internet of things malware. Muda et al. [14] focus on the k-means clustering as well as NB classification mechanism for intrusion detection. Trishan et al. [17] focus on ML-based classification such as NB, k-nearest and random forest to detect Hepatitis A, B, C and E viruses. Kaur [19] focuses on the ML-based approaches such as kNN and NB for the detection of fraud associated with credit card. Goyal [20] focuses on a NB model that is based on enhanced kNN classification mechanism for the prediction of breast cancer. Wahid et al. [22] focus on the performance analysis of several ML-based techniques for the classification of microscopic bacteria images. Ito et al. [27] focus on convolutional NN mechanism for the detection of virus particle in transmission electron microscopy (TEM) images. Devan et al. [28] focus on transfer learning mechanism to detect herpesvirus capsids by considering several TEM images.

4 Machine Vision Inspection Systems

1.3 Methodology In this work, the ML-based classification techniques [10, 11, 14–16] such as LR, NN, kNN and NB are used to carry out classification mechanism on several TEMVIs such as EV, ENV, LV, SARS-CoV-2 and ZV. LR technique is used for the prediction of probability of a target variable or dependent variable. Generally, this target variable has a dichotomous nature. It deals with the data coded as 1 for yes or success and 0 for no or failure. A LR model can be used to predict a dependent data variable by considering the relationship between one or more existing independent variable. NN technique deals with a network of functions in order to understand as well as translate a data input of one form into another form as required output. It deals with different neurons layers where each layer can receive inputs from previous layers and can pass outputs to further layers. This technique can process complex data inputs into a space that the computers can be able to understand. kNN technique uses all the available data and classifies new data points on the basis of similarity measures. This technique takes k closest training examples in the feature space as input and generates a class membership as output. NB technique uses the Bayes theorem and this technique assumes that the presence of a particular feature in a class is not related to any other features. So, every features pair is independent of each other. This technique can predict the membership probabilities for each class and the class having the highest probability can be considered as the most likely class. In this work, at first the TEMVIs are given as input to the Orange 3-3.24.1 [56]. Afterwards, image embedding mechanism is carried out by taking input TEMVIs as inputs to generate embeddings or skipped TEMVIs as outputs. Several embedders such as Inception v3, SqueezeNet (local), VGG-16, VGG-19, Painters, DeepLoc, Openface can be used for image embedding purpose. SqueezeNet (local) is taken as embedder for image embedding purpose. Then, test and score calculation will be carried out by considering image embedding mechanism and by applying LR, NN, kNN and NB techniques separately to compute CA values. For LR, the regularization type, strength are considered as Ridge (L2) and C = 1 respectively. For NN, the neurons in hidden layers, activation function, solver method, regularization and maximal number of iterations are considered as 100, ReLu, Adam, a = 0.0001 and 100 respectively along with replicable training mechanism. For kNN, the number of neighbors, metric and weight are considered as 5, Euclidean and uniform respectively. For test and score calculation, inputs can be considered as data, test data, learner,

ML-Based Virus Classification Using TEMVIs 5 preprocessor and outputs can be generated as evaluation results as well as predictions. Afterwards, confusion matrix can be generated to represent classification results of each technique. For confusion matrix, the inputs can be considered as evaluation results from test and score and it generates data or selected data as outputs. Figure 1.1 describes the methodology. The steps involved in this work are mentioned as follows.

Steps for TEMVIs Classification Step 1: Input several categories of TEMVIs such as EV, ENV, LV, SARS-CoV-2 and ZV. Step 2: Perform image embedding mechanism by considering input TEMVIs. Step 3: Test and score calculation by considering image embedding data and by applying LR, NN, kNN and NB techniques separately to compute CA values. Step 4: Create confusion matrix to represent the classification results each technique.

Input TE MVIs

LR

NN

Image Embedding

Test and Score

Confusion Matrix

Figure 1.1 Methodology.

kNN

NB

6 Machine Vision Inspection Systems

1.4 Results and Discussion This work uses Orange 3-3.24.1 [56] for the simulation purpose. Several TEMVIs with different sizes are taken from the source [59–88]. In this work, 30 TEMVIs with 6 images of each category such as EV, ENV, LV, SARS-CoV-2 and ZV are taken for testing purpose which is mentioned in Figures 1.2–1.6. The TEMVIs are processed using ML-based classification

Ebola Virus Image 1

Ebola Virus Image 2

Ebola Virus Image 3

Ebola Virus Image 4

Ebola Virus Image 5

Ebola Virus Image 6

Figure 1.2 Ebola virus images (1–6) with sizes 331 × 152, 254 × 198, 203 × 248, 189 × 267, 266 × 190, and 259 × 194 respectively.

Entero Virus Image 1

Entero Virus Image 2

Entero Virus Image 3

Entero Virus Image 4

Entero Virus Image 5

Entero Virus Image 6

Figure 1.3 Entero virus images (1–6) with sizes 225 × 225, 250 × 201, 225 × 225, 214 × 235, 191 × 264, 209 × 190 respectively.

ML-Based Virus Classification Using TEMVIs 7

Lassa Virus Image 1

Lassa Virus Image 2

Lassa Virus Image 3

Lassa Virus Image 4

Lassa Virus Image 5

Lassa Virus Image 6

Figure 1.4 Lassa virus images (1–6) with sizes 251 × 201, 180 × 180, 259 × 194, 241 × 209, 262 × 192, 299 × 168 respectively.

techniques such as LR, NN, kNN and NB. The classification results of these techniques are mentioned in Figures 1.7–1.10, 1.11–1.14, 1.15–1.18, 1.19– 1.22, 1.23–1.26 by considering number of folds (NoF) as 2, 3, 5, 10 and 20 respectively. In this work, five different cases such as cases-I–V are considered by taking five different NoF such as 2, 3, 5, 10 and 20 respectively.

SARS-CoV-2 Virus Image 1

SARS-CoV-2 Virus Image 2

SARS-CoV-2 Virus Image 3

SARS-CoV-2 Virus Image 4

SARS-CoV-2 Virus Image 5

SARS-CoV-2 Virus Image 6

Figure 1.5 SARS-CoV-2 virus images (1–6) with sizes 225 × 225, 256 × 197, 254 × 198, 243 × 207, 249 × 203, 300 × 168 respectively.

8 Machine Vision Inspection Systems The test and score calculation is carried out by using cross validation sampling mechanism with different NoF. The classification results generated by using confusion matrix for each of these classification techniques represent actual and predicted values by showing the number of instances. The correct classification results are represented by the help of light blue color and the misclassification results are represented by the help of light red color. Table 1.1 represents the CA values by applying the LR, NN, kNN and NB classification techniques.

Zika Virus Image 1

Zika Virus Image 2

Zika Virus Image 3

Zika Virus Image 4

Zika Virus Image 5

Zika Virus Image 6

Figure 1.6 Zika virus images (1–6) with sizes 225 × 225, 202 × 250, 225 × 225, 211 × 239, 244 × 207, 236 × 213 respectively.

Case-I (NoF = 2) Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

3

2

0

1

6

Lassa

0

2

1

2

1

6

SARS-CoV-2

0

0

1

3

2

6

Zika

0

0

1

1

4

6

Σ

6

5

5

6

8

30

Figure 1.7 Classification result by applying LR technique.

ML-Based Virus Classification Using TEMVIs 9 Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

1

3

2

0

0

6

Lassa

0

2

2

1

1

6

SARS-CoV-2

0

1

2

2

1

6

Zika

1

0

0

1

4

6

Σ

8

6

6

4

6

30

Figure 1.8 Classification result by applying NN technique.

Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

5

1

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

0

3

1

1

1

6

SARS-CoV-2

0

1

3

2

0

6

Zika

0

2

3

1

0

6

Σ

5

13

7

4

1

30

Figure 1.9 Classification result by applying kNN technique.

Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

5

1

0

0

6

Lassa

0

2

2

1

1

6

SARS-CoV-2

0

1

2

3

0

6

Zika

0

1

0

1

4

6

Σ

6

9

5

5

5

30

Figure 1.10 Classification result by applying NB technique.

10 Machine Vision Inspection Systems Case-II (NoF = 3) Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

3

2

0

1

6

Lassa

0

1

1

2

2

6

SARS-CoV-2

0

0

2

3

1

6

Zika

0

1

1

1

3

6

Σ

6

5

6

6

7

30

Figure 1.11 Classification result by applying LR technique. Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

5

1

0

0

6

Lassa

0

2

1

2

1

6

SARS-CoV-2

0

0

4

2

0

6

Zika

0

1

1

1

3

6

Σ

6

8

7

5

4

30

Figure 1.12 Classification result by applying NN technique. Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

0

3

1

2

0

6

SARS-CoV-2

0

1

0

5

0

6

Zika

0

2

1

2

1

6

Σ

6

12

2

9

4

30

Figure 1.13 Classification result by applying kNN technique.

ML-Based Virus Classification Using TEMVIs 11 Predicted

Actual

Ebola

Entero

Lassa

SARS-CoV-2

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

0

2

1

2

1

6

SARS-CoV-2

0

1

0

4

1

6

Zika

0

1

1

2

2

6

Σ

6

10

2

8

4

30

Figure 1.14 Classification result by applying NB technique.

Case-III (NoF = 5) Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

6

1

0

0

6

Lassa

0

1

1

2

2

6

SARS-CoV-2

0

0

1

4

1

6

Zika

0

1

2

0

3

6

Σ

6

7

5

6

6

30

Figure 1.15 Classification result by applying LR technique.

Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

0

1

2

2

1

6

SARS-CoV-2

0

0

2

3

1

6

Zika

0

1

2

0

3

6

Σ

6

8

6

5

5

30

Figure 1.16 Classification result by applying NN technique.

12 Machine Vision Inspection Systems Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

0

3

1

2

0

6

SARS-CoV-2

0

1

0

4

1

6

Zika

0

1

2

2

1

6

Σ

6

11

3

8

2

30

Figure 1.17 Classification result by applying kNN technique.

Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

5

1

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

0

1

2

2

1

6

SARS-CoV-2

0

0

1

5

0

6

Zika

0

1

0

3

2

6

Σ

5

9

3

10

3

30

Figure 1.18 Classification result by applying NB technique.

Case-IV (NoF = 10) Predicted

Actual

Ebola

Entero

Lassa

SARS-CoV-2

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

3

2

1

0

6

Lassa

0

1

1

2

2

6

SARS-CoV-2

0

0

1

3

2

6

Zika

0

1

0

1

4

6

Σ

6

5

4

7

8

30

Figure 1.19 Classification result by applying LR technique.

ML-Based Virus Classification Using TEMVIs 13 Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

5

1

0

0

6

Lassa

0

1

2

2

1

6

SARS-CoV-2

0

1

2

3

0

6

Zika

0

1

0

1

4

6

Σ

6

8

5

6

5

30

Figure 1.20 Classification result by applying NN technique.

Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

1

3

1

1

0

6

SARS-CoV-2

0

1

0

5

0

6

Zika

0

1

2

2

1

6

Σ

7

11

3

8

1

30

Lassa SARS-CoV-2

Zika

Σ

Figure 1.21 Classification result by applying kNN technique.

Actual

Predicted Ebola

Entero

Ebola

6

0

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

0

2

2

1

1

6

SARS-CoV-2

0

1

0

5

0

6

Zika

0

1

0

2

3

6

Σ

6

10

2

8

4

30

Figure 1.22 Classification result by applying NB technique.

14 Machine Vision Inspection Systems Case-V (NoF = 20) Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

4

2

0

0

6

Lassa

0

1

1

2

2

6

SARS-CoV-2

0

0

1

3

2

6

Zika

0

1

0

1

4

6

Σ

6

6

4

6

8

30

Figure 1.23 Classification result by applying LR technique.

Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

5

1

0

0

6

Lassa

0

1

2

2

1

6

SARS-CoV-2

0

0

2

3

1

6

Zika

0

1

0

1

4

6

Σ

6

7

5

6

6

30

Figure 1.24 Classification result by applying NN technique.

Predicted

Actual

Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

Ebola

6

0

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

1

3

1

1

0

6

SARS-CoV-2

0

1

0

5

0

6

Zika

0

1

2

2

1

6

Σ

7

11

3

8

1

30

Figure 1.25 Classification result by applying kNN technique.

ML-Based Virus Classification Using TEMVIs 15 Predicted Ebola

Entero

SARS-CoV-2

Lassa

Σ

Zika

6

0

0

0

0

6

Entero

0

6

0

0

0

6

Lassa

0

1

3

1

1

6

SARS-CoV-2

0

1

0

5

0

6

Zika

0

1

0

2

3

6

Σ

6

9

3

8

4

30

Actual

Ebola

Figure 1.26 Classification result by applying NB technique.

From the analysis of Figures 1.7–1.26 and Table 1.1, it is observed that NB technique provides better classification results with CA values 0.667, 0.733 and 0.767 as compared to LR, NN and kNN when the NoF is 2, 10 and 20 respectively. Again, NB and kNN techniques provide better classification results with CA value 0.633 as compared to LR and NN technique when the NoF is 3. Similarly, NB and NN techniques provide better classification results with CA value 0.667 as compared to other techniques when the NoF is 5. Here, the maximum CA value is 0.767 which is provided by NB technique when the NoF is 20. So, CA value varies for each technique when different NoF are considered. However, NB technique attempts to provide better classification results in three different cases such as cases-I, -IV, and -V. In case-II, NB and kNN provide same CA values as 0.633 and in case-III, NB and NN provide same CA values as 0.667. However, the number of instances present in different cells of confusion matrix varies for the classification results of NB and kNN in case-II as well as NB and NN in case-III. So, NB technique is able to provide better classification results overall as compared to other technique. Table 1.1 CA (in unit) of different classification techniques. Method

NoF = 2

NoF = 3

NoF = 5

NoF = 10

NoF = 20

LR

0.567

0.533

0.633

0.567

0.600

NN

0.567

0.567

0.667

0.667

0.667

kNN

0.467

0.633

0.600

0.633

0.633

NB

0.667

0.633

0.667

0.733

0.767

16 Machine Vision Inspection Systems

1.5 Conclusion This chapter focuses on the processing of several TEMVIs such as EV, ENV, LV, SARS-CoV-2 and ZV using ML-based approach. The TEMVIs are analyzed by applying ML-based classification techniques such as LR, NN, kNN and NB. Each technique carries out the classification mechanism on several TEMVIs. From the analysis of results, it is concluded that the CA values changes for each classification technique when the NoF changes. The maximum CA value is provided by NB technique when the NoF is considered as 20. The NB technique provides overall better classification results as compared to LR, NN and kNN techniques by considering different NoF. This work will be extended to analyze the performance of these ML-based classification techniques along with other classification techniques by focusing on other types of TEMVIs as well as coronavirus disease-19 (COVID-19) images in future.

References 1. Ray, U., Chouhan, U., Verma, N., Comparative study of machine learning approaches for classification and prediction of selective caspase-3 antagonist for Zika virus drugs. Neural Comput. Appl., 32, 11311–11328, 2020. 2. Singh, J.P., Pradhan, C., Das, S.C., Image Processing and Machine Learning Techniques to Detect and Classify Paddy Leaf Diseases: A Review, in: Machine Learning and Information Processing, pp. 161–172, Springer, Singapore, 2020. 3. Cao, Z., Identification of the Association between Hepatitis B Virus and Liver Cancer using Machine Learning Approaches based on Amino Acid, in: Proceedings of the 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics, 2020, January, pp. 56–63. 4. Sambasivam, G. and Opiyo, G.D., A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egypt. Inform. J., 2020. 5. Al-Kasassbeh, M., Mohammed, S., Alauthman, M., Almomani, A., Feature Selection Using a Machine Learning to Classify a Malware, in: Handbook of Computer Networks and Cyber Security, pp. 889–904, Springer, Cham, 2020. 6. Yang, X., Yang, S., Li, Q., Wuchty, S., Zhang, Z., Prediction of human-virus protein–protein interactions through a sequence embedding-based machine learning method. Comput. Struct. Biotechnol. J., 18, 153–161, 2020. 7. Dey, L., Chakraborty, S., Mukhopadhyay, A., Machine Learning Techniques for Sequence-based Prediction of Viral–Host Interactions between SARSCoV-2 and Human Proteins, Biomedical J., 2020.

ML-Based Virus Classification Using TEMVIs 17 8. Gibert, D., Mateu, C., Planes, J., The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. J. Netw. Comput. Appl., 153, 1–22, 2020. 9. Karanja, E.M., Masupe, S., Jeffrey, M.G., Analysis of internet of things malware using image texture features and machine learning techniques. Internet Things, 9, 100153, 2020. 10. Sen, P.C., Hajra, M., Ghosh, M., Supervised Classification Algorithms in Machine Learning: A Survey and Review, in: Emerging Technology in Modelling and Graphics, pp. 99–111, Springer, Singapore, 2020. 11. Ahuja, R., Chug, A., Gupta, S., Ahuja, P., Kohli, S., Classification and Clustering Algorithms of Machine Learning with their Applications, in: Nature-Inspired Computation in Data Mining and Machine Learning, pp. 225–248, Springer, Cham, 2020. 12. Di Noia, A., Martino, A., Montanari, P., Rizzi, A., Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction. Soft Comput., 24, 6, 4393–4406, 2020. 13. Firdausi, I., Erwin, A., Nugroho, A.S., Analysis of machine learning techniques used in behavior-based malware detection, in: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, 2010, December, IEEE, pp. 201–203. 14. Muda, Z., Yassin, W., Sulaiman, M.N., Udzir, N.I., Intrusion detection based on K-Means clustering and Naïve Bayes classification, in: 2011 7th International Conference on Information Technology in Asia, 2011, July, IEEE, pp. 1–6. 15. Chen, Y., Luo, Y., Huang, W., Hu, D., Zheng, R.Q., Cong, S.Z., Wang, X.Y., Machine-learning-based classification of real-time tissue elastography for hepatic fibrosis in patients with chronic hepatitis B. Comput. Biol. Med., 89, 18–23, 2017. 16. Shruthi, U., Nagaveni, V., Raghavendra, B.K., A review on machine learning classification techniques for plant disease detection, in: 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), 2019, March, IEEE, pp. 281–284. 17. Trishna, T.I., Emon, S.U., Ema, R.R., Sajal, G.I.H., Kundu, S., Islam, T., Detection of Hepatitis (A, B, C and E) Viruses Based on Random Forest, K-nearest and Naïve Bayes Classifier, in: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2019, July, IEEE, pp. 1–7. 18. Mahajan, G., Saini, B., Anand, S., Malware Classification Using Machine Learning Algorithms and Tools, in: 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), 2019, February, IEEE, pp. 1–8. 19. Kaur, D., Machine Learning Approach for Credit Card Fraud Detection (KNN & Naïve Bayes), Machine Learning Approach for Credit Card Fraud

18 Machine Vision Inspection Systems Detection (KNN & Naïve Bayes) in International Conference on Innovative Computing & Communications (ICICC), 2020. 20. Goyal, S., Naïve Bayes Model Based Improved K-Nearest Neighbor Classifier for Breast Cancer Prediction, in: International Conference on Advanced Informatics for Computing Research, 2019, June, Springer, Singapore, pp. 3–11. 21. Devika, R., Avilala, S.V., Subramaniyaswamy, V., Comparative Study of Classifier for Chronic Kidney Disease prediction using Naive Bayes, KNN and Random Forest, in: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), 2019, March, IEEE, pp. 679–684. 22. Wahid, M.F., Hasan, M.J., Alom, M.S., Mahbub, S., Performance Analysis of Machine Learning Techniques for Microscopic Bacteria Image Classification, in: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2019, July, IEEE, pp. 1–4. 23. Matuszewski, D.J. and Sintorn, I.M., Reducing the U-Net size for practical scenarios: Virus recognition in electron microscopy images. Comput. Methods Programs Biomed., 178, 31–39, 2019. 24. Kumar, D. and Maji, P., An Efficient Method for Automatic Recognition of Virus Particles in TEM Images, in: International Conference on Pattern Recognition and Machine Intelligence, 2019, December, Springer, Cham, pp. 21–31. 25. Steur, N.A. and Mueller, C., Classification of Viral Hemorrhagic Fever Focusing Ebola and Lassa Fever Using Neural Networks. Int. J. Mach. Learn. Comput., 9, 3, 334–343, 2019. 26. Dreiseitl, S. and Ohno-Machado, L., Logistic regression and artificial neural network classification models: A methodology review. J. Biomed. Inf., 35, 5–6, 352–359, 2002. 27. Ito, E., Sato, T., Sano, D., Utagawa, E., Kato, T., Virus particle detection by convolutional neural network in transmission electron microscopy images. Food Environ. Virol., 10, 2, 201–208, 2018. 28. Devan, K.S., Walther, P., von Einem, J., Ropinski, T., Kestler, H.A., Read, C., Detection of herpesvirus capsids in transmission electron microscopy images using transfer learning. Histochem. Cell Biol., 151, 2, 101–114, 2019. 29. Miranda-Saksena, M., Boadle, R.A., Cunningham, A.L., Preparation of Herpes Simplex Virus-Infected Primary Neurons for Transmission Electron Microscopy, in: Herpes Simplex Virus, pp. 343–354, Humana, New York, NY, 2020. 30. Prasad, S., Potdar, V., Cherian, S., Abraham, P., Basu, A., Team, I.N.N., Transmission electron microscopy imaging of SARS-CoV-2. Indian J. Med. Res., 151, 2–3, 241, 2020. 31. Roingeard, P., Raynal, P.I., Eymieux, S., Blanchard, E., Virus detection by transmission electron microscopy: Still useful for diagnosis and a plus for biosafety. Rev. Med. Virol., 29, 1, e2019, 2019.

ML-Based Virus Classification Using TEMVIs 19 32. Xie, L., Song, X.J., Liao, Z.F., Wu, B., Yang, J., Zhang, H., Hong, J., Endoplasmic reticulum remodeling induced by Wheat yellow mosaic virus infection studied by transmission electron microscopy. Micron, 120, 80–90, 2019. 33. Thomas, T., Vijayaraghavan, A.P., Emmanuel, S., Machine Learning and Cybersecurity, in: Machine Learning Approaches in Cyber Security Analytics, pp. 37–47, Springer, Singapore, 2020. 34. Mirjalili, S., Faris, H., Aljarah, I., Introduction to Evolutionary Machine Learning Techniques, in: Evolutionary Machine Learning Techniques, pp. 1–7, Springer, Singapore, 2020. 35. Jena, K.K., Mishra, S., Mishra, S., Bhoi, S.K., Stored Grain Pest Identification Using an Unmanned Aerial Vehicle (UAV)-Assisted Pest Detection Model, in: Machine Vision Inspection Systems: Image Processing, Concepts, Methodologies and Applications, vol. 1, pp. 67–83, 2020. 36. Nayak, S.R., Mishra, J., Khandual, A., Palai, G., Fractal dimension of RGB color images. Optik, 162, 196–205, 2018. 37. Nayak, S.R. and Mishra, J., Analysis of Medical Images Using Fractal Geometry, in: Histopathological Image Analysis in Medical Decision Making, pp. 181–201, IGI Global, Hershey, Pennsylvania, 2019. 38. Nayak, S.R., Mishra, J., Palai, G., Analysing Roughness of Surface through Fractal Dimension: A Review. Image Vision Comput., 89, 21–34, 2019. 39. Jena, K.K., Mishra, S., Mishra, S.N., Bhoi, S.K., Nayak, S.R., MRI Brain Tumor Image Analysis Using Fuzzy Rule Based Approach. J. Res. Lepid., 50, 98–112, 2019. 40. Nayak, S.R. and Mishra, J., A modified triangle box-counting with precision in error fit. J. Inf. Optim. Sci., 39, 113–128, 2018. 41. Jena, K.K., Mishra, S., Mishra, S.N., Bhoi, S.K., 2L-ESB: A Two Level Security Scheme for Edge Based Image Steganography. Int. J. Emerg. Technol., 10, 29–38, 2019. 42. Nayak, S.R., Mishra, J., Palai, G., An extended DBC approach by using maximum Euclidian distance for fractal dimension of color images. Optik, 166, 110–115, 2018. 43. Jena, K.K., Mishra, S., Mishra, S., Bhoi, S.K., Unmanned Aerial Vehicle Assisted Bridge Crack Severity Inspection Using Edge Detection Methods, in: 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), IEEE, pp. 284–289, 2019. 44. Nayak, S.R., Ranganath, A., Mishra, J., Analysing Fractal Dimension of Color Images, in: IEEE International Conference on Computational Intelligence and Networks 2015, pp. 156–159, 2019. 45. Jena, K.K., Mishra, S., Mishra, S.N., An Edge Detection Approach for Fractal Image Processing, in: Examining Fractal Image Processing and Analysis, pp. 1–22, IGI Global, Hershey, Pennsylvania, 2020. 46. Nayak, S.R., Mishra, J., Padhy, R., A New Extended Differential Box-Counting Method by Adopting Unequal Partitioning of Grid for Estimation of Fractal

20 Machine Vision Inspection Systems Dimension of Grayscale Images, in: Computational Signal Processing and Analysis, pp. 45–57, Springer, Singapore, 2018. 47. Bhoi, S.K., Panda, S.K., Jena, K.K., Mallick, C., Khan, A., A fuzzy approach to identify fish red spot disease, in: Grey Systems: Theory and Application, 2020. 48. Das, S.K., Nayak, S.R., Mishra, J., Fractal Geometry: The Beauty of Computer Graphics. J. Adv. Res. Dyn. Control Syst., 9, 10, 76–82, 2017. 49. Jena, K.K., Mishra, S., Mishra, S.N., An Algorithmic Approach Based on CMS Edge Detection Technique for the Processing of Digital Images, in: Examining Fractal Image Processing and Analysis, pp. 252–272, IGI Global, Hershey, Pennsylvania, 2020. 50. Nayak, S.R., Khandual, A., Mishra, J., Ground truth study on fractal dimension of color images of similar texture. J. Text. Inst., 109, 2018, 1159–1167, 2020. 51. Nayak, S.R., Mishra, J., Jena, P.M., Fractal analysis of image sets using differential box counting techniques. Int. J. Inf. Technol., 10, 39–47, 2018. 52. Nayak, S.R., Mishra, J., Palai, G., A modified approach to estimate fractal dimension of gray scale images. Optik, 161, 136–145, 2018. 53. Jena, K.K., Nayak, S.R., Mishra, S., Mishra, S.N., Vehicle Number Plate Detection: An Edge Image Based Approach, in: 4th Springer International Conference on Advanced Computing and Intelligent Engineering, Advances in Intelligent Systems and Computing, 2019. 54. Nayak, S.R., Mishra, J., Padhy, R., An improved algorithm to estimate the fractal dimension of gray scale images, in: International Conference on Signal Processing, Communication, Power and Embedded System, IEEE, pp. 1109– 1114, 2016. 55. Nayak, S.R., Mishra, J., Jena, P.M., Fractal dimension of grayscale images, in: Progress in Computing, Analytics and Networking, pp. 225–234, Springer, Singapore, 2018. 56. https://orange.biolab.si/download/#windows [Accessed on April 11, 2020]. 57. https://www.onlinebiologynotes.com/classification-of-virus/ [Accessed on May 14, 2020]. 58. https://www.viprbrc.org/brc/home.spg?decorator=vipr [Accessed on May 14, 2020]. 59. https://www.researchgate.net/figure/Transmission-electron-microscopeview-of-an-Ebolavirus-virion-The-bar-shows-an_fig1_269095800 [Accessed on May 29, 2020]. 60. https://commons.wikimedia.org/wiki/File : Ebola_Virus_TEM_PHIL_1832_ lores.jpg [Accessed on May 29, 2020]. 61. https://time.com/3502740/ebola-virus-1976/ [Accessed on May 29, 2020]. 62. https://en.wikipedia.org/wiki/Ebolavirus [Accessed on May 29, 2020]. 63. https://www.wvik.org/post/why-wont-fear-airborne-ebola-go-away0#stream/0 [Accessed on May 29, 2020]. 64. https://www.defense.gov/observe/photo-gallery/igphoto/2001104229/ [Accessed on May 29, 2020].

ML-Based Virus Classification Using TEMVIs 21 65. https://www.flickr.com/photos/nihgov/27385281096 [Accessed on May 29, 2020]. 66. https://www.britannica.com/science/Zika-virus [Accessed on May 29, 2020]. 67. https://en.wikipedia.org/wiki/Zika_virus [Accessed on May 29, 2020]. 68. https://www.northcountrypublicradio.org/news/npr/495935879/zika-mysteryhow-did-a-73-year-old-man-infect-his-son [Accessed on May 29, 2020]. 69. https://www.mtu.edu/unscripted/stories/2018/november/be-brief-enveloped.html [Accessed on May 29, 2020]. 70. https://www.mpi-magdeburg.mpg.de/3254770/2017-05-15-pm-zika-viruspropagation [Accessed on May 29, 2020]. 71. https://www.nih.gov/news-events/nih-research-matters/novel-coronavirus-structure-reveals-targets-vaccines-treatments [Accessed on May 29, 2020]. 72. http://www.sci-news.com/medicine/sars-cov-2-natural-origin-08242.html [Accessed on May 29, 2020]. 73. https://www.sciencemag.org/news/2020/03/who-launches-global-megatrial-four-most-promising-coronavirus-treatments [Accessed on May 29, 2020]. 74. https://www.genengnews.com/news/sars-cov-2-insists-on-making-a-namefor-itself/[Accessed on May 29, 2020]. 75. https://www.niaid.nih.gov/news-events/novel-coronavirus-sarscov2-images [Accessed on May 29, 2020]. 76. https://www.soundhealthandlastingwealth.com/health-news/new-insightsinto-sars-cov-2-viral-diversity/?utm_source=rss&utm_medium=rss&utm_ campaign=new-insights-into-sars-cov-2-viral-diversity [Accessed on May 29, 2020]. 77. https://www.flickr.com/photos/nihgov/43683984840 [Accessed on May 29, 2020]. 78. https://www.nih.gov/news-events/news-releases/scientists-develop-novelvaccine-lassa-fever-rabies [Accessed on May 29, 2020]. 79. https://www.nytimes.com/2015/05/27/science/lassa-virus-carries-little-riskto-public-experts-say.html [Accessed on May 29, 2020]. 80. http://www.mrcindia.org/journal/issues/441001.pdf [Accessed on May 29, 2020]. 81. https://www.dw.com/en/man-severely-ill-with-lassa-fever-being-treated-at-university-hospital-frankfurt/a-19122900 [Accessed on May 29, 2020]. 82. https://fineartamerica.com/featured/1-lassa-virus-tem-science-source.html [Accessed on May 29, 2020]. 83. https://www.cdc.gov/non-polio-enterovirus/resources-ev68-photos.html [Accessed on May 29, 2020]. 84. https://www.researchgate.net/figure/TEM-image-of-Enterovirus-71-EV71virus-like-particles-The-morphology-of-purified-VLPs_fig1_277783163 [Accessed on May 29, 2020].

22 Machine Vision Inspection Systems 85. https://www.nih.gov/news-events/nih-research-matters/enterovirus-infectionlinked-acute-flaccid-myelitis [Accessed on May 29, 2020]. 86. https://en.wikipedia.org/wiki/Enterovirus_C [Accessed on May 29, 2020]. 87. https://www.emptywheel.net/tag/enterovirus-d68/?print=print [Accessed on May 29, 2020]. 88. https://simple.wikipedia.org/wiki/Enterovirus [Accessed on May 29, 2020].

2 Capsule Networks for Character Recognition in Low Resource Languages C. Abeysinghe, I. Perera and D.A. Meedeniya* Department of Computer Science and Engineering, University of Moratuwa, Moratuwa, Sri Lanka

Abstract

Most of the existing techniques in handwritten character recognition are not well-utilized for low resource languages, due to the lack of labelled data and the need for large datasets for image classification using deep neural networks. In contrast to recent advancement in deep learning-based image classification, human cognition could quickly identify and differentiate characters without much training. As a solution to character recognition problem in low resource languages, this chapter proposes a model that replicates the human cognition ability to learn with small datasets. The proposed solution is a Siamese neural network which bestows capsules and convolutional units to get a thorough understanding of the image. The presented model takes two images as inputs, process, and extract features through the capsule network and outputs the probability of being similar. This study attests that the capsule-based Siamese network could learn abstract knowledge about different characters which could be extended to unforeseen characters. The proposed model is trained on Omniglot dataset and achieved up to 94% accuracy for previously unseen alphabets. Further, the module is tested on Sinhala language alphabet and MNIST dataset that stands for Modified National Institute of Standards and Technology database, which are new to the trained model. Keywords: Character recognition, capsule networks, deep learning, one-shot, learning, sinhala dataset

*Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (23–46) © 2021 Scrivener Publishing LLC

23

24 Machine Vision Inspection Systems

2.1 Introduction Ability to learn visual concepts using a small number of examples is a distinctive ability of human cognition. For instance, even a child can correctly distinguish between a bicycle and car, after showing them one example. Taking this one step further, if we show them a plane and ship, which they have never seen before, they could correctly understand that they are two different vehicle types. One could argue that this ability is an application of previous experience and domain knowledge to new situations. How could we reproduce this same ability in machines? In this chapter, we propose a method to transfer previously learned knowledge about characters to differentiate between new character images. There are versatile applications in image classification using few training samples [1–3]. Being able to classify images without any previous training possess greater importance in situations like character recognition, signature verification, and robot vision. This paradigm, where only one sample is used to learn and make predictions, is known as one-shot learning [4]. Especially when it comes to low resource languages, currently available deep learning techniques fail due to lack of large labeled datasets. If a model could do one-shot learning for an alphabet using a single image as a training sample for classification, that model could make a massive impact for optical character recognition [5]. This chapter uses Omniglot dataset [6] to train such one-shot learning model. Omniglot stands for the online encyclopedia of writing systems and languages, which is a dataset of handwritten characters and widely used in similar tasks that need a small number of data samples belonging to many classes. In the research, we extend this dataset by introducing a set of characters from Sinhala language, which has around 17 million native speakers and mainly used only in Sri Lanka. Due to lack of available resources for the language, using novel deep learning-based Optical Character Recognition (OCR) methods are challenging. With the trained model introduced in this chapter, significant character recognition accuracy was achieved for Sinhala language using a small dataset. Character detection using one-shot learning has been addressed previously by researchers such as Lake et al. [6] using generative character model, Koch et al. [7] using Convolutional Neural Networks (CNN). In this proposed study, we focus on using capsule networks integrated into a Siamese network [8] to learn a generalized abstract function which outputs the similarity of two images. Capsule networks are the latest advancement in the computer vision domain, and they possess several advantages over traditional convolutional layers [9].

Capsule Networks for Character Recognition 25 Translation invariance or disability to identify the position of an object relative to another is one main shortcoming of convolutional layers compared to capsules [10]. Further use of global pooling in CNN causes loss of valuable information. Hinton et al. [11] have proposed capsule networks as a solution to these problems. In this study, by using a capsule-based network architecture, we achieve equal level performance as deep convolutional Siamese networks, which proposed in previous literature but using a smaller number of parameters. The main contributions of the study: • Propose a novel capsule-based Siamese network architecture to perform one-shot learning, • Improve energy function of Siamese network to grab complex information output by Capsules, • Evaluate and analyse the performance of the model to identify characters which are previously not seen, • Extend Omniglot dataset by adding new characters from Sinhala language. The chapter is structured as follows. Section 2.2 explores related learning techniques. Section 2.3 describes the design and implementation aspects of the proposed solution for the capsule layers-based Siamese network. Section 2.4 evaluates the methodology using several experiments and analyzing the results. Section 2.5 discusses the contribution of the proposed solution with the existing studies and concludes the chapter.

2.2 Background Study 2.2.1 Convolutional Neural Networks Convolutional neural networks have been commonly used in computer vision research and applications [12] due to their ability to process a large amount of data and extract meaningful and powerful representations from it [13–15]. Before the era of CNNs, computer vision tasks largely relied on handcrafted features and mathematical modeling. There a large number of applications that relies on features Gabor wavelets [16–18], fractal dimensions [19–21], symmetric axis chords [22]. However, when it comes to handwritten character classification for low resource languages, the deep neural network’s this ability becomes more of a limitation, as not much of labeled training data available.

26 Machine Vision Inspection Systems An ideal solution for handwritten character recognition should be based on zero-shot learning, where no previous sample used to classify or oneshot learning, where only one or few samples are used for training [23]. Several attempts have been made to modify different deep neural networks to match requirements of one-shot learning [24–26].

2.2.2 Related Studies on One-Shot Learning Initial attempts on one-shot learning in computer vision domain are based on probabilistic approaches. Fei-Fei et al. [4] in 2003, have introduced a model to learn visual concepts and then use that knowledge to learn new categories. They have used a variational Bayesian framework. Here, the probabilistic models have used to represent the object groups and a probability density function has used to denote the prior knowledge. Their model has supported to learn four visual concepts, human faces, aeroplanes, motorcycles, and spotted cats. Initially, abstract knowledge is learned by training on many samples belong to three categories. Then this knowledge is used to understand the remaining category with the help of a small number of examples (1 to 5 training examples). Lately, neural networks came in as a solution to the one-shot learning problem. The two main types of networks used in the one-shot learning tasks are memory augmented neural networks [26, 27] and Siamese neural networks [7, 24, 28]. Memory augmented neural networks are similar to Recurrent neural networks (RNN), but they have an external memory and try to separate the computation from memory [29]. Siamese networks have two similar branches of networks, and the output of those compared to get a decision on one-shot task. Most of the time, Siamese network branches are built on convolutional layers or fully connected layers.

2.2.3 Character Recognition as a One-Shot Task Lake et al. [6] in 2013, has introduced Omniglot dataset and defined a oneshot learning problem there as a handwritten character recognition task. Omniglot is a handwritten character dataset similar to digit dataset named MNIST, which stands for Modified National Institute of Standards and Technology database [30]. However, in contrast to MNIST, Omniglot has 1,600 characters belonging to 50 alphabets. Each character has only 20 samples where MNIST has only ten classes and thousands of samples for each class. In order to accurately categorize characters in Omniglot, Lake et al. have proposed a one-shot learning approach named, Hierarchical Bayesian Program Learning (HBPL) [6]. Their approach is based on decomposing

Capsule Networks for Character Recognition 27 characters into strokes and determining a structural description for the detected pixels. Here, the strokes in different characters have identified using the knowledge gained from the previous characters. However, this method cannot be applied to complex images, since it uses stroke data to determine class. Further, inference under HBPL is difficult because it has a vast parameter space [7]. In the proposed solution with the capsule layers-based Siamese network, we borrow the problem defined by Lake et al. and propose a novel solution that works a more human-like way. The above-mentioned methods needed some manual feature engineering, but in human cognition, the required features are learned along with the process of learning new visual concepts. For example, when we observe a car, we decompose it to wheels, body, and internal parts spontaneously. Moreover, to differentiate it from a bicycle, we use those learned features. A similar process can be replicated in machines using capsule neural networks. Koch et al. [7] in 2015, have proposed a model using Siamese neural networks as a solution to the one-shot learning problem. They have used the same dataset and approach as Lake et al. [6], but their model has used convolutional units in neural networks to achieve understanding about the image. According to Hinton et al. [11], CNNs are misguided in what they are trying to achieve and far from how human visual perception works; hence, they have proposed capsules instead of convolutions. In this chapter, we present a Siamese neural network based on Capsule networks to solve one-shot learning problem. The idea of the capsule first proposed by Hinton et al. in 2011 and later used for numerous applications [31, 32]. Generally, CNNs aim for viewpoint invariance of the “neuron” activities, so that the characters can be recognized irrespective of the viewing angle. This can be performed by a single scalar output to recap the tasks of replicated feature detectors [9]. In contrast to CNN, capsule networks use local “capsules” that can perform computations on the inputs, internally. These results are encapsulated into an informative output vector [11]. Sabour et al. [9], have proposed an algorithm to train capsule networks based on the concept of routing by agreement between capsules. Dynamic routing helps to achieve equivariance, while CNNs can only achieve invariance by the pooling layers. Table 2.1 summarizes the techniques used in the related studies for Oneshot learning. Accordingly, most studies have used capsule-based techniques in recent years. This could be because capsule networks show better generalization with small datasets. In this chapter, we design a Siamese network similar to Koch et al. but with useful modifications to accommodate more complex details grabbed by capsules. Siamese network is a twin network, which takes two images

28 Machine Vision Inspection Systems Table 2.1 Comparison of related studies.

Related work

Bayesian network

Lake et al. [6]

X

Koch et al. [7]

X

Neural network

Siamese neural network X

Hinton et al. [11]

X

Bertinetto et al. [24]

X

Chen et al. [13]

X

Fei-Fei et al. [4]

Capsule network

X

Lie et al. [15]

X

Bromley et al. [28]

X

Kumar et al. [31]

X

Zhao et al. [32]

X

Sabour et al. [9]

X

Sethy et al. [12]

X

as input and feeds to the weight sharing twin network. Our contributions in this chapter include exploring the applicability of capsules in Siamese networks, introducing novel architecture to handle intricate details of capsule output, and integrating recent advancement to go deep with capsule networks [33, 34] into Siamese networks.

2.3 System Design In this research, we define character image classification as a subproblem of character verification and develop an image verification model which learns a function F, as shown in Equation (2.1)

F: {Xi, Xj} → Pi,j

(2.1)

which gives the Pi,j probability of two images, Xi and Xj belonging to the same category. We expect the model to learn a general representation of

Capsule Networks for Character Recognition 29 images that could be applied to unseen data without any further training. After fine-tuning the model for the verification task, we use a one-shot learning approach to classify images, as explained in Section 2.3.1. We propose a Siamese architecture comprising of a twin capsule network and a fully connected network, for character verification. This architecture mainly consists of a weight sharing twin network, vector difference layer and a fully connected network, as shown in Figure 2.1. Siamese network takes two input images, run them through a feature extraction process, comparison layers, and finally gives out the probability of belonging to the same category. Twin network starts with a convolutional layer and has four capsule layers before the fully connected entity capsule layer. Results from twin network merged using a vector difference function and input to the fully connected part to predict the final probability. We decompose the Siamese network to three main sections: Twin network, Difference layer and Fully connected network, to explain the structure and functionality. Twin network: Twin network consists of two similar networks that share weights between them. The purpose of sharing weights is getting the same output from both networks if the same image feed to them. Since we wanted the twin network to learn how to extract features that could help distinguish images; convolutional layers, capsule layers and deep capsule layers were used and deep capsule layers-based model gave the best performance. The capsule network consists of four layers. Since we consider relatively simpler images with plain backgrounds, having many layers has a less effect. The first layer is a convolutional layer with 256, 9 × 9 kernels with a stride 5 to discover basic features in the 2D image. Second, third, and fourth layers are capsule layers with 32 channels of 4-dimensional capsules, where each capsule consists of 4 convolutional units with a 3 × 3 kernel and strides of 2 and 1, respectively. Next capsule layer contains 16 channels of 6-dimensional capsules. Each of them consists of a convolutional unit with a 3 × 3 kernel and stride of 2. The sixth layer is a fully connected capsule layer named as entity capsule layer. It contains 20 capsules of 16-dimension. We use dynamic routing proposed by Ref. [9], between final convolutional capsule layer and entity capsule layer with three routing iterations. Vector difference layer: After twin network identifies and extracts important features in two input images, the vector difference layer is used to compare those features to get a final decision about similarity. Each capsule in the twin network is trained to extract an exact type of property or entity such as an object or part of an object. Here, the length and the direction of the output vector is determined by the probability of feature detection and the state of the detected feature, respectively [11]. For example,

105

105

21

21

21

32

11

11

11

11

4

32

4

32

Convolution Layer

21

Figure 2.1 Siamese network architecture.

105

105

32

Convolution Layer

4

32

Network 1

11

4 11 Network 2

11

11

32

11

11

11

11

6 6

6

16

4

32

6 6

6

16

Capsule Network

4

32

Capsule Network

20 x 16

Entity Capsule

20 x 16

Entity Capsule

20 x 16

Vector Difference Layer

FC @1 neurons

FC @32 neurons

FC @1024 neurons FC @256 neurons

30 Machine Vision Inspection Systems

Capsule Networks for Character Recognition 31 when an identified feature is changed its state by a move, the probability remains the same with the vector length, while orientation changes. Due to this property, it is not enough to take scalar difference using L1 distance but needs to use more complex vector difference and analyse it. We obtain 20 vectors of dimension 16 after the difference layer and feed it to a fully connected network. Fully connected network: Fully connected network comprises four fully connected layers with parameters as shown by Figure 2.1. Except for the last fully connected layer which has sigmoid activation, other fully connected layers use Rectified Linear Unit (ReLU) activation [35]. In this study, multiple fully connected layers are used to analyse the complex output of the vector difference layer to get an accurate probability.

2.3.1 One-Shot Learning Implementation The goal of this study is classifying characters in new alphabets. After fine-tuning the model for the verification job, we expect that it has learned a general enough function to distinguish between any two images. Hence, we could model character classification as a one-shot learning task that uses only one sample to learn or perform a particular task [6]. This study creates a reference set for all the possible classifications with only one image and then feed the verification model with the pairs created by using test image and one image from the reference set and predict a class using the similarity score given by the model. This approach is further extended to improve accuracy and testing purposes, as explained in Section 2.4.

2.3.2 Optimization and Learning The proposed methodology learns the optimal model parameters by optimizing a cost function, which is defined over the expected output and the actual result. Moreover, binary cross-entropy function [36], is used as given in Equation (2.2), to quantify the prediction accuracy. Here θ denotes the parameters of the model. The symbols xi, xj and yi,j represent the input image, reference image and the expected output, respectively. The output of the function F increases if the reference and the test images are equal. Otherwise, the function tries to decrease the value. The Adam optimizer [37], is used to optimize this cost function.

L(xi,xj,θ) = yi,j · log(F(xi,xj)) + (1 – yi,j) · (1 – log(F(xi,xj))) (2.2)

32 Machine Vision Inspection Systems

2.3.3 Dataset This study focuses on character domain. Therefore, we use the Omniglot dataset to train the model to learn a discriminative function and features of the images. Omniglot dataset consists of 1,623 handwritten characters that belong to 50 alphabets [6]. Each character has 20 samples, which is written by 20 individuals through the Amazon Mechanical Turk platform. The dataset is divided into a training set with 30 alphabets and test set with 20 alphabets. For the training sessions, we use data from the training set only and validate using the data in the test set.

2.3.4 Training Process The learning model is trained on an AWS EC2 instance consists of four vCPUs and Nvidia Tesla V100 GPU with 16GB memory. We trained our models up to 500 epochs while manually adjusting the learning rate depending on the convergence. Before the model training, images were coupled. For the images of the same category, the expected prediction is 1 and for others 0. Data fetching is done on the CPU, at the same time they are fed and processed in the GPU. This significantly reduced the training time. Algorithm 1 states the data generation process for model training. The process takes the category list of the characters and the images that belong to each category as the inputs. This process generates the image couples and the expected output values as output. The process starts with generating similar couples. As stated in line 1, the loop goes through each character category and generates the couples belonging to the same category, as given in the get_similar_couples function in line 2. These image couples are added to the output array training_couples in line 3, along with the expected value as given in line 4. For the matching image couples, the prediction is one, hence number 1 is added to the expected values array for the count of couples. In lines 5 and 6, the algorithm loops through category list for two times, and check for the similar categories in line 7. If the two categories are the same, the process immediately goes to the next iteration of the loop, using the continue keyword in line 8. If there are different categories, then the process generates the mismatching image couples from the category images in each of the considered categories, as given in line 9. Then the image couples are added to the training_couples array. Since these are the false couples, the prediction should be zero. Thus, in line 11, the value 0 is added to the expected values array for the same length of the image_couples array.

Capsule Networks for Character Recognition 33 After that, in line 12, the output arrays are shuffled before the training model, to generate random training samples. Algorithm 1: Data generation Input: cat_list[], category_images [] Output: training_couples[], expected_values[] 1. for (category in cat_list[]) 2. image_couples = get_similar_couples (category_images[category]) 3. traing_couples.add(image_couples) 4. expected_values.add([1] * image_couples.length) 5. for (category1 in cat_list[]) 6. for (category2 in cat_list[]) 7. If (category1 == category2) 8. Continue 9. image_couples=get_different_couples (category_images[category1],category_images[category2]) 10. training_couples.add(image_couples) 11. expected_values.add([0] * image_couples.length) 12.Shuffle (training_couples, expected_values) 13.return training_couples[], expected_values []

2.4 Experiments and Results The proposed methodology has experimented with a few models based on capsule networks, while keeping the convolutional Siamese network that has proposed by Koch et al. as a baseline. As an initial attempt to understand the applicability of Capsules in Siamese networks, we integrate the network proposed by Sabour et al. [9] to a Siamese network, which does not give satisfactory result due to its inability to converge properly. Sabour et al. proposed this model for the MNIST dataset which is a collection of 28 × 28 images of numbers. However, in our study, we scale out this model to 105 × 105 images of Omniglot dataset, which makes it highly compute-intensive to train the learning model. In order to mitigate the high computational power, improvements were made to the previous model based on the ideas proposed in DeepCaps [34] to stack multiple capsule layers and finally replace the L1 distance layer with a vector difference layer. Validation accuracies for different models are reported in Table 2.2. Here, three Siamese networks were tested while keeping Convolutional Siamese network [7], as the base. The network is

34 Machine Vision Inspection Systems Table 2.2 Model validation accuracy. Class

Agreement (%)

Convolutional Siamese

94 ± 2%

Sabour et al. Capsule Siamese

78 ± 5%

Deep Capsule Siamese 1

89 ± 3%

Deep Capsule Siamese 2

95 ± 2.5%

purely based on Sabour et al. [9] showed poor performance, while Capsule Siamese 1 with deep capsule networks and Capsule Siamese 2 with deep capsule and new vector difference layer shows on par performance to the base model. This is an indication that the original Siamese network with classical Capsule layers is not generalized enough.

2.4.1 N-Way Classification One expectation of this model is achieving the ability to generalize previous experience and use it to make decisions with completely new unseen alphabets. Thus, the n-way classification task was designed to evaluate the model in classifying previously unseen characters. Here, we have used 30 alphabets having 659 characters from the evaluation set of Omniglot dataset which was not used in the training. However, that makes the model completely unfamiliar with these characters. In this experiment, we have designed the one-shot learning task as deciding the category of a given test image X out of n given categories. For an n-way classification task, we selected n character categories and selected one-character category from the same set as the test category. Then the one-shot task is prepared with one test image (X) from test category and reference image set {XN}; one image for each character category. The Siamese network is fed with X, Xn couples and predict the similarity. Belonging category, n* is selected as category with the maximum similarity as in Equation (2.3). The argmax function denotes the index of n that maximize F function.

n* = argmaxn∈N(F(X,Xn))

(2.3)

The model is evaluated by N-way classification, N varying in the range [1, 40] and results are depicted in Figure 2.2.

Capsule Networks for Character Recognition 35 Convolutional Siamese network

Accuracy %

100 80

Capsule layerbased Siamese network

60 40

Nearest Neighbour classification

20 0

Random Guessing 1 2

3

4

5

6

7

8 9 10 11 12 13 14 15 16 17 18 19 20

Number of possible classes in one-shot tasks

Figure 2.2 Omniglot one-shot learning performance of Siamese networks.

According to Figure 2.2, the proposed model of this study, capsule layer-based Siamese network classification has on par results with Koch et al.’s model with the convolutional Siamese network classification. However, our model has 2.4 million parameters, which is 40% less compared to 4 million parameters in Koch et al.’s model. Although the overall performance of Koch et al.’s model with the convolutional classification, and the proposed model in this study which is based on capsule network, are on par, there are certain cases our model shows superior performance. For instance, the proposed model has a superior capability of identifying minor changes in characters. For the n-way classification task, the statistical approach random guessing techniques are defined, such that if there are n options and if only one is correct, the chance of prediction being correct is 1/n. Thus, for the repeated experiment the accuracy is considered as a percentage of that probability. Here, the classification accuracy has dropped with the growth of the reference set, because then the solution space is large for the classification task. Nearest neighbor shows exponential degrades while Siamese networks have less reduction with a similar level of performance. Figure 2.3 shows the classification results obtained by different models, namely the 20-way classification task (top), Capsule Siamese network (middle) and Convolutional Siamese network (bottom). The figure shows the samples of the test images and the corresponding classification results. Capsule based architecture was able to identify small changes in image structure, as shown in the middle row. Figure 2.3 illustrates a few 20-way classification problems in which the proposed capsule layers-based Siamese network model outperforms the convolutional Siamese network. In most of the cases, the convolutional network fails to identify minor changes in the image, such as small line segments, curves. However, with the detailed features extracted through

36 Machine Vision Inspection Systems Test image 1

Classified image 1

Test image 2

Classified image 2

Test image 3

Classified image 3

20-way classification task Capsule Siamese network Convolutional Siamese network

Figure 2.3 Sample 1 classification results.

capsules, such decisions were made easy in the proposed capsule network model. Figure 2.4 depicts a few samples, where the proposed capsule network model fails to classify characters correctly. For certain characters, there is a vast difference in the writing styles between two people. In such cases, the proposed capsule layers-based Siamese network underperforms compared to the CNN. Capsule network model fails in certain cases while convolutional units successfully identify the character. As a solution to the decrease of n-way classification accuracy, we propose n-shot learning instead of one-shot learning. In one-shot learning, we use only one image from each class in the reference set, however, n-shot learning, we use n images for each category and select the category with highest similarity as in Equation (2.4), where argmax is the argument Test image 1

Classified image 1

20-way classification task Capsule Siamese network Convolutional Siamese network

Figure 2.4 Sample 2 classification results.

Test image 2

Classified image 2

Test image 3

Classified image 3

Capsule Networks for Character Recognition 37 maximizing the summation, X denotes the image and the function F(x, xi,n) states the similarity score.

n* = argmaxn∈N

( ∑ F ( X , X )) i ,n

(2.4)

Accuracies obtained with n-shot learning for 2-, 6-, 20- and 28-way classification are illustrated in Figure 2.5. There is no significant improvement for test cases with small classification set, however, when the classification set is large n-shot learning can significantly improve the performance. For instance, 28-way classification accuracy is improved from 78 to 90% by using 20 images for each class in the reference set. Here, the classification accuracy improves with the increase of the number of samples that are used to compare against. For n-way classification with smaller n with few samples 100% accuracy achieved while more complex task needs a greater number of samples.

2.4.2 Within Language Classification In n-way testing, we use characters from different languages, but the accuracy obtained for individual language is the main determinant for research. Language-wise classification accuracy was evaluated by preparing one-shot tasks with characters taken from a single alphabet, and the results were illustrated in Table 2.3. These results are based on the nearest neighbour, 1-shot capsule network classifications within individual alphabets. We have selected the Nearest neighbor method because it is a simpler classification

100

2-way Learning

Accuracy %

95

6-way Learning

90

20-way Learning

85 80

28-way Learning

75 70

1

2 3 4 5 6 7 8 9 Number of samples (shots) used for learning

Figure 2.5 Omniglot n-shot n-way learning performance.

10

11

38 Machine Vision Inspection Systems Table 2.3 Classification accuracies within individual alphabets. Model

Characters

Nearest neighbor

1-shot capsule network

Aurek-Besk

25

6.40%

84.40%

Angelic

19

6.32%

76.84%

Keble

25

2.00%

71.20%

Atemayar Qelisayer

25

4.00%

62.80%

Tengwar

24

3.33%

62.08%

ULOG

25

3.60%

61.60%

Syriac (Serrto)

22

6.36%

58.64%

Atlantean

25

2.80%

58.00%

Avesta

25

5.20%

57.60%

Cyrillic

44

2.05%

57.05%

Sinhala

60

1.00%

56.22%

Ge`ez

25

1.60%

52.40%

Mongolian

29

4.83%

52.07%

Glagolitic

44

1.82%

50.68%

Manipuri

39

3.08%

50.51%

Malayalam

46

3.26%

45.87%

Tibetan

41

2.93%

45.61%

Sylheti

27

4.07%

40.37%

Gurmukhi

44

2.27%

38.41%

Oriya

45

1.56%

33.33%

Kannada

40

1.00%

29.25%

method that uses raw pixel values. Thus, it is evident that language level classification accuracy is proportional to the number of characters in the language. Another critical factor that influences accuracy is the structural similarity between characters.

Capsule Networks for Character Recognition 39 For further analysis, we consider the alphabet models with the same number of characters and those have shown the highest and lowest classification accuracies. Consider the characters of Gurmukhi (38.41% accuracy) and Cyrillic (57.05% accuracy), which has the same number of characters (44), but accuracy differs by 18.64%. The accuracy difference could be due to the structural similarity between characters in those alphabets. Figure 2.6 shows the two alphabets. Due to the same reason, we get lower accuracies for within language classification compared to mixed language n-way classification as described in Section 2.4.1. Further, in an attempt to boost the accuracies in classification, we have used n-shot learning, while keeping 10 images for each character in the alphabet as the reference set and 10 images for averaging the results. By this experiment, we obtained 7 to 15% accuracy improvement resulting in 94% highest accuracy for Aurek-Besh language and 40% lowest accuracy for Oriya language, respectively.

2.4.3 MNIST Classification The Omniglot dataset has more than 1,600-character classes, but has only 20 samples for each category. In contrast, MNIST dataset has 10 classes and 60,000 total training samples [30]. Since the proposed model of this study aims to learn an abstract knowledge about characters and extend it to identify new characters, by treating MNIST as a whole new alphabet with 10 characters, we could use the proposed capsule layers-based Siamese network model to apply classifications for MNIST dataset. Table 2.4 shows the accuracy values obtained by different MNIST models. Here, large neural

Figure 2.6 Gurmukhi (left) and Cyrillic (right) alphabets.

40 Machine Vision Inspection Systems Table 2.4 Accuracies of different MNIST models. MNIST Model

Accuracy

1-Layer NN [18]

88%

2-layer NN [18]

95.3%

Large convolutional NN [25]

99.5%

Proposed capsule layer-based Siamese network (1-shot)

51%

Proposed capsule layer-based Siamese network (20-shot)

74.5%

networks have achieved more than 90% accuracy while the proposed capsule layers-based Siamese network model has given 76% accuracy with only 20 images. The MNIST dataset is a benchmark model for image classification algorithms and has been solved to get more than 90% accuracy as summarized in Table 2.4. These methods are based on deep neural networks and use all the 60K characters in the dataset. Although the proposed capsule layers-based Siamese network model has shown only 51% accuracy with MNIST dataset, that has used only one sample for each digit class while other models have access to more than 60,000 samples. The proposed solution has improved this accuracy by using the same n-shot learning technique. By using 20 samples the accuracy is improved by 23.5% as depicted in Figure 2.7. Thus, the classification accuracy of MNIST dataset is improved from 51 to 74.5% by using a greater number of samples.

75 Accuracy %

70 65 60 55 50 45

1

2

3

4

5

6

7

8

9 10 11 12 14 16 18 20

Number of samples used for learning

Figure 2.7 MNIST n-shot learning performance.

Capsule Networks for Character Recognition 41

2.4.4 Sinhala Language Classification One of the main goals in this research is evaluating the performance of one-shot learning for Sinhala language. Using deep learning approaches is not an option for Sinhala character recognition due to a lack of datasets. Sinhala language has 60 characters, making it a complex alphabet. For each character in Sinhala alphabet, we have added 20 new images to Omniglot dataset. First, we have classified Sinhala characters with a model which was not trained with Sinhala characters and was able to achieve 49% accuracy. After training the model with 5% of the Sinhala dataset, the accuracy is improved to 56%. Considering the languages used in the experiment, Sinhala language has the largest alphabet. Compared to some other languages with a smaller number of characters, the model has given a better accuracy for Sinhala. This could be due to significant visual structural differences between characters.

2.5 Discussion 2.5.1 Study Contributions This chapter has presented a novel architecture for image verification using Siamese networks structure and capsule networks. We have improved the energy function used in Siamese network to extract complex details output by capsules and obtained on par performance as Siamese networks based on convolutional units [7], but using significantly a smaller number of parameters. Another major objective of this study is duplicating the human ability to understand completely new visual concepts using previously learnt knowledge. Capsule based Siamese networks can learn a well-generalized function that can be effectively extended to previously unseen data. We have evaluated this capability using n-way classification using one-shot learning. The results have shown more than 80.5% classification accuracy with 20 different characters, which the model has no previous experience. Moreover, the model is evaluated with MNIST dataset, which is considered as a de facto dataset to evaluate image classification model [30]. The proposed methodology of the capsule layers-based Siamese network has shown 51% accuracy in the classification, using only one image for each digit. Latest deep learning models achieve more than 90% accuracy [39], but that is using all the 60K images available in MNIST dataset. The solution proposed by this study has improved the one-shot learning accuracies by using n-shot learning method, that is using n samples from each

42 Machine Vision Inspection Systems image class to do the classification. This way accuracies were improved by 23.5% using 20 samples. As depicted in Figure 2.5, even 28-way learning has showed a classification accuracy of 90%, with Omniglot dataset, while MNIST dataset achieved 74.5% accuracy as shown in Table 2.4. Further, we have extended the Omniglot dataset by adding a new set of characters for Sinhala language. This contains 600 new handwritten characters for 60 characters in the alphabet. The proposed model has given 49% accuracy for Sinhala without any training stage and it has shown a classification accuracy of 56.22% with a training model accuracy using only one reference image, as shown in Table 2.3. By comparing with the related studies, in Koch et al. [7], the authors of Omniglot dataset, have used a convolutional layer based Siamese network to solve the one-shot problem [6]. They have shown an accuracy of 94% for class independent classification. This is a similar performance as of the proposed capsule layers-based Siamese network model. In contrast, capsule layers achieve this accuracy with 40% fewer parameters. In an experiment with MNIST dataset using one-shot learning, Koch et al. have achieved 70% accuracy [7], Vinyals et al. [27] have shown 72% accuracy, while the proposed capsule layers-based Siamese network model has given 76% accuracy. The approach in Vinylas et al. [27], is based on Memory augmented neural networks (MANN) and has a similar structure to recurrent neural networks with external memory.

2.5.2 Challenges and Future Research Directions Although the proposed solution has shown more than 50% accuracy, which is the general threshold for the tested languages, for most of the alphabet types in Omniglot dataset, it has used a small set of images to achieve that accuracy. This limitation can be surpassed by using handcrafted features, which is time-consuming. In the proposed capsule layers-based Siamese network model, the accuracy of the within language classifications depends on two factors: the number of characters in the alphabet and visual difference between characters. Some alphabets have visually similar characters. In such cases, although the number of characters in the alphabet is small, the classification accuracy becomes low. Thus, the system architecture can be improved with the representation of the image features using transfer learning. Here, features can be extracted from each character image, using a pre-trained deep neural network, and those images can pass to the Siamese network. This study can be extended by integrating model in a complete OCR pipeline incorporating a character segmentation and reconstruction

Capsule Networks for Character Recognition 43 algorithm. Also, it is possible to analyse the applicability of the proposed model with complex datasets such as ImageNet [40] and COCO [41] by deepening the Siamese network. Additionally, the knowledge learnt from printed character classification can be used to classify handwritten characters. Further, the model classification accuracy can be improved by using printed characters to train the network at initial stages and then using handwritten characters. This will allow the network to understand the defining attributes of each character and such dataset can be generated easily.

2.5.3 Conclusion Character recognition is a critical module in applications such as document scanning and optical character recognition. With the emergence of deep learning techniques, languages like English have achieved high classification accuracies. However, the applicability of those deep learning methods is constrained in low resource languages, because of the lack of well-developed datasets. This study has focused on implementing a viable method for classification of handwritten characters in low resource languages. Due to the restrictions on the size of available dataset, this problem is modelled as a one-shot learning problem and solved using Siamese networks based on Capsule networks. Siamese network is a de facto type of network use in one-shot learning, but when it comes to image-related tasks, they still need a large number of training dataset. However, the use of Capsule layers-based Siamese network, which can mitigate information losses in Convolutional neural networks allowed to train a Siamese network with a small number of parameters, datasets and get on par performance as a convolutional network. This model is tested with Omniglot dataset and achieved 30–85% accuracy for different alphabets. Further, the model has shown a classification accuracy of 74.5% for MNIST dataset.

References 1. Vorugunti, C.S., Gorthi, R.K.S., Pulabaigari, V., Online Signature Verification by Few-Shot Separable Convolution Based Deep Learning. International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 1125– 1130, 2019. 2. Wu, Y., Liu, H., Fu, Y., Low-shot face recognition with hybrid classifiers, in: IEEE International Conference on Computer Vision Workshops, pp. 1933– 1939, 2017.

44 Machine Vision Inspection Systems 3. Gui, L.-Y., Wang, Y.-X., Ramanan, D., Moura, J.M., Few-shot human motion prediction via meta-learning, in: European Conference on Computer Vision (ECCV), pp. 432–450, 2018. 4. Fe-Fei, L., A Bayesian approach to unsupervised one-shot learning of object categories, in: 9th IEEE International Conference on Computer Vision, IEEE, pp. 1134–1141, 2003. 5. Arica, N. and Yarman-Vural, F.T., Optical character recognition for cursive handwriting. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 801–813, 2002. 6. Lake, B., Salakhutdinov, R., Gross, J., Tenenbaum, J., One shot learning of simple visual concepts, in: Annual Meeting of the Cognitive Science Society, 2011. 7. Koch, G., Zemel, R., Salakhutdinov, R., Siamese neural networks for one-shot image recognition, in: 32nd International Conference on MachineLearning, Lille, France, pp. 1–8, 2015. 8. Chopra, S., Hadsell, R., Lecun, Y., Learning a similarity metric discriminatively, with application to face verification, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp. 539–546, 2005. 9. Sabour, S., Frosst, N., Hinton, G.E., Dynamic routing between capsules, in: 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, pp. 3856–3866, 2017. 10. Lehtonen, E. and Laiho, M., CNN using memristors for neighborhood connections, in: 12th International Workshop on Cellular Nanoscale Networks and their Applications, IEEE, pp. 1–4, 2010. 11. Hinton, G.E., Krizhevsky, A., Wang, S.D., Transforming auto-encoders, in: International Conference on Artificial Neural Networks, Springer, pp. 44–51, 2011. 12. Sethy, A., Patra, P.K., Nayak, S.R., Offline Handwritten Numeral Recognition Using Convolution Neural Network, in: Machine Vision Inspection Systems: Image Processing, Concepts, Methodologies and Applications, M. Malarvel, S.R. Nayak, S.N. Panda, P.K. Pattnaik, N. Muangnak (Eds.), cp. 9, pp. 197– 212, John Wiley & Sons Inc, New York, United States, 2020. 13. Chen, Y., Jiang, H., Li, C., Jia, X., Ghamisi, P., Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens., 54, 6232–6251, 2016. 14. Garcia-Gasulla, D., Parés, F., Vilalta, A., Moreno, J., Ayguadé, E., Labarta, J., Cortés, U., Suzumura, T., On the behavior of convolutional nets for feature extraction. J. Artif. Intell. Res., 61, 563–592, 2018. 15. Liu, B., Yu, X., Zhang, P., Yu, A., Fu, Q., Wei, X., Supervised deep feature extraction for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens., 56, 1909–1921, 2017. 16. Sethy, A., Patra, P.K., Nayak, S.R., Nayak, D.R., A Gabor Wavelet Based Approach for Off-Line Recognition of ODIA Handwritten Numerals. Int. J. Eng. Technol., 7, 253–257, 2018.

Capsule Networks for Character Recognition 45 17. Krüger, V. and Sommer, G., Gabor wavelet networks for object representation, in: Multi-Image Analysis, R. Klette, G. Gimel’farb, T. Huang (Eds.), LNCS, 2032, pp. 115–128, Springer, Berlin, Heidelberg, 2001. 18. Kaushal, A. and Raina, J., Face detection using neural network & Gabor wavelet transform. Int. J. Comput. Sci. Technol., 1, 58–63, 2010. 19. Nayak, S.R., Mishra, J., Palai, G., Analysing roughness of surface through fractal dimension: A review. Image Vision Comput., 89, 21–34, 2019. 20. Nayak, S.R., Mishra, J., Palai, G., A modified approach to estimate fractal dimension of gray scale images. Optik, 161, 136–145, 2018. 21. Nayak, S., Khandual, A., Mishra, J., Ground truth study on fractal dimension of color images of similar texture. J. Text. Inst., 109, 1159–1167, 2018. 22. Sethy, A. and Patra, P.K., Off-line Odia Handwritten Character Recognition: an Axis Constellation Model Based Research. Int. J. Innov. Technol. Explor. Eng., 8, 788–793, 2019. 23. Zhang, J., Zhu, Y., Du, J., Dai, L., Radical analysis network for zero-shot learning in printed Chinese character recognition, in: IEEE International Conference on Multimedia and Expo, IEEE, pp. 1–6, 2018. 24. Bertinetto, L., Henriques, J.F., Valmadre, J., Torr, P., Vedaldi, A., Learning feed-forward one-shot learners, in: 30th International Conference on Neural Information Processing Systems, ACM, pp. 523–531, 2016. 25. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H., Fullyconvolutional siamese networks for object tracking, in: European Conference on Computer Vision, Springer, pp. 850–865, 2016. 26. Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T., Oneshot learning with memory-augmented neural networks, arXiv preprint arXiv:1605.06065, 1–13, 2016. 27. Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., Matching networks for one shot learning, in: 30th International Conference on Neural Information Processing Systems, pp. 3630–3638, 2016. 28. Bromley, J., Guyon, I., Lecun, Y., Säckinger, E., Shah, R., Signature verification using a “Siamese” time delay neural network, in: 6th International Conference on Neural Information Processing Systems, pp. 737–744, 1993. 29. Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T., Metalearning with memory-augmented neural networks, in: International Conference on Machine Learning, pp. 1842–1850, 2016. 30. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., Gradient-based learning applied to document recognition, in: IEEE, vol. 86, pp. 2278–2324, 1998. 31. Kumar, A.D., Novel deep learning model for traffic sign detection using capsule networks, arXiv preprint, arXiv:1805.04424, 1–5, 2018. 32. Zhao, W., Ye, J., Yang, M., Lei, Z., Zhang, S., Zhao, Z., Investigating capsule networks with dynamic routing for text classification, arXiv preprint, arXiv:1804.00538, 1–12, 2018. 33. Lalonde, R. and Bagci, U., Capsules for object segmentation, arXiv preprint, arXiv:1804.04241, 1–9, 2018.

46 Machine Vision Inspection Systems 34. Rajasegaran, J., Jayasundara, V., Jayasekara, S., Jayasekara, H., Seneviratne, S., Rodrigo, R., Deepcaps: Going deeper with capsule networks, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10725–10733, 2019. 35. Xu, B., Wang, N., Chen, T., Li, M., Empirical evaluation of rectified activations in convolutional network, arXiv preprint, arXiv:1505.00853, 1–5, 2015. 36. Mackay, D.J. and Mac Kay, D.J., Information theory, inference and learning algorithms, Cambridge Universtity Press, Cambridge, United Kingdom, 2003. 37. Kingma, D.P. and Ba, J., Adam: A method for stochastic optimization, arXiv preprint, arXiv:1412.6980, 1–15, 2014. 38. Jarrett, K., Kavukcuoglu, K., Ranzato, M.A., Lecun, Y., What is the best multistage architecture for object recognition?, in: 12th international conference on computer vision, IEEE, pp. 2146–2153, 2009. 39. Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J., Convolutional neural network committees for handwritten character classification, in: International Conference on Document Analysis and Recognition, IEEE, pp. 1135–1139, 2011. 40. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., Imagenet: A largescale hierarchical image database, in: IEEE Conference on computer vision and pattern recognition, IEEE, pp. 248–255, 2009. 41. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., Microsoft coco: Common objects in context, in: European Conference on Computer Vision, Springer, pp. 740–755, 2014.

3 An Innovative Extended Method of Optical Pattern Recognition for Medical Images With Firm Accuracy—4f System-Based Medical Optical Pattern Recognition Dhivya Priya E.L.1*, D. Jeyabharathi2, K.S. Lavanya1, S. Thenmozhi1, R. Udaiyakumar1 and A. Sharmila3 Dept. of ECE, Sri Krishna College of Technology, Coimbatore, India 2 Dept. of IT, Sri Krishna College of Technology, Coimbatore, India 3 Dept. of ECE, Bannari Amman Institute of Technology, Sathyamangalam, Erode, India 1

Abstract

Optics grows with the development of lens which supports for designing various accuracy based systems. The demand for accuracy based systems is increasing day-to-day. One such application is the medical fields where the demand of accuracy is high. The idea is to invoke the concept of lens in the field of medical sciences. The 4f based optical system is used as a benchmark to develop a firm system for the medical applications. This method of performing transforms with the optical system helps in improving the accuracy. When the image of the patient placed in the object plane is exposed to optical rays, the biconvex lens between the object and Fourier Plane performs optical Fourier transform. The 4f system at the Fourier plane is altered by adding up a LCD projector system connected to the computer which is placed before an optical biconvex lens at 45 degree. The 45 degree setup creates the diffracted pattern of the database image (i.e. reference image). The setup is made in such a way that the diffracted pattern of the reference image and the input image coincides. The coincide image is again passed through the second biconvex lens to produce an 180 degree out of phase image with the exact outline indicating the normal or

*Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (47–68) © 2021 Scrivener Publishing LLC

47

48 Machine Vision Inspection Systems abnormal condition of the patient. The designed systems help in high speed pattern recognition with optical signals. Keywords: Optical signal processing, medical applications, Fourier optics, optical Fourier transforms, correlator signal processing, biconvex lens

3.1 Introduction 3.1.1 Fourier Optics Fourier optics is the investigation of traditional optics utilizing Fourier changes, in which the wave is viewed as a superposition of plane waves that are not identified with any recognizable sources; rather they are simply the regular methods of the spread medium itself. Fourier optics can be viewed as the double of the Huygens–Fresnel standard, in which the wave is, viewed as a superposition of growing round waves which transmit outward from genuine current sources by means of a Green’s capacity relationship. Fourier optics frames a significant part of the hypothesis behind picture handling methods, just as seeing applications where data needs as separated from optical sources, for example, in quantum optics. The plane wave range idea is the fundamental establishment of Fourier Optics. The plane wave range is a constant range of uniform plane waves, and there is one plane wave part in the range for each digression point on the far-field stage front. The abundancy of that plane wave part would be the sufficiency of the optical field at that digression point. The range of the optical field in the far field is represented as follows,

Range = 2 D2 λ

(3.1)

Where, D is the maximum linear extent of the optical sources λ is the wavelength. The plane wave range is regularly viewed as being discrete for specific sorts of occasional gratings, however as a general rule, the spectra from gratings are persistent also, since no physical gadget can have the unbounded degree required to create a genuine line range. The Plane Wave Spectrum Representation of Electromagnetic Fields presents the hypothesis of the electromagnetic field with accentuation to the plane wave. The plane wave range is the essential electromagnetic fields can be spoken to by the notion of plane waves going in various ways.

4f System Medical Optical Pattern Recognition 49 The idea of relative effortlessness of plane wave arrangements of Maxwell’s conditions utilizes a portion of the critical rudimentary physical and designing qualities of the electromagnetic field to be explained. Numerous physical marvels are found tentatively to share the essential property that their reaction to a few upgrades acting all the while is indistinguishably equivalent to the total of the reactions that every part boost would create exclusively. Such marvels are called straight and the property they share is called linearity. Electrical systems made out of resistors, capacitors, and inductors are typically straight over a wide scope of sources of info. Also, the wave condition portraying the spread of light through most media drives us normally to see optical imaging tasks as straight mappings of “object” light circulations into “picture” light disseminations. The single property of linearity prompts tremendous rearrangements in the scientific portrayal of such wonders and speaks to the establishment of a numerical structure which is alluded as straight frameworks hypothesis. The extraordinary preferred position managed by linearity is the capacity to communicate the reaction to an entangled boost regarding the reactions to certain “rudimentary” improvements. In this manner if a boost is deteriorated into a straight mix of rudimentary upgrades, every one of which delivers a known reaction of advantageous structure, at that point by excellence of linearity, the all-out reaction can be found as a relating direct blend of the reactions to the basic improvements. Spatial rationality is an idea of wave aggravation portraying the relationship between occasional transmitted vitality (wave signals) starting with one point then onto the next, it can likewise be said that it is a shared reliance or association of variable wave amounts of two unique focuses in a given moment of time, the intelligence is introduced as an element of separation and planned as connection against a flat out separation between focuses. On the off chance that the enlightenment utilized in an optical framework shows a property called spatial lucidness, the find that it is fitting to depict the light as a spatial dispersion of complex-esteemed field sufficiency. At the point when the brightening is absolutely ailing in spatial intelligence, it is suitable to depict the light as a spatial appropriation of genuine esteemed force. Consideration will be centered on the investigation of direct frameworks with complex-esteemed data sources. A framework is characterized to be a planning of a lot of information capacities into a lot of yield capacities. For the instance of electrical systems, the information sources and yields are genuine esteemed capacities (voltages or flows) of a one-dimensional free factor (time), for the instance of imaging frameworks, the data sources and yields can be genuine esteemed

50 Machine Vision Inspection Systems capacities (power) or complex-esteemed capacities (field abundancy) of a two-dimensional autonomous variable (space). On the off chance that consideration is confined to deterministic (nonrandom) frameworks, at that point a predefined input must guide to an interesting yield. It isn’t important, in any case, that each yield relates to an extraordinary contribution, an assortment of info capacities can deliver no yield. Accordingly we limit consideration at the beginning to frameworks portrayed by many-to-one mappings. An advantageous portrayal of a framework is a scientific administrator, SO, which we envision to work on input capacities to create yield capacities. Along these lines if the capacity gl(xl, yl) speaks to the contribution to a framework, and g2(x2, y2) speaks to the comparing yield, at that point by the meaning of S {}, the two capacities are connected through

g2(x2, y2) = {g1(x1,y1)}

(3.2)

Without indicating progressively nitty gritty properties of the administrator SO, it is hard to state increasingly explicit properties of the general framework. In the material that follows, it will be concerned basically, however not solely, with a limited class of frameworks that are supposed to be lineal. The property of linearity can be checked by planning a framework. The supposition of linearity will be found to yield straightforward and genuinely important portrayals of such frameworks; it will likewise permit helpful relations among sources of info and yields to be created.

3.2 Optical Signal Processing Optical Signal Processing (OSP) is an assortment of rundowns, crafted by numerous specialists in the various fields of optical signal processing. OSP can change the data substance of information signals, simultaneously, protecting certain properties of the physical transporter.

3.2.1 Diffraction of Light At the point when an optical wave is transmitted in a murky screen and voyages some separation in free space, its force dispersion is known as the diffraction pattern. On the off chance that light were treated as beams, the diffraction example would be a shadow of the gap. Due to the wave idea of light, the diffraction example may veer off marginally or generously from

4f System Medical Optical Pattern Recognition 51 the gap shadow, contingent upon the separation between the aperture and observation plane, the wavelength, and the dimensions of the gap.

3.2.2 Biconvex Lens A lens is a trans-missive optical gadget that concentrates or scatters a light shaft by methods for refraction. A straightforward focal point comprises of a solitary bit of straightforward material, while a compound focal point comprises of a few basic focal points (components), typically organized along a common axis. Focal points are produced using materials, for example, glass or plastic, and are ground and cleaned or formed to an ideal shape. A focal point can concentrate light to shape a picture, in contrast to a crystal, which refracts light without centering. Gadgets that comparably center or scatter waves and radiation other than obvious light are additionally called focal points or lens, for example, microwave focal points, electron focal points, acoustic focal points, or dangerous focal points. Gadgets that comparably center or scatter waves and radiation other than obvious light are additionally called focal points, for example, microwave focal points, electron focal points, acoustic focal points, or dangerous focal points. In the event that the focal point is biconvex or plano-arched, a collimated light emission going through the perspective unites to recognize (a center) behind the focal point. For this situation, the focal point is known as a positive or combining focal point. The good ways from the focal point to the spot is the central length of the focal point, which is generally shortened f in charts and conditions. The Figure 3.1 shows the working of biconvex focal point.

3.2.3 4f System The proposed framework depends on 4f correlator. The 4f correlator is an imaging framework with solidarity amplification which can be confirmed by beam following. The 4f correlator framework comprises of two focal point and three planes. The planes are object plane, Fourier plane and picture

Figure 3.1 Biconvex lens.

52 Machine Vision Inspection Systems plane. The focal points utilized in this framework are biconvex circular focal point. The framework includes two subsystems. The main subsystem between the object plane and the Fourier plane, performs Fourier change and the second subsystem between the Fourier plane and the picture plane, performs converse Fourier change. The optical Fourier change is performed by biconvex focal point in the 4f framework. The plane wave segments that establish a wave may likewise be isolated by the utilization of focal point. A meager circular focal point changes a plane wave into a paraboloidal wave centered to a point in the focal point central plane. In this way the focal point maps every bearing into a solitary point in the central plane and along these lines isolates the commitments of the distinctive plane waves. Picture arrangement in 4f framework imitates the conveyance of light in object plane into picture plane. The picture at the output plane is never a definite copy of the article. The picture is amplified and furthermore brings about the obscured picture coming about because of defective centering and from diffraction of optical waves. The 4f framework can be utilized as spatial channels with a cover put at the Fourier plane. The mask might be utilized to modify the parts in the Fourier plane. The veil has the inclination of hindering a few segments and transmitting the others. The spatial filters might be a low pass filter, high pass filter and vertical pass filter. The combination of 4f correlator and surveillance cameras are utilized in this proposed work. The database is made which stores the pictures of the approved individuals. The conjugate of every one of these pictures were performed and put away in PC framework. For a solitary camera input picture, all the database pictures must be contrasted and checked so as with distinguish the section of the unapproved individual. This examination must be done at rapid. For this correlation at fast, the proposed framework is planned with the optical standards. The light tends to go at fast of around 3 ∗ 108 m/s. In this way the optical framework is joined with the camera to perform rapid activity.

3.2.4 Literature Survey Dhivyapriya and Pragash’s “Advanced high speed optical pattern recognition” [1] invoke the idea of implementing the high speed pattern recognition for medical applications. This system helps in improving the security in surveillance systems. To overcome the security threat in airports, the system helps in finding out the illegal entry to the airports. The 4f system with the CCTV surveillance camera captures the entry of each person. The entry of each person is verified with the created database. When the input image and the database image coincide the image plane

4f System Medical Optical Pattern Recognition 53 indicates the match of images by a perfect correlation peak. The system helps in finding the forged entry at high speed with the help of optical signals. The same method is extended to use in the medical applications for fast identification of abnormality conditions. Bahasaleh and Teach’s “Photonics” [4] is a book that briefs about the Fourier optics and the comparing optical standards to execute the 4f framework. The essential standards to execute the framework were talked about first. The Fourier optics starts with Ray optics additionally called as geometrical optics is characterized as the light depicted by beams that movement in various optical media as per a lot of geometrical principles. It is worried about the area and course of light beams. Optical beams point toward the progression of optical vitality. Fourier optics gives a portrayal of proliferation of light waves dependent on symphonious investigation and straight frameworks. Consonant examination depends on the development of a self-assertive capacity of time as the superposition of symphonious elements of time of various frequencies. The Fourier depiction of spread of light is in free space. The engendering of light in free space is clarified with spatial frequencies. The optical Fourier change is acted in the 4f correlator framework with the assistance of focal point. The focal point utilized in this framework is biconvex focal point. These fundamental standards help in structuring the 4f framework. The 4f framework incorporates two subsystems. First subsystem performs Fourier change and the subsequent subsystem performs Inverse Fourier change. This framework delivers a yield of the specific reproduction of the picture at the item plane with a stage move. This stage move is a direct result of the reverse Fourier change. Wu et al.’s “4f enhanced in-line compressive holography” [23] clarifies compressive holography, blend of compressive detecting and holography. This paper discloses a way to deal with extend the intensification proportion and upgrade the hub goal in-line compressive holography. The fundamental standard of 4F enhanced in-line compressive holography is first depicted. Next the attainability of 24 reproducing article and examination of remade quality is confirmed. At last both recreated and genuine tests on multilayer objects with non-covering and covering designs are shown to approve the methodology. Computerized Holography is a two-advance procedure: recording a 3D image on a CCD and recouping the item wave front by back-spreading the numerically remade 3D image to the article plane. While computerized holography has critical favorable circumstances and far reaching applications in gaining and putting away item data, its application is restricted by the requirement for tremendous capacity and transfer speed necessities.

54 Machine Vision Inspection Systems The generally new structure of compressive detecting can diminish the prerequisite on capacity and consequently transmitting of data. It has drawn broad consideration and has been generally applied as it can understand precise N-dimensional data remaking from [M N] = dimensional estimated esteem utilizing the improvement calculations. Odinokov’s “Access Control Holographic System Based on Joint Transform Correlator and Image Encoding” [15] helps in making another design dependent on joint change correlator (JTC) is given the transporter key should comprises of a few tried 3D images. Every one of the multidimensional images stores a piece of whole picture, put away in the reference visualizations. Picture space JTC is utilized to coordinate the pictures recovered from the multi-dimensional images. Being recorded and recovered, the pictures furnishes relationship tops with unique situations, with a severe reliance on the tried and reference multi-dimensional images common movements. The optical sign changing is considered by methods for numerical investigation. Li et al.’s “Security and encryption optical frameworks dependent on a correlator with huge yield pictures” [24] helps in structuring an improved optical security framework dependent on two stage just PC produced covers is proposed. The two transparencies are put together in a 4f correlator so a realized yield picture is gotten. Notwithstanding straightforward confirmation, our security framework is equipped for recognizing the kind of info cover as indicated by the comparing yield picture it creates. The two stage veils are planned with an iterative improvement calculation with requirements in the info and the yield spaces. A reenactment is given the resultant pictures framed by the two stage just components. Different cover blends are contrasted with show that a mix is extraordinary and can’t be copied. This uniqueness is a preferred position in security frameworks. Yuan et al.’s “Optical validation method dependent on impedance picture concealing framework and stage just connection” [19] clarifies some predefined complex pictures with various amplitudes and a similar stage are individually encoded into two stage just covers as per the obstruction standard. This method can without much of a stretch create distinctive confirmation keys for various clients, so it brings accommodation for multi-client application. In the confirmation stage, both a sharp relationship top and a huge picture spoke to the character of a client are produced at the same time. Accordingly, notwithstanding straightforward confirmation, this strategy can likewise perceive the character of the client. In addition, this method can viably maintain a strategic distance from the confirmation key being manufactured by other individual.

4f System Medical Optical Pattern Recognition 55

3.3 Extended Medical Optical Pattern Recognition 3.3.1 Optical Fourier Transform The causing of light in free space is portrayed by Fourier examination. In case the unusual sufficiency of a monochromatic wave off recurrence λ in the z = 0 plane is a capacity (x,) made out of symphonious parts of various spatial frequencies, every consonant segment relates to a plane wave. The plane wave travelling at angles θx = sin−1λvx, θy = sin−1λvy corresponds to the components with spatial frequencies vx and vy and has an amplitude F(vx, vy) the Fourier transform of (x,). Light can be utilized to process the Fourier change of two-dimensional capacity (x,y) , basically by making a straightforwardness with sufficiency transmittance f(x,y) through which a uniform plane rush of solidarity size is transmitted. Since every one of the plane waves has a boundless degree and in this manner covers with the other plane wave, be that as it may, it is important to discover a strategy for isolating these waves. A solitary plane wave adds to the complete abundancy at each point in the yield plane, for adequately longer separation. The focal point is utilized to concentrate every one of the plane waves into a solitary point. On the off chance that the reaction of the framework to every consonant capacity is known, the reaction to a discretionary info work is promptly dictated by the utilization of symphonious examination at the information and superposition at the yield.

3.3.2 Fourier Transform Using a Lens 1. The plane wave with angles θx = λvx and θy = λvy has a complex amplitude U(x,y,0) = F(vx,vy)exp[−j2π(vxx + vyy)] in the z = 0 plane and U(x,y,d) = H(vx,vy)F(vx,vy)exp[−j2π(vxx + vyy)] in the z = d plane, immediately before crossing the lens, where H (v x , v y ) = H 0 exp  jπλd v x2 + v 2y  is the transfer function of a distance d of free space and H0 = exp(−jkd). 2. Upon crossing the lens, the complex amplitude is multiplied  jπ ( x 2 + y 2 )  by the lens phase factor exp  . λf   3. Free space propagation between the lens and the output plane is performed to determine (x, y, d + Δ + f ) . 4. The last step is to integrate over all the plane waves. By virtue of the shifting property of the delta function, this integral gives g(x,y).

(

)

56 Machine Vision Inspection Systems The power of light at the yield plane (the back central plane of the focal point) is along these lines relative to the squared absolute value of the Fourier transform of the complex amplitude of the wave at the input plane, without a great concern to the separation d. The stage factor evaporates if d = f, so that

 x y g ( x , y ) = h1 F  , λf λf

 

(3.3)

3.3.3 Fourier Transform in the Far Field In the Fraunhofer approximation, the complex amplitude of a wave of wavelength λ in the z = d plane is proportional to the Fourier transform, of the complex amplitude (x,y) in the z = 0 plane, evaluated at the spatial frequencies. The approximation is valid if f(x,y) is confined to a circle of radius ‘b’ satisfying b2/λd ≪ 1, and at points in the output plane within a circle of radius ‘a’ satisfying a2/ λd ≪ 1.

3.3.4 Correlator Signal Processing The correlator signal processing takes two functions such as x(a,b) and y(c,d). The correlation between these two functions is expressed as

rx , y (l , m) =

∑∑ x( p, q). y*( p − l, q − m)

(3.4)

The above equation can be rewritten as

rx , y (l , m) =

∑∑ x( p, q). y*(−(l − p), −(m − q))

(3.5)

The convolution sum is given by

rx,y(l,m) = x(p,q)©y*(−l, −m)

(3.6)

The Fourier transform of the above convolution sum is given as

F[rx,y(l,m) = F[x(p,q)©y*(−l, −m)]

(3.7)

4f System Medical Optical Pattern Recognition 57 On rearranging the above equation the right hand side of the equation becomes

X(u,v).Y*(r,s)

(3.8)

3.3.5 Image Formation in 4f System A perfect picture arrangement is an optical framework that reproduces the appropriation of light in the object plane into the image plane. The picture is amplified and furthermore there is obscure coming about because of flawed centering and from the diffraction of optical waves. Consider a two-focal point imaging framework in Figure 3.2. This framework is called as 4f framework, which fills in as an engaged imaging framework with unity magnification, as can be effortlessly checked by beam following. The investigation of wave proliferation through this framework is by falling two Fourier-changing subsystems. The principal subsystem, between the object plane and the Fourier plane, plays out a Fourier change and the subsequent subsystem, between the Fourier plane and picture plane, performs Inverse Fourier change. Accordingly, without a gap the picture is an ideal copy of the article. Let m(x,y,t) be the complex amplitude transmittance of a transparency placed in the object plane and illuminated by a plane wave exp(-jkz) traveling in the z direction, as shown in the above Figure 3.2. And let m(x,y,t) be the complex amplitude in the image plane. The first lens system analyzes m(x,y,t) into its spatial Fourier transform and separates its Fourier components so that each point in the Fourier plane corresponds to a single spatial frequency. These components are then recombined by the second lens system and the object distribution is perfectly reconstructed. This 4f correlator system can also be used as a spatial filter. The Fourier plane can be designed with a mask which filters by blocking some components and transmitting others.

Image plane

Fourier plane

Object plane

Lens 1

Figure 3.2 Two lens imaging system.

Lens 2

58 Machine Vision Inspection Systems

3.3.6 Extended Medical Optical Pattern Recognition The extended medical optical pattern recognition uses 4f system as in Figure 3.3 a benchmark. The proposed system introduces a change in the Fourier plane by introducing LCD projector placed before a biconvex lens 3 connected to the computer which stores the database images. The base 4f system holds biconvex lens 1 and 2 to perform optical Fourier and optical inverse Fourier transform. The input image of the patient is placed in the object plane and is taken an optical Fourier transform to create a diffracted pattern of the input image on the Fourier plane. The database will be verified for a perfect match. As the proposed system is designed for the medical applications, high rate of accuracy is demanded. The demand is satisfied with the help of optical signals where the accuracy and speed of detection is high. The diffracted pattern of the input image and the database image is made to fall on the same Fourier plane. This helps in finding the rate at which the images are correlated to each other. The image in the Fourier plane is again passed through the biconvex lens 2 so as to perform optical inverse Fourier transform. The result of inverse Fourier transform is 180° out-of-phase pattern recognized image. This 180° out-of- phase pattern recognized image helps in indicating the severity of the diseases. The categorization of the output at the image plane helps in finding the severity of the disease. This high speed optical pattern recognition system helps in early stage detection as well as the worst stage Fourier plane

Object plane

Lens 1

R D LC CTO E J RO

P

Figure 3.3 Extended 4f system.

Image plane

Lens 2

Lens 3

4f System Medical Optical Pattern Recognition 59 of the disease at very high speed with the optical principles. Thus the system can be utilized for the medical applications.

3.4 Initial 4f System The image of the patient suffering from a disease is placed in the object plane. The image in the object plane is restricted to the part of the body affected with a disease. Optical Fourier transform is performed for the input image and its diffracted pattern is made to fall on the Fourier plane.

3.4.1 Extended 4f System The base 4f system is extended in such a way it suits the medical applications. The 4f system with three planes and two biconvex lenses is altered in a way to recognize a perfect pattern at the image plane. The Fourier plane of the 4f system is altered by introducing a LCD projector connected to a computer which stores the database reference images. The database images are projected to the newly inserted biconvex lens 3 placed before the LCD projector. Thus the Optical Fourier transform is performed for the reference image by projecting its diffracted pattern on the Fourier plane.

3.4.2 Setup of 45 Degree The newly inserted setup of biconvex lens 3, LCD projector and computer is placed at 45° to the 4f system. This 45° setup is because when the diffracted image is projected at 90°, the pattern cannot be superimposed with the diffracted pattern of the input image. The aim is to make the diffracted pattern of the input and the reference image gets superimposed at the Fourier plane. This superimposition helps in accurate pattern recognized image at the output image plane.

3.4.3 Database Creation The set of reference medical images are used for creating a database. The selection of the reference image has been considered at a key note which can be selected only after the advice of the medical experts. The database can contain folders for indicating different types of diseases. At the initial setup of the extended 4f system, the database has to be selected depending

60 Machine Vision Inspection Systems on the diseased input image at the object plane. This database selection pays a way for producing a high rate of accuracy in pattern recognition.

3.4.4 Superimposition of Diffracted Pattern The diffracted pattern of the input patient image and the database image gets superimposed at the Fourier plane. This superimposition is made possible by the 45° setup. The correlation between the images is observed by the way they get superimposed at the Fourier plane. When the images are perfectly superimposed, then the correlation is one, indicating the normal health condition of the person. If not the abnormality is detected with the uncorrelated image at the image plane. For the perfect analysis, the superimposition at the Fourier plane plays a vital role.

3.4.5 Image Plane The image plane is the output plane where the correlation between the images is detected. The biconvex lens 2 before the image plane performs the inverse optical Fourier transform. The diffracted pattern at the Fourier plane is taken to the biconvex lens with the help of optical rays passed. As a result, a 180° out of phase perfectly pattern recognized image is projected on the image plane. With the help of this image, the results are categorized with their severities. Higher the correlation peak, lower the severity of the disease. • Maximum the correlation peak, least severity • Precautionary steps with the mid-rate correlation • Continuous monitoring with a under rated correlation peak (reference to the mid-rate correlation) • Immediate treatment with the low correlation. The following table helps in detailed analysis of the system with their reference values at Image plane.

3.5 Simulation Output 3.5.1 MATLAB MATLAB is abbreviated as MATrix LABoratory. This platform helps in technical computing at high rate of performance. This helps in integrating

4f System Medical Optical Pattern Recognition 61 computation, visualization and programming as the data are represented in mathematical notations. As the scientifically proven models are purely mathematical, those models are simulated and verified with MATLAB. The advantage is the diverse support for newly designed scientific models. It also helps for adding new packages depending on the demand for design.

3.5.2 Sample Input Images The extended 4f system supports for fast detection of abnormalities in the given input patient image. As an initial step, the designed system is verified for brain tumor images available in the internet [25]. The sample input image is shown in Figure 3.4.

3.5.3 Output Simulation Figure 3.4 is selected as an input patient image to be placed at the object plane. The optical rays are passed through this input image and when the rays hit on the biconvex lens, a diffracted pattern is projected on the

Figure 3.4 Image with a detected tumor cells.

62 Machine Vision Inspection Systems Fourier plane. Figure 3.4 has to be compared with the reference image at the database. The reference image is taken from the internet for the initial stage of simulation with MATLAB. Figure 3.5 is taken as a reference image for detecting brain tumor cells [26]. This has to be reanalyzed for the real time simulation. Because the reference image is the main aim of this extended work. So suggestions from the medical experts can be involved for selecting the reference image. The creation of database is clearly explained in the future scope. For verifying the working condition of extended optical 4f setup, Figures 3.4 and 3.5 are used. The diffracted pattern of input patient image at the Fourier plane has to be superimposed with the diffracted pattern of the reference image at the database. The perfect match for the input patient image has to be selected from the database. Once selected, then its diffracted pattern is to be projected on the Fourier plane with the help of biconvex lens 3. The diffracted patterns of the input and reference images are superimposed at the Fourier plane at the angle of 45°. The second half of the extended 4f system helps in detecting the correlation peak between the input and reference images. The superimposed diffracted pattern is passed through the biconvex lens 2. The lens 2 with the principle of optics converts the diffracted pattern into a 180 degree pattern recognized output at the object plane.

Figure 3.5 Reference image for detecting brain tumor cells.

4f System Medical Optical Pattern Recognition 63 Figure 3.6 represents the uncorrelation peak at the image plane for Figures 3.4 and 3.5. This uncorrelation represents the severity of the tumor is high. This detection of correlation between the images will be at high rate because of using optical rays. The pattern recognized output at the object plane is used for finding the severity of the disease as mentioned in Table 3.1. For the outputs simulated with Figures 3.4 and 3.5, the severity of the disease is very high, as the correlation between them is very low. The outputs are also generated for the reference image as the input image at the object plane so as to check the accuracy of the extended 4f system. When the diffracted pattern at the Fourier plane is analyzed for the above mentioned case, perfect superimposition is detected. This perfect superimposition indicates the perfect correlation between the images. It is also inferred that the severity of disease in this case is zero, indicating the perfect working of extended 4f system. Correlation between the images is represented in the Figure 3.7.

Figure 3.6 Uncorrelation peak indicating the severity of tumor.

Table 3.1 Categorization of output values. Correlation peak

Reference values

Actions taken

Maximum

1

Least severity/no treatment

Mid-rate

0.5

Precautionary steps

Under rated

0.4

Continuous monitoring

Minimum

0.2

Immediate treatment

64 Machine Vision Inspection Systems

Figure 3.7 Correlation peak indicating the normal condition of the patient.

3.6 Complications in Real Time Implementation The outputs generated in the simulation environment of MATLAB are verified for a single input image and a reference image, which does not, suits the real time applications. The real time implementation demands for various prerequisite.

3.6.1 Database Creation Database creation is the first criteria for the implementation of extended 4f system. The database for medical applications demands for including various types of abnormalities a patient meet in their day to day life. The database can be created for a specific abnormality or for a group of abnormalities. For example, the database can be created in a way that supports for bone fracture in leg and hand or for the types of tumor cells. The selection of reference image among the database can be made automatic or be allowed manually. The reference image are always not supposed be a single image. To improve the accuracy of the results, multiple comparisons can be made with different sets of reference images. The database creation purely depends on the suggestions of the medical experts, as the main aim of this implementation is to make their work simple.

4f System Medical Optical Pattern Recognition 65

3.6.2 Accuracy The designed 4f system aims at high accuracy with greater rate of performance. The optical signals supports for both accuracy and also for high rate of performance. Since the optical signals travel at the velocity of light, the higher rate of performance is achieved. Also, the accuracy is achieved with the optical signals.

3.6.3 Optical Setup The biconvex lens is used in the designed extended 4f system. The simulation in MATLAB demands the equations of biconvex lens to perform optical Fourier transform. As MATLAB supports for adding extra packages to it, the biconvex lens package can be added and called wherever required.

3.7 Future Enhancements The future scope of the extended 4f system is to automate the idea in the simulation environment and also to analyze the outputs for different sets of database reference images. Also the system can be extended for continuous monitoring of health conditions of the patient. This system can be preferred because of the higher rate of performance by the optical signals. As only the images are exposed to the optical rays, human cells are been protected.

References 1. Dhivyapriya, E.L. and Pragash, N.N., Advanced high speed optical pattern recognition for surveillance systems. IEEE Digital Library, https://www. researchgate.net/publication/320746649. 2. Dhivyapriya, R.A., Premalatha, S., Divakaran, J., A novel step for the enhancement of Security in Airports using optical ideology. Int. J. Future Revolut. Comput. Sci. Commun. Eng., 4, 5, 204–207, 2018. 3. Nagime, A.V. and Patange, A.D., Smart CCTV Camera Surveillance System. Int. J. Sci. Res. (IJSR), 2319–7064, 2016. 4. Bahasaleh, and Teach, chapter 4, in: Photonics, Introduction to Fourier optics. 5. Clemente, P., Durán, V., Tajahuerce, E., Andres, P., Climent, V., Lancis, J., Compressive holography with a single-pixel detector. Opt. Lett., 38, 14, 2524–2527, 2013.

66 Machine Vision Inspection Systems 6. Cull, C.F., Wikner, D.A., Mait, J.N., Mattheiss, M., Brady, D.J., Millimeterwave compressive holography. Appl. Opt., 49, 19, 67–82, 2010. 7. Dausmann, G., Menz, I., Gnadig, K., Yang, Z., Copy Proof Machine Readable Holograms for Security Application. Proc. SPIE, 2659, 198–201, 1996. 8. Rueda, E., Rios, C., Barrera, J.F., Henao, R., Torroba, R., Experimental multiplexing approach via code key rotations under a joint transform correlator scheme, Elsevier. Opt. Commun., 284, 2500–2504, 2011. 9. Rueda, E., Barrera, J.F., Henao, R., Torroba, R., Optical encryption with a reference wave in a joint transform correlator architecture, Elsevier. Opt. Commun., 282, 3243–3249, 2009. 10. Hahn, J., Lim, S., Choi, K., Horisaki, R., Brady, D.J., Video-rate compressive holographic microscopic tomography. Opt. Express, 19, 8, 7289–7298, 2011. 11. Javidi, B., Bernard, L., Towghi, N., Noise performance of phase encryption compared to XOR encryption. Opt. Eng., 38, 9–19, 1999. 12. Bauer, J., Podbielska, H., Suchwalko, A., Mazurkiewicz, J., Optical Correlators for Recognition of Human Face Thermal Images. Proc. SPIE, 5954, 59540– 59548, 2005. 13. Marim, M., Angelini, E., Olivo-Marin, J.C., Atlan, M., Off-axis compressed holographic microscopy in low-light conditions. Opt. Lett., 36, 1, 79–81, 2011. 14. Muravsky, L.I. and Fitio, V.M., Identification of a Random Binary Phase Mask and Its Fragments with a Joint Transform Correlator. Proc. SPIE, 3238, 87–96, 1997. 15. Odinokov, S.B., Access Control Holographic System Based on Joint Transform Correlator and Image Encoding. Opt. Mem. Neural Networks (Inf. Opt.), 17, 3, 220–231, 2008. 16. Refregier, P. and Javidi, B., Optical image encryption based on input plane and Fourier plane random encoding. Opt. Lett., 20, 767–769, 2009. 17. Rivenson, Y., Stern, A., Javidi, B., Improved depth resolution by single- exposure in-line compressive holography. Appl. Opt., 52, 1, A223–A231, 2013. 18. Scholl, M.S., Architecture for Object Identification: Incorporating an Optical Correlator and Digital Processing for Display and Recording of Optical Data. Opt. Eng., 34, 3, 887–895, 1995. 19. Yuan, S., Zhang, T., Zhou, X., Liu, X., Liu, M., Optical authentication technique based on interference image hiding system and phase-only correlation, Elsevier. Opt. Commun., 304, 129–135, 2013. 20. Tian, L., Liu, Y., Barbastathis, G., Improved Axial Resolution of Digital Holography Via Compressive Reconstruction in Digital Holography and Three-Dimensional Imaging. Opt. Soc. Am. (OSA), DW4C.3–DW4C.3, 2012. 21. Wen, C., Xudong, C., Stern, A., Javidi, B., Phase-modulated optical system with sparse representation for information encoding and authentication. IEEE Photonics J., 5, 2, 84–88, 2013. 22. Williams, L., Nehmetallah, G., Banerjee, P.P., Digital tomographic compressive holographic reconstruction of three-dimensional objects in transmissive and reflective geometries. Appl. Opt., 52, 8, 1702–1710, 2013.

4f System Medical Optical Pattern Recognition 67 23. Wu, X., Yu, Y., Zhou, W., Asundi, A., 4f amplified in-line compressive holography. Opt. Soc. Am., 22, 17, 19860–19864, 2014. 24. Li, Y., Kreske, K., Rosen, J., Security and encryption optical systems based on a correlator with significant output images. Opt. Soc. Am., 39, 29, Applied optics, pp. 5295-5301, 2000. 25. https://www.semanticscholar.org/paper/Detection-and-GTV-Definitionof-Brain-Tumors-in-MRI-AbdoelrahmanHassanA.-Garelnabi/e74f8ba 6b502caf03794f844b7a994cf2b5eafa8/figure/0. 26. https://www.researchgate.net/figure/Axial-T1W-MRI-of-the-brain-Thoughthe-entire-image-occupies-hard-disk-space-for-storage_fig3_225059460.

4 Brain Tumor Diagnostic System— A Deep Learning Application Kalaiselvi, T.* and Padmapriya, S.T. Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Dindigul, Tamilnadu, India

Abstract

Deep learning is one of the emerging fields in machine vision and its applications. Convolution neural network (CNN) is a class of deep learning model which is mainly used for digital image analysis processing. Hence, we have made an ablation study on CNN for brain tumor diagnostic process. Eight such CNN models were developed and tested using magnetic resonance imaging (MRI) of human head scans collected from brain tumor challenge websites and MRI centers. The proposed CNN models achieved about 90–99% of accuracy during brain tumor classification process. In this book chapter, applied mathematics for basics of machine learning, machine learning algorithms, deep learning techniques and the proposed CNN models were discussed. Keywords: CNN, MRI, brain tumor, classification, accuracy

4.1 Introduction 4.1.1 Intelligent Systems Intelligent systems have the ability to determine, justify and interpret relationships. These intelligent systems retrieve information from their previous experience. It has the ability to solve problems. It can be able to classify, generalize and adapt to new situations [1]. An area of computer science that emphasizes the creation of intelligent machines that imitates *Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (69–90) © 2021 Scrivener Publishing LLC

69

70 Machine Vision Inspection Systems

AI ML

DL

CNN

Figure 4.1 Computer vision techniques.

the behavior of human is called as Artificial Intelligence (AI). Various computer vision techniques which comes in the category of AI is shown in Figure 4.1. Machine learning is one of the current applications of AI and it is based on the idea that we should enable the machine to learn itself by giving access to data [2]. Deep learning is one of the sub fields of machine learning. Machine learning is similar to human vision whereas deep learning is similar to animal vision. CNN is an emerging deep learning technique for image analysis. Processing of digital images to highlight some conspicuous diseases in order to provide some support radiologists or medical professionals is done by CAD systems.

4.1.2 Applied Mathematics in Machine Learning Mathematical concepts such as statistics, probability, linear algebra, numerical computation plays a vital role in machine learning [3]. To explore the various concepts of machine learning, a good mathematical understanding is necessary in order to grasp the inner workings of these algorithms. Applied mathematics is used to understand the parameter settings, validation strategies, identification of overfitting and underfitting that are caused by bias-variance tradeoff and to estimate the uncertainty in machine learning algorithms. Linear algebra is a part of mathematics which includes vectors, matrices and linear transforms. It acts as a foundation to machine learning [4]. Some of the examples of linear algebra in machine learning are dataset, data files, images and photographs, one hot encoding, linear regression,

Brain Tumor Diagnostic System 71 regularization, principal component analysis, single value decomposition, latent semantic analysis, recommender systems, deep learning, etc. Probability theory acts as a foundation of many machine learning algorithms. Probability tells the possibility of different outcomes [4]. A sample space (s) is the set of all possible outcomes. For example, the sample space for the flip of a coin is {heads, tails}. Random variable is a variable that takes values from the sample space randomly. For example, if x is denoted as the outcome of a coin flip, then x = heads or x = tails. In order to describe the likelihood of the random variable, a probability distribution is specified [4].

X ~ P(X)

(4.1)

This indicates that X is a random variable that is drawn from a probability distribution P(X). Probability Mass Function (PMF) is used to describe discrete random variables. The value in the variable’s sample space is mapped to the probability using PMF [5]. An example for discrete distribution is Bernoulli. The Bernoulli distribution denotes the probability for a random variable which takes one of two values (for example, heads/tails, true/false, rain/no rain, etc). Probability Density Function (PDF) is used to describe continuous distributions [6]. The infinite sample space is mapped to relative likelihood values using PDF’s. An example for continuous distribution is Gaussian distribution. Gaussian or normal distribution is parameterized by the mean and the variance. A multiple random variables distribution is called as a joint probability distribution. Marginal probability distribution of x given y is represented as

P(x) = Ey P(x,y)

(4.2)

Here sum rule is used to marginalize the random variables. Conditional probability distributions of x given y is represented as

P(x|y) = P(x,y) ÷ P(y)

(4.3)

Probability of x is conditioned on observing y. Bayes rule is represented as

P(x|y) = P(y|x).P(x) ÷ P(y) This rule is very important in machine learning [7].

(4.4)

72 Machine Vision Inspection Systems Let us consider two variables x and y. Here x and y are said to be independent if and only if

P(x,y) = P(x).P(y)

(4.5)

Similarly, x and y are said to be conditionally independent given another variable z, if and only if

P(x,y|z) = P(x|z).P(y|z)

(4.6)

4.1.3 Machine Learning Basics Machine learning is one of the major subfield of Artificial Intelligence (AI). Machine learning deals with the understanding of data structure and its aim is to fit those data into models that can be utilized by the appropriate users [8]. Even though, machine learning belongs to computer science field, it is little different from traditional computational approaches. Machine learning techniques have been adapted by many applications such as facial recognition technology, computer vision, natural language processing, etc. Machine learning is one of the fields which undergo continuous development. Machine learning can be classified into two types namely supervised and unsupervised learning. Some of the common algorithms that are used in machine learning include kNN (k-Nearest Neighbor) algorithm, decision tree learning and deep learning. Types of Machine Learning algorithms: The types of machine learning algorithms are shown in Figure 4.2. Supervised learning comprises of dependent variable (target) and independent variables (set of predictors). By using these variables, a function is generated to map the inputs to desired outputs [9]. The training process is continued till the model achieves a good level of accuracy on training data. Some of the examples of supervised learning include regression, decision tree, random forest, kNN, logistic regression, etc. In unsupervised learning, there will not be any target variable to predict. Some of the examples of unsupervised learning are apriori algorithm, k-means, etc. In Reinforcement learning, the machine is exposed into an environment where it trains itself continually using trial and error method [10]. Here, the machine learns from the part experience and it tries to acquire

Brain Tumor Diagnostic System 73 Types of Machine Learning

Supervised Learning

Unsupervised Learning

Decision tree

Apriori algorithm

Random forest

K-means algorithm

Reinforcement Learning

Markov decision process

kNN Logistic regression

Figure 4.2 Types of machine learning algorithms.

the knowledge to make accurate decisions. An example for reinforcement learning is Markov decision process.

4.1.4 Machine Learning Algorithms Some of the common types of machine learning algorithms are listed in Figure 4.2. Linear regression establishes a relationship between independent and dependent variables by fitting a best line [11]. The best fit line is known as regression line and is represented by a linear equation

Y = AX + B

(4.7)

Where Y is the dependent variable A is the slope X is the independent variable B is an intercept. Logistic regression is used to predict discrete values (for example, binary values like 0/1, yes/no, true/false) based on the given set of independent variables. Since, logistic regression predicts the probability, its output values lies between 0 and 1. Decision tree is a kind of supervised learning algorithm that is mostly used for classification problems. Decision tree is employed for categorical and continuous dependent variables. Here, the population is split into two or more homogenous sets. This process is done based on more significant attributes or independent variables to make as many distinct groups as possible.

74 Machine Vision Inspection Systems Support Vector Machine (SVM) is used for classification. Here each data item is plotted as a point in n-dimensional space where n denotes the number of features with the value of each feature being the value of a particular coordinate. Naïve Bayes is one of the techniques that are used for classification which is based on bayes theorem. Naïve Bayes classifier is based on the assumption of presence of a particular feature in a class. This class in turn is unrelated to the presence of any other feature. For very large datasets, Naïve Bayes model is easily built. Bayes theorem is used for calculating posterior probability using this formula,

P(c|x) = [P(x|c) · P(c)] ÷ P(x)

(4.8)

Where P (c|x) denotes the posterior probability of target class given predictors (attributes). P (c) denotes the prior probability of class. P (x|c) denotes the likelihood which is the predictor probability of the given class. P (x) denotes the prior probability of the predictor. kNN algorithm is used both for classification and regression problems. In this algorithm, all the available cases are stored and the new cases are classified according to the majority vote of its k neighbors. kNN algorithm is considered to be computationally expensive. All the variables have a chance of getting bias by higher range variables if it is not normalized. This algorithm works on preprocessing stage. In order to solve the clustering problem, an unsupervised algorithm named k-means algorithm is used. The procedure of k-means algorithm is to classify a given set of data using a certain number of clusters. Data points inside a cluster are homogenous in nature and these data points are heterogeneous to other clusters. An ensemble of decision trees are known as random forest. In order to classify new object, tree votes are given for that class. Finally, the classification having most votes is chosen by the forest. An ensemble of learning algorithms that combines the prediction of several base estimators in order to improve robustness over a single estimator is known as boosting [12]. While dealing with plenty of data to make a prediction with high prediction power, GBM is used. Some of the well known gradient boosting algorithms are XGBoost, LightGBM, catboost, etc.

Brain Tumor Diagnostic System 75

4.2 Deep Learning 4.2.1 Evolution of Deep Learning A branch of machine learning is deep learning. Deployment of algorithms for processing data, imitation of thinking process and development of abstractions is done by deep learning. Deep learning uses several layers of algorithms for data processing, understanding of human speech and objects are recognized visually [13]. Here, the information is passed through each layer, where the output of the previous layer acts as the input for next layer. The first layer in the network is termed as input layer, the middle layers are termed as hidden layers and the last layer is termed as output layer. Feature extraction is another aspect of deep learning which uses an algorithm to automatically construct meaningful features of the data for learning, training and understanding. Deep learning had enhanced the state-of-the-art in many artificial intelligent tasks such as machine translation, object detection, speech recognition, etc. When compared to other machine learning algorithms, deep learning performs very well by the way it is proposed. The variation between machine learning and deep learning is its performance as the data scale increases. When the data is small, the deep learning algorithms will not perform well. In order to understand deep learning in a perfect way, a huge amount of data is required. In 1943, Warren Mcculloch and Walter Pitts created a model based on the neural networks of the human brain. They defined a term called threshold logic to mimic the thought process [14]. Threshold logic is the combination of mathematics and algorithms. During the steady development of deep learning, there were two significant breaks. The basic of back propagation model was developed by Henry J. Kelley in 1960. Stuart Dreyfus developed a very simple version based only on the chain rule in 1962. The concept of back propagation existed during early 1960’s but it was useful until 1985. In 1965, efforts were taken to develop deep learning algorithms. Alexey Grigoryevich Ivakhnenko and Valentin Grigorevich Lapa used polynomial activation functions for their models which were analyzed statistically [15]. In the 1970s, lack of funding limited both deep learning and artificial intelligence research. Kunihiko Fukushima designed the neural networks with multiple pooling and convolutional layers. Earlier he developed an artificial neural network called Neocognitron in 1979 which used a multilayered and hierarchical design.

76 Machine Vision Inspection Systems

4.2.2 Deep Networks Feed forward neural networks or multilayer perceptron is composed of many sigmoid neurons. It is capable to handle nonlinearly separable data [16]. The layers present between the input and output layers are called as hidden layers. It is called as feed forward because the information moves only in forward direction in neural network via input nodes, then through the hidden layers (it may be single or many layers) and at last through the output nodes. It is shown in Figure 4.3. Back propagation neural networks is a short form for “backward propagation of errors”. This method is one of the standard methods [17]. This algorithm calculates the minimum value of the error function in weights using an approach called the delta rule or gradient descent. The solution to the learning problem is to minimize the error function using weights. Back propagation is done as shown in Figure 4.4. 1. Initialize some random value and propagate forward. 2. If there are some errors we propagate backwards in order to reduce that errors. This step is repeated until the error gets minimized.

Input

Hidden1

Figure 4.3 Feed forward neural networks.

Hidden2

Output

Brain Tumor Diagnostic System 77 w x w x

Calculation of difference between actual and the predicted

w x

Backpropagation w Input

Hidden1

Hidden2

Output

Figure 4.4 Back propagation neural networks.

4.2.3 Convolutional Neural Networks Convolutional Neural Networks (CNN) could be a reasonably deep feed forward network. CNN was trained much ease compared to networks having full connectivity between layers that are nearby. CNN has been adopted by the pc vision community. CNN aims in processing data that are within the kind of arrays. The initial stages of CNN comprises of two varieties of layers, pooling layers and convolutional layers. The units are prearranged as feature maps within the convolutional layer. There are a group of weights termed as a filter bank. Here every unit is linked to the local patches of the preceding layer through the filter bank. The role of convolutional layer is to detect the local permutations of features from the preceding layer, whereas the role of the pooling layer is to integrate the comparable features which are semantic. CNN is capable to find out in an exceedingly manner that it will be able to differentiate edges from the raw pixels within the first layer in image taxonomy. It applies the perimeters to detect simple shapes within the second layer and so they apply these shapes in preventing higher level characteristics. The ultimate layer is termed as a classifier that applies these characteristics of larger level. CNN consists of an input layer, an output layer together with numerous hidden layers and therefore the architecture of proposed CNN is shown in Figure 4.5. Convolutional layer mimics a person’s response neuron to visual stimuli. Here convolution procedure is applied to the input thus conveying the result of the next layer. This layer consists of manifold neurons which are

BN1

Input image Conv1

0.99 (2x2) (64,3x3)

Figure 4.5 Proposed CNN architecture.

Conv2 BN2 Pool2

(32,3x3)

Pool1

0.99 (2x2)

(150x150) (32,3x3)

Conv3 BN3

0.99 (2x2)

Pool3

0.99 (2x2)

Conv4 BN4

(64,3x3)

Pool4

0.99 (2x2)

Conv5 BN5 Pool5

(64,3x3)

Fully connected Output Layers

ReLu Sigmoid Prediction

78 Machine Vision Inspection Systems

Brain Tumor Diagnostic System 79 termed as filters or maps with their size almost like input image dimension. The sigmoid and ReLU activation functions are shown in Figure 4.6. The sigmoid function is an activation function. It scales the values between 0 and 1 by using a threshold value. Below is a sigmoid curve,

f(x) = 1/1 + e^(−x)

(4.9)

The above equation represents a sigmoid function. When we apply the weighted sum in the place of x, the values are scaled in between 0 and 1. The nature of an exponent is that the value never reaches zero nor exceed 1 in the above equation. The sigmoid function scales the large negative numbers to 0 and large positive numbers to 1. ReLU could be a nonlinear resolution to use an activation layer [19]. This layer is followed by every convolutional layer. The researchers discovered that ReLU layers are more enhanced within the way they operate because the network has the attitude to train more quickly. ReLU assist in relieving the vanishing gradient challenge. The ReLU layer makes use of the function f(x) = max (0, x). Here all the negative activations are changed to zero. For the multiscale analysis, pooling layer lessens their input size. A number of the foremost popularly used pooling operators are average

X1

X2

output

X3

Activation function

Sigmoid function 1

Rectified Linear Unit 10 8 6

0.5

4 2

-6

-4

-2

0

0

2

4

6

Figure 4.6 Sigmoid and ReLU activation functions.

-10

-5

5

10

80 Machine Vision Inspection Systems pooling and max pooling. The max pooling operator calculates the utmost value within a little spatial block. The pooling operation down samples the input image efficiently by reducing the amount of model parameters. The scale of the output may well be regulated by three hyper structures which are zero padding, depth and strides. Strides will be defined because the number of pixels the filter skips after they slide over the image. Depths are often defined in terms of input image and it’s basically the number of filters employed. Using this depth filters, blobs, edges and corners are often detected. Zero padding is explained because the padding of zeros rounds the input’s borders for preserving its size. It’s necessary for one or more completely connected layers to follow the ultimate pooling layer. Fully connected layer is attached to every neuron of the preceding layer and typically these layers are applied because the final layer of the network. Batch Normalization is the normalization of activation functions or the output value of convolution. They inferred that when BN is employed, it’s not influenced by a parameter scale during weight propagation. They need also used dropout layer followed by the ultimate convolution block. Dropout is employed to avoid overfitting. It’s a regularization technique.

4.3 Brain Tumor Diagnostic System 4.3.1 Brain Tumor One of the most common primary brain malignancies are known as gliomas. Gliomas can be categorized into glioblastoma (HGG) and low grade glioma (LGG) based on the pathological evolution of tumor. The most aggressive human brain tumor is glioblastoma [18]. The sub regions of gliomas contain peritumoral edema, necrotic core, an enhancing and non enhancing tumor core. In order to portray the phenotype and heterogeneity of gliomas, multimodal MRI is used. MRI scans are classified as T1-weighted, contrast enhanced T1-weighted, T2-weighted and Fluid Attenuation Inversion Recovery (FLAIR) images.

4.3.2 Methodology In our proposed work, we have constructed eight different CNN models and named them as proposed convolutional neural networks (pCNN) from pCNN1 to pCNN8. pCNN1 Model was developed as our first trial. This model consists of two Convolutional layers each containing a convolution and pooling. We

Brain Tumor Diagnostic System 81 have used 32 filters and 3 × 3 strides in the first convolution layer and max pooling is done using 2 × 2 strides. As defined in the first layer, the second convolution layer contains 32 filters with 3 × 3 strides and max pooling is done using 2 × 2 strides. The number of epochs that are defined in this model is 5. We have used ReLU and sigmoid activation functions in this model. ReLU is used to vomit the negative values. Sigmoid is used for binary prediction. Adam optimizer is used for compilation of CNN model. Here, the 150 × 150 dimensional images are reduced to 64 × 64 dimensions while training and testing. pCNN2 model consists of five Convolutional layers each containing a convolution and a pooling layer. The first layer of convolution is set to 32 filters with 3 × 3 strides and the first pooling is done using 2 × 2 strides. The second layer of convolution has 32 filters with 3 × 3 strides and the second pooling is done using 2 × 2 strides. The third layer of convolution is designed to be having 64 filters with 3 × 3 strides and the third pooling is done using 2 × 2 strides. The fourth layer of convolution has 64 filters with 3 × 3 strides and the fourth pooling is done using 2 × 2 strides. The fifth layer consists of 64 filters and 3 × 3 strides in its convolution and 2 × 2 strides is used to do fifth pooling. Before the fully connected layer, a dropout layer is used to avoid overfitting. The drop out is set to 0.5. We used ReLU and Sigmoid activation functions as mentioned in TLFE model. For compiling the CNN model, adam optimizer is used. We used 30 epochs to train the system with this model. pCNN3 model is Five Layer with Stopping Criteria and Dropout model that contains five layers of convolution each consisting of a convolution and a pooling layer. The first layer of convolution is set to 32 filters with 3 × 3 strides and the first pooling is done using 2 × 2 strides. The second layer of convolution has 32 filters with 3 × 3 strides and the second pooling is done using 2 × 2 strides. The third layer of convolution is designed to be having 64 filters with 3 × 3 strides and the third pooling is done using 2 × 2 strides. The fourth layer of convolution has 64 filters with 3 × 3 strides and the fourth pooling is done using 2 × 2 strides. The fifth layer consists of 64 filters and 3 × 3 strides in its convolution and 2 × 2 strides is used to do fifth pooling. Before the fully connected layer, a dropout layer is used to avoid overfitting. The drop out is set to 0.5. We used ReLU and Sigmoid activation functions as mentioned in TLFE model. For compiling the CNN model, adam optimizer is used. Here also we have defined number of epochs as 30. Along with that we have defined a early stopping criteria. If there is no change in the accuracy or loss while running epochs, the epoch will stop earlier. In this model, the epoch has stopped at 13. This will gradually decrease the time to execute the CNN model without any loss in accuracy of prediction.

82 Machine Vision Inspection Systems We have developed another model named pCNN4. This model also contains five layers of convolution. The first layer of convolution is set to 32 filters with 3 × 3 strides and the first pooling is done using 2 × 2 strides. The second layer of convolution has 32 filters with 3 × 3 strides and the second pooling is done using 2 × 2 strides. The third layer of convolution is designed to be having 64 filters with 3 × 3 strides and the third pooling is done using 2 × 2 strides. The fourth layer of convolution has 64 filters with 3 × 3 strides and the fourth pooling is done using 2 × 2 strides. The fifth layer consists of 64 filters and 3 × 3 strides in its convolution and 2 × 2 strides are used to do fifth pooling. We used ReLU and Sigmoid activation functions as mentioned in TLFE model. For compiling the CNN model, adam optimizer is used. Here also we have defined number of epochs as 30. Along with that we have defined a early stopping criteria. If there is no change in the accuracy or loss while running epochs, the epoch will stop earlier. In this model, the epoch has stopped at 13. This will gradually decrease the time to execute the CNN model without any loss in accuracy of prediction. We have developed pCNN5 model. This model contains five layers of convolution each consisting of a convolution and a pooling layer. The first layer of convolution is set to 32 filters with 3 × 3 strides and the first pooling is done using 2 × 2 strides. The second layer of convolution has 32 filters with 3 × 3 strides and the second pooling is done using 2 × 2 strides. The third layer of convolution is designed to be having 64 filters with 3 × 3 strides and the third pooling is done using 2 × 2 strides. The fourth layer of convolution has 64 filters with 3 × 3 strides and the fourth pooling is done using 2 × 2 strides. The fifth layer consists of 64 filters and 3 × 3 strides in its convolution and 2 × 2 strides is used to do fifth pooling. In all the five layers of convolution, we have added a BN layer and its momentum are set to 0.99. Before the fully connected layer, a dropout layer is used to avoid overfitting. The drop out is set to 0.5. We used ReLU and Sigmoid activation functions as mentioned in previous models. For compiling the CNN model, adam optimizer is used. Here also we have defined number of epochs as 30. Along with that we have defined a early stopping criteria. If there is no change in the accuracy or loss while running epochs, the epoch will stop earlier. In this model, the epoch has stopped at 20. This will gradually decrease the time to execute the CNN model without any loss in accuracy of prediction. We have developed is pCNN6. This model contains five layers of convolution each consisting of a convolution and a pooling layer. The first layer of convolution is set to 32 filters with 3 × 3 strides and the first pooling is done using 2 × 2 strides. The second layer of convolution has 32 filters

Brain Tumor Diagnostic System 83 with 3 × 3 strides and the second pooling is done using 2 × 2 strides. The third layer of convolution is designed to be having 64 filters with 3 × 3 strides and the third pooling is done using 2 × 2 strides. The fourth layer of convolution has 64 filters with 3 × 3 strides and the fourth pooling is done using 2 × 2 strides. The fifth layer consists of 64 filters and 3 × 3 strides in its convolution and 2 × 2 strides are used to do fifth pooling. In all the five layers of convolution, we have added a BN layer and its momentum are set to 0.99. We used ReLU and Sigmoid activation functions as mentioned in previous models. For compiling the CNN model, adam optimizer is used. Here also we have defined number of epochs as 30. Along with that we have defined a early stopping criteria. If there is no change in the accuracy or loss while running epochs, the epoch will stop earlier. In this model, the epoch has stopped at 20. This will gradually decrease the time to execute the CNN model without any loss in accuracy of prediction. We have developed another model named pCNN7. This model contains five layers of convolution each consisting of a convolution and a pooling layer. The first layer of convolution is set to 32 filters with 3 × 3 strides and the first pooling is done using 2 × 2 strides. The second layer of convolution has 32 filters with 3 × 3 strides and the second pooling is done using 2 × 2 strides. The third layer of convolution is designed to be having 64 filters with 3 × 3 strides and the third pooling is done using 2 × 2 strides. The fourth layer of convolution has 64 filters with 3 × 3 strides and the fourth pooling is done using 2 × 2 strides. The fifth layer consists of 64 filters and 3 × 3 strides in its convolution and 2 × 2 strides is used to do fifth pooling. In the first layer of convolution, we have added a BN layer and its momentum is set to 0.99. In all the other layers we have not used BN. We used ReLU and Sigmoid activation functions as mentioned in previous models. For compiling the CNN model, adam optimizer is used. Here also we have defined number of epochs as 30. Along with that we have defined early stopping criteria. If there is no change in the accuracy or loss while running epochs, the epoch will stop earlier. In this model, the epoch has stopped at 20. This will gradually decrease the time to execute the CNN model without any loss in accuracy of prediction. The next model that we have developed is pCNN8 model. This model contains five layers of convolution each consisting of a convolution and a pooling layer. The first layer of convolution is set to 32 filters with 3 × 3 strides and the first pooling is done using 2 × 2 strides. The second layer of convolution has 32 filters with 3 × 3 strides and the second pooling is done using 2 × 2 strides. The third layer of convolution is designed to be having 64 filters with 3 × 3 strides and the third pooling is done using 2 × 2 strides. The fourth layer of convolution has 64 filters with 3 × 3 strides

84 Machine Vision Inspection Systems and the fourth pooling is done using 2 × 2 strides. The fifth layer consists of 64 filters and 3 × 3 strides in its convolution and 2 × 2 strides are used to do fifth pooling. In the first three layers of convolution, we have added a BN layer and its momentum are set to 0.99. In all the other two layers we have not used BN. We used ReLU and Sigmoid activation functions as mentioned in previous models. For compiling the CNN model, adam optimizer is used. Here also we have defined number of epochs as 30. Along with that we have defined an early stopping criterion. If there is no change in the accuracy or loss while running epochs, the epoch will stop earlier. In this model, the epoch has stopped at 20. This will gradually decrease the time to execute the CNN model without any loss in accuracy of prediction.

4.3.3 Materials and Metrics We used BRATS 2013 dataset (BRATS2013 Challenge, MRI brain tumor database) and a sample Clinical data set collected from WBA (The Whole Brain Atlas (WBA), Department of Radiology and Neurology at Women’s Hospital, Harvard Medical School, Boston, USA) for training the system using Keras and Tensorflow in Python. Accuracy, False Alarm, Missed Alarm are used as evaluation metrics for checking the performance of the proposed work. False Alarm can be defined as the slices that are identified falsely. For example a tumor slice may be classified as non tumor slice and vice versa. Missed Alarm (MA) may be defined as the slices that are missed during classification. For example, a slice from the abnormal slices in the MRI volumes is not detected as tumor slice.

Missed Alarm( MA)% =

False Alarm( FA)% =

Accuracy =

FP ∗100 Total Slices

(4.10)

FN ∗100 Total Slices

(4.11)

(TP + TN ) (TP + TN + FP + FN )

(4.12)

where true positive (TP) is number of tumor slices correctly identified. True negative (TN) is number of normal slice detected correctly. False

Brain Tumor Diagnostic System 85 positive (FP) is number of tumor slice incorrectly identified. False negative (FN) is number of normal slice incorrectly classified.

4.3.4 Results and Discussions We have trained our proposed CNN models using BraTs2013 data. We have tested our models with a clinical MRI Volume which contains about eight volumes. Among them, two volumes are the volumes of non tumor patients and remaining six volumes are the volumes of tumor patients. In pCNN1 model, V01 and V02 which are the MRI volumes of non tumor patients, there occurs false alarm of 43 and 16% respectively. This is one of the major limitations of pCNN1 model since it is not fair to predict tumor slices from non tumor patient MRI. We tried to overcome this limitation by developing next model named pCNN2 model. In pCNN2 model, V01 and V02 which are the MRI volumes of non tumor patients, there occurs false alarm of 22 and 34% respectively. It is not fair to predict tumor slices from non tumor patient MRI. In this model also we have not seen any improvement. We tried to overcome this limitation by developing next model named pCNN3 model. In pCNN3 model, V01 and V02 which are the MRI volumes of non tumor patients, there occurs false alarm of 29 and 27% respectively. It is not fair to predict tumor slices from non tumor patient MRI. In this model also we have not seen any improvement. We tried to overcome this limitation by developing next model named pCNN4 model to get some better results when compared to this model. In pCNN4 model, V01 and V02 which are the MRI volumes of non tumor patients, there occurs false alarm of 5 and 11% respectively. It is not fair to predict tumor slices from non tumor patient MRI. In this model we have seen some improvement when compared to the previous models. Even though, we tried to reduce the false alarm percentage further by developing next model named pCNN5 model to get some better results when compared to this model. In pCNN5 model, V01 and V02 which are the MRI volumes of non tumor patients, there is zero percentage of false alarms. In this model we have seen very good improvement when compared to the previous models. But this model contains a limitation that in V05 and V08 which are the MRI volumes of tumor patients, none of the slices are detected as tumor slices. This is also a not good result in medical field. In order to overcome this, we moved on to next model named pCNN6 model. In pCNN6 model, V01 and V02 which are the MRI volumes of non tumor patients, there is zero percentage of false alarms. In this model we have seen very

86 Machine Vision Inspection Systems 50 40 30 20 10 0

pCNN1

pCNN2

pCNN3

pCNN4 V01

pCNN5

pCNN6

pCNN7

pCNN8

V02

Figure 4.7 FA% of proposed CNN models.

good improvement when compared to the previous models. This model overcomes the limitation that occurred in pCNN5 model. In V05 and V08 which are the MRI volumes of tumor patients, some of the slices are detected as tumor slices. We got satisfactory results when compared to all the other models [34]. We have got more number of missed alarms in this model. We tried to overcome this by moving to next model. We have developed pCNN7 and pCNN8 models, which were not up to the performance of all other models. Among all the eight models, pCNN6 model produces best results for brain tumor detection. The accuracy of prediction in pCNN1 model is 90%, pCNN2 is 98%, pCNN3 is 98%, pCNN4 is 90%, pCNN5 is 90%, pCNN6 is 90%, pCNN7 is 87% and pCNN8 is 75%. If the batch normalization is used, then the accuracy is pretty low when compared to other models. In case of using BN in selective layers, the accuracy is so low when compared to all other models. The comparison of all the eight models in terms of accuracy is shown in Figure 4.7.

4.4 Computer-Aided Diagnostic Tool Computer-aided diagnostic (CAD) systems are assisting the doctors within the rapid interpretation of medical images. CAD systems process digital images for typical appearances and to target conspicuous sections, like possible diseases, so on supply input to support a choice taken by the professional. Imaging techniques in X-ray, CT, MRI, PET, SPECT and Ultrasound diagnostics yield a wonderful deal of knowledge that

Brain Tumor Diagnostic System 87

Figure 4.8 Computer aided diagnostic system.

the radiologist or other medical professional must analyze and evaluate comprehensively in an exceedingly very short time. This computer aided diagnostic system for detecting brain tumors was developed using python which may well be a open source software [35]. We used tkinter package to develop a Graphical interface for diagnosing tumor slices. We have also used widgets like text buttons, label buttons, etc. during this diagnostic system. The computer aided diagnostic system is shown in Figure 4.8.

4.5 Conclusion and Future Enhancements We have made an analytical study on CNN in brain tumor detection. We have developed eight CNN models such as pCNN1, pCNN2, pCNN3, pCNN4, pCNN5, pCNN6, pCNN7, and pCNN8 and tested these models for brain tumor detection. We have trained all these models using BRATS2013 dataset and tested using clinical dataset. We have found that pCNN6 is the best among all these models for brain tumor classification. Also we have achieved about 90–99 percentage of accuracy in all these models during the brain tumor classification. We will be using pCNN6 model for further classification tasks in brain tumor detection. We had a limitation while executing these models. It takes more time to execute these models. In future we will focus on reducing the time complexity during execution.

88 Machine Vision Inspection Systems

References 1. LeCun, Y., Yoshua, B., Geoffrey, H., Deep learning. Nature, 521, 7553, 436– 444, 2015. 2. Bengio, Y., Learning Deep Architectures for AI. J. Found. Trends Mach. Learn., 2, 1, 2009. 3. Gupta, A., Wang, H., Ganapathiraju, M., Learning structure in gene expression data using deep architectures, with an application to gene clustering. 2015 IEEE Int. Conf. Bioinforma. Biomed, pp. 1328–1335, 2015. 4. Gatys, L.A., Ecker, A.S., Bethge, M., A Neural Algorithm of Artistic Style, arXiv Prepr. 1–16, 2015. 5. Wang, H., Meghawat, A., Morency, L.P., Xing, E.P., Select-Additive Learning: Improving Cross-individual Generalization in Multimodal Sentiment Analysis. In 2017 IEEE International Conference on Multimedia and Expo (ICME) IEEE, pp. 949–954, 2017. 6. Affonso, C., Rossi, A.L.D., Vieira, F.H.A., Carvalho, D., Deep learning for biological image classification. Expert Syst. Appl., 85, 114–122, 2017. 7. Wang, H. and Raj, B., On the Origin of Deep Learning, arXiv preprint arXiv:1702.07800. 1–72, 2017. 8. Lee, S.J. and Kim, S.W., Localization of the slab information in factory scenes using deep convolutional neural networks. Expert Syst. Appl., 77, 34–43, 2017. 9. Dahl, G.E., Sainath, T.N., Hinton, G.E., Improving Deep Neural Networks for {LVCSR} Using Rectified Linear Units and Dropout. IEEE Int. Conf. Acoust. Speech Signal Process., pp. 8609–8613, 2013. 10. Nair, V. and Hinton, G.E., Rectified Linear Units Improve Restricted Boltzmann Machines. Proc. 27th Int. Conf. Mach. Learn., no. 3, pp. 807–814, 2010. 11. Yu-Hsin, C., Tushar, K., Joel, E., Vivienne, S., Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks, in: IEEE International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers, pp. 262–263, 2016. 12. Elalami, M.E., A novel image retrieval model based on the most relevant features. Knowl.-Based Syst., 24, 1, 23–32, 2011. 13. Angermueller, C., Pärnamaa, T., Parts, L., Stegle, O., Deep learning for computational biology. Mol. Syst. Biol., 12, 7, 1–16, 2016. 14. Krizhevsky, A., Sutskever, I., Hinton, G.E., ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst., 1–9, 1097–1105, 2012. 15. Fe-Fei, L., Fergus, Perona, A Bayesian approach to unsupervised one-shot learning of object categories. Proc. Ninth IEEE Int. Conf. Comput. Vis., vol. 2, pp. 1134–1141, 2003. 16. Gupta, V., Image Classification using Convolutional Neural Networks in Keras, Learn OpenCV, 2017. 17. Chollet, F., Keras: Theano-based deep learning library. Code: https://github. com/fchollet, Documentation: http://keras.io, 2015.

Brain Tumor Diagnostic System 89 18. Zeiler, M.D., ADADELTA: An Adaptive Learning Rate Method, arXiv: 1212.5701, 2012. 19. Bosch, A., Zisserman, A., Mu, X., Munoz, X., Image Classification Using Random Forests and Ferns. Comput. Vis. (ICCV), IEEE 11th Int. Conf., pp. 1–8, 2007. 20. Kalaiselvi, T., Sriramakrishnan, P., Somasundaram, K., Brain Abnormality Detection from MRI of Human Head Scans using the Bilateral Symmetry Property and Histogram Similarity Measures. International Computer Science and Engg Conference, 2016. 21. Mohsen, H., El Dahshan, E.A., El-Horbaty, E.M., Salem, A.M., Classification using deep learning neural networks for brain tumors. ScienceDirect—Future Comput. Inf. J., 3, 68–71, June 2018. 22. Razzak, M.I., Naz, S., Zaib, A., Deep Learning for Medical Image Processing: Overview, Challenges and Future, in: Classification in BioApps, pp. 323–350, April 2017. 23. Kamnistas, C.L., Virginia, F.J.N., Joanna, P.S., Andrew, D.K., David, K.M., Daniel, R., Ben, G., Efficient multiscale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal., 36, 61–78, 2017. 24. Kleensiek, Urban, G., Hubert, A., Schwarz, D., Maier, H.K., Bendzus, M., Biller, A., Deep MRI Brain Extraction: A 3D convolution neural network for skull stripping. NeuroImage, 129, 460–469, 2016. 25. Pereira, A.P., Alves, V., Silva, C.A., Brain Tumor Segmentation using Convolutional Neural Networks in MRI images. IEEE Trans. Med. Imaging, 35, C1–C4, 2016. 26. Zhao, L. and Jia, K., Multiscale CNN’s for Brain Tumor Segmentation and Diagnosis. Computational and mathematical methods in medicine, 2016, Article ID:8356294, 2016. 27. Activation Functions of Convolution Neural Networks,https://towardsdata science.com. 28. BRATS2013 Challenge, MRI brain tumor database, https://www.smir.ch/ BRATS/Start2015. 29. The Whole Brain Atlas (WBA), Department of Radiology and Neurology at Women’s Hospital, Harvard Medical School, Boston, USA, https://www.med. harvard.edu/aanlib/. 30. Coskun, M., Osal, Y., Ucar, A., Demir, Y., An Overview of popular Deep Learning Methods. Eur. J. Tech., 7, 165–176, 2017. 31. Hwang, H., Rehman, H.Z.U., Lee, S., 3D U-Net for Skull Stripping in Brain MRI. Appl. Sci., 9, 569, 2019. 32. Kavitha, A.R., Chitra, L., Kanaga, R., Brain Tumor Segmentation using Genetic Algorithm with SVM Classifier. Int. J. Adv. Res. Electr. Electron. Eng., 1468–1471, 2016. 33. Suhag, S. and Saini, L.M., Automatic brain tumor detection and classification using SVM classifier. Proceedings of ISER 2nd International Conference, Singapore, pp. 55–59, July 2015.

90 Machine Vision Inspection Systems 34. Kalaiselvi, T., Padmapriya, S.T., Sriramakrishnan, P., and Somasundaram, Karuppanagounder, Deriving tumor detection models using convolutional neural networks from MRI of human brain scans. Int. J. Inf. Technol., 1–6, 2020. 35. Kalaiselvi, T., Padmapriya, S.T., Somasundaram, K., Sriramakrishnan, P., A Deep Learning Approach for Brain Tumor Detection System using Convolution Neural Networks, selected and communicated to special issue of International Journal of Dynamical Systems and Differential Equations (Scopus indexed) an extended version of the paper presented in International Conference of Mathematical Modelling, Dynamical Systems and Computing Techniques, Gandhigram Rural University, Gandhigram.

5 Machine Learning for Optical Character Recognition System Gurwinder Kaur1* and Tanya Garg2 Department of Computer Science and Engineering, Gulzar Group of Institutes, Khanna, India 2 Department of Computer Science and Engineering, Thapar Institute of Engineering & Technology, Patiala, India 1

Abstract

Optical character recognition involves identification, classification and in some applications, correction of optical symbols/patterns present in a digital image. Recognition can be focused on online printed text, offline and also on handwritten documents. Many applications such as in postal addresses, bank checks and vehicle number plate verification etc. require OCR systems to make the processing fast. Segmentation, Feature extraction, classification techniques play a vital role to perform character recognition. There are different phases in an OCR to efficiently process the text such as optical scanning, location segmentation, pre-processing, segmentation, representation, feature extraction, training and recognition and post-processing. In training phase ANN can be used to make system efficient to process huge data. Recognition of handwritten text is an active area of research. Various techniques involved in OCR and their limitations are discussed along with an overview of precision rate of ANN based approaches. Keywords: OCR, feature extraction, segmentation, classification, ANN

5.1 Introduction Traditionally, different organizations maintain their records in printed or handwritten forms. Moreover most of our literature is also available in *Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (91–108) © 2021 Scrivener Publishing LLC

91

92 Machine Vision Inspection Systems Character Recognition

Online Recognition

Offline Recognition

Handwritten Scripts

Recognition

Single Character

Verification

Printer

Handwritten

Figure 5.1 Different areas of character recognition.

offline format and may no longer be useful due to distortion. It can be a useful source of information or analysis. One way to preserve this is to transform these documents into digital form using OCR systems. The major challenge with OCR system comes with handwritten documents where system has to efficiently recognize different writing styles. The major phases of OCR include pre-processing, segmentation, feature extraction and classification. OCR can be insightful for applications such as pattern recognition, machine vision and machine learning. Character Recognition system can be categorized into online and offline systems as illustrated in Figure 5.1. In online system, 2D coordinates of consecutive points are depicted. In offline OCR, image is scanned, characters are segmented, recognized and sometimes corrected as well based on the application specific OCRs. Machine Learning can be applied in the pre-processing and training & recognition phase of OCR to process enormous data effectively. Neural network can analyze different features of the characters written in different styles and recognize the unknown characters. Various feed-forward and feed-backward networks can be used to perform recognition tasks [1].

5.2 Character Recognition Methods Different research areas for character recognition systems have been discussed in previous section. Handwritten offline scripts recognition is the major area on which most of the researchers are focusing as it can be the way to process literature and identification of signatures. Focusing on Recognition tasks the main tasks involved are Segmentation, Feature

ML for Optical Character Recognition System 93 Extraction and Classification. The overall recognition system is based on the principle to analyze the text written in a document and convert it into digital form. The system should be capable to identify the text boundaries to differentiate characters known as segmentation process. Segmentation can be performed on whole document to separate text from other multimedia or background of the image. Moreover segmentation can also be a part of preprocessing at character level where the task is to separate each character from a sentence or word by defining its boundaries. Generally overlapped characters or discontinued characters are difficult to segment accurately. Once the characters are segmented, next module just focuses on a single character and training is performed on the patterns for recognition of each character. For training model, system needs input with different features which are significant to write characters or symbols. Feature Extraction is most important task where user input is given as training set consisting of tremendous data with different feature combinations for same class. After specifying all the features required, some unwanted features are removed by the model to improve the system efficiency and to minimize storage. Another Important task is classification which involves distinguishing characters in class labels. Based on the features different combinations should occur to distinguish each class. For an Instance, if there are two features, vertical stroke and horizontal stroke for English alphabets. But for characters ‘L’ and ‘T’ the value of vertical stroke and horizontal stroke comes out to be 1 and 1, which says that both ‘L’ and ‘T’ belong to same class. As it is not correct, therefore a relationship between both these features is required to distinguish.

5.3 Phases of Recognition System Any Character Recognition System follows some general steps to recognize a character. The order of some tasks may vary respective to the application domain it is used, type of noise it carries, etc. Below written are the phases involved in Recognition Systems:

5.3.1 Image Acquisition The first stage is to prepare dataset for the Machine Learning Model. Dataset refers to the collection of numerous data, particularly in this context are images of defined file format such as JPG, JPEG, PNG, etc. a part of which will act as a learning resource and other part will be used for

94 Machine Vision Inspection Systems Verification and Validation. Image acquisition is done through a camera or scanner whose input may be binary, gray or colored. Thresholding process is performed on images to convert them into bi-level images. This step is performed to save memory and computational capacity, also called as image-preprocessing. Further, compression mechanism is performed on image to support faster processing. This compression can be ignored in some scenarios based upon the quality of original image [2].

5.3.2 Defining ROI For a machine learning model, region of Interest needs to be specified to avoid complexity. Segmentation is the process to distinguish characters or letters from the other constituents of image such as figures or other graphics. For example, while arranging medicines by their labels or salt composition through artificial intelligence it is required to separate the figures, watermarks, etc. before recognition. After Image acquisition, image is divided into number of blocks which further constitutes set of pixels. After intensity based classification of each block, information block is isolated from background block, collectively called as regions. Further textual region is classified from these regions based on dimensions, aspect ratio, area of region, pixel density, etc. [3].

5.3.3 Pre-Processing This phase performs initial processing on the image required to make it compatible to feature extraction phase. The pre-processing tasks include character segmentation, skew detection and correction, binarization, noise removal and thinning. The order of these processes may vary according to application requirement [2].

5.3.4 Character Segmentation Similar to Segmentation of sentences from other multimedia, segmentation of one character from another can be performed. This step can be performed separately than defining ROI. It is required to distinguish boundaries of a character from one another so that it becomes easier for the algorithm to identify that character. In this technique, after analyzing inter segment distance those segments are discarded that exhibits smaller or equal pixel distance. Vertical histogram profile is used to segment individual word or character. Further in

ML for Optical Character Recognition System 95 this scenario, the problem may occur if the letters are overlapped or consists of several parts [4, 5].

5.3.5 Skew Detection and Correction Manually or mechanically scanned text may be at few degrees angle with horizontal axis, termed as skew angle. Skew detection and correction may be performed prior to the character segmentation. Skew angle can be identified based on nearest neighbor clustering, Hough transform, projection profile analysis, cross correlation and morphological transforms. Due to non-parallel planes during image capture, generally images results in skew and perspective distortion. Skewed image has four sets of values bounded as virtual rectangle around the skewed text region known as profiles. The values of these profiles are M and N similar to the length and breadth of a rectangle as shown in Figure 5.2. The profile values are the distances in pixels from a side to adjacent text region. The bottom profile of text region estimates the skew angle. Based on bottom profile, the algorithm decides elements T1 and T2 which is the height of skewed text from two specific points on bottom profile. Further the average of difference between these elements is used to compute skew angle. After calculating skew angle text is rotated [2].

5.3.6 Binarization Based on the image type, a number of binarization techniques are available to keep only relevant amount of information from the captured image. The main idea is to divide the image into two classes such as text region and N

M

e approximate th how we want to is t s en es m pr gu m ar t co Our las X_SIMPLE to 2.CHAIN_APPRO cv de en us e eir W th to r. contou segments in cal, and diagonal ory. If horizontal, verti tation and mem pu m co th bo d ve sa is r, without compoints only. Th along the contou ts in po r, e th all X_NONE; howeve we wanted cv2.CHAIN_APPRO in ss pa all n g ca in we ev pression, function. Retri when using this steful be very sparing cessar y and is wa ne our is often un nt co a ng alo ts poin T2 T1 of resources.

Figure 5.2 Profiles of a skewed image.

96 Machine Vision Inspection Systems background region. This phase is performed on gray-scale images to make character segmentation and pattern recognition simpler. Binarization technique is categorized in two ways: a) Overall threshold: Threshold value acts as a deciding factor for classification. In overall threshold, single threshold value is assumed to classify entire image into text and background region. b) Local threshold: In Local threshold, separate threshold values are assumed for each pixel or region based on its characteristics. The basic flow of skew corrected text region binarization technique is shown in Figure 5.3. Local Threshold method is very efficient to connect the disconnected pixels of a segmented character. Start

Input Skewed Image

Yes Last pixel?

No Check condition

Satisfactory

Unsatisfactory Classified as Background region

Move to next Pixel

Stop

Figure 5.3 Skew corrected text region binarization.

Classified as Text region

ML for Optical Character Recognition System 97

5.3.7 Noise Removal Any slipshod variation in the basic properties of image such as intensity which is absent in the original object is known as noise. Noise may arise during image acquisition phase due to printer quality, scanner, paper quality, etc. This step is performed before processing the image to remove the unwanted data while retaining the required content and enhancing the quality of image. Being an essential phase in preprocessing stage, if not performed then it may affect the efficiency of subsequent phases.

5.3.8 Thinning Thinning being a compression technique also extracts the structural information about the character such as preservation of connectedness and its end points. It is also known as skeletonization process as it provides one-pixel-width representation or skeleton of the object. It can be implemented based on two approaches viz pixel wise thinning and non-pixel wise thinning. Thinning process analyses the components of image and reduces to a certain extent. Figure 5.4 illustrates letter ‘e’ before and after thinning process. It is based on the principle that same alphabet may have different stroke thickness when written by two individual but the information residing in the content remains same [4, 6].

5.3.9 Representation The image representation is a crucial step in image recognition systems. The definite representation of image will help the system to perform better. Generally, binary images serve as an input to the system. Rather a system with certain characteristics of image as an input source may increase the accuracy of algorithm [4]. Therefore for a classification system those

Figure 5.4 Character before and after thinning.

98 Machine Vision Inspection Systems characteristics of image are identified which differentiate it from others. The methods followed for representation of image are: a) Global transformation and series expansion. b) Statistical representation. c) Geometrical and topological representation [5].

5.3.10 Feature Extraction Feature extraction is considered as the most important and difficult phase in recognition systems. The key objective is to extract essential characteristics of characters. The techniques involved in feature extraction process are categorized as Distribution of points, Transformations and series expansions and Structural analysis. Apart from feature extraction techniques, feature classification is a crucial stage. Classification involves identification of each character w.r.t. its character class. Classification approach can be classified as structural methods and decision theoretic. In Structural recognition systems syntactic description of features is used for classification. In decision theoretic feature vectors are numeric values along with some pattern characteristics which are not quantified. These features are important for deciding on membership class. As an instance, if any two input character have same numerical feature values but they belong to different classes, then relationship between the hose features is required to distinguish them. The most common approaches used in decision theoretic recognition are classifiers (minimum distance or statistical) and neural networks [5, 7].

5.3.11 Training and Recognition A recognition system is based on the technique which identifies to which predefined class an unknown sample belongs to by means of supervised learning known as pattern recognition. Chaudhuri et al. define four general approaches for pattern recognition as template matching, statistical techniques, structural techniques and ANNs [2]. Being four different approaches, they are not entirely independent from each other as shown in Figure 5.5. Each approach may require other as a sub module to complete any character recognition task. An overall strategy or problem solving flow is required by any of these approaches. The two global strategies possible are holistic and analytic strategy.

ML for Optical Character Recognition System 99 Flow Recognition System Approaches

Template

Dependency

ANN

Statistical Techniques

Structural Techniques

Figure 5.5 Relation between different approaches of Recognition Systems.

a) Holistic Strategy Holistic strategy corresponds to top down approach that recognizes the complete sentence or word than focusing on each character. Therefore the segmentation problem gets eliminated where each character is isolated for recognition. As the computation involved in segmentation process is reduced it results in limited vocabulary for OCR. Further as the system flows from top to bottom, it represents each character stroke and increases complexity of the system which results in decreasing the accuracy. b) Analytic Strategy It corresponds to bottom up approach where each character stroke is recognized and by combining these characters and words it produces meaningful text as illustrated in Figure 5.6. As it starts from recognizing each character, implicit segmentation is required to isolate each character. Further at next stage when words are generated from characters, it performs explicit segmentation to separate text and background region. Such process increases the complexity of the system and makes it error prone. Apart from its cons, analytic strategy can recognize unlimited vocabulary with more efficiency [4]. ANN is the most emerging approach for training and recognition of characters. For offline systems, ANN is considered for finest classification and character recognition which makes OCR robust and low cost system.

100 Machine Vision Inspection Systems

Machine

M

A

C

H

Holistic Strategy

Analytic Strategy

Machine Learning

Learning

I

N

E

L

E

A

R

N

I

N

G

Figure 5.6 Holistic and analytic strategies.

Neural networks are primarily based on two designs stated feed-forward and fee-backward networks. A feed-forward network with multiple layers makes the network understand and recognize I/O interactions. For pattern recognition or classification neural network becomes helpful in OCR systems. For an instance, a basic neural network with a sub-layer is represented in Figure 5.7. The input images are transformed into binary values INPUT LAYER

HIDDEN LAYER

OUTPUT LAYER

Weight

x1 A

A Weight y1

x2 B

B

A

y2

x3 C

C

D

D

x4

Figure 5.7 3-layered neural network.

B

ML for Optical Character Recognition System 101 Table 5.1 Accuracy rate comparison for various algorithms [8]. S. No.

Techniques or Algorithms used

Accuracy rate

1

Technique of Feed forward back propagation. [9]

85%

2

Recognition task based on preprocessing and neural network. [10]

82.5%

3

Segmentation and Holistic approach. [11]

98.75%

4

Multi-layer feed forward neural network. Diagonal based feature extraction.

97.8% for 54 features

5

Error Back Propagation Algorithm. [12]

70%

6

Modified Hough transform method. [13]

67.3%

7

Segmentation, Neural Network and Statistical classifier. [14]

73.25%

8

Determining Multiple Segmentation By Contour Analysis.

67.8%—Upper Case

98.5% for 69 features

75.63%—Average Case 75.7%—Lower Case

(0 and 1). These binary images are input to feature extraction, whose output is supplied to neural network to train the classifier. Then it is evaluated on the basis of misclassification and correctly classified [8]. Many OCRs have been implemented using different Machine Learning Algorithms. Lamba et al. reviewed some researches and find a comparison of precision level of English characters with distinct algorithms as shown below: This review shows different feed forward and feed backward networks learning about features of different characters. Majorly, classification, segmentation and preprocessing techniques are implemented by researchers. Table 5.1 shows accuracy rate of different algorithms under different environment.

5.4 Post-Processing It is the last component of OCR which includes activities like grouping, error detection and correction. After recognizing all the characters,

102 Machine Vision Inspection Systems OCR combines them to form strings or sentences. However recognized symbols are not associated with those strings. These individual symbols are not much informative. Rather associating symbols with others gives required information. For printed and handwritten scripts grouping task is as follows: 1. For printed scripts with fixed pitch grouping is easy as it is based on the position of symbol in the document. The distance between the characters helps to associate the symbols. 2. For printed scripts where character distance is variable, grouping is slightly complex as compared to fixed pitch. For such typeset characters, distance between two words is comparatively larger than between two characters using which grouping is performed. 3. In handwritten scripts, characters are skewed due to which distance between every two characters cannot be specified. In such scenario, context in which document is written needs to understand for grouping the characters. Grouping can help in identifying meaningful words from set of characters and symbols but still there may occur some errors in the word or sentence formation. In such case, context may be helpful in detecting errors and replacing them with an appropriate alternative word to certain extent [15]. The two main approaches used are: a) Defining rules for syntax of word can help to detect errors. These rules are based on the possibility of sequences of characters which appears together. The probability of two or more character to become a part of string can be computed and system can detect errors using these rules. For an Instance, in English language the probability of appearing ‘K’ after ‘L’ is negligible. After recognition phase, if system finds such combination then it can use this rule to throw an error detection. b) In second approach, knowledge base is used to check whether the recognized combination is present in dictionary or not. If not, then error is detected by the system and it further look up in the knowledge base to find the most similar word that can be replaced with detected error. In such scenarios, this may happen that the word corrected by the system may not be an accurate word according to the context of document.

ML for Optical Character Recognition System 103 This actually leads to another error which is undetectable by the system. Furthermore searching and comparison operations performed in dictionary based error detection and correction are more time consuming than rule based methods.

5.5 Performance Evaluation The quality of input image plays a vital role in deciding the accuracy of the OCR systems. Any error encountered in the input image may affect the overall accuracy of the systems. The main causes of the error are: a) Shape variation and style variation of handwritten text [16]. b) Text deformation caused by smudged and broken characters. c) Space variations in handwritten scripts may cause traditional techniques to misclassify. Also, in printed document there can be deformations due to subscripts, superscripts, variable spacing, etc. d) A document with text, background and other images. e) Inappropriately scanned image. f) Segmentation. For performance evaluation, there are no specific test sets for OCR system as the accuracy is highly dependent on the image quality. In general, following three parameters are considered for evaluation of the system:

5.5.1 Recognition Rate It is the percentage of characters correctly classified. Recognition rate do not identify the errors occurred.

5.5.2 Rejection Rate It is the percentage of characters which are not recognized by the system.

5.5.3 Error Rate It is the percentage of characters which are incorrectly classified by OCR. The error rate decides the cost effectiveness of the system as it involves error detection and correction phases. Both of these phases are time consuming as a result of which the cost will affect. The system is designed in a

104 Machine Vision Inspection Systems way that its error rate should be low. For an instance, there is a mail sorting application of postal services. If the receiver’s address is classified incorrectly, it will give less accurate results in the end. Whereas if instead of misclassification it throws rejection then system will take the same input again and classify again. In second scenario there are more chances of correct classification. Therefore, error rate is kept low which leads to high rejection rate and a lower recognition rate. Thus recognition rate alone cannot be responsible for an efficient system.

5.6 Applications of OCR Systems Different users require OCR for their specific applications mainly includes data entry, text entry and automation of process. The detailed applications of OCR systems in various sectors are as follows: a) Data entry specific OCRs are designed for confidential data such as banking applications, government agencies, defense agencies etc. The data related to these areas is usually formatted with limited vocabulary and set of symbols. Therefore for such specific data system gives effective results with great accuracy, high throughput and lesser character errors and reject rates. Text entry reading machines are designed for office automations. Any document which is in traditional format can be converted into digital format at a very high rate. The main obstacle is quality of document and paper format. However, single character error and reject rate can be minimized under controlled conditions. b) Process automation is also possible using OCR systems. Process automation task do not check the validity or correctness of text, it identifies the text from the document and based on that perform certain operations such as sorting, searching, etc. For an instance, sorting the mails based on sent address can be performed using OCR. As the context to be searched is domain specific and has limited vocabulary, so it results high throughput [17]. c) OCR embedded with speech synthesis system can help the blind to understand the printed literature. d) Traffic controllers embedded with automatic number plate readers is also an emerging application of OCR. Though the character set and syntax for the system is limited but

ML for Optical Character Recognition System 105 it cannot be implemented using normal OCR systems for data entry. Such OCRs require image processing techniques to process quality of image captured by high speed camera [18, 19]. e) Cartography automation requires special feature in general OCR. In maps, numbers of components are present such as skewed text, printed text at various angles, graphics or handwritten characters. An OCR capable to process all these symbols requires complex structure [20, 21]. f) Signature verification and identification: Pattern recognition techniques are used in such OCRs which matched the handwritten signatures with the reference database and verify the user. g) OCRs can be used for handwritten assignment evaluation of primary school students in the pandemic situations [22].

5.7 Conclusion and Future Scope The evolution of OCR systems is improvising from reading differently styled printed text to handwritten text recognition system for different strokes. More sophisticated systems can be developed using hybrid technologies and using context libraries. Machine learning techniques can be used for contextual analysis while performing segmentation to recognize deformed characters. Machine learning based classifiers with different characteristics can be integrated to provide multiple functionalities. There is a lot of potential in using ANNs for sophisticated cursive scripts and calligraphic characters. As effective feature extraction mechanism can improve the precision rate of the system. ANN based feature extraction along with noise tolerance feature can learn using feed-forward network and propagate error using feed-backward network. Further character precision outcome can be improved by deep learning as an alternative to neural networks.

References 1. Hinduja, D., Dheebhika, R., Prem Jacob, T., Enhanced character recognition using deep neural network—A survey. Proc. 2019 IEEE Int. Conf. Commun. Signal Process. ICCSP 2019, pp. 438–440, 2019.

106 Machine Vision Inspection Systems 2. Chaudhuri, A., Mandaviya, K., Badelia, P., Ghosh, S.K., Optical Character Recognition Systems for Different Languages with Soft Computing. Studies in Fuzziness and Soft Computing, Springer, 352, pp. 9–40, 2017. 3. Shen, M. and Lei, H., Improving OCR performance with background image elimination. 2015 12th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2015, pp. 1566–1570, 2016. 4. Arica, N. and Yarman-Vural, F.T., An overview of character recognition focused on off-line handwriting. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 31, 2, 216–233, 2001. 5. Dholakia, K., A Survey on Handwritten Character Recognition Techniques for Various Indian Languages. Int. J. Comput. Appl., 115, 1, 17–21, 2015. 6. Gonzalez, R.C., Woods, R.E., Masters, B.R., Digital Image Processing, Third Edition. J. Biomed. Opt., 14, 2, 029901, 2009. 7. Jain, A.K., Duin, R.P.W., Mao, J., Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell., 22, 1, 4–37, 2000. 8. Lamba, S., Gupta, S., Soni, N., Handwriting Recognition System—A Review. Proc.—2019 Int. Conf. Comput. Commun. Intell. Syst. ICCCIS 2019, vol. 2019Janua, pp. 46–50, 2019. 9. Sahu, V.L. and Kubde, B., Offline Handwritten Character Recognition Techniques using Neural Network: A Review. Int. J. Sci. Eng. Res., 1, 1–3, 87–94, 2013. [Online]. Available: www.ijser.in. 10. Singh, S. and Hewitt, M., Cursive digit and character recognition on CEDAR database. Proc.—Int. Conf. Pattern Recognit., 15, 2, 569–571, 2000. 11. Pradeep, J., Srinivasan, E., Himavathi, S., Diagonal Feature Extraction Based Handwritten Character System Using Neural Network. Int. J. Comput. Appl., 8, 9, 17–22, 2010. 12. Alkoffash, M., Bawaneh, M., Muaidi, H., Alqrainy, S., Alzghool, M., A Survey of Digital Image Processing Techniques in Character Recognition. Int. J. Comput. Sci. Netw. Secur., 14, 3, 65, 2014. 13. Kimura, F., Kayahara, N., Miyake, Y., Shridhar, M., Machine and human recognition of segmented characters from handwritten words, in: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, vol. 2, pp. 866–869, 1997. 14. Gupta, A., Srivastava, M., Mahanta, C., Offline handwritten character recognition using neural network. ICCAIE 2011—2011 IEEE Conf. Comput. Appl. Ind. Electron., pp. 102–107, 2011. 15. Nguyen, T.T.H., Jatowt, A., Coustaty, M., Van Nguyen, N., Doucet, A., Deep statistical analysis of OCR errors for effective post-OCR processing. Proc. ACM/IEEE Jt. Conf. Digit. Libr., vol. 2019-June, pp. 29–38, 2019. 16. Bharath, V. and Rani, N.S., A font style classification system for English OCR. Proc. 2017 Int. Conf. Intell. Comput. Control. I2C2 2017, vol. 2018-January, pp. 1–5, 2018. 17. Abdul Robby, G., Tandra, A., Susanto, I., Harefa, J., Chowanda, A., Implementation of optical character recognition using tesseract with the

ML for Optical Character Recognition System 107 Javanese script target in android application. Procedia Comput. Sci., 157, 499–505, 2019. 18. Jain, R. and Gianchandani, P.D., A hybrid approach for detection and recognition of traffic text sign using MSER and OCR. Proc. Int. Conf. I-SMAC (IoT Soc. Mobile, Anal. Cloud), I-SMAC 2018, pp. 775–778, 2019. 19. Singh, P., Patwa, B., Saluja, R., Ramakrishnan, G., Chaudhuri, P., StreetOCRCorrect: An Interactive Framework for OCR Corrections in Chaotic Indian Street Videos. 2019 Int. Conf. Doc. Anal. Recognit. Work., vol. 2, pp. 36–40, 2019. 20. Chiang, Y.Y. and Knoblock, C.A., An approach for recognizing text labels in raster maps. Proc.—Int. Conf. Pattern Recognit., vol. 2010-August, pp. 3199– 3202, 2010. 21. Li, H., Liu, J., Zhou, X., Intelligent Map Reader: A Framework for Topographic Map Understanding with Deep Learning and Gazetteer. IEEE Access, 6, c, 25363–25376, 2018. 22. Marne, M.G., Recognition (OCR) engine for proposed system. 2018 Fourth Int. Conf. Comput. Commun. Control Autom., pp. 1–4, 2018.

6 Surface Defect Detection Using SVM-Based Machine Vision System with Optimized Feature Ashok Kumar Patel1*, Venkata Naresh Mandhala2, Dinesh Kumar Anguraj2 and Soumya Ranjan Nayak3 1

Department of Computer Science and Engineering, C.V. Raman Global University, Bhubaneswar, India 2 Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, India 3 Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India

Abstract

At present, most of the industries have adopted to automate different processes on their production. Therefore, monitoring of quality of raw material play a crucial role on the production of quality product. If material is considered wood or metal surface, early detection of defect reduces the further processing cost as the final product may be not useful. In this study wood has been considered for identifying defect which is classified into seven categories. The solution may be provided by using machine vision system. A set of 280 features have been extracted from each sample image and gets an optimized feature set of 36 using sequential forward selection. The classification model is developed using the multiclass support vector machine to identify the defect present in wood. The performance has been measured using indices sensitivity, specificity, misclassification and accuracy. Keywords: Machine vision system, surface defect detection, feature extraction, sequential forward selection, support vector machine

*Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (109–128) © 2021 Scrivener Publishing LLC

109

110 Machine Vision Inspection Systems

6.1 Introduction Identification of surface defect plays an important part in production from the material such as wood, steel, and cloths. In old days, the task of identification and location detection of defect is performed by the expert on material. The continuous monitoring is a monotonic and tedious task, which makes the expert exhausted and prone to error. It has been revealed by Huber et al. that the accuracy achieved by the expert in identification of defect is 68% which gave space for the introduction of new technology to fill the gap [1]. In automated industries this identification should also be automated and can be achieved by machine vision system. A machine vision system consists of illumination system, image acquisition device, and processing device to learn identification of surface defect by processing the images [2]. The illumination system helps in controlling the variation on image illumination during image acquisition. With the advancement of processing capability, it becomes possible to process a large image dataset to achieve high accuracy by trained model. It has been observed in the cloth industry that the price of the material drastically changed with the quality, i.e. the defected will have only 45–65% of quality of product [3]. Therefore many researchers have concentrated on the inspection of the cloth before delivery. Chan and Pang have observed that defect in the fabric has resulted in frequency changes in the common structure and suggested Fourier analysis based inspection system for defect detection [4]. This work became extended and applied wavelet transform for the defect detection of plain fabric for many years. Later Ngan et al. have introduced golden image subtraction (GIS) method on the sub-images generated by applying direct thresholding on the wavelet transformed components [5]. Bu et al. have observed fabric defect detection as the one class classification problem and applied support vector data description (SVDD) model by extracting multiple fractal features [6]. Later the adaptive wavelet model developed by Wen et al. has shown its efficiency in detecting the defect on the fabric [7]. The problem with this adaptive wavelet model is that the model accuracy is dependent on the availability of accurate reference images for background modeling. Li and Zhang have proposed concatenated approach of Gabor filtering and pulse coupled neural network (PCNN) for the fabric defect detection [8]. With the advancement of processing capability, template based processing come into existence. Chang et al. have proposed template based correction method for defect detection in patterned fabric [9]. Later the textured fabric defect detection was performed by low rank representation (LRR) by modeling defect with sparse structure [10].

SVM Vision System & Surface Defect Detection 111 Similar to cloth industry, steel industry has also required continuous inspection of surfaces. It became more essential as product designed from the defected steel may produce security challenges such as wear and tear, breakdown, burst as per the application of final product [11]. This increased the interest of researchers for developing automated inspection system for steel surface defects. Zheng et al. have developed defect (such as cracks, holes, etc.) detection system for the metallic surface using the morphological feature and optimized using genetic algorithm [12]. Pernkopf has used laser light images for inspection of steel surfaces by using Bayesian network for the extracted surface reflectance features [13]. Aghdam et al. have used tree based classification for inspection of defect in steel surface by applying principal component analysis (PCA) and bootstrapping (Bagging) on extracted feature for binary classification [14]. Support vector machine has been used with decision tree for multiclass defect detection. A review on the different inspection methods based on machine vision for steel surface defect has been carried out and accuracy of the models has been reported [15]. Sun et al. have tried to overcome the limitation of segmentation algorithm by applying projection to localize the defect in the steel surface by detection of changes in frequency [11]. Chu et al. have used a novel quantile hyper-spheres method for multiclass inspection of defect as well as for incrementing learning with new data set [16]. With the convolution network training the model became easy for higher number of data with increased accuracy. Li et al. have used convolution network for classification of six type of defect with accuracy in the steel surface [17]. It also helped in quantifying and localizing defects. The limitation with convolution network is the requirement of higher number of training data and an alternative is by applying semi-supervised method with including unlabeled data using generative adversarial network (GAN) for surface defect classification for steel [18]. Defect detection in wood industry is very essential as wood cost will be 30–70% of the total production cost of secondary product [19]. A large number of efforts have been done on automating the inspection in the wood industry. Conner et al. have developed automated lumber processing system (ALPS) which will inspect the defect in timber and cut the defected parts [20]. They used tonal as well as pattern feature for inspection of defect using the sequential classifier. Later, Conner et al. developed an automated inspection and grading system for hardwood lumber using combination of bottom-up and top-down approach for identification of defect in the wood surface [21]. Later in Virginia Tech, Conner et al. have developed a model of wood product manufacturing, where the inspection of wood quality is performed using optical, X-ray and laser camera images [22]. A fuzzy

112 Machine Vision Inspection Systems rule based system is used for discriminating the defected area using the extracted properties of sample images. Image segmentation based defect detection was proposed by Funk et al. by comparing nine segmentation algorithm and recommending combing clustering and region growing for higher accuracy [23]. With the increase of processing capacity of system, grayscale images are processed for feature extraction using co-occurrence matrices [24]. Multi-objective genetic algorithm is used for reducing the feature dimension extracted from the co-occurrence matrices before processing using neural network and support vector machine. Gu et al. have proposed defect classification present in wood surface using the tree structured support vector machine classification of the extracted color and texture features [25]. Piuri and Scotti have used fluorescence spectra imaging for the identification of defect in the wood surface by analyzing the spectra with principal component analysis based k nearest neighbor and Bayes classifier [26]. At the same time Mu et al. have used X-ray imaging for the defect detection on the surface of wood as well as inside the surface invisible from outside by using log edge detection on binary images [27]. França et al. have used ultrasound imaging for defect detection on wood surface [28]. Hittawe et al. have used local binary pattern (LBP) and speed up robust feature (SURF) using support vector machine for identification of knots and crack in the timber surface [29]. A survey has been carried out by Hashim et al. for surface defect detection methods of woods and reported strength and weakness of various algorithms adopted [19]. Later Hashim et al. developed gray level dependence matrices (GLDM) based feature extraction for identification of defect on timber surface using artificial neural networks [30]. At the same time Qayyum et al. have developed gray level co-occurrence matrices (GLCM) feature based neural network model with feature optimization using particle swarm optimization (PSO) [31]. Later Kamal et al. have extended this GLCM feature with including laws texture energy feature for wood surface defect classification using feed forward back propagation neural network [32]. At the same time Tong et al. have generated image features using GLCM, LBP, Gabor, wavelet, color histogram and color coherence vector for the defect classification present in the wood by using support vector machine, k-nearest neighbor and random forest classifier with feature optimization using PSO, ranker and Tabu search method [33]. Chang et al. have developed wood surface defect detection model using convex optimization method in classification and regression tree (CART) by extracting geometrical characteristics [34]. Li et al. have developed Euclidian distance classifier for the ice cream bar defect detection by extracting LBP and local binary differential excitation pattern (LB-DEP) of images [35].

SVM Vision System & Surface Defect Detection 113 Surface defect detection approaches have been broadly classified into four categories based on color and texture features namely statistical approaches, structural approaches, filter based methods, and model based approaches [36]. In the present study statistical and filter based methods have been combined for the classification of defect present in the wood surface. A set of 280 features were extracted from each of the images. Sequential forward selection algorithm has been used for the optimized feature selection. Support vector machine has been used for multiclass classification of defect present in wood surface images. Classification performance indices (i.e. sensitivity, specificity, etc.) have been evaluated for model effectiveness in classifying the samples.

6.2 Methodology The work has been carried out in the six stages mentioned as data collection, data pre-processing, feature extraction, feature optimization, model development and performance evaluation.

6.2.1 Data Collection The knot in the wood is defined as the base of a branch on the trunk which have sever effect on the quality of wood [37]. The wood knot sample images are collected from database of laboratory of Dr. Olli Silven and Hannu Kauppinen, Department of Electrical Engineering, University of Oulu [38–40]. The database contains 438 images in the portable pix map format (.ppm) containing 438 sample images of knot presence on the surface. They are having seven types of knot in the dataset and images of each class are shown in Figure 6.1. Table 6.1 contains the details of the dataset with the available number of samples of each class. From the table it can be seen that the number of sample varied for each classes and variation is high. Therefore data pre-processing is required to reach nearly similar number of sample on each class.

6.2.2 Data Pre-Processing As the data obtained from the dataset have large variation for different classes on the number of sample and model development has adversely influenced with this variation. As the data size is small so it cannot be eliminated higher data sample and solution is achieved by augmenting the images [41]. Augmentation is a technique for additional sample generation

114 Machine Vision Inspection Systems 20

20

20

20

40

40

40

40

60

60

60

60

80

80

80

80

100

100

100

100

20 40 60 80 100

20 40 60 80 100

20

20

20

40

40

40

60

60

60

80

80

80

100

100

100

20 40 60 80 100

20 40 60 80 100

20 40 60 80 100

20 40 60 80 100

20 40 60 80 100

Figure 6.1 Knot sample images of seven types named as dry knot, encased knot, horn knot, decayed knot, leaf knot, sound knot and edge knot.

Table 6.1 Details of knot image dataset. S. No.

Number of samples

Type of knots

1

69

dry_knot

2

29

encased_knot

3

35

horn_knot

4

14

decayed_knot

5

47

leaf_knot

6

179

sound_knot

7

65

edge_knot

implemented on image for obtaining transformed images (e.g. cropping and flipping) with underlying class unchanged [42]. In the present study augmentation has been applied as per the sample requirement to reach nearer to the number of highest sample class. The requirement is shown in Table 6.2. In the present study random rotation in the range of [−20, 20] and horizontal and vertical translation with range [−3, 3] is used for generating new sample images from the existing images. The augmented images are shown in Figure 6.2.

SVM Vision System & Surface Defect Detection 115 Table 6.2 Details of augmentation required image dataset.

S. No.

Number of samples

Number of samples required

Number of iteration required

Type of knots

1

69

110

2

dry_knot

2

29

150

6

encased_knot

3

35

144

5

horn_knot

4

14

165

12

decayed_knot

5

47

132

3

leaf_knot

6

179

0

0

sound_knot

7

65

114

2

edge_knot

20

20

20

40

40

40

60

60

60

80

80

80

100

100

20

40 60 80 100

20 40 60

80 100

100

20 40 60

80 100

Figure 6.2 Image augmentation (a) original image (b–c) augmented images with labeled same class.

6.2.3 Feature Extraction In computer vision and image processing, a feature is a piece of information which can be used as input in model development of ore classification and grade prediction. In general, an image can be represented either by global features or local features. In the global feature representation, the image is represented by one multidimensional feature vector. In other words, the global representation method produces a single vector with values that measure various aspects of the image such as color, texture or shape. On the other hand, the local feature representation distinct the image based on some salient regions. The present study extracted the global features of images in different color spaces. A color space is known as the representation of object’s visual property using parameters and co-ordinates. The original images captured by camera were in RGB color space and these

116 Machine Vision Inspection Systems original images were converted into five other color spaces (HSI, CMYK, Lab, xyz, and Gray) for maximum possible feature extractions. Additionally, the feature value of an object may change with changing of device or illumination in few color spaces. Thus, to avoid the above mentioned problem all possible color spaces were considered for feature extraction. A total of 10 statistical measures (minimum, maximum, mean, variance, standard deviation, skewness, kurtosis, and moment of 3rd, 4th, and 5th order) of each of the 17 color components (R, G, B, H, S, I, C, M, Y, K, L, a, b, x, y, z, and gray) of 6 color spaces were extracted. Thus, the total number of color feature was 10 × 17 (=170). The extracted image color components are shown in Figure 6.3. The intensity component images of HSI color space were used for texture feature extractions. Different image transformation techniques have been introduced in the image processing for obtaining the texture information [43]. Few of these are the discrete Fourier transform (DFT) (Cooley and Tukey, 1965), Walsh–Hadamard transform (WHT) [44], discrete cosine transforms (DCT) [45], Gabor filter [46, 47], discrete wavelet transform (DWT) [48, 49]. In the present study, frequency and directional information were used for texture feature extraction. The DCT and DFT were used to transform the image from spatial domain to frequency domain. The Gabor filter transform and DWT were used to extract the directional information of images along with the frequency. A total of 11 frequency components (1-DCT, 2-DFT, 4-DWT, 4-Gabor filter) were derived for each image and used for texture feature extraction. Thus, the total number of texture feature was 10 × 11 (=110). Thus, each of the original images was transformed into 28 different components and a total of 10 statistical measures were extracted. The extracted image texture components are shown in Figure 6.4. Thus, a feature dimension of 280 (=28 × 10) were derived for each image differently in dry and wet condition respectively. The present study used huge number (812 for dry, 657 for wet) of captured images for classification model development and 53 (for both dry and wet) captured images for regression model development. This feature set was used for the machine learning after optimizing the feature set.

6.2.4 Feature Optimization Feature selection generally performs to obtain the reduced feature set that helps in the reduction in feature extraction cost, improves the accuracy and reliability of the estimate of performance [50]. The present study derived optimized feature subset from the set of extracted feature by selection algorithm known as sequential forward

100

100

100

20

(n)

60

(h)

100

100

80

60

40

20

60

20

20

20

60

(o)

60

(i)

60

(c)

Figure 6.3 Color feature component extracted from sample image.

(m)

80

80

100

60

60

60

40

40

20

20

20

(g)

60

100

100

20

100

80

80

100

80

60

60

40

40

20

20

60

100

40

20

60

20

(a)

(b)

100

100

100

20

80

80

80

100

60

60

60

60

40

40

40

20

20

20

20

100

100

100

100

80

60

40

20

100

80

60

40

20

100

80

60

40

20

20

20

20

60

(p)

60

(j)

60

(d)

100

100

100

100

80

60

40

20

100

80

60

40

20

100

80

60

40

20

20

20

20

60

(q)

60

(k)

60

(e)

100

100

100

100

80

60

40

20

100

80

60

40

20

100

80

60

40

20

20

20

20

60

60

(r)

(l)

60

(f)

100

100

100

SVM Vision System & Surface Defect Detection 117

20

20

60

(g)

60

(a)

100

100

20 40 60 80 100

20 40 60 80 100

20

20

60

(f)

60

(b)

100

100

20 40 60 80 100

20 40 60 80 100

20

20

60

(i)

60

(c)

Figure 6.4 Texture feature component extracted from sample image.

20 40 60 80 100

20 40 60 80 100

100

100

20 40 60 80 100

20 40 60 80 100

20

20

60

(j)

60

(d)

100

100

20 40 60 80 100

20 40 60 80 100

20

20

60

(k)

60

(e)

100

100

20 40 60 80 100

20 40 60 80 100

20

20

60

(l)

60

(f)

100

100

118 Machine Vision Inspection Systems

SVM Vision System & Surface Defect Detection 119 selection (SFS) algorithm. The SFS algorithm is well-known wrapper method which selects optimized feature based on same classification algorithm as criterion function. The most important issue with the SFS algorithm is its monotonic growing feature set [51]. The performance of SVM is used for criterion function for examining the feature’s relevancy in classification of knot present in wood surfaces. The details about the SVM based classification model are explained in Section 6.2.5. The pseudo code of the SFS algorithm is given below. 1 Smart with empty set Y0 = {ϕ} 2 Select the next best feature x+ = 3 Update Yk+1 = Yk + x+; k = k + 1 4 Go to 2

arg max J (Yk + x ) x ∉Yk

As SFS algorithm is sequentially adding new feature so it is best when the size of the optimized feature set is small.

6.2.5 Model Development In the present study, the sample has collected 7 different classes. Therefore, a multiclass classifier is required. A multi-class classifier may be implemented using two methods, first by directly having multiclass classifier and second by combining many binary classifiers to classify into multiple classes [52]. The second method works on the principle of divide and conquer strategy. Support vector machine (SVM) is used for classification of defect present into surface based upon the selected image based feature. SVM started as a binary classifier and is designed to classify data into positive and negative classes [53]. With more implementation using set based method such as one versus one (OVO) and one versus all (OVA) methods, soon SVM will be used for multi-class classification [52]. Support vector machine is known to be used in statistical methods of classification; provides better control in over fitting and better convergence speed. The other feature such as using kernel function and unaffected by local minima has made SVM the best choice for classification problem [54]. SVM has proven its effectiveness in various industries to solve classification problems [52, 55–57]. The working principal of multiclass SVM was detailed in its use of Iron ore classification [58].

120 Machine Vision Inspection Systems Min objective vs. Number of function evaluations Min observed objective Estimated min objective

0.7 0.6

Min objective

0.5 0.4 0.3 0.2

0

50

100

150

200 250 300 Function evaluations

350

400

450

0.1 500

Figure 6.5 Minimum objective function values at particular iteration during training.

The data partitioned into training (955) and testing (409) are used for model training and performance evaluation. The data partition is tested using paired t-test for data distribution and found that they are not rejecting the null hypothesis inferencing that both the training and testing samples belong to same distribution. The model has trained with optimized feature set (dimension 36) and RBF kernel allowing 500 iterations (can be increased to achieve higher accuracy). The training process performance can be seen in Figure 6.5 for minimum objective obtained at particular iteration.

6.2.6 Performance Evaluation The model was trained using 70% of the data and evaluated using remaining 30% of the data. The classification model considered optimized feature subset as inputs and class of ore as the output whereas the regression model considered optimized feature subset as inputs and grade value of ore as output. The classification and regression models were run separately for dry sample and wet samples. The classification model performance was evaluated using confusion matrix parameters [59, 60] whereas the regression model performance was evaluated using correlation coefficient value [61]. The confusion matrix has plotted between the observed and predicted classes having the observed classes in rows and predicted classes in columns. The values of the diagonal elements of the confusion matrix represent the true classification (either positive or negative) and the values of the

SVM Vision System & Surface Defect Detection 121 remaining elements of the confusion matrix represent the false classification (either positive or negative). The confusion matrix parameters are considered as sensitivity, specificity, misclassification, and accuracy. The value of all the parameters can be determined from the true positive (TP), False positive (FP), true negative (TN), and false negative (FN). It can be more understand by Figure 6.6. The true positive (TP) of any particular class is the diagonal element of that class and true negative (TN) is the sum of the remaining element excluding horizontal and vertical element of that class. The false positive (FP) of any particular class can be obtained by summation of the column element of that class except TP and false negative (FN) can be obtained by summation of the row element of that class except TP. These four parameters i.e. TP, TN, FP, and FN have been used for the calculation of four confusion matrix parameters i.e. sensitivity, specificity, misclassification, and accuracy. The equations of these parameters are given below.

SensitivityT =

TP TP + FN

SpecificityT =

TN FP + TN

MissclassificationT = AccuracyT =

FC FP + FN = TC + FC TP + TN + FP + FN

TC TP + TN = TC + FC TP + TN + FP + FN

The confusion matrices for the testing sample as well as for training sample are given in Table 6.3. Predicted Classes Class n Class 1 Class 2 ... Class 1 Actual Class 2 Classes ... Class n For Class 1

True Positive False Positive

True Negative False Negative

Figure 6.6 Graphical representation of confusion matrix.

0

0

0

0

0

0

encased_knot

horn_knot

decayed_knot

leaf_knot

sound_knot

edge_knot

0

0

0

0

0

142

2

encased_knot

49

4

0

5

4

2

0

dry_knot

encased_knot

horn_knot

decayed_knot

leaf_knot

sound_knot

edge_knot

0

3

0

0

0

55

8

Testing Samples

140

dry_knot

dry_knot

Training Samples

1

0

0

0

53

0

0

0

0

0

0

153

0

0

horn_knot

Table 6.3 Confusion matrices for the training and testing sample.

0

1

0

48

0

1

2

0

0

0

128

0

0

1

decayed_knot

0

3

60

0

1

1

2

0

0

123

0

0

0

0

leaf_knot

0

48

0

0

0

0

3

0

121

1

0

0

0

0

sound_knot

50

1

0

1

3

0

0

144

0

0

0

0

0

0

edge_knot

122 Machine Vision Inspection Systems

SVM Vision System & Surface Defect Detection 123

Performance indices values

1.2 1 0.8 0.6

Sensitivity

0.4

Specificity

0.2

Misclassification Accuracy

n_ kn ot de ca ye d_ kn ot le af _k no t so un d_ kn ot ed ge _k no t

no t

ho r

ed _k

ca s en

dr

y_

kn ot

0

Knot Types

Figure 6.7 Classification performance indices plot for testing samples.

The result indicates that the model is not over fitted i.e. performing similar for the training as well as testing samples. Dry knot sample has been mostly classified as encased knot due to similarity on the few feature distributions. Some of encased knot, decayed knot and leaf knot have been classified as dry knot. The classification performance indices have obtained from the confusion matrix of testing sample using the calculating of true positive, true negative, false positive and false negative for each classes. The result is plotted in Figure 6.7 having classes in horizontal axis and indices values in the vertical axis.

6.3 Conclusion In the present study wood sample images have been collected from online database provided by University of Oulu with seven types of knot. They have high data size variation which is controlled by using augmentation technique. A set of 280 features (color and texture) has been extracted from each image. The sequential forward selection method has been used to obtain a set of optimized features with dimension 36. Support vector machine for multiclass classification has been developed with optimized feature and optimized parameter (as coding: onevsone, box constraint: 573.04, kernel: RBF and kernel scale: 1.7984). The result obtained for

124 Machine Vision Inspection Systems testing sample is evaluated for performance indices (i.e. sensitivity 0.8902, specificity 0.9881 and accuracy 0.9679 obtained near to 1 and misclassification 0.0321 obtained near to 0 as per the effective model) and has been found that the model can be effectively implemented in the wood industry to surface defect detection on wood.

References 1. Huber, H.A., McMillin, C.W., McKinney, J.P., Lumber Defect Detection Abilities of Furniture Rough Mill Employees. For. Prod. J., 35, 11–12, 79–82, 1985. 2. Duda, R.O., Hart, P.E., Stork, D.G., Pattern classification, John Wiley & Sons, New York, 2012. 3. Srinivasan, K., Dastoor, P.H., Radhakrishnaiah, F., Jayaraman, S., FDAS: A knowledge-based framework for analysis of defects in woven textile structures. J. Text. Inst., 83, 3, 431–448, 1992. 4. Chan, C.-H. and Pang, G.K.H., Fabric defect detection by Fourier analysis. IEEE Trans. Ind. Appl., 36, 5, 1267–1276, 2000. 5. Ngan, H.Y.T., Pang, G.K.H., Yung, S.P., Ng, M.K., Wavelet based methods on patterned fabric defect detection. Pattern Recognit., 38, 4, 559–576, 2005. 6. Bu, H., Wang, J., Huang, X., Fabric defect detection based on multiple fractal features and support vector data description. Eng. Appl. Artif. Intell., 22, 2, 224–235, 2009. 7. Wen, Z., Cao, J., Liu, X., Ying, S., Fabric Defects Detection using Adaptive Wavelets. Int. J. Cloth. Sci. Technol., 26, 3, 202–211, 2014. 8. Li, Y. and Zhang, C., Automated vision system for fabric defect inspection using Gabor filters and PCNN. Springerplus, 5, 1, 765, 2016. 9. Chang, X., Gu, C., Liang, J., Xu, X., Fabric Defect Detection Based on Pattern Template Correction. Math. Probl. Eng., 2018, 1–17, 2018. 10. Li, P., Liang, J., Shen, X., Zhao, M., Sui, L., Textile fabric defect detection based on low-rank representation. Multimed. Tools Appl., 78, 1, 99–124, 2019. 11. Sun, Q., Cai, J., Sun, Z., Detection of Surface Defects on Steel Strips Based on Singular Value Decomposition of Digital Image. Math. Probl. Eng., 2016, 1–12, 2016. 12. Zheng, H., Kong, L., Nahavandi, S., Automatic inspection of metallic surface defects using genetic algorithms. J. Mater. Process. Technol., 125–126, 427–433, 2002. 13. Pernkopf, F., Detection of surface defects on raw steel blocks using Bayesian network classifiers. Pattern Anal. Appl., 7, 3, 333–342, 2004. 14. Aghdam, S.R., Amid, E., Imani, M.F., A fast method of steel surface defect detection using decision trees applied to LBP based features. 2012 7th IEEE Conf. Ind. Electron. Appl., vol. 17 (10), pp. 1447–1452, 2012.

SVM Vision System & Surface Defect Detection 125 15. Neogi, N., Mohanta, D.K., Dutta, P.K., Review of vision-based steel surface inspection systems. EURASIP J. Image Video Process., 2014, 1, 50, 2014. 16. Chu, M., Zhao, J., Liu, X., Gong, R., Multi-class classification for steel surface defects based on machine learning with quantile hyper-spheres. Chemom. Intell. Lab. Syst., 168, 15–27, 2017. 17. Li, J., Su, Z., Geng, J., Yin, Y., Real-time Detection of Steel Strip Surface Defects Based on Improved YOLO Detection Network. IFAC-PapersOnLine, 51, 21, 76–81, 2018. 18. He, Y., Song, K., Dong, H., Yan, Y., Semi-supervised defect classification of steel surface based on multi-training and generative adversarial network. Opt. Lasers Eng., 122, June, 294–302, 2019. 19. Hashim, U.R., Hashim, S.Z., Muda, A.K., Automated vision inspection of timber surface defect: A review. J. Teknol., 77, 20, 127–135, 2015. 20. Conners, R.W., Mcmillin, C.W., Lin, K., Vasquez-Espinosa, R.E., Identifying and Locating Surface Defects in Wood: Part of an Automated Lumber Processing System. IEEE Trans. Pattern Anal. Mach. Intell., PAMI-5, 6, 573– 583, 1983. 21. Conners, R.W., Cho, T.-H., Ng, C.T., Drayer, T.H., Araman, P.A., Brisbin, R.L., A machine vision system forautomatically grading hardwood lumber. Ind. Metrol., 2, 3–4, 317–342, 1992. 22. Conners, R.W., Kline, D.E., Araman, P.A., Drayer, T.H., Machine vision technology for the forest products industry. Computer (Long. Beach. Calif)., 30, 7, 43–48, 1997. 23. Funck, J.W., Zhong, Y., Butler, D.A., Brunner, C.C., Forrer, J.B., Image segmentation algorithms applied to wood defect detection. Comput. Electron. Agric., 41, 1–3, 157–179, 2003. 24. Cavalin, P., Oliveira, L.S., Koerich, A.L., Britto, A.S., Wood defect detection using grayscale images and an optimized feature set. IECON Proc. (Industrial Electron. Conf., December, pp. 3408–3412, 2006. 25. Gu, I.Y.H., Andersson, H., Vicen, R., Automatic classification of wood defects using support vector machines. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 5337 LNCS, 356–367, 2009. 26. Piuri, V. and Scotti, F., Design of an automatic wood types classification system by using fluorescence spectra. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 40, 3, 358–366, 2010. 27. Mu, H., Qi, D., Zhang, M., Zhang, P., Study of wood defects detection based on image processing. Proc.—2010 7th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2010, vol. 2, pp. 607–611, 2010. 28. França, T.S.F.A., França, F.J.N., Ross, R.J., Wang, X., Arantes, M.D.C., Seale, R.D., African mahogany wood defects detected by ultrasound waves. 19th Int. Nondestruct. Test. Eval. Wood Symp., September 2015, pp. 22–28, 2015. 29. Hittawe, M.M., Muddamsetty, S.M., Sidibe, D., Meriaudeau, F., Multiple features extraction for timber defects detection and classification using SVM. 2015 IEEE Int. Conf. Image Process., pp. 427–431, 2015.

126 Machine Vision Inspection Systems 30. Hashim, U.R., Hashim, S.Z.M., Muda, A.K., Performance evaluation of multivariate texture descriptor for classification of timber defect. Optik (Stuttg)., 127, 15, 6071–6080, 2016. 31. Qayyum, R., Kamal, K., Zafar, T., Mathavan, S., Wood defects classification using GLCM based features and PSO trained neural network. 2016 22nd Int. Conf. Autom. Comput. ICAC 2016 Tackling New Challenges Autom. Comput., pp. 273–277, 2016. 32. Kamal, K., Qayyum, R., Mathavan, S., Zafar, T., Wood defects classification using laws texture energy measures and supervised learning approach. Adv. Eng. Inform., 34, February, 125–135, 2017. 33. Tong, H.L., Ng, H., Yap, T.V.T., Ahmad, W.S.H.M.W., Fauzi, M.F.A., Evaluation of feature extraction and selection techniques for the classification of wood defect images. J. Eng. Appl. Sci., 12, 3, 602–608, 2017. 34. Chang, Z., Cao, J., Zhang, Y., A novel image segmentation approach for wood plate surface defect classification through convex optimization. J. For. Res., 29, 6, 1789–1795, 2018. 35. Li, S., Li, D., Yuan, W., Wood Defect Classification Based on TwoDimensional Histogram Constituted by LBP and Local Binary Differential Excitation Pattern. IEEE Access, 7, 145829–145842, 2019. 36. Xie, X., A Review of Recent Advances in Surface Defect Detection using Texture analysis Techniques. ELCVIA Electron. Lett. Comput. Vis. Image Anal., 7, 3, 1, 2008. 37. Gupta, R., Basta, C., Kent, S.M., Effect of knots on longitudinal shear strength of Douglas-fir using shear blocks. For. Prod. J., 54, 11, 77–83, 2004. 38. Kauppinen, H. and Silvén, O., A color vision approach for grading lumber, in: Theory and Applications of Image Analysis II, pp. 367–379, World Scientific, Sweden, 1995. 39. Silvén, O. and Kauppinen, H., Recent developments in wood inspection. Int. J. Pattern Recognit. Artif. Intell., 10, 01, 83–95, 1996. 40. Kauppinen, H. and Silven, O., The effect of illumination variations on color-based wood defect classification. Proc. 13th Int. Conf. Pattern Recognit, vol. 3, pp. 828–832, 1996. 41. Zhang, C., Tavanapong, W., Wong, J., de Groen, P.C., Oh, J., Real Data Augmentation for Medical Image Classification. Lecture Notes in Computer Science, 10552, 67–76, 2017. 42. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A., Return of the Devil in the Details: Delving Deep into Convolutional Nets. BMVC 2014—Br. Mach. Vis. Conf, 2014. 43. Yaroslavsky, L.P., Fast Transforms in Image Processing: Compression, Restoration, and Resampling. Adv. Electr. Eng., 2014, 1–23, 2014. 44. Harmuth, H.F., Transmission of Information by Orthogonal Functions, Springer Berlin Heidelberg, Berlin, Heidelberg, Germany, 1970. 45. Ahmed, N., Natarajan, T., Rao, K.R., Discrete Cosine Transform. IEEE Trans. Comput., C–23, 1, 90–93, 1974.

SVM Vision System & Surface Defect Detection 127 46. Daugman, J.G., Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A, 2, 7, 1160, 1985. 47. Daugman, J.G., Two-dimensional spectral analysis of cortical receptive field profiles. Vision Res., 20, 10, 847–856, 1980. 48. Daubechies, I., Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math., 41, 7, 909–996, 1988. 49. Daubechies, I., Where do wavelets come from? A personal point of view. Proc. IEEE, 84, 4, 510–513, 1996. 50. Kudo, M. and Sklansky, J., Comparison of algorithms that select features for pattern classifiers. Pattern Recognit., 33, 1, 25–41, 2000. 51. Schenk, J., Kaiser, M., Rigoll, G., Selecting Features in On-Line Handwritten Whiteboard Note Recognition: SFS or SFFS? 2009 10th Int. Conf. Doc. Anal. Recognit., pp. 1251–1254, 2009. 52. Rifkin, R., Mukherjee, S., Tamayo, P., Ramaswamy, S., Yeang, C.-H., Angelo, M., Reich, M., Poggio, T., Lander, E.S., Golub, T.R., Mesirov, J.P., An Analytical Method for Multiclass Molecular Cancer Classification. SIAM Rev., 45, 4, 706–723, 2003. 53. Vapnik, V.N., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. 54. Shawe-Taylor, J. and Cristianini, N., Kernel Methods for Pattern Analysis, Cambridge University Press, New York, NY, USA, 2004. 55. Chatterjee, S., Vision-based rock-type classification of limestone using multiclass support vector machine. Appl. Intell., 39, 1, 14–27, 2013. 56. Perez, C.A., Saravia, J.A., Navarro, C.F., Schulz, D.A., Aravena, C.M., Galdames, F.J., Rock lithological classification using multi-scale Gabor features from sub-images, and voting with rock contour information. Int. J. Miner. Process., 144, 56–64, 2015. 57. Ivanciuc, O., Applications of support vector machines in chemistry. Rev. Comput. Chem., 23, 291–400, 2007. 58. Patel, A.K., Chatterjee, S., Gorai, A.K., Development of an expert system for iron ore classification. Arab. J. Geosci., 11, 15, 401, 2018. 59. Ranawana, R. and Palade, V., Multi-classifier systems: Review and a roadmap for developers. Int. J. Hybrid Intell. Syst., 3, 1, 35–61, 2006. 60. Sokolova, M. and Lapalme, G., A systematic analysis of performance measures for classification tasks. Inf. Process. Manag., 45, 4, 427–437, 2009. 61. Prasad, K., Gorai, A.K., Goyal, P., Development of ANFIS models for air quality forecasting and input optimization for reducing the computational cost and time. Atmos. Environ., 128, 246–262, 2016.

7 Computational Linguistics-Based Tamil Character Recognition System for Text to Speech Conversion Suriya, S.1*, Balaji, M.2, Gowtham, T.M.1 and Rahul, Kumar S.1 1

Department of Computer Science and Engineering, PSG College of Technology, Coimbatore, India 2 Arcesium, Bangalore, India

Abstract

Computational linguistics focuses towards text recognition and synthesis, speech recognition and synthesis and conversion between text to speech and vice versa. Specially designed computers and softwares are available for speech synthesis. This chapter branches out towards a text-to-speech system (TTS) which is used for conversion of natural language text into speech distinguishing itself from other systems which renders symbolic linguistic representations like phonetic transcriptions into speech. The quality of any speech synthesizer is evaluated based on its ability to resemble human voice. An intelligible text-to-speech program allows a visually impaired or a person with reading disability to familiarize a language. The text-to-speech (TTS) system will convert text into acoustic signal. Although there exist many TTS approaches, the intelligibility, naturalness, comprehensibility, and recall ability of synthesized speech is not good enough to be widely accepted by users. End-to-end speech synthesis has become a popular paradigm as a TTS can be trained using only pairs. This chapter proposes a research idea that will focus on building a TTS system for Tamil Language. Keywords: Text-to-speech, orthographic text, acoustic signal, end-to-end speech synthesis

*Corresponding author: [email protected]; [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (129–154) © 2021 Scrivener Publishing LLC

129

130 Machine Vision Inspection Systems

7.1 Introduction Deep Learning technology is a branch of artificial intelligence that reflects the working of the human brain in data processing and creating patterns for decision making. Deep learning has the capability of learning unsupervised knowledge from unstructured or unlabeled data. Natural language processing is a field of computational linguistics that is concerned with the interactions between computers and natural languages. Speech recognition, natural language understanding, and natural language generation are the major challenges in natural language processing. The scope of this system is to tackle the problems that can arise in day to day life. The major applications of a speech synthesis model are: • The system can be used to help communicate with deaf and mute people. • The system can be used for automated marketing and customer services. • It can be installed at busy places like airports, railway stations for information announcements. • This system can be used with personal assistant devices like google home etc. • This system can be used for automated voice text messaging.

7.2 Literature Survey Analysis of pronunciation learning in end-to-end speech synthesis: in paper [1], Text to Speech systems are developed for many Indian languages via text-based representations as shown in Figure 7.1. This paper has introduced a multi-language character map to train the system based on characters into speech and a common label set approach on phone based. Ultimate goal of both the approaches are to extract the similarities existing among the languages. The results help researchers to infer that good quality Text to speech systems can be developed using both approaches. Building Multilingual End-to-End Speech Synthesizers for Indian Languages: In this paper [2] has avoided the research technique discussed earlier in paper [1] and used a pronunciation technique as part of a direct mapping from input characters to speech audio. The system is called the End to End model which overlooks the pitfalls of letter to sound model and out-of-vocabulary words model. Experimental results prove that the ability of an E2E model to learn correct pronunciations.

TTS System for Tamil Language 131 Wave Form Generation

Text Analysis

Text Case Study - Tamil Language

Linguistic Analysis of Text

Speech

Figure 7.1 Text to speech system.

Tacotron: Towards end-to-end speech synthesis: in paper [3], has focused on the feature of speech synthesis from characters into a Text to Speech Tacotron model. The system adopts a random initialization for training from scratch when provided with input of pairs. This Tacotron model comparatively is faster in generating speech at the frame level. Text to speech synthesis system for Tamil: in paper [4] Mel generalized coefficients model is used to replace the traditional pitch model. This model uses a statistical mapping through Gaussian mixture model to obtain pitch. Experimental results prove that no significant loss in the naturalness of the synthesized speech. A complete text-to-speech synthesis system in Tamil: in paper [5] Thirukkural is taken as a case study for text to speech synthesis. An automatic segmentation algorithm followed by discrete cosine transform based pitch technique is proposed for segmenting syllables into constant and vowel to obtain synthesized natural Tamil speech. Kumar et al. [6] have done a study on the results produced by various classifiers such as K-Nearest Neighbor, Support Vector Machine, Naive Bayes, Decision Trees, Convolution Neural Networks and Random Forest on Gurmukhi characters. This paper concludes that Random Forest classifier is more effective but it suggests that a hybrid application of various classifiers will always have better impact on the performance. Sabeenian et al. [7] focused on recognizing Tamil Characters from Ancient Palm-Leaf manuscripts with the help of Convolution Neural Networks. The proposed method has a recognition rate of 96.1 to 100% which is due to the large number of features that were extracted for different layers. However the ancient Tamil fonts are different from the modern script due to which there is very limited dataset for recognizing ancient manuscripts.

132 Machine Vision Inspection Systems Deepa and Rao [8] proposed a nearest interest point classifier for identifying handwritten Tamil characters, when compared with other classification techniques nearest interest point classifier is found to have an advantage of identifying wide range of handwritten characters which are not easy to classify using standard classifiers. The collective class similarity voting makes this classifier a better approach than traditional classifiers. Kowsalya and Periasamy [9] have worked on Tamil character recognition using a different approach by using Optimal Neural Network with Elephant herd optimization. This approach is found to have 93% accuracy when compared to other classifiers such as SVM, SOM, FNN, RBF. It also uses effective preprocessing techniques such as binarization, skew detection and noise removal using gaussian filter. Babu and Soumya [10] have provided a survey on Optical Character Recognition of historical handwritten manuscripts. It is observed that few approaches use a combination of feature extraction methods and classifiers which yields better accuracy. However there are certain limitations due to the unavailability of proper datasets and due to the presence of complexity in feature extraction. Raghupathy and Chandrasekaran [11] have proposed a system based on Deep Learning approach to recognize handwritten Tamil characters. This system achieves an accuracy of 97.7% which is found to be better than other existing systems. Certain issues faced by this system is mainly due to ambiguities in writing which is said to improve on fine tuning the hyper parameters. Kar and Banerjee [12] have proposed a system for recognizing Bangla characters which functions based on the Euclidean Distance between center of gravity and endpoints. This system is a new approach which has not been attempted previously. However this method does not yield considerable accuracy rate. Raj and Abirami [13] have worked on recognizing Tamil characters with various styles and shape with the help of Support Vector Machine classifier. The proposed system utilizes three different feature representation techniques for better feature extraction of curvy letters. This makes it to improve the recognition accuracy of more different fonts and handwritings of the Tamil character. Shanmugam and Vanathi [14] have proposed a system using raspberry pi with the help of Support Vector Machine. The proposed system recognizes Tamil character and then converts the recognized characters into speech. This system also recognizes English characters. However recognition of Tamil characters is not very accurate due to limited dataset.

TTS System for Tamil Language 133 Jajoo et al. [15] have worked on identifying the script from multi-script texts which are obtained through camera captured images. This helps in identifying different scripts among multi-script texts. However the proposed system will not be able to perform as expected when the camera captured images have heterogeneous background Chittaragi et al. [16] have worked together on Dialect Identification System on the Kannada Language. With the help of Support Vector Machine and Neural networks they have achieved classification of texts at a larger scale. The classification is also found to have a much higher accuracy. But it has a drawback of classifying the texts at the word and phoneme level. Xie et al. [17] have provided a wider study on Unconstrained Scene Text Recognition using the Convolutional Attention Networks classifier. Convolutional Attention Networks is found to be a stable algorithm providing a better classification over the wide range of Scene texts. However it is found to have a complex structure in categorizing the Scene texts. Srivastava et al. [18] have proposed a system for handwritten character recognition of block letters and digits. The proposed system also focused on pattern recognition with the aid of 2D Convolutional Neural Networks. This system has achieved significant accuracy of 97% when compared with other systems. Though it is found to be inefficient in recognizing Cursive Handwritten characters. Zhang et al. [19] have proposed a system on text image recognition using Sequence to Sequence Domain Adaptation Network as their classification algorithm. This system is proved to be more efficient as it includes a wide range of texts such as scene text, Handwritten texts and Mathematical expressions. However the proposed system fails when handling various sequence domain shifts. Francis and Sreenath [20] proposed a system on validating and recognizing the text present in the natural scene images using the Manifold Regularized Twin-Support Vector Machine classifier. It possesses a great advantage of having the accuracy rate upto 84.91% and a precision rate of 85.71%.But the proposed system is inefficient while recognizing cursive texts and lacks error corrections techniques. Luo et al. [21] have proposed a study on recognizing Scene Text Recognition using the Multi-Object Rectified Attention Network in their study of Scene Text recognition. It has a greater advantage of taking images of any size for recognition. However, it does not deal with arbitrary- oriented text recognition. It has greater drawbacks on the texts with a large curve angle.

134 Machine Vision Inspection Systems

7.3 Proposed Approach The dataset created should be available in pair format. The dataset is to be removed of noise and silence for further processing and effective results. The system should be able to generate text–audio mappings from the dataset. The system should ensure that the neural network model chosen maps the Tacotron models with highest accuracy. The system should provide the representation of how the Tacotron model works with the Text to Speech conversion system. The system should be able to complete training within a considerable amount of time on a system with a moderate configuration. The system provides the Audio synthesized by the system as output for the given text input, which is done using the Tacotron model trained with the help of a set of encoders and decoders that adds attention to the pairs that gives a raw spectrogram. This has to be reconstructed using Griffin Lim algorithm for final output which is the synthesized waveform. The system should be easy to use. The system should be able to recognize the text input from the user correctly and provide the correct audio as output using speech synthesis. Training the Tacotron model uses more steps in order to make the recognition much easier. This software will be developed with machine learning techniques. Therefore it depends primarily on the quality of the dataset provided for the training, as well as the number of steps or cycles performed in training. Also, user provided data will be used to compare with result and measure reliability. The user gained data should be enough for reliability if enough data is obtained. The users can use the program at any time, so maintenance and reliability is guaranteed. Speech Synthesis time and playback time should be as little as possible. The conversion should not take up more than 10 s. Conversion and Playback durations are very low. The amount of mismatch in synthesized speech output should be minimal. The system should require Python knowledge for maintenance. If a problem arises in Tacotron, it requires code knowledge and deep learning background to solve. The system should have sufficient libraries that support end-to-end speech synthesis.

7.4 Design and Analysis The UML usecase diagram as shown in Figure 7.2 depicts the different modules that are implemented in the bilingual translation system. The usecases in the diagram represent each module in the system. The actors involved are user and the system. The usecase specification is shown in Table 7.1.

TTS System for Tamil Language 135 System Give Input User

System

Cancel

Initialize

Pre-Process

Sample Role Correction

Encoder

Noise Removed

Silence Removed

Add Attention

Decoder

Reconstruction

Voice Output

Figure 7.2 UML Usecase diagram.

Table 7.1 Usecase specification. Usecase name

Pre-process

USECASE ID

4

DESCRIPTION

This module pre-processes the audio dataset using various techniques such as Noise Removal, Sample Rate Correction and Silence Removal

ACTORS INVOLVED

System

PRE CONDITIONS

The dataset and meta data file for the dataset must be prepared and mapped correctly .The audio dataset should be in .wav format.

MAIN FLOW

After initialization the audio files are passed through this module where a set of steps are performed to fine tune the dataset by removing noise and silence.

POST CONDITIONS

Nil

ALTERNATE FLOW

Nil

136 Machine Vision Inspection Systems User

Input

System

Initialize

Pre Process

Encoder

Add Attention

Cancel

Decoder

Reconstruction

Voice Output

Figure 7.3 Activity diagram for TTS system.

Activity Diagram The activity diagram as shown in Figure 7.3 and Figure 7.4 depicts the flow in the system. The different activities include downloading of language dataset, preprocessing, building a model, Training, Translation and Testing. Deployment Diagram The deployment diagram as shown in Figure 7.5 depicts how the entire system is deployed in a PC and the tools used in developing the bilingual machine translation system.

7.5 Experimental Setup and Implementation In April 2017, Google published a paper, “Tacotron: Towards End-to-End Speech Synthesis”, which presents a neural text-to-speech model that learns to synthesize speech directly from (text, audio) pairs. The detailed steps described

TTS System for Tamil Language 137 Pre-Process

Sample Role Correction

Noise Removal

Silence Removal

Figure 7.4 Sub-activity diagram for pre-process module.

PC

Linux

Anaconda

Encoder Decoder Model

Tamil Audio Dataset

Training and Testing

Figure 7.5 Deployment diagram.

138 Machine Vision Inspection Systems below is an attempt to implement the model described in their paper for Tamil Language. The following project has been implemented using python3 along with a few packages and modules as shown in Figure 7.6. The development has been done under Anaconda Environment. The use of Anaconda prompt is done hence installation of Anaconda is recommended. The Installation of latest version of TensorFlow within Anaconda is recommended. The performance of the model can be increased with the installation of GPU support for TensorFlow if available. Python packages and modules to be installed are listed in the requirements.txt file. They can be installed using pip installation. A self-developed experimental dataset along with the existing Microsoft Open Research Dataset has been used for training and testing purposes. The details of each of the datasets has been described below. An audio file for each of the 247 Tamil Characters has been developed by our team by recording voice files of 1–2 s each. The audio files have been made to be consistent for training and testing purposes. The bit rate of each of the audio files has been set to 384 kbps. Similarly, the project rate for each of the audio files has been set to 24,000 Hz. These properties have been set to match the properties of audio files present in the Microsoft Open Research Dataset. A metadata file has been developed such that it matches the audio file name with its text. The audio files and the metadata file for the self- developed experimental dataset has been shown in Figure 7.7. Microsoft Speech Corpus (Indian languages) release contains conversational and phrasal speech training and test data for Telugu, Tamil and Gujarati languages. The data package includes audio and corresponding transcripts. The Tamil data package has been made use for this project. The package contains 39,131 audio files for training and 3,081 audio files

pip install -r requirements.txt

Figure 7.6 Python module requirements.

TTS System for Tamil Language 139

Figure 7.7 Audio files and Metadata.txt.

for testing. Each of the audio files represent a phrase or sentence in Tamil Language. The transcript for each of these sentences has also been provided. The audio files and the transcript file for the Microsoft Speech Corpus dataset can be seen in Figure 7.8.

Figure 7.8 Audio files and transcriptions.

140 Machine Vision Inspection Systems The datasets are currently in the raw form and need to be preprocessed such that it can be used for training and testing without any errors as shown in Figure 7.10. The preprocessing steps to be carried out are discussed below. The preprocessing of dataset involves various steps starting from data preparation to formation of linear and mel spectogram of the dataset as shown in Figure 7.11. Each step has been discussed below. The datasets have been edited and trimmed to avoid noise and empty spaces in the front and end as shown in Figure 7.12. The open source software Audacity has been used to prepare the data. The sample preparation can be seen in Figure 7.9. Audacity is a free and open-source digital audio editor and recording software which is available for different operating systems. Audacity can be used for post-processing of all types of audio. The pre-processor file includes computing the linear and mel spectrograms, saving the spectrograms to the disk and generating a tuple (spectrogram_filename, melspectogram_filename, n_frames, text) to write to train.py file. Once pre-processing is completed, the linear and mel spectrograms created are ready to use for training. Before training the model, we must specify the text symbols going to be used, along with their regular expressions. The python file symbols.py represents the set of valid symbols that are used in the text input to the Tacotron model. The valid symbols include Tamil characters, padding, space, as well as punctuations. The script for symbols. py file is shown in Figure 7.13.

Figure 7.9 Data preparation.

TTS System for Tamil Language 141

Figure 7.10 Pre-process function for Tamil dataset.

Figure 7.11 Function for Linear and Mel Spectrogram creation.

Figure 7.12 Main function for Pre-process file.

Figure 7.13 Set of valid symbols.

142 Machine Vision Inspection Systems The inclusion of English characters enables use of mix of Tamil and English characters during input text to the model. The English characters are converted to Tamil during input text processing phase. The valid Tamil characters are shown in Figure 7.14. The development of tacotron model can be split into 4 modules. The detailed description of each module is discussed below. The general architecture of the tacotron model shows the combination of all the four modules. The block diagram representing the general architecture can be seen in Figure 7.15. Tacotron is an end-to-end generative Text to Speech model based on the sequence-to-sequence (seq2seq) with attention paradigm. The model takes characters as input and produces spectrogram frames which are converted to waveforms. The four modules involved in its development involves CBHG Module, Encoder, Decoder and Post-processing net and Waveform synthesis. CBHG consists of a bank of 1-D convolutional filters, followed by highway networks and a bidirectional gated recurrent unit (GRU). It is a powerful module used to extract representations from sequences. The input sequence is first convolved with K sets of 1-D convolutional filters, where the kth set contains Ck filters of width k (i.e. k = 1,2,…, K). To increase local invariances the convolution outputs are stacked together and further

Figure 7.14 Valid Tamil characters.

TTS System for Tamil Language 143 Final Encoder Representation

Reconstruction

Output

CBHG

CBHG

Input to next decoder step

Pre-net

Add Attention

Decoder RNN

Character Embedding

Attention RNN

Input

Pre-net

Input from previous step

Figure 7.15 Tacotron block diagram.

max pooled along time. The processed sequence is further passed to a few fixed-width 1-D convolutions, whose outputs are added with the original input sequence via residual connections. To extract high-level features the convolution outputs are fed into a multi-layer highway network. Finally, a bidirectional GRU RNN is stacked on top to extract sequential features from both forward and backward context.To develop the CBHG module in python we make use of TensorFlow inbuilt functions which include the GRUCell, bidirectional_dynamic_rnn and highwaynet. The code for CBHG module can be seen in Figure 7.16. The CBHG model has been developed which takes us to develop the next module, the encoder. The CBHG module will be made use of in the encoder module to extract representations from the Tamil input sequences. The next module describes the implementation of the encoder. The encoder is responsible for extraction of robust sequential representations of text. A character sequence is given as input to the encoder, where each character is represented as a one-hot vector and embedded into a continuous vector. Then a set of non-linear transformations called a pre-net is applied to each embedding. A bottleneck layer with dropout is used as the pre-net which helps convergence and improves generalization. A CBHG module transforms the pre-net outputs into the final encoder representation used by the attention module. The code for encoder can be seen in Figure 7.17.

144 Machine Vision Inspection Systems

Figure 7.16 CBHG module function.

Figure 7.17 Code for character embedding and encoder.

The attention module makes use of Bahdanu Attention. Further in Decoder module a content-based tanh attention module is used where an attention query is produced at each decoder time step. The code snippet for attention can be found in Figure 7.18. The encoder has been developed and also attention has been added to the Tamil character embeddings, we must now develop the decoder. The context vector and the attention RNN cell output are concatenated to form the input

TTS System for Tamil Language 145

Figure 7.18 Code for attention module.

to the decoder RNNs. A stack of GRUs with vertical residual connections is used for the decoder. An 80-band mel-scale spectrogram is used as the target and then a post-processing network is used to convert from the seq2seqtarget to waveform. The code snippet for decoder can be seen in Figure 7.19. The post-processing net converts the seq2seq target to a target that can be synthesized into waveforms. Since Griffin-Lim is used as the synthesizer, the post-processing net learns to predict spectral magnitude sampled on a linear-frequency scale. A CBHG module is used for the post-processing net. The code snippet for post- processing net can be found in Figure 7.20. The tacotron model has been trained on the self-made experimental Tamil characters dataset. A batch size of 32 is used, where all sequences are padded to a max length. The code snippet can be seen in Figure 7.21. The execution of training the model was done in Anaconda prompt and the summary of the training was recorded every 1000 steps. This summary can also be used as a checkpoint from which training can be restarted. The execution of train.py can be seen in Figure 7.22. While training the model, for testing the accuracy, the model is be initialized after running it for 2 lakh steps. The model initialization can be seen in Figure 7.23. With the completion of backend part, the frontend needs to be developed. The frontend has been primarily developed using HTML, CSS and JavaScript. The input text is received using GET method. The input text is processed in the backend and the output is displayed as an audio file back to the frontend. The detailed steps in developing the frontend are described below.

Figure 7.19 Code for decoder.

146 Machine Vision Inspection Systems

Figure 7.20 Code for Post-processing net.

Figure 7.21 Training code.

TTS System for Tamil Language 147

Figure 7.22 Train.py execution.

Figure 7.23 Model initialization.

The demo server is set up to display the front end locally. The server serves on port 9000. It is built using the python modules falcon and argparse. Additionally, a python script synthesize.py for synthesizing the audio is developed which is invoked when the input is passed. The code snippet for setting up the demo server is seen in Figure 7.24. The code snippet to call the synthesize function is shown in Figure 7.25.

148 Machine Vision Inspection Systems

Figure 7.24 Demo server code.

Figure 7.25 Function to call synthesizer.

A simple web application is developed using HTML, CSS and JavaScript which is activated on a local server at port 9000 once a demo server file is executed. The code for the webpage and function call is shown in Figure 7.26 and Figure 7.27 respectively. Upon execution of the demo server script, the server is started at port 9000 and webpage has been hosted locally. The output of the execution has been shown in Figure 7.28. The server is now setup and GET method is working successfully. This is an indication that the webpage is now hosted and ready for use. The webpage is shown in Figure 7.29. The webpage is now successfully set up. It is now time to test the model. A Tamil character is given as input and audio file is the expected output. The input is shown in Figure 7.30. Input has been successfully given and audio file is received as the output. Due to a smaller number of training steps executed the audio file consists of additional noisy data. synthesized along with actual expected speech. As the number of training steps increases, the comprehensibility will also increase. The output can be seen in Figure 7.31.

TTS System for Tamil Language 149

Figure 7.26 Webpage script.

Figure 7.27 Webpage script function call.

Figure 7.28 Demo server execution.

150 Machine Vision Inspection Systems

Figure 7.29 Webpage demo.

Figure 7.30 Input to the model for audio synthesize.

Figure 7.31 Audio output.

TTS System for Tamil Language 151

7.6 Conclusion A lot of research work exists in the survey for End-to-End Speech synthesis for Text-to-Speech conversion for English language. However, there isn’t a standard solution to synthesize the audio of Tamil characters with reasonable accuracy. Various methods have been used in each phase of the TTS conversion process, Challenges still prevails in the recognition and conversion of normal as well as texts with special sounds, auxiliary characters, similar sounded characters, joined characters, and so on during the conversion process. In this project, we have projected various aspects of each phase of the Tamil Text-toSpeech conversion process. We have used a minimal audio dataset considering the versatility of the language. Coverage is not given for different voices and pronunciation styles and sound issues. The following key challenges can be further explored in the future. As a result, the proposed system has been found to yield the decent recognition accuracy of 76%. Text-to-Speech conversion system described in this project will find potential applications when combined with our previous work on Handwritten Tamil Character Recognition. The proposed architecture has shown enhanced performance in Text-to-Speech conversion with the help of End-to-End Speech Synthesis.

References 1. Taylor, J. and Richmond, K., Analysis of pronunciation learning in end-toend speech synthesis. Proc. Interspeech 2019, pp. 2070–2074, 2019. 2. Prakash, A., Thomas, A.L., Umesh, S., Murthy, H.A., Building Multilingual End-to-End Speech Synthesizers for Indian Languages, in: Proc. 10th ISCA Speech Synthesis Workshop 2019, pp. 194–199, 2019. 3. Wang, Y., Skerry-Ryan, R.J., Stanton, D., Wu, Y., Weiss, R.J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio, S., Le, Q., Computation and Language, arXivLabs: experimental projects with community collaborators, Cornell University, Volume:1, pp. 1–10, March 2017. 4. Sangeetha, J., Jothilakshmi, S., Sindhuja, S., Ramalingam, V., Text to speech synthesis system for Tamil. Int. J. Emerging Technol. Adv. Eng., 3, 170–5, 2013. 5. Rama, G.J., Ramakrishnan, A.G., Muralishankar, R., Prathibha, R., A complete text-to-speech synthesis system in Tamil, in: Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002, pp. 191–194, IEEE, September, 2002. 6. Kumar, M., Jindal, M.K., Sharma, R.K., Jindal, S.R., Performance evaluation of classifiers for the recognition of offline handwritten Gurmukhi characters and numerals: A study. Artif. Intell. Rev., 53, 3, 1–23, 2019. 7. Sabeenian, R.S., Paramasivam, M.E., Anand, R., Dinesh, P.M., Palm-Leaf Manuscript Character Recognition and Classification Using Convolutional

152 Machine Vision Inspection Systems Neural Networks, in: Computing and Network Sustainability, pp. 397–404, Springer, Singapore, 2019. 8. Deepa, R.A. and Rao, R.R., A novel nearest interest point classifier for offline Tamil handwritten character recognition. Pattern Anal. Appl., 23, 1, 1–14, 2019. 9. Kowsalya, S. and Periasamy, P.S., Recognition of Tamil handwritten character using modified neural network with aid of elephant herding optimization. Multimed. Tools Appl., 78, 17, 1–19, 2019. 10. Babu, N. and Soumya, A., Character Recognition in Historical Handwritten Documents—A Survey, in: 2019 International Conference on Communication and Signal Processing (ICCSP), pp. 0299–0304, IEEE, April 2019. 11. Raghupathy, K.B. and Chandrasekaran, S., Benchmarking on offline Handwritten Tamil Character Recognition using convolutional neural networks. J. King Saud Univ.-Comp. Info. Sci., 2019. 12. Kar, C. and Banerjee, S., Bangla Character Recognition by Euclidean Distance Between Center of Gravity and Endpoints, in: Smart Intelligent Computing and Applications, pp. 1–7, Springer, Singapore, 2019. 13. Raj, M.A.R. and Abirami, S., Structural representation-based off-line Tamil handwritten character recognition. Soft Comput., 24, 2, 1–26, 2019. 14. Shanmugam, K. and Vanathi, B., Hardcopy Text Recognition and Vocalization for Visually Impaired and Illiterates in Bilingual Language, in: Computational Intelligence and Sustainable Systems, pp. 151–163, Springer, Cham, 2019. 15. Jajoo, M., Chakraborty, N., Mollah, A.F., Basu, S., Sarkar, R., Script Identification from Camera-Captured Multi-script Scene Text Components, in: Recent Developments in Machine Learning and Data Analytics, pp. 159– 166, Springer, Singapore, 2019. 16. Ahmed, S.B., Naz, S., Swati, S., Razzak, M.I., Handwritten Urdu character recognition using one-dimensional BLSTM classifier. Neural Comput. Appl., 31, 4, 1143–1151, 2019. 17. Iqbal, A. and Zafar, A., Offline Handwritten Quranic Text Recognition: A Research Perspective, in: 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 0299–0304, IEEE, February, 2019. 18. Tang, B., Wang, X., Yan, J., Chen, Q., Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF. BMC Med. Inf. Decis. Making, 19, 3, 74, 2019. 19. Chittaragi, N.B., Limaye, A., Chandana, N.T., Annappa, B., Koolagudi, S.G., Automatic Text-Independent Kannada Dialect Identification System, in: Information Systems Design and Intelligent Applications, pp. 79–87, Springer, Singapore, 2019. 20. Xie, H., Fang, S., Zha, Z.J., Yang, Y., Li, Y., Zhang, Y., Convolutional Attention Networks for Scene Text Recognition. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM), 15, 1s, 3, 2019.

TTS System for Tamil Language 153 21. Zhu, F., Ma, Z., Li, X., Chen, G., Chien, J.T., Xue, J.H., Guo, J., Image-text dual neural network with decision strategy for small-sample image classification. Neurocomputing, 328, 182–188, 2019. 22. Srivastava, S., Priyadarshini, J., Gopal, S., Gupta, S., Dayal, H.S., Optical Character Recognition on Bank Cheques Using 2D Convolution Neural Network, in: Applications of Artificial Intelligence Techniques in Engineering, pp. 589–596, Springer, Singapore, 2019. 23. Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T., Sequence-tosequence domain adaptation network for robust text image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2740–2749, 2019. 24. Francis, L.M. and Sreenath, N., Robust scene text recognition: Using manifold regularized Twin-Support Vector Machine. J. King Saud Univ.-Comp. Info. Sci., 2019. 25. Luo, C., Jin, L., Sun, Z., Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognit., 90, 109–118, 2019.

8 A Comparative Study of Different Classifiers to Propose a GONN for Breast Cancer Detection Ankita Tiwari1, Bhawana Sahu1, Jagalingam Pushaparaj1* and Muthukumaran Malarvel2 School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India 2 Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India 1

Abstract

The most occurring cancer among Indian females is breast cancer, which was found as 25.8 per 100,000 women and mortality of 12.7 per 100,000 women as per the government of India survey from 2010 to 2014. The survey revealed that only 66.1% of women were diagnosed for cancer and survived. The detection of early cancer symptoms is important for diagnosing the ailment. To identify the tumor for breast cancer various machine learning algorithms were adopted in the literature. In this paper, a comparative study of existing classifiers like support vector clustering (SVC), decision tree classification algorithm (DTC), K-nearest neighbours (KNN), random forest (RF), and multilayer perceptron (MLP) are demonstrated on Wisconsin-breast-cancer-dataset (WBCD) of UCI Machine learning repository. The results indicate that the MLP outperformed the other classifier algorithms. Further, the linear regression approach is adopted to optimize the feature selection for giving an input to the genetic algorithm. Keywords: Breast cancer detection, machine learning, WBCD, multilayer perceptron, linear regression, feature selection

*Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (155–170) © 2021 Scrivener Publishing LLC

155

156 Machine Vision Inspection Systems

8.1 Introduction The early diagnosis of breast cancer can exponentially increase the chances of successful treatment of the ailment. The survey of world health organization presented in two decades reveals that for every 100 breast cancer patients; 2, 7 and 60% are in 20 to 30, 30 to 40 and 50 years age groups respectively. The present survey estimated that 4, 16, and 28% are 20 to 30, 30 to 40 and 50 years age groups respectively. It indicates that 48% of breast cancer patient were below 50 years, the increased risks of breast cancer among females in India were observed. Therefore, early detection of breast cancer may help for better diagnosis and to decrease the death rate [1]. To detect the breast cancer, tissues in the breast were examined and biopsy is operated if there is an abnormal growth of tissue. In the past, researchers have implemented the idea of neural networks for the detection of breast tissues. Nemissi et al. [2] adopted extreme learning machine algorithm on WBCD dataset to train the neural network with single hidden layer, proposed a classification system for the examination of breast cancer. The performance of algorithm are compared with the conventional methods, the extreme learning algorithm provides the better performance when the number of hidden nodes are reduced. Sethi [3] performed a comparison analysis between evolutionary algorithm and machine algorithms like genetic artificial neural network (GANN), particle swarm optimization algorithm (PSO), C4.5, k-nearest neighbors (k-NN) on WBCD dataset. The analysis shows that the GANN algorithm outperformed the others. Xue et al. [4] performed particle swarm optimization for the selection of feature on fourteen dataset from the repository of UC Irvine Machine Learning. The initial isolation strategy affects the selection of features and computational time. Belciug and Gorunescu [5] investigated genetic Algorithm, MLP, radialbasis function (RBF), probabilistic neural network (PNN), probabilistic combined neural network (PCNN) for the detection of breast cancer tissue for the datasets like Wisconsin prognostic breast cancer (WPBC), leukocyte-reduced red blood cells (LRRBC). They created a hybrid model for MLP and genetic Algorithm which was very flexible for providing accurate classification with various types of attributes present in medical studies. Alic et al. [6] applied logistic-regression, support-vector-machine (SVM), rotation forest, decision-trees, bayesian network, MLP, and radialbasis-function networks (RBFN). The rotation forest model had the high accuracy (99.48%) among the others.

GONN for Breast Cancer Detection 157 Al-Shargabi et al. [7] proposed a tuned MLP which is based on best fitting the hyper parameters with the feature selection applied to the WBCD dataset. It showed that the accuracy of 97.70% by tuned MLP which was better result compare to basic MLP. Garg et al. [8] optimized various input attributes using the artificial neural network to classify the cancer and performed the real time detection of tumor. They showed that the normal nuclei, cell size uniformity and single epithelial size of cell are the most important factor in deciding the type of tumour. S. Agrawal and J. Agrawal [9] did a survey for predicting the cancer using neural network. They have demonstrated the effectiveness of various neural network methods like MLP, PNN, Adaptive Resonance Theory (ART) and perceptron out of which MLP gives 97.1% accuracy by exceeding others. The main objective of the work is to optimize the genetic algorithm for the detection of breast cancer tissue. To optimise the algorithm, i) selection of feature is obtained using the linear regression, ii) for the selection of best classifiers algorithms among support vector, random forest classifier, K-neighbor classifier, decision tree classifier and multi-layer perceptron. The present paper examines only the selection of features and the performance analysis of classification algorithm. As a future research, optimization of genetic algorithm will be carried out for the ideal prediction of features (tissues) which lead to a high likelihood for the occurrence of cancer.

8.2 Methodology In order to optimize the genetic algorithm, the following steps were adopted for the selection of maximum likelihood features and classifier algorithm. Figure 8.1 represents the overall methodology of the present work. Figure 8.2 represents the methodology for the optimization of genetic algorithm.

8.2.1 Dataset The dataset used in the paper is obtained from WBCD (https://archive. ics.uci.edu/ml/) UCI Repository and is shown in Table 8.1. It consists of a total of 569 instances out of which 357 and 232 correspond to benign and malignant affected human breast tissue. The attributes listed in the table are from the cell nuclei, the values of the attributes are ranges from 0 to 10, were 0 denotes least abnormal state and 10 indicates most abnormal state.

158 Machine Vision Inspection Systems Wisconsin Breast Cancer Dataset

Classifier Selection

Decision Tree

Random Forest

SVC

Dataset Selection

KNN

Linear Regression

MLP

Most correlated attribute selection (feature selection)

Best Classifier Technique

Optimised Genetic Algorithm Prediction of Cancer

Malignant

Benign

Figure 8.1 The overall methodology of the present work. Neural Network Prediction

Genetic Algorithm Optimization

System Modelling

Initial Population

Neural Network Construction

Calculation Fitness Selection

Initialize Neural Network Training

Crossover Mutation

Is Training Done? YES

NO Are Optimised Criteria Met?

NO

Test data inputting YES Neural Network Prediction

Classification into Cancerous or Non-cancerous

Figure 8.2 Methodology for optimizing the genetic algorithm.

GONN for Breast Cancer Detection 159 Table 8.1 Dataset attribute of WBCD. Attribute

Value

Sample code number

1-10

Clump Thickness

1-10

Uniformity of cell size

1-10

Uniformity of cell shape

1-10

Marginal Adhesion

1-10

Single Epithelial cell size

1-10

Bare Nuclei

1-10

Bland chromatin

1-10

Normal Nucleoli

1-10

Mitoses

1-10

Class

(2 for benign, 4 for malignant)

8.2.2 Linear Regression To model the relationship between two attributes the linear regression is performed to obtain the maximum correlating attributes. The prominently correlated attributes, leads to the maximum likelihood for causing the tumour. Therefore it helps to identify the nature of the tumor as benign or malignant. Linear regression is a method used commonly for the prediction. The purpose behind is to find a line such that it describes the data in a best possible way [10]. The line should be positioned in such a way that the overall error in prediction for all the points in a particular dataset is least. The error is the distance between data points and the regression line.

Y(predicted value) = b0 + b1 ∗ X

(8.1)

Where b0 and b1 are regression coefficients, these values must be selected in such a way that it gives minimum discrepancy. The relationship between two variables is observed by fitting the linear equation to the observed data. One variable can be understood as explanatory and the other as dependent. Let X and Y are two variables randomly assigned, then

160 Machine Vision Inspection Systems If the regression of Y on X is linear, it can be denoted as:

σy y = µ y + ρ xy x ( x − µ x ) σ

(8.2)

If the regression of X on Y is linear, it can be denoted as:

x = µ x + ρ xy

μ: σx: σy: ρxy:

σx ( y − µ y) σy

(8.3)

Mean Standard deviation of X Standard deviation of Y Correlation coefficient

8.2.2.1 Correlation The correlation coefficient is denoted by ρxy of random variables X and Y is defined to be

ρxy =

Cov( X , Y ) σxσy

(8.4)

8.2.2.2 Covariance The covariance of X and Y denoted by Cov(X, Y), is defined as

Cov( X ,Y ) =

1 n

∑ XY − XY

(8.5)

X is a mean of X; Y is a mean of Y

yj = b0 + b1xj + Єj

(8.6)

For the computation of confidence interval and hypothesis test, assumption is that the errors were normally distributed with mean zero and variance σ are independent.

variance(σ 2 ) =

1 n

∑x 2 − x i

2

(8.7)

GONN for Breast Cancer Detection 161 To a sample of n-observations on x and y, the least square determines b0 and b1 which denotes the accuracy estimation and linear fitness of the data. The linear line generated need to fit the data exactly. If it is found divergence, the value is added between the actual and fitted. Therefore, the equation is modified as below;

yj = b0 + b1 + ej

(8.8)

j represent the observation number; b1 gives one, yj is the data discrepancy calculated between the actual and data value; the fitted value is calculated using the linear regression (yj).

8.2.3 Classification Algorithm In order to classify the tumor as benign (non-cancerous) or malignant (cancerous), the classification like SVC, DTC, KNN, RF, and MLP are demonstrated. The better classification algorithm is selected for optimizing the genetic algorithm for the prediction of tumors.

8.2.3.1 Support Vector Machine SVM is a supervised algorithm adopted majorly for the regression and classification problem. The datasets are plotted as a point in n dimensional space (n is the number of features) and each co-ordinate consists of a particular values of features [11]. The classification is performed with respect to the hyper plane found which distinguish the two classes. The hyper plane is selected by considering the nearest possible data points of each classes, and the nearest points are denoted as support vectors [12]. The aim of support vector classifier is to identify the maximum precise input features subset, for retrieving the optimal parameters.

WTX = 0 [equation of hyper plane]

(8.9)

SVM is capable of dealing the large dimension of data, hence it overcome the problem of dimensionality. The training error is in conjunction with training data, the error is expected to be close to zero. However, there is no agreement for the hyper plane to perform efficiently for the given data at the test time. The hyper plane can be infinite, the classifier has an option of selecting one which performs well during the training time as well as at test time [13].

162 Machine Vision Inspection Systems The equation for the 2-D plane space is donated as below:

w0 + w1x1 + w2x2 = 0

(8.10)

where w0, w1, and w2 are constants defines the slope and intercept of the line. The point which lies above the hyper plane is

w0 + w1x1 + w2x2 > 0

(8.11)

Similarly, the point lying lower the hyper plane is

w0 + w1x1 + w2x2 < 0

(8.12)

Euclidean equation of a hyper plane in Rm is

w0 + w1x1 + w2x2 + ⋯ + wmxm = b

(8.13)

wi’s and b are the real numbers and real constant respectively. The real constant (b) can be positive or negative. The hyper plane in form of matrix is expressed as

W.X + b = 0

(8.14)

Where W = [w1, w2.......wm] and X = [x1, x2.......xm] and b is a real constant.

8.2.3.2 Random Forest Classifier It was developed in 2001 by Breiman [14]. RF has an advantage of learning the datasets promptly for the vast datasets to classify the features. The algorithm is well known for the high accuracy in forecasting and also provides new information on variables. It is capable of handling the thousands of attributes and also for the estimation of missing data. It is a supervised learning algorithm. It selects the data randomly from the dataset for the construction of decision tree and forecasts the results through the tree which is shaped. For every attribute the individual decision tree were generated using the attribute selection indicator known as gini index and it is calculated as below.

Gini = I −

∑(Pi)2

(8.15)

GONN for Breast Cancer Detection 163 Where I denote the number of classes and Pi is the probability of occurrence of a particular class.

8.2.3.3 K-Nearest Neighbor Classifier K-Nearest Neighbor (K-NN) is a supervised algorithm adopted for the classification and regression problems. It does not make an assumption for the distribution of fundamental data. The working principle of K-NN algorithm is that, it makes a fundamental assumption that the objects of related characteristic fall close to each other [15]. It uses Euclidian or manhattan function for the calculation of similarity based on the distance of the objects [16]. m

Euclidean: d( x , y ) =

∑(x − y ) i

2

i

(8.16)

i =1

m

Manhattan: d( x , y ) =

∑|x − y | i

i

(8.17)

i =1

8.2.3.4 Decision Tree Classifier The main characteristic of decision tree is that it learns the inferences from the instances based on the laws. The working approach of decision tree classifier is that it breaks the one stage complex classification into several smaller less complex classes [17]. It breaks down the tests into smaller, simultaneously it also links the other instances. Figure 8.3 represents the working principle of decision tree.

Root Node C A

B

A

B

Figure 8.3 Working principle of decision tree.

B

B

164 Machine Vision Inspection Systems Where, A, B and C represent the nodes of tree, terminal nodes and branches of tree respectively. The entropy and gini index are the two modes of finding the decision tree. The attribute consists of high information is selected. S contains Si tuple of class Ci for I = {1,…,m} Information measures required to classify any arbitrary tuple.

InformationGain = I ( p, n) =

 n   p  −p n log 2  log 2  −  p+n  p + n   p+n p+n (8.18)

Entropy finding for the values of attribute A {a1, a2…av} is shown below; v

Entropy: E( A) =

∑ pp ++ nn (I( p,n)) i

i

(8.19)

i =1

The gained of information of attribute A is shown as below;

Gain(A) = I(p, n) – E(A)

(8.20)

8.2.3.5 Multi-Layered Perceptron The algorithm consists of several layers of nodes which are inter connected, each node is perceptron resembles multiple linear regression. The signals generated through the multiple linear regression are fed to the activation function by perceptron. The input pattern maps the input layer, output layer maps the output signal. There are several hidden layers between the input and output; which optimizes the input weight still the neural network error margin is least. The hidden layers exhibit the distinctive features of the input layer which are capable for the prediction of outputs. The input layer (X) is a multi-dimensional and it can be a vector x = (I1, I2, .., In). Input nodes are connected to a node in the next layer which is known as hidden layer. A node in the hidden layer takes a weighted sum of all its inputs:

Summed input :

∑w I i i

i

(8.21)

GONN for Breast Cancer Detection 165 For multi-layered perceptron during the time of training the models it need the optimization process which requires loss function where the mean squared error and cross-entropy are main types of loss functions for predicate probabilities where loss function are often referred to as objective function or cost function [18]. But also there is a minute difference between loss function and cost function is for single training example whereas cost function is used for the average loss for the entire training dataset. Here the loss function can take the input Y (target value) as in the form 0 or 1 (malignant or benign) and predicate probability p can be defined as:

 − log(1 − p) L = − y ∗ log( p) − (1 − y ) ∗ log(1 − p) =  − log( p), 

if y = 0 if y = 1.

(8.22)

8.3 Results and Discussion The present work has been divided into three segments: Firstly, the objective is to select the classifier which gives the highest performance which is judged on the metric of accuracy. The classifiers used for comparison are RF, SVC, DTC, K-NN and MLP. The classifier which outperforms the other classifiers on our dataset will be used for the deployment of the genetic algorithm. Secondly, since the dataset comprises of various redundancies and anomalies, we need to filter out the useful data from the dataset. The filtered data were given as input to the genetic algorithm and optimized. The approach used for the selection of feature is the linear regression which helps us to select the attributes which are highly correlated (in terms of features). Lastly, architecture of the genetically optimized neural network has been proposed will classify the tumor being malignant or benign. The total number of attributes are 10, each of the attributes consists of three parameters, and the linear regression were performed between all the parameters of attributes. Figure 8.4 shows the best maximum likelihood of the features. The highly likelihood features will be further used for the optimization of genetic algorithm. Further, classification techniques like SVC, DTC, KNN, RF, and MLP are adopted on WBCD dataset for the selection of best classifier, the dataset consists of 569 records out of which 232 samples are malignant and 357 are benign. To model the

166 Machine Vision Inspection Systems

180 160 140 120 100 80 60 40

Estimated coefficients: b_0 = 0.12443218746096268 b_1 = 0.00041143344634357126

y

y

Estimated coefficients: b_0 = –4.979586855004811 b_1= 6.853979195256334

7.5 10.012.515.017.5 20.0 22.525.0 27.5 x

(a) X-axis represent radius_mean Y-axis represent perimeter_mean

0.225 0.200 0.175 0.150 0.125 0.100 0.075 0.050

10

15

20

25 x

30

35

40

(b) X-axis represent texture_mean Y-axis represent smoothness_mean

Estimated coefficients: b_0 = –659.0957184880144 b_1= 14.287252716675862

Estimated coefficients: b_0 = 0.055347701863517125 b_1 = 7.481157045239215e-05

2500

0.35 0.30 0.25 0.20 0.15 0.10

2000 y

y

1500 1000 500

0.05

0 40

60

0.00

80 100 120 140 160 180 x

(c) X-axis represent perimeter_mean Y-axis represent area_mean

1000

1500 x

2000

2500

(d) X-axis represent area_mean Y-axis represent compactness_mean

Estimated coefficients: b_0 = –22.92663005819665 b_1= 156.14035610341462

Estimated coefficients: b_0 = 0.004409998490798171 b_1= 0.0021621177711736873

500

0.04

400

0.03

y

y

300

0.02 0.01

200

0.00

100 0 0.0

500

–0.01 0.5

1.0

1.5 x

2.0

(e) X-axis represent radius_se Y-axis represent area_se

2.5

3.0

1

2

x

3

4

5

(f) X-axis represent texture_se Y-axis represent smoothness_se

Figure 8.4 Linear regression graph plotted between two features.

(Continued)

classifier techniques, dataset were divided in the ratio of 70:30 of which 70% of data used for the purpose of training and 30% of data used for testing. To analysis the performance, each of the classification techniques were executed for ten iterations and averaged, the results are shown in Table 8.2. The outcomes revealed that the average performance of support vector classifier is 69.29, for decision tree classifier is 92.97, for random forest tree

GONN for Breast Cancer Detection 167 Estimated coefficients: b_0 = 0.015900220761185267 b_1 = 0.18219847366894462

Estimated coefficients: b_0 = 0.0069307039714221474 b_1 = 0.0016976038268260165 0.06 0.05 0.04

0.08 0.06

y

y

0.03 0.02 0.01 0.00 –0.01

0.02 0.00 0

5

10

x

15

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 x

20

(g) X-axis represent perimeter_se Y-axis represent concave point_se

(h) X-axis represent compactness_se Y-axis represent symmetry_se

Estimated coefficients: b_0 = 0.11572348174694058 b_1 = 0.0010231064039114774

Estimated coefficients: b_0 = 7.60991892824933 b_1 = 0.013222499549357242

0.225

40

0.200

35

0.175

30

0.150

25

y

y

0.04

0.125

20

0.100

15

0.075

10 10

15

20

x

25

30

35

(i) X-axis represent Radius_worst Y-axis represent smoothness_worst Estimated coefficients: b_0 = 0.038899508573996114 b_1 = 6.41792428270972

y

0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 –0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 x

(k) X-axis represent concave point_se Y-axis represent concave point_worst

500

1000

1500 x

2000

2500

(j) X-axis represent area_mean Y-axis represent radius_worst Estimated coefficients: b_0 = 0.11295052884121595 b_1 = 1.3381198436191333 0.7 0.6 0.5 y

0.050

0.4 0.3

0.2 0.075 0.100 0.125 0.150 0.175 0.200 0.225 x

(l) X-axis represent smoothness_worst Y-axis represent symmetry_worst

Figure 8.4 (Continued) Linear regression graph plotted between two features.

is 85.43, for K-nearest neighbor is 93.85 and for multilayer perceptron is 93.97. The results indicate that the multilayer perceptron is outperformed the other classifiers. The main objective of the research is to optimize the genetic algorithm for the detection of cancer tumors, hence for the future study the high likelihood features and multilayer perceptron is opted for optimization.

0

69.29

91.22

93.85

93.85

93.98

#Iteration

SVC

DTC

RFC

KNC

MLP

94.47

93.85

95.61

93.85

69.29

1

93.98

93.85

94.73

94.75

69.29

2

94.85

93.85

94.73

92.98

69.29

3

94.98

93.85

97.36

92.98

69.29

4

Table 8.2 Performance analysis of classification techniques.

93.35

93.85

93.85

91.22

69.29

5

93.1

93.85

93.85

93.85

69.29

6

94.47

93.85

94.73

91.22

69.29

7

94.22

93.85

93.85

92.1

69.29

8

93.35

93.85

95.61

92.98

69.29

9

93.97

93.85

85.43

92.97

69.29

Average

168 Machine Vision Inspection Systems

GONN for Breast Cancer Detection 169

8.4 Conclusion In this paper, linear regression is adopted on WBCD dataset for studying the maximum likelihood of parameters of the attributes. The classifiers algorithm like SVM, RFC, KNN, DTC and MLP were used to obtain the accuracy of the dataset. The performance analysis of the classifiers technique indicates that the MLP provides the better results for the WBCD dataset. For the future work, highly likelihood features and MLP will be adopted for the optimization of genetic algorithm.

References 1. Stender, J., Introduction to Genetic Algorithms, in: IEE Colloquium (Digest), IEE, London, UK, 1994, https://www.semanticscholar.org/paper/An. 2. Nemissi, M., Salah, H., Seridi, H., Breast cancer diagnosis using an enhanced Extreme Learning Machine based-Neural Network. 2018 Int. Conf. Signal, Image, Vis. their Appl., pp. 1–4, 1945. 3. Sethi, A., Analogizing of Evolutionary and Machine Learning Algorithms for Prognosis of Breast Cancer. 2018 7th Int. Conf. Reliab. Infocom Technol. Optim. Trends Futur. Dir., pp. 252–255, 2018. 4. Xue, B., Zhang, M., Browne, W.N., Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl. Soft Comput. J., 18, 261–276, 2014. 5. Belciug, S. and Gorunescu, F., A hybrid neural network/genetic algorithm applied to breast cancer detection and recurrence. Expert Systems, 30, 1, 00, 30, 1–12, 2012. 6. Aličković, E., Subasi, A., Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput. Applic., 28, 753−763, 2017, https:// dl.acm.org/doi/10.1007/s00521-015-2103-9 7. Al-Shargabi, B., Al-Shami, F., Alkhawaldeh, R.S., Enhancing Multi-Layer Perceptron for Breast Cancer Prediction. Int. J. Adv. Sci. Technol., 130, 11–20, 2019. 8. Garg, B., Optimizing Number of Inputs to Classify Breast Cancer Using Artificial Neural Network. J. Comput. Sci. Syst. Biol., 02, 247−254, 2009. 9. Agrawal, S. and Agrawal, J., Neural Network Techniques for Cancer Prediction: A Survey. Procedia—Procedia Comput. Sci., 60, 769–774, 2015. 10. Malmir, H., Farokhi, F., Sabbaghi-Nadooshan, R., Optimization of data mining with evolutionary algorithms for cloud computing application, in: Proceedings of the 3rd International Conference on Computer and Knowledge Engineering, ICCKE 2013, pp. 343–347, 2013. 11. Huang, M., Chen, C., Lin, W., Ke, S., Tsai, C., SVM and SVM Ensembles in Breast Cancer Prediction. PLoS ONE, 12, 1, 1–14, 2017.

170 Machine Vision Inspection Systems 12. Polat, K. and Güneş, S., Breast cancer diagnosis using least square support vector machine. Digit. Signal Process. A Rev. J., 12, 1, 17, 694–701, 2007. 13. Zheng, B., Yoon, S.W., Lam, S.S., Expert Systems with Applications Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst. Appl., 41, 1476–1482, 2014. 14. Breiman, L., Random Forests. Mach. Learn., 45, 5–32, 2001, https://doi. org/10.1023/A:1010933404324. 15. Science, C., A deep learning approach for cancer detection and relevant gene identification. Pac. Symp. Biocomput., 22, 219–229, 2017. 16. Medjahed, S.A., et al., Breast Cancer Diagnosis by using k-Nearest Neighbor with Different Breast Cancer Diagnosis by using k-Nearest Neighbor with Different Distances and Classification Rules. Int. J. Comput. Appl., 62, 1−5, 2013, https://research.ijcaonline.org/volume62/number1/pxc3884635.pdf 17. Lavanya, D. and Rani, D.K.U., Analysis of Feature Selection with Classification: Breast Cancer Datasets. Indian J. Comput. Sci. Eng., 2, 756–763, 2011. 18. Abdel-zaher, A.M. and Eldeib, A.M., Breast cancer classification using deep belief networks. Expert Syst. Appl., 46, 139–144, 2016.

9 Mexican Sign-Language Static-Alphabet Recognition Using 3D Affine Invariants Guadalupe Carmona-Arroyo1, Homero V. Rios-Figueroa1* and Martha Lorena Avendaño-Garrido2 1

Research Center in Artificial Intelligence, University of Veracruz, Xalapa, Mexico 2 Mathematics Department, University of Veracruz, Xalapa, Mexico

Abstract

Communication for hearing impaired people with all the society is very important. Since most members of this community use sign language it is extremely valuable to develop automatized traductors between this language and other spoken languages. This work reports the recognition of Mexican sign-language static-alphabet from 3D data acquired from leap motion and MS Kinect 1 sensors. The features extracted from the data are six 3D affine moment invariants. These features allow the recognition despite changes in position, pose and shape differences between user’s hands. The novelty of this research is the use of six 3D affine moments invariants for sign language recognition. The precision obtained in the experiments with the dataset from the leap motion sensor and using linear discriminant analysis was 94%. The precision obtained using data from MS Kinect 1 sensor was 95.6%. Keywords: Sign language, 3D affine invariants

9.1 Introduction According to the National Institute of Statistics and Geography of Mexico, in the last census of 2010, of the 112 million people living in the country, approximately 4.5 million have some type of disability. From these people, *Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (171–192) © 2021 Scrivener Publishing LLC

171

172 Machine Vision Inspection Systems 498,640 have a hearing limitation and 401,534 have a disability to talk or have communication problems. People having difficulty in hearing or talking face communication problems when trying to incorporate in economic activities in the society. The language that is most natural to learn and use by this group is the Mexican sign language (MSL). Unfortunately, MSL is rarely used outside this community, making harder the integration in the society. The alphabet of the MSL is shown in Figure 9.1. Each word can be signed letter by letter. In addition, MSL has special signs for the different semantic categories, such as nouns, verbs, adjectives, and adverbs. In this work we investigated the use of six 3D affine moment invariants as features to recognize the 21 letters of the static alphabet of the Mexican sign language. There are other works which recognize subsets of the MSL as well as the sign language of other countries (Table 9.1). However, no other works use 3D affine moment invariants and only 6 features to achieve a precision of 94% for leap motion data and 95.6% for MS Kinect 1 data. This paper is organized as follows. Section 9.2 describes the mathematical expressions of the six 3D affine moment invariants used in this work. Section 9.3 presents the three experiments in this work. Section 9.4 summarizes the results of the experiments using five metrics. Section 9.5 discusses the results. Finally, Section 9.6 provides the conclusion of the study and future work.

A

B

C

D

E

F

G

H

I

J

K

L

M

N

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

Figure 9.1 Mexican sign language alphabet. The letters (J, K, Ñ, Q, X, Z) that have a blue arrow are the dynamic. The other letters are static.

3 letters and 1 number

25 letters

10 phrases

Static alphabet

American

American

American

American

American

German

German

French

Mexican

[17]

[4]

[30]

[7]

[27]

[18]

[34]

[2]

[14]

20 words

Static alphabet

Static alphabet

Static alphabet

Static alphabet and 2 phrases

2,576 videos

American

[21]

Type

Language

Work

Table 9.1 Related work.

3D

3D

3D

3D

2D

3D

2D

3D

2D

3D

Dimension

Kinect RGB-D/ skeleton

Kinect RGB-D

Kinect RGB-D video

Kinect RGB-D

Grayscale images

Kinect RGB-D/ color

Edges Kinect

Leap Motion

Grayscale images

RGB-AVI images

Input

98.5

Data time Warping

(Continued)

76

94.6

97

97.5

90

89

92.8

94

–

Precision

Random Forest

Hidden Markov Model and Fuzzy Neural Network

Markov Networks

Wavelet NN

Random Forest

Naive Bayes and K-NN

Random Regression Forest

Cross-correlation Coefficient

No classifier evaluated

Classification algorithm

MSL and 3D Affine Invariants 173

Language

Mexican

Mexican

Mexican

Mexican

Mexican

Mexican

Work

[24]

[32]

[31]

[16]

[13]

Proposal

Static alphabet

7 letters

5 letters and 5 numbers

Static alphabet

Static alphabet

25 words and 23 letters

Type

Table 9.1 Related work. (Continued)

3D

3D

3D

2D

2D

3D

Dimension

MS Kinect 1

Leap Motion

Kinect RGB-D

Kinect RGB-D

RGB images

Grayscale Images

Leap Motion

Input

Linear Discriminant Analysis

Neural Networks

AdaBoost

Neural Networks

Neural Networks

Several proposed

Classification algorithm

95.6

94

76.1

95.8

93

93

97.2

Precision

174 Machine Vision Inspection Systems

MSL and 3D Affine Invariants 175

9.2 Pattern Recognition To automatically recognize objects from data acquired through sensors we need to extract relevant features. A vector of features is a pattern. To recognized patterns, we need appropriate classification techniques (Figure 9.2).

9.2.1 3D Affine Invariants Xu and Li presented a general procedure to obtain 3D affine moment invariants based on geometric considerations [36]. In their paper, they explicitly showed the expression of the first six 3D affine moment invariants (I1… I6). f(x,y,z) represents a function from R3 to R. mpqr and μpqr represent respectively the general moments and the central moments. ny

nx

m pqr = µ pqr =

∑∑∑ x y z f (x , y , z ). p q r i j k

i =1

nx

ny

nz

i

k

(9.1)

nz

∑∑∑(x − x ) (y − y ) (z i

i =1

j

j=1 k =1

c

p

j

c

q

k

− z c )r f ( x i , y j , z k ). (9.2)

j=1 k =1

Pattern Recognition Sensor Data Feature extraction Features Classification Decision

Figure 9.2 In pattern recognition, data is acquired through sensors. To learn and recognize some patterns in the data, relevant features are extracted. If the patterns can be organized into groups, we need classification techniques to automatically group them [3, 8].

176 Machine Vision Inspection Systems m100 m m , y c = 010 , z c = 001 . m000 m000 m000

Where x c =

I1 =

I2 =

1 (µ 400 + µ 040 + µ 004 + 2µ 220 + 2µ 202 + 2µ 022 ). /3 µ 7000

(9.3)

(

1 2 µ 400µ 040 + µ 400µ 004 + µ 004µ 040 + 3µ 2220 + 3µ 2202 + 3µ 022 − 4µ103µ 301 − 4µ130µ 310 /3 µ14 000 − 4µ 013µ 031 + 2µ 022µ 202 + 2µ 022µ 220 + 2µ 220µ 202 + 2µ 022µ 400 + 2µ 004µ 220

+ 2µ 040µ 202 − 4µ103µ121 − 4µ130µ112 − 4µ 013µ 211 − 4µ121µ 301 − 4µ112µ 310

)

2 2 −4µ 211µ 031 + 4µ 2211 + 4µ112 + 4µ121 .

I3 =

1 µ

14 /3 000

(µ

2 400

(9.4)

2 2 2 2 2 2 2 + µ 040 + µ 004 + 4µ130 + 4µ103 + 4µ 0213 + 4µ 031 + 4µ 310 + 4µ 301 + 6µ 2220

2 2 2 + 6µ 2202 + 6µ 022 + 12µ112 + 12µ121 + 12µ 2211 ) . (9.5)

I4 =

I5 =

(

)

1 2 2 2 2 2 2 2 2 µ 300 + µ 030 + µ 003 + 3µ120 + 3µ102 + 3µ 012 + 3µ 2210 + 3µ 021 + 3µ 2201 + 6µ111 . 4 µ 000

(9.6)

(

1 2 2 2 2 2 2 2 µ 300 + µ 030 + µ 003 + µ120 + µ 012 + µ102 + µ 2210 + µ 021 + µ 2201 + 2µ 300µ120 4 µ 000 + 2µ 300µ102 + 2µ120µ102 + 2µ 003µ 201 + 2µ 003µ 021 + 2µ 021µ 201 + 2µ 030µ 012 + 2µ 030µ 210 + 2µ 012µ 210 ) .

(9.7)

MSL and 3D Affine Invariants 177 I6 =

1 [µ 200 (µ 400 + µ 220 + µ 202 ) + µ 020 (µ 220 + µ 040 + µ 022 ) + µ 022 (µ 202 + µ 022 + µ 004 ) 4 µ 000 + 2µ110 (µ 310 + µ130 + µ112 ) + 2µ101 (µ 301 + µ121 + µ103 ) + 2µ 001 (µ 211 + µ 031 + µ 013 )]. (9.8)

9.3 Experiments To test the discriminatory capability of six 3D affine invariants to recognize the 21 letters of the Mexican sign-language static-alphabet, we followed a pattern recognition process. This process consists in three main steps (Figures 9.2, 9.3). 1. Data acquisition and processing. 2. Feature extraction. 3. Classification. These steps are described in greater detail below. The pattern recognition process was applied in three experiments using two different datasets.

R Discriminant Analysis

Leap Motion

Support Vector Machines

Data Data pre-processing Sensor

Naive Bayes

Python Feature extraction

Classification

Affine invariants

Figure 9.3 For one data set, acquisition is performed using the leap motion sensor [23]. For a second data set, MS Kinect sensor was used. Data pre-processing consists in data augmentation applying 3D transformations. Feature extracted from the 3D point cloud were 3D affine invariants for both datasets. For leap motion dataset in addition 3D angles between joints were used as features. The processing was programmed using Phyton language. Three classification methods were tested: Discriminant Analysis, Support Vector Machine and Naïve Bayes. Classification processing was done using R language.

178 Machine Vision Inspection Systems Description of the Datasets: • Dataset 1. This dataset was acquired for this work with 8 participants and using the leap motion sensor [37]. Each of the 21 static alphabet letters was captured once for each participant. To each captured letter four additional transformation were applied. This gives 8 × 5 = 40 3D point clouds per alphabet letter. Since there are a total of 21 letters this gives 40 × 21 = 840 3D point clouds for the whole dataset. Each 3D point cloud is composed of 22 3D points which represent the 3D coordinates of the intersection of the phalanges of a human hand in a configuration of the sign language. Please see Section 9.3.1 for the description of the participants. Section 9.3.2 complements the setting for data acquisition. Section 9.3.3 describes the data augmentation procedure. • Dataset 2. This dataset was acquired in other related work using the MS Kinect 1 sensor [33, 38]. The first author of these works shared with us this dataset. Each of the 21 static alphabet letters was captured once for each of the 9 participants in that study. To each captured letter four additional transformation were applied. This gives 9 × 5 = 45 3D point clouds per alphabet letter. Since there are a total of 21 letters this gives 45 × 21 = 945 3D point clouds for the whole dataset. In the case of the MS Kinect each of the 3D point cloud representing the surface of a hand is composed of thousands of 3D points. This contrasts with the 22 3D landmarks points generated by the leap motion sensor. Description of the Experiments • Experiment 1. Six 3D affine invariants are used as features for classification with dataset 1. Metrics of performance are obtained. • Experiment 2. Six 3D affine invariants and 19 angles between the hand phalanges are used as features for classification. Metrics of performance are obtained. • Experiment 3. Six 3D affine invariants are used as features for classification with dataset 2. Metrics of performance are obtained.

MSL and 3D Affine Invariants 179

9.3.1 Participants For our study and to acquire dataset 1, eight persons participated with ages between 25 and 53. The gender and the age of each participant can be seen in Table 9.2. In all cases, the hand used to perform the alphabet letter of the MSL was the right hand. All participants were informed in advance of the purpose of the research and all provided informed consent. All people participated in a voluntary way.

9.3.2 Data Acquisition Figure 9.4 shows the setup for data acquisition using the leap motion sensor. The hand used to perform the sign language is in front of the sensor. The figure also shows the sensor connected to a laptop through a usb port and the graphical user interface on the monitor. The sensor grabs 3D data continuously. When the user is ready, we press a button to grab and save a data record which contains a 3D data cloud with 22 3D coordinates of the intersection of the hand phalanges.

9.3.3 Data Augmentation Data augmentation is a common practice in machine learning to increase the number and variability of a dataset [1]. In our case, from each 3D data Table 9.2 People data for the participants to acquire dataset 1 using leap motion sensor. Person

Age

Gender

1

36

Female

2

53

Male

3

26

Male

4

25

Male

5

38

Female

6

28

Male

7

27

Male

8

31

Male

180 Machine Vision Inspection Systems

Figure 9.4 Setup for data acquisition using the leap motion controller. The sensor was placed on top of a box facing the user to allow a more natural interface and as the user performs sign language in reality. This is different to applications in which this sensor is usually placed on top of a desk or table.

cloud acquired through the sensor (leap motion or MS Kinect 1), four new synthetic 3D data clouds were generated through a random rigid transformation with parameters in the interval shown in Table 9.3. Table 9.3 Random geometric transformations applied to the 3D points representing a hand. For example, hand 5 represent the original data set acquired through the sensor. Hand 1 represent a new hand generated by applying a uniform scaling choosing a number randomly in the interval [2, 4.5]. Variations

Translation

Rotation

Uniform scale

Hand 1

–

–

s ∈ [2, 4.5]

Hand 2

–

 7π 7π  α ∈ − ,   16 16 

–

Hand 3

t ∈ [0, 50]

–

s ∈ [1, 2.5]

Hand 4

t ∈ [0, 50]

 π π α ∈ − ,   4 4

s ∈ [1, 2.5]

MSL and 3D Affine Invariants 181

9.3.4 Feature Extraction The features shared by all the experiments and dataset were the six 3D affine invariants (See Section 9.2.1). For experiment 2, in addition, 19 angles between the phalanges were computed as follows. If u = (x1, y1, z1) and v = (x2, y2, z2) are two vectors. The angle between them is:

 u⋅v  x1x 2 + y 1 y 2 + z1z 2 . α = arccos  =   |u||v|  x12 + y 12 + z12 x 22 + y 22 + z 22

(9.9)

9.3.5 Classification For each of the three experiments the corresponding features were tested for performance. In each experiment three classifier were tested: Linear Discriminant Analysis, Support Vector Machine and Naïve Bayes [5]. Each dataset was divided using a four-fold scheme. Therefore 4 random partition were generated in each case. In each random partition, ¾ of the data was used for training and ¼ for test. If TP represent true positive, TN is true negative, FP is false positive, and FN is false negative, then the following metrics were used to assess the results of the experiments [6, 12]:

TP + TN . TP + TN + FP + FN

(9.10)

Sensibility =

TP . TP + FN

(9.11)

Specificity =

TN . TN + FP

(9.12)

Precision =

TP . TP + FP

(9.13)

Precision ∗ Sensibility . Precision + Sensibillity

(9.14)

Accuracy =

F1 Score = 2

182 Machine Vision Inspection Systems

9.4 Results 9.4.1 Experiment 1 Figures 9.5–9.10 show the results of experiment 1 using the six 3D affine invariants using dataset 1. This is using linear discriminant analysis, support vector machine and naïve Bayes as classifiers.

1.00 0 Accuracy A Sensibility Specificity Precision F1 Score

0 0.90 0 0.80 0 0.70

A

B

C

D

E

F

G

H

I

L

M

N

O

P

R

S

T

U

V

W

Y

Figure 9.5 The metrics values are shown for each of the 21 letters of static alphabet of the Mexican sign language for experiment 1 with dataset 1. Linear discriminant analysis is used as classifier.

1.00 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.90 Accuracy

Sensibility

Specificity

Precision

F1 Score

Figure 9.6 This plot shows the mean value of the metrics over the 21 letters of the static alphabet for experiment 1 with dataset 1 using linear discriminant analysis as classifier.

MSL and 3D Affine Invariants 183 1.00 .00 Accurac Accuracy Sensibility Sensibi Specificity Specific Precision Precisio Score F1 Scor

0.90 .90 0.80 .80 0.70 .70

A

B

C

D

E

F

G

H

I

L

M

N

O

P

R

S

T

U

V

W

Y

Figure 9.7 The metrics values are shown for each of the 21 letters of static alphabet of the Mexican sign language for experiment 1 with dataset 1. Support vector machine is used as classifier.

1.00 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.90 Accuracy

S ibilit Sensibility

S ifi it Specificity

P ii Precision

F1 SScore

Figure 9.8 This plot shows the mean value of the metrics over the 21 letters of the static alphabet for experiment 1 with dataset 1 using support vector machine as classifier.

1.00 A Accuracy SSensibility SSpecificity Precision P FF1 Score

0.90 0.80 0.70 0.60

A

B

C

D

E

F

G

H

I

L

M

N

O

P

R

S

T

U

V

W

Y

Figure 9.9 The metrics values are shown for each of the 21 letters of static alphabet of the Mexican sign language for experiment 1 with dataset 1. Naïve Bayes is used as classifier.

184 Machine Vision Inspection Systems 1.00 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.90 Accuracy

Sensibility

Specificity

Precision

F1 Score

Figure 9.10 This plot shows the mean value of the metrics over the 21 letters of the static alphabet for experiment 1 with dataset 1 using naïve Bayes as classifier.

9.4.2 Experiment 2 Figures 9.11–9.16 show the results of experiment 2 using the six 3D affine invariants and 19 angles using dataset 1. This is using linear discriminant analysis, support vector machine and naïve Bayes as classifiers.

9.4.3 Experiment 3 Figures 9.17–9.22 show the results of experiment 3 using the six 3D affine using dataset 2. This is using linear discriminant analysis, support vector machine and naïve Bayes as classifiers.

1.00 0 A Accuracy SSensibility SSpecificity Precision P FF1 Score

0 0.90

0 0.80

0 0.70

A

B

C

D

E

F

G

H

I

L

M

N

O

P

R

S

T

U

V

W

Y

Figure 9.11 The metrics values are shown for each of the 21 letters of static alphabet of the Mexican sign language for experiment 2 with dataset 1. Linear discriminant analysis is used as classifier.

MSL and 3D Affine Invariants 185 0.99 0.97 0.95 0.93 0.91 0.89 0.87 0.85 Accuracy

SSensibility ibilit

S ifi it Specificity

P ii Precision

F1 SScore

Figure 9.12 This plot shows the mean value of the metrics over the 21 letters of the static alphabet for experiment 2 with dataset 1 using linear discriminant as classifier. 1.00 0.90

Accuracy Sensibility Sensibili Specificity Specifici Precision F1 Score

0.80 0.70 0.60

A

B

C

D

E

F

G

H

I

L

M

N

O

P

R

S

T

U

V

W

Y

Figure 9.13 The metrics values are shown for each of the 21 letters of static alphabet of the Mexican sign language for experiment 2 with dataset 1. Support vector machine is used as classifier. 0.99 0.97 0.95 0.93 0.91 0.89 0.87 0.85 Accuracy

Sensibility

Specificity

Precision

F1 Score

Figure 9.14 This plot shows the mean value of the metrics over the 21 letters of the static alphabet for experiment 2 with dataset 1 using support vector machine as classifier.

186 Machine Vision Inspection Systems 1.00 0.90

Accuracy Sensibility Specificity Precision F1 Score

0.80 0.70 0.60

A

B

C

D

E

F

G

H

I

L

M

N

O

P

R

S

T

U

V

W

Y

Figure 9.15 The metrics values are shown for each of the 21 letters of static alphabet of the Mexican sign language for experiment 2 with dataset 1. Naïve Bayes is used as classifier.

0.99 0.97 0.95 0.93 0.91 0.89 0.87 0.85 Accuracy

Sensibility

Precision

Specificity

F1 Score

Figure 9.16 This plot shows the mean value of the metrics over the 21 letters of the static alphabet for experiment 2 with dataset 1 using Naïve Bayes as classifier.

1.00 Accurac Accuracy Sensibil Sensibility Specificity Specific p Precision Precisio F1 Score

0.90 0.80 0.70

A

B

C

D

E

F

G

H

I

L

M

N

O

P

R

S

T

U

V

W

Y

Figure 9.17 The metrics values are shown for each of the 21 letters of static alphabet of the Mexican sign language for experiment 3 with dataset 2. Linear discriminant analysis is used as classifier.

MSL and 3D Affine Invariants 187 1.00 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.90 Accuracy

Sensibility

Specificity

Precision

F1 Score

Figure 9.18 This plot shows the mean value of the metrics over the 21 letters of the static alphabet for experiment 3 with dataset 2 using linear discriminant as classifier. 1.00 Accuracyy Sensibility Sensibilit Specificity Specificit p Precision F1 Score

0.90 0.80 0.70

A

B

C

D

E

F

G

H

I

L

M

N

O

P

R

S

T

U

V

W

Y

Figure 9.19 The metrics values are shown for each of the 21 letters of static alphabet of the Mexican sign language for experiment 3 with dataset 2. Support vector machine is used as classifier. 1.00 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.90 Accuracy

Sensibility

Specificity

Precision

F1 Score

Figure 9.20 This plot shows the mean value of the metrics over the 21 letters of the static alphabet for experiment 3 with dataset 2 using support vector machine as classifier.

188 Machine Vision Inspection Systems 1.00 Accuracyy Sensibility Sensibili Specifici p Specificity Precision F1 Score

0.90 0.80 0.70

A

B

C

D

E

F

G

H

I

L

M

N

O

P

R

S

T

U

V

W

Y

Figure 9.21 The metrics values are shown for each of the 21 letters of static alphabet of the Mexican sign language for experiment 3 with dataset 2. Naïve Bayes is used as classifier. 1.00 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.90 Accuracy

Sensibility

Specificity

Precision

F1 Score

Figure 9.22 This plot shows the mean value of the metrics over the 21 letters of the static alphabet for experiment 3 with dataset 2 using Naïve Bayes as classifier.

9.5 Discussion According to the results of the metrics we will discuss relevant points for each experiment. Experiment 1. This experiment used six 3D affine moment invariants with dataset 1. Linear discriminant analysis classifier got metrics between 0.94 and 0.97. The worst value was for letter S with 0.73 of precision. Support vector machine got metrics between 0.95 and 0.97. The worst value was for letter S and V with 0.8 of sensitivity. Naïve Bayes got values between 0.91 and 0.96. The worst value was for letter S with 0.67 of precision. Overall,

MSL and 3D Affine Invariants 189 for this experiment the best results were with support vector machine classifier. Experiment 2. This experiment used 6 affine moment invariants and 19 angles between the phalanges with dataset 1. Linear discriminant analysis classifier got metrics between 0.90 and 0.95. The worst value was for letter S with 0.75 of precision. Support vector machine got metrics between 0.87 and 0.93. The worst value was for letter S with 0.64 of precision. Naïve Bayes got values between 0.90 and 0.95. The worst value was for letter P with 0.7 of sensitivity. Overall, for this experiment the best results were with the linear discriminant classifier. Experiment 3. This experiment used 6 3D affine moment invariants with dataset 2. Linear discriminant analysis classifier got all metrics values very close to 0.96. The worst value was for letter S and H with 0.86 sensitivity. Support vector machine got metrics between 0.94 and 0.96. The worst value was for letter V with 0.8 of sensitivity. Naïve Bayes got values between 0.93 and 0.95. The worst value was for letter S with 0.76 of precision. Overall, for this experiment the best results were with linear discriminant classifier. Comparing the results of experiments 1 and 2 with dataset 1, the six 3D affine moment invariants performed better alone than with the added 19 angles. In this case support vector machine gave better results. Comparing experiments 1 and 3 which used different datasets we see that the six 3D affine moment invariants obtained the best results with the dataset obtained with the MS Kinect 1 sensor and with linear discriminant analysis. This possibly was due that dataset 2 contained much more 3D points to represent each hand.

9.6 Conclusion According to the experiments, the results show that the six 3D affine moment invariants have a high percentage of precision (94%) on the 21 letters of the static alphabet of the Mexican sign language using leap motion data and 95.6% using MS Kinect 1 data. In addition, the data acquisition did not require a controlled environment. Future work will investigate the use of more 3D affine moment invariants to test if that improves the metrics.

190 Machine Vision Inspection Systems

Acknowledgments The first author acknowledges the support of the National Council of Science and Technology of Mexico (CONACYT) for a scholarship during her graduate studies. The authors thank Dr. Candy Obdulia Sosa-Jimenez for providing Dataset 2 additional experiments [33, 38].

References 1. Ben-Hur, A. et al., Support vector clustering. J. Mach. Learn. Res., 2, 125–137, 2001. 2. Ben Jmaa, A. et al., A new approach for hand gestures recognition based on depth map captured by rgb-d camera. Comput. y Sist., 20, 4, 709–721, 2016. 3. Bishop, C.M., Pattern recognition and machine learning, New York, Springer, 2006. 4. Canavan, S. et al., Hand gesture recognition using a skeleton-based feature representation with a random regression forest. Image Processing (ICIP), 2017 IEEE International Conference on. IEEE, pp. 2364–2368, 2017. 5. Cortes, C. and Vapnik, V., Support-vector networks. Mach. Learn., 20, 3, 273–297, 1995. 6. Dodge, Y., The Oxford dictionary of statistical terms, Oxford, Oxford University Press, 2006. 7. Dong, C., Leu, M.C., Yin, Z., American sign language alphabet recognition using Microsoft kinect. Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 44–52, 2015. 8. Duda, R.O., Hart, P.E., Stork, D.G., Pattern classification, New York, John Wiley & Sons, 2012. 9. Fisher, R.A., The use of multiple measurements in taxonomic problems. Ann. Eugen., 7, 2, 179–188, 1936. 10. Flusser, J. and Suk, T., Pattern recognition by affine moment invariants. Pattern Recognit., 26, 1, 167–174, 1993. 11. Flusser, J., Zitova, B., Suk, T., Moments and moment invariants in pattern recognition, New York, John Wiley & Sons, 2009. 12. Friedman, M., The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc., 32, 200, 675–701, 1937. 13. Galicia, R. et al., Mexican sign language recognition using movement sensor. Industrial Electronics (ISIE), 2015 IEEE 24th International Symposium, pp. 573–578, 2015. 14. García-Bautista, G. et al., Mexican sign language recognition using kinect and data time warping algorithm. Electronics, Communications and Computers (CONIELECOMP), International Conference on. IEEE, pp. 1–5, 2017.

MSL and 3D Affine Invariants 191 15. Gibson, J.J., The perception of the visual world, Boston, Houghton Mifflin, 1950. 16. Jimenez, J. et al., Mexican Sign Language Alphanumerical Gestures Recognition using 3D Haar-like Features. IEEE Lat. Am. Trans., 15, 10, 2000–2005, 2017. 17. Joshi, A., Sierra, H., Arzuaga, E., American sign language translation using edge detection and cross correlation. Communications and Computing (COLCOM), 2017 IEEE Colombian Conference on. IEEE, pp. 1–6, 2017. 18. Lang, S., Block, M., Rojas, R., Sign language recognition using Kinect. International Conference on Artificial Intelligence and Soft Computing, Springer, pp. 394–402, 2012. 19. Luis-Pérez, F.E., Trujillo-Romero, F., Martínez-Velazco, W., Control of a service robot using the Mexican sign language. Mexican International Conference on Artificial Intelligence, Springer, pp. 419–430, 2011. 20. Maron, M.E., Automatic indexing: an experimental inquiry. J. ACM, 8, 3, 404–417, 1961. 21. Martínez, A.M. et al., Purdue RVL-SLLL ASL database for automatic recognition of American Sign Language. Multimodal Interfaces, Proceedings. Fourth IEEE International Conference on. IEEE, pp. 167–172, 2002. 22. McLachlan, G., Discriminant analysis and statistical pattern recognition, New York, John Wiley & Sons, 2004. 23. Leap Motion, Leap Motion, Inc., San Francisco, CA, https://www.leapmotion.com/, 2010. 24. Romero-Nájera, L.O. et al., Recognition of Mexican Sign Language through the Leap Motion Controller. Proceedings of the International Conference on Scientific Computing (CSC). The Steering Committee of The World Congress in Computer Science, Computer Engineering y Applied Computing (WorldComp), p. 147, 2016. 25. Powers, D.M.W., Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation, Technical Report SIE-07-001, School of Informatics and Engineering, Flinders University, Adelaide, Australia, 2007. 26. Priego-Perez, F., Recognition of images of Mexican sign language, Master Thesis, Instituto Politécnico Nacional, Mexico, 2012. 27. Rashed, J.R. and Hasan, H.A., New method for hand gesture recognition using wavelet neural network. J. Eng. Sustain. Dev., 21, 1, 65–73, 2017. 28. Rennie, J. et al., Tackling the poor assumptions of Naive Bayes classifiers. ICML, 2003. 29. Sadjadi, F.A. and Hall, E.L., Three-dimensional moment invariants. IEEE Trans. Pattern Anal. Mach. Intell., 2, 2, 127–136, March 1980. 30. Saha, H.N. et al., A Machine Learning Based Approach for Hand Gesture recognition using Distinctive Feature Extraction. Computing and Communication Workshop and Conference (CCWC), 2018 IEEE 8th Annual. IEEE, pp. 91–98, 2018.

192 Machine Vision Inspection Systems 31. Solís, F., Martínez, D., Espinoza, O., Automatic Mexican sign language recognition using normalized moments and artificial neural networks. Engineering, 8, 10, 733–740, 2016. 32. Solís-V., J.F. et al., Mexican sign language recognition using normalized moments and artificial neural networks. Optics and Photonics for Information Processing VIII, vol. 9216, International Society for Optics y Photonics, 2014, 92161A. 33. Sosa-Jiménez, C.O. et al., Real-time Mexican Sign Language recognition. 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC). IEEE, pp. 1–6, 2017. 34. Wu, H., Wang, J., Zhang, X., Combining hidden Markov model and fuzzy neural network for continuous recognition of complex dynamic gestures. Vis. Comput., 33, 10, 1265–1278, 2017. 35. Xu, D. and Li, H., 3-D affine moment invariants generated by geometric primitives. Pattern Recognition, ICPR 2006. 18th International Conference on. IEEE, vol. 2, pp. 544–547, 2006. 36. Xu, D. and Li, H., Geometric moment invariants. Pattern Recognit., 41, 1, 240–249, 2008. 37. Carmona-Arroyo, G., Affine invariants in third dimension for the recognition of the static alphabet of the Mexican sign language, Master Thesis in Artificial Intelligence, University of Veracruz, Mexico, 2019. 38. Sosa-Jimenez, C.O., Mexican sign language recognition in a general medical consultation, Ph.D. Thesis in Artificial Intelligence, University of Veracruz, Mexico, 2019.

10 Performance of Stepped Bar PlateCoated Nanolayer of a Box Solar Cooker Control Based on Adaptive Tree Traversal Energy and OSELM S. Shanmugan1*, F.A. Essa2, J. Nagaraj3 and Shilpa Itnal3 Research Centre for Solar Energy, Department of Physics, Koneru Lakshmaiah Education Foundation, Green Fields, Guntur District, Vaddeswaram, India 2 Mechanical Engineering Department, Faculty of Engineering, Kafrelsheikh University, Kafrelsheikh, Egypt 3 Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Green Fields, Guntur District, Vaddeswaram, India 1

Abstract

The human interference methods of traditional are exceeding trust for thermal applications and the environment cannot adapt to the variable source. Temporarily, the approaches of the adaptive neural network-based controls were encountered through quandaries of homegrown least, measured conjunction, gigantic period feeding, below performance, and challenged to overcome of the solar cooker. The novel solar cooker has been discussed and based adaptive control through an online Sequential Extreme Learning Machine (OSELM). The human experienced an erstwhile in distinction or required phase is an off-line training process. The solar cooker has wild physical activity haste cheers in arbitrarily produced with all parameters of the bar plate nanolayers. It is used as a qualm by the machine learning smart an online method of the design. Harshly, new way can be closedloop publicized permanency. A scheme in feasibility to authenticate has been studied in assessment to extensive cases. From furious SiO2/TiO2 nanoparticles of the Stepped solar bar plate cooker (SSBC) efficiency were increased by 37.69% and 49.21% using 05% and 10%. It is higher as per equated to that of SSBC with analysis of SiO2, TiO2, without nanoparticles for the systems. *Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (193–218) © 2021 Scrivener Publishing LLC

193

194 Machine Vision Inspection Systems Keywords: Solar cooker, nanomaterials, OSELM, stepped bar plate, adaptive tree traversal

10.1 Introduction In a typical solar cooker, heat is trapped inside an enclosure where the internal air temperature may reach nearly 200 °C that would be sufficient to cook or bake foodstuffs. An industrial process monitoring for actual bids are fiscal information investigation, elements prediction, and buyer performance estimates. Perez et al. [1] have studied an online learning algorithm for flexible topologies and implementations a nervous grids organization. It is the samples to reach continuously in the method for statistics brook. Qiao et al. [2] developed the non-line self-adaptive modular neural network and time series control ways. It is used the underlying distribution or a movement with the data vicissitudes concluded time. Lughofer [3] studied an online learning algorithm and selected more appropriate assortment done group efficient to obtain novel information for altering designs hip developing information brook by Zhang et al. [4] and Zhou et al. [5]. In essence, solar cookers can be either direct type in which sunlight reaches the cooking pot to transfer thermal energy directly or indirect type in which thermal power is provided using means of a solar collector and supplied to the cooking unit indirectly [6]. Direct types can be classified into solar panel cookers [7], box-type cookers [8], and concentrating cookers [9]. Numerous different types of design by box cooker are studied and developed to improve efficiency and broaden its applicability. Schwarzer et al. [10] showed the fundamental characteristics, design principle and testing standards for a simple solar cooker. They reported that several criteria concerning safety, portability, stability, endurance, robustness, and user-friendliness are important to consider for this technology. In another study, Lokeswaran et al. [11] presented a solar cooker coupled with a parabolic dish and a porous medium using scrap material. Conducting several tests indicated that the implementation of a porous medium increases the operating temperature, water temperature and optical efficiency compared to the cooker with the plain receiver. Atul et al. [12] improved on solar cooker performances with obtainable HCP. It is conversed with middle temperature for system measure and reported in a comparison of TPP and COR allows act assessment of the solar box cookers. Cuce [13] studied a cylindrical solar cooker and has used microporous absorbers plates. It is achieved through the predictable

SSBC Based on OSELM 195 absorber infection for 110 °C and was 134.1 °C for trilateral absorbency. They are followed by the systems for energy, exergy efficiency is 34.6–21.2 and 22.6–14.6% for trapezoidal porosity. Cuce [14] has developed in solar box cookers and has implemented as thermal energy storage by use of Bayburt stone. It provided stability and efficiency in constant cooking, which is achieved for a range of about 35.3–21.7%, and the conventional cooker is 27.6–16.9%. Its overall efficiency is 21.2–14.1 and 18.0–11.6% with a standard one. Guidara et al. [15] produced by the solar box cooker with outer reflectors and have implemented through the numerical study and experimental studies. It is achieved by the thermal performance and was increased by the high absorber plate temperature, which used cooking materials of the roasting less hours at inactivity assessment. The analysis of an accurate prototype gave correct outcomes and compared to theory and experiments as well. Negi and Purohit [16] studied an experimental analysis of a cooker and employed a non-tracking concentrator. It concluded that the concentrator solar cooker runs an inaction infection about 15–22 °C, which is higher than the conservative cooker by a promoter looking glass. The boiling point of water is achieved quicker at 50–55 min and is an increase in heat collection in non-tracking reflectors by the cooker. Palanikumar et al. [17] have developed in a solar box cooker and studied a thermal image, which analyzed for time series considered by Fourier transform. It used the solar cooker characteristic of image reconstruction in order of deficiency, which is used in algorithmic program and forms thermal image fusion. Isabel et al. [18] coated gel-dipping with polysaccharides and prepared of TiO2 photocatalytic materials. It used all good performance parameters for photocatalytic movement of the concluding covered gutters described then conversed. All coated properties were analyzed for the photocatalytic action of the coatings. Liu et al. [19, 20] coated with the use of TiO2/WO3 a magnetic nanoparticle and with the absorption of visible light (solar light). The catalyst quickly mineralizes sixteen and ten dyes under the focus of sunlight and concludes good reusability of recycling with a magnet. Abd-Elhady et al. [21] studied thermal act solar cooker with evacuated tubes. It is influenced by adding nanographene particles and a serious quantity of wires is introduced confidentially with an evacuated tube which is increased in a natural invention like cookers. With the survey of researchers, it recognized the limits produced by the solar thermal applications and one is of the solar cookers which first introduced the use of furious SiO2/TiO2 nanoparticle. The current studies aim

196 Machine Vision Inspection Systems to enhance the thermal performance analyzed by the system. It is improved with SSBC coated with a bar plate that is used in doping SiO2/TiO2 nanoparticle at a different ratio about 05 to 10%. The novel solar cooker is discussed which is based on adaptive control through an OS-ELM (online Sequential Extreme Learning Machine).

10.2 Experimental Materials and Methodology 10.2.1 Furious SiO2/TiO2 Nanoparticle Analysis of SSBC Performance Methods The SSBC analysis of the new act’s experimental method and schematic diagram was exposed in Figures 10.1 & 10.2 by Nahar [22]. The solar cooker utilization of the area is 100 cm and the length × breadth × heights of the dimensions with absorption parts have been made of 100 cm ×100 cm × frond wall is 25 cm and height of back wall is 30 cm. It is made in stepped absorber bar plates total 16 for inner side fixed of the design with left side for 8 × right side of 8. It is made of the absorber bar plate in the copper sheet and the inner stepped plate is made with a mild steel

Figure 10.1 Experimental process views of a SSBC.

SSBC Based on OSELM 197 Glass Cover -4mm

Is

Outer wall

Solar

tion

Radia

SUN Tg

Inner wall Water

30 cm

TC-in & output

Cooker

Bar Plate

Tbp

Coated TiO2/SiO2

Tf Tm

25 cm

Ts - bp

Insulation

Temperature Indicator TENMARS TM-206 Solar Power Meter Tester

Laptop

100cm x 100cm = l x b

Figure 10.2 Schematic diagrams for SSBC.

sheet with an analysis of experimental steps. The transferring glass cover is 4 mm thick. The SSBC has been fabricated in same dimensions with different studies of the designs as followed in Verdugo [23], which uses nanoparticles as that of furious SiO2, TiO2, mixture furious SiO2/TiO2 nanoparticle and without nanoparticle (conventional solar cooker). It has been studied and compared to the nanoparticles with effect of different solar cooking performances. The experimental work started at 10.00 am to 2.00 pm and continued variations in parameters like temperatures of the stepped plate, bar plate, cooker, food stuffiness, moist internal air and glass cover, which is measured at 30 min intermission. The experimental analysis by solar cookers have been started from the location of KLEF at Vijayawada, Andhra Pradesh, and reduced by the duration of the cooking materials like as milk, water cook time of 45 min with a mass of 1 kg. The experimental values are measured by the solar radiations in the equipment of the TENMARS TM-206 solar power meter tester, which is handled by the value of interface in a laptop to identifications of the effect in solar box cookers. The standard solar power meter measures all the mechanisms of the different box cookers that are used in temperature indicator for 6-channel and it is measured by the RTD-PT-100 type with sensor absorption of the thermocouple wire which range about 0–800 °C and ±0.1 °C correctness for the data individually in the systems.

198 Machine Vision Inspection Systems

10.2.2 Introduction for OSELM by Use of Solar Cooker The general approach of extreme learning machine (ELM) is anticipated in Liang et al. [24−26], which are utilized in more fields. The solar cooker has been used in the materials (rice, water, milk, vegetables, etc.) for weights and partialities of the concealed coating for initialization in ELM. The least-squares process was developed by the solar cooker with the use of parameter values that are considered suitable to the heaviness of the production coatings. The heat transfer speed is higher than usual use in traditional neural networks. This method avoids convergence into local minima [27]. The samples are used in S arbitrary separate physical activity for ( Xi ,Ti ) Rn × Rm .X j was the n × 1 vector for involvement and Ti is m × 1 vector for goal. The active function is g(.) and the hidden layer is L, which is used in selected by additive stimulation or Radial Basis Function (RBF) activation or both. The single-hidden layer feedforward neural network (SLFN) is approximately given for S working out of the solar cooker as zero error with exist for cooking pots that are ωi, mi, and θi,

E=

∑

l

θi g (ω i , mi , Xi )

i =1

(10.1)

where cooking materials (weights) are ωi, mi then it is followed in biases of a hidden layer. Output based on cooking materials (weights) vector is θ. It is concise by way of

T = Φθ

(10.2)

where Φ, θ is expressed for T

θ = Φ⊺T

(10.3)

Pseudo-inverse based on back point is Φ

Φ* = (Φ⊺Φ)−1Φ⊺

(10.4)

Φ⊺ = (Φ⊺Φ)−1Φ⊺T

(10.5)

Where rank (Φ) = L is for cooking result followed by vector θ from Equation (10.3) we can write as

θ = (Φ⊺Φ)−1Φ⊺T

(10.6)

SSBC Based on OSELM 199

10.2.3 Online Sequential Extreme Learning Machine (OSELM) Approach for Solar Cooker The heat transfer with a more realistic order with that of the solar cooker values was generated line by line or hunk by hunk by OSELM is produced as ELM and recursive least square process is anticipated with consecutive method of result (weight) vector of θ. The OSELM is a very important part of two functions like (i) initialization (ii) sequential learning. We have followed a way of that process as we consider cooking pots of the design S = {(Xi, Ti) ∤ Xi ∈ Rn.Ti ∈ Rm, i = 1,…}, then RBF activated function of . g( ) these nodes of L hidden layer. (i) Initialization The solar cooker values to get initial cooking pots as S0 = ( Xi ,Ti )iS=01 and S0 ≥ L. We consider from the initial result vector layer

(

θ(0 ) = ΦT0 Φ0

)

−1

T

ΦT0 T0 where T = t1 , t S0  (10.7)

(ii) Sequential Learning The solar cooker results are generated line by line or hunk of the heat values of Sk+1 = ( Xi ,Ti )SSkk++1Sk+1 and generated with result of matrix function for hidden layer of Φk+1. The cooker boiling materials (weights) result as

(

)

Bk+1 = Bk − Bk ΦTk+1 I + Φ k+1 Bk ΦTk+1 −1 Φ k+1 Bk

(10.8)

θ ( k+1) = θ ( k ) + Ak−+11ΦTk+1 (Tk+1 − Φ k+1θ ( k ) )

(10.9)

Finally, that design performance at k = k + 1 set of hunk cooking values reported sequential learning in development with the conclusion of the solar cooker.

10.2.4 OSELM Neural Network Adaptive Controller on Novel Design An algorithm use of the system controlled through DO concentration from heat transfer modes and manipulated variable is written as

Yk+1 = f(Xk) + D(Xk)Uk + Ck

(10.10)

200 Machine Vision Inspection Systems From Eq. (10.10) is heat transformed according as

Yk+1 = f [ X k , w *] + D[ X k ,υ *]U k + ∆ f

where the cooker is used in perturbation Ck and is the sum of Δf the modeling error for DO focused that absolute temperature values cooking pots error and OSELM formed each neural networks can be written as L

f [ X k , w *] =

∑w* g (α ,m X ) i

i

i

k

i =1

D[ X k ,υ *] =

∑

2L i = L +1

υi* g (α i , mi X k )

(10.11)

The solar cooker is updated for OSELM principle following an online value was derived in sequential method

 Bk−1ΦTk Φ k Bk−1  Bk = Bk−1 − 1 + Φ k Bk−1Φ k 

θ k+1 = θ k + Bk ΦTk Ek*+1 (10.12)

The solar cooker can be developed as controlller for DO concentration

Uk =

− f [ X k , w *] + Yk*+1 D[ X k ,υ *]

(10.13)

From Equation (10.13) the flow chart control by a system approach is shown by Figure 10.3.

10.2.5 Binary Search Tree Analysis of Solar Cooker In a binary search tree, the elements are arranged in such a way that for any node the elements on left side are lesser than that swelling. The essentials right side is greater than that node. This means an essential in the left sub tree is smaller than that key element. An element right sub tree is better than that key of a stepped solar

SSBC Based on OSELM 201 Start

Initialization = (Φ & θ)

Process End ! Yes

No

Set K

Approach to control Solar Cooker (Uk)

OSELM based adaptive control approach Solar cooker

Update of mass Cooking materials (Bk + 1 & θ (k +1))

Solar Cooker Dynamic (f & D)

End

Figure 10.3 Exposed ﬂow chart in solar cooker control based on adaptive and OSELM.

cooker. It is easy for searching the key elements and time taken for searching an element in tree rests on the elevation of a tree. 1. Height for binary search tree minimum is log N. 2. Height of binary search tree maximum is N. We perform rotations to convert larger height binary search tree to smaller height binary search tree. Algorithm: 1. Starting from the root. 2. Compare an element root element is greater than the root and is towards right sub tree. 3. An element has fewer than root element is towards left sub tree. This is about binary search tree insertion and deletion diagram with temperature values. In a binary search tree, we take an element as root element from the given elements, which we take better than an origin section, we supplement on right act tree. An element which we take fewer than root element we supplement left sub tree INSERTION: Tbarplate = 105 °C; Tcook = 98 °C; Tsidewall = 60 °C; Tpcm = 99 °C; Tglasscover = 42 °C; Tfood = 96 °C; Tloss = 10 °C.

202 Machine Vision Inspection Systems Step 1; Insert the value from the given temperature values of the design. Tcook

98°C as root

Tcook

98°C

Step 2; Insert the next value Tsidewall = 60 °C in left cross on root element is less than root element of the design. Tsidewall

60°C

Tcook 98°C

60°C

Tsidewall

Step 3; Insert the next value Tpcm = 99 °C to the right side of the root element as it is greater than the root element of the design.

Tpcm = 99°C 98°C

Tcook

Tsidewall 60°C

99°C

Tpcm

Step 4; Insert the next value Tglasscover = 42 °C to the left side of the root. Now Tsidewall = 60 °C will become the root for Tglasscover = 42 °C so it is inserted left cross root is less than the root design.

SSBC Based on OSELM 203 Tglasscover

=

42°C

Tcook

98°C

Tsidewall 60°C

42°C

99ºC

Tpcm

Tglasscover

Step 5; Insert Tfood on the left of the Tcook(root) as it is less than the root now as Tfood is greater than Tsidewall = 60 OC it is inserted on the right side of Tsidewall = 60 °C of the design. Tfood =

96°C

Tcook 98°C

Tsidewall 99ºC

60°C

Tpcm

96ºC

42°C

Tglasscover

Tfood

Step 6; Insert Tloss = 10 °C to the left side of root as it is less than the root now Tsidewall = 60 °C will be the root as Tloss = 10 °C is then the root it will move towards left side then Tglasscover = 42 °C will be the root now and as Tloss is then the root it will move towards left side of the design.

204 Machine Vision Inspection Systems Tcook 98°C

Tsidewall

Tpcm 60°C

99ºC

42°C

Tfood

96ºC

Tglasscover Tloss

10°C

Step 7; Insert Tbarplate = 105 °C which is greater than the root so inserts on the right side of the root. Now Tpcm = 99 °C will be the root for Tbarplate = 105 °C is better than root inserts correct cross root of the design. It is used in Tbarplate = 105 °C. 98°C

Tcook Tpcm

Tsidewall 60°C

42°C

Tglasscover

99ºC

96ºC

Tfood 105°C

Tbarplate 10°C

Tloss

SSBC Based on OSELM 205

10.2.6 Tree Traversal of the Solar Cooker In order Traverse: Left, Root, Right 10 °C, 42 °C, 60 °C, 96 °C, 98 °C, 99 °C, 105 °C Tloss, Tglasscover, Tsidewall, Tfood, Tcook, Tpcm, Tbarplate Pre order Traverse: Root, Left, Right 98 °C, 60 °C, 42 °C, 10 °C, 96 °C, 99 °C, 105 °C, Tcook, Tsidewall, Tglasscover, Tloss, Tfood, Tpcm, Tbarplate Post order Traverse: Left, Right, Root, 10 °C, 42 °C, 96 °C, 60 °C, 105 °C, 99 °C, 98 °C, Tloss, Tglasscover, Tfood, Tsidewall, Tbarplate, Tpcm, Tcook DELETION 1. If the element to be deleted is leaf, we can delete it without any changes in the binary tree. 2. If the element to be deleted is having one child, then place the child in place of element and delete the element. 3. If the element to be deleted is having two children, then we have to follow in order traverse of the element and delete the element. Step 1; Tloss = 10 °C (Node which is not having any Child), Proposed binary tree of the design. 98°C

Tsidewall

42°C

99ºC

60°C

Tglasscover

Tcook

96ºC

Tfood

Tpcm

105°C

Tbarplate 10°C

Tloss Delete Tloss which is extreme left side of the root.

206 Machine Vision Inspection Systems Step 2; Tpcm = 99 °C (Node which is having single Child) 98°C

Tsidewall

42°C

Tcook

60°C

99ºC

Tglasscover

96ºC

Tpcm

Tfood

Tbarplate 105°C

10°C

Tloss After deleting Tpcm which is on right side of the root.

Step 3; Tsidewall = 60 °C (Node which is having 2 Child Nodes), here by following in order traverse by deleting of the design. Tsidewall = 60 °C using in order traverse in place of Tsidewall, Tglasscover = 42 °C will be there. Tbarplate in the place of Tpcm.

Tcook 98°C

Tsidewall

42°C

10°C

Tbarplate

60°C

Tglasscover

105°C

96°C

Tfood

Tloss

10.2.7 Simulation Model of Solar Cooker Results The comparison of online learning algorithms is studied in solar cookers through the heat transfer to verify all parameters the inherent characteristic

SSBC Based on OSELM 207 Table 10.1 Comparing online learning algorithm to use solar cooker. S. No.

Algorithm

OSELM

OGDLR

SVR

GOGP

1

R2

0.8541

0.9425

0.5421

0.0052

2

RMSE

0.0342

0.0754

0.0461

0.0741

3

Cooking Time

0.0246

4.1246

>6301

0.0847

of the OSELM such as SVR, Go-GP GOD-LR, respectively. The largescale data is not considering handle for Go GP and adapted precisely with streaming data. These online learning methods have been compared that solar cooker parameter values, cooking pots boiling materials data were produced by the BSM1 model and controlled through evasion PI organizer. 1,450 groups, 1,000 groups are the total number of set data and collected by solar cooker data, cooking materials data. From Table 10.1 is the solar cooker parameters of OSRLM and the fastest working process is online SVR. From Table 10.1, the training time of OSELM is the fastest among these methods, especially for Online SVR. OSELM is still the lowest Root Mean Square (RMSE) compared with other methods. As followed in Table 10.1 for the solar cooker temperature analysis of OSELM are high-speed methods especially for online SVR. The lower root means square error (RMSE) is compared to the process. Also, R2 the online process of GOGP is higher than SRV. The solar cooker concluded that the process of OSELN is a good performance of machine learning methods and its application of learning methods.

10.2.8 Program

208 Machine Vision Inspection Systems

Output:

SSBC Based on OSELM 209

210 Machine Vision Inspection Systems

10.3 Results and Discussion The weather conditions of KLEF to the activity of parameters, by the various of volume fractions (0.5%, 10%) of the SSBC have been measured by the effective temperature production like ambient, glass, moist air, bar plate, stepped plate, cooker (input, output), food stuffiness and solar radiation that is experimentally verified for every 30 min as shown in Figure 10.4. The parallel experiment of the systems has been derived during the solar radiation, then ambient temperature. It is calculated that the solar intensity and ambient temperature reach the maximum during the peak which is higher up to 3.00 pm and abridged through when the nightfall reduced the system. It is verified that exploitation of about 10 and 15% furious SiO2/ TiO2 nanoparticle is enhanced with average of the temperature which increases the system through about 16.7 and 27.4% as shown in Figure 10.5 as modified by SSBC in hourly variations to an improved temperature of the system. It is produced by the hourly variations with maximum temperature enhanced single elements nanoparticle, without nanoparticles 0.73, 0.77 ± 0.01 kg/m2 and coated of SSBC with 0.5, 10, 15, 20, 25% the furious SiO2/TiO2 nanoparticles are achieved at 0.74, 0.76, 0.78, 0.77 then 0.76 ± 0.01 kg/m2 individually through the highest solar radiation at 12:00 to 13:00 pm. The SSBC is used in 1 kg water, milk boiling in a method of the 1200

Ta Tma Is

Tbp

Tsp

Tfs

160 140

1100 120 1050 1000

100

05 (%) SiO2/TiO2 nanoparticle

950

80

(12.04.2020)

900

60

850 800

Temperature (ºC)

Solar Radiation (W/m2)

1150

Tg

40 10

11

12 Time of the day (hr)

13

14

Figure 10.4 Shows sample analysis of 0.5% volume fractions act of parameters by a SSBC.

SSBC Based on OSELM 211 Ta

1200

Tma

Tsp

Tg

Tbp

Tfs

Is

180 160 140

1100

120

1050

10 (%)

1000

SiO2/TiO2 nanoparticle (10.03.2020)

950

100 80

900

60

850

40 10

11

12 Time of the day (hr)

13

Temperature (ºC)

Solar Radiation (W/m2)

1150

14

Figure 10.5 Shown sample analysis of 10% volume fractions act of parameters by a SSBC.

temperature virtually 25 min previous than a period of water, milk single nanoelements (SiO2, TiO2), without nanoparticles as shown in Figure 10.6. A bar plate temperature as achieved by the SSBC has been raised virtually to 167 °C period for working hours. Sagade et al. [27] and Bhavani et al. [28] as followed by Table 10.2 is the cooking performance of different materials using SSBC have been achieved by the temperature ranges of various weather conditions of absorption and experimentally verified in cooking the period of October 2019 to April 2020 by weather condition of KLEF. 1 kg of the rice investigated by the systems produced well foodstuffs by the SSBC and was taken during the period of 105 min, always than the higher temperatures, and compared to single nanoparticles for 145 to 155 min, without nanoparticles for 190 min in cooking pots through the hours. SSBC has increased the heat acts for 21.5and 24.6% and used 10 and 15% ratio higher as compared to single elements nanoparticle and without nanoparticles. Likewise, coated bar plate’s higher temperature implemented usage of 20, 25% is established and without nanoparticles through the bar plate peak times a sunlight period. Figure 10.7 followed with performance of the SSBC with a ratio of 0.5, 10, 15, 20, 25% SiO2/TiO2 nanoparticles that have been achieved by a system of 31.77, 37.69, 49.21, 36.99 and 34.66% then improved to compare coated the single elements nanoparticles, without nanoparticles to produce the bar plate reactions.

212 Machine Vision Inspection Systems SiO2/TiO2 TiO2

SiO2 Without Nanoparticles

1kg mass cooking average act of SSBC

140 120

80

Tempera ture (ºC)

100

60

40 20

25 Vo lu

me

20 fra cti 15 on (% )

Tfs

Tfs Tfs -

10 5

Tfs

- Si

O2

/Ti

SiO

- Ti

O2

0 -w

itho

ut n ano

par ti

cle

2

O2

Figure 10.6 Shows 1 kg mass used in various volume fraction acts of SSBC by the different samples.

10.4 Conclusion The solar cooker is used in the composite nonlinear design and based on a neural network with a controller in an approach that has been analyzed of heat transfer widely. Though nearly scarcities with a traditional neural network affect a consultation advanced calculation charges also an enormous period of feasting. The solar cooker has been impressed to faintness and achieved a good control effect of research for OSELM based adaptive control with the recursive least square method being introduced for thermal application as one of the solar cooker design. The neural network-based control approach of the cooking materials and solar design for biases of hidden nodes running methods. The effect of solar cooker in different used ratios of SiO2/TiO2 nanoparticles have been coated with a bar plate that enhanced in high temperature then reduced the cooking times. It is mainly important to further be handled by the system for the bar plate to be influenced and enhanced for that of heat transfer modes. SiO2/ TiO2 nanoparticles of average temperature of the glass, cooker, bar plate, stepped plate has been enhanced around 12.5, 16.4, 16.5, 16.3% individually through about 10%. The absorption ratio mixture of nanoparticles is increased by 10% by the SSBC bar plate, stepped plate temperature, which

23.24

1 kg Milk

1 kg Water

1 kg Rice

1

2

3

105.10

25.12

167.09 °C

SiO2/TiO2 Taken Time/s

BP Temperature

S. No.

Cooking Materials

145.23

44.20

43.27

131.23 °C

SiO2 Taken Time/s

155.11

43.11

42.37

127.13 °C

TiO2 Taken Time/s

Table 10.2 SSBC analysis of cooking materials with furious SiO2/TiO2 performances.

190.11

80.12

84.41

101.24 °C

Without Nanoparticles Taken Time/s

Bhavani et al. [29, 30]

Sagade et al. [28]

Verdugo [23]

Nahar [22]

References

SSBC Based on OSELM 213

214 Machine Vision Inspection Systems SiO2/TiO2

SiO2

TiO2

Without Nanoparticles

Average SSBS volume fraction %

50

20

Efficien cy

30

(%)

40

10 25 Vo lum 20 e fr act 15 ion (% )

Tfs

Tfs 10

5

Tfs

Tfs -S

- Si

iO2

- TiO

2

0 - wi

tho

ut n

ano

par

ticle

O2

/TiO 2

Figure 10.7 Shows various volume fraction acts of SSBC and overall efficiency by the different samples.

is no significance for nanoparticle development of the parameters. From furious SiO2/TiO2 nanoparticles of the SSBC efficiency were increased by 37.69 and 49.21% using 0.5 and 10%. It is higher as per equated to that of SSBC with analysis of SiO2, TiO2, without nanoparticles for the systems.

References 1. Perez-Sanchez, B., Fontenla-Romero, O., Guijarro-Berdinas, B., MartınezRego, D., An online learning algorithm for adaptable topologies of neural networks. Expert Syst. Appl., 40, 18, 7294, 2013, https://doi.org/10.1016/j. eswa.2013.06.066. 2. Qiao, J., Zhang, Z., Bo, Y., A non-line self-adaptive modular neural network for time-varying systems. Neurocomputing, 125, 7, 2014. 3. Lughofer, E., On-line active learning: A new paradigm to improve practical use ability of data stream modeling methods. Inf. Sci., 415, 416, 356, 2017, https://doi.org/10.1016/j.ins.2017.06.038.

SSBC Based on OSELM 215 4. Zhang, Q., Zhang, P., Long, G., Ding, W., Zhang, C., Wu, X., Online learning from trapezoidal data streams. IEEE Trans. Knowl. Data Eng., 28, 10, 2709, 2016. 5. Zhou, Z., Zheng, W.S., Hu, J.F., Xu, Y., You, J., One-pass online learning: A local approach. Pattern Recognit., 51, 346, 2016, https://doi.org/10.1016/j. patcog.2015.09.003. 6. Saxena, A., Varun, Pandey, S.P., Srivastav, G., A thermodynamic review on solar box type cookers. Renew. Sust. Energ. Rev., 15, 3301, 2011. 7. Wang, H., Huang, J., Song, M., Yan, J., Effects of receiver parameters on the optical performance of a fixed-focus Fresnel lens solar concentrator/cavity receiver system in solar cooker. Appl. Energy, 237, 70, 2019. 8. Regattieri, A., Piana, F., Bortolini, M., Gamberi, M., Ferrari, E., Innovative portable solar cooker using the packaging waste of humanitarian supplies. Renew. Sust. Energ. Rev., 57, 319, 2016. 9. Farooqui, S.Z., A review of vacuum tube based solar cookers with the experimental determination of energy and exergy efficiencies of a single vacuum tube based prototype. Renew. Sust. Energ. Rev., 31, 439, 2011. 10. Schwarzer, K. and Silva, M.E.V., Characterisation and design methods of solar cookers. Sol. Energy, 82, 157, 2008. 11. Lokeswaran, S. and Eswaramoorthy, M., Experimental studies on solar parabolic dish cooker with porous medium 1. Appl. Sol. Energy, (English Transl Geliotekhnika), 48, 169, 2012. 12. Atul, A., Sagade, Samdarshi, S.K., Lahkar, P.J., Narayani Sagade, A., Experimental determination of the thermal performance of a solar box cooker with a modified cooking pot. Renewable Energy, 150, 1001, 2020, https://doi.org/10.1016/j.renene.2019.11.114. 13. Cuce, E., Improving thermal power of a cylindrical solar cooker via novel micro/nano porous absorbers: A thermodynamic analysis with experimental validation. Sol. Energy, 176, 211, 2018, https://doi.org/10.1016/j.solener. 2018.10.040. 14. Cuce, P.M., Box type solar cookers with sensible thermal energy storage medium: A comparative experimental investigation and thermodynamic analysis. Sol. Energy, 166, 432, 2018, https://doi.org/10.1016/j.solener.2018.03.077. 15. Guidara, Z., Souissia, M., Morgenstern, A., Maalej, A., Thermal performance of a solar box cooker with outer reflectors: Numerical study and experimental investigation. Sol. Energy, 158, 347, 2017, https://doi.org/10.1016/j. solener.2017.09.054. 16. Negi, B.S. and Purohit, I., Experimental investigation of a box type solar cooker employing a non-tracking concentrator. Energy Convers. Manage., 46, 4, 577, 2005, https://doi.org/10.1016/j.enconman.2004.04.005. 17. Palanikumar, G., Shanmugan, S., Chithambaram, V., Solar cooking thermal image processing applied to time series analysis of fuzzy stage and

216 Machine Vision Inspection Systems inconsiderable Fourier transform method. Materials Today: Proceedings, In press, 2020, https://doi.org/10.1016/j.matpr.2020.02.664. 18. Santacruz, I., Cabeza, A., Ibeh, P., Losilla, E.R., De la Torre, A.G., Aranda, M.A.G., Preparation of photocatalytic TiO2 coatings by gel-dipping with polysaccharides. Ceram. Int., 38, 8, 6531, 2012, https://doi.org/10.1016/j. ceramint.2012.05.034. 19. Liu, H., Guo, W., Li, Y., He, S., He, C., Photocatalytic degradation of sixteen organic dyes by TiO2/WO3-coated magnetic nanoparticles under simulated visible light and solar light. J. Environ. Chem. Eng., 6, 1, 59, 2018, https://doi. org/10.1016/j.jece.2017.11.063. 20. Swapna, K., Mahamuda, S., Rao, A.S., Sasikala, T., Moorthy, L.R., Visible luminescence characteristics of Sm3+ doped zinc alumino bismuth borate glasses. J. Lumin., 146, 288, 2014, https://doi.org/10.1016/j.jlumin.2013.09.035. 21. Abd-Elhady, M.S., Abd-Elkerim, A.N.A., Ahmed, S.A., Halim, M.A., AbuOqual, A., Study the thermal performance of solar cookers by using metallic wires and nanographene. Renewable Energy, 153, 108, 2020, https://doi. org/10.1016/j.renene.2019.09.037. 22. Nahar, N.M., Performance and testing of a hot box storage solar cooker. Energy Convers. Manage., 44, 8, 1323, 2003, https://doi.org/10.1016/ S0196-8904(02)00113-9. 23. Soria-Verdugo, A., Experimental analysis and simulation of the performance of a box-type solar cooker. Energy Sustain. Dev., 29, 65, 2015, https://doi. org/10.1016/j.esd.2015.09.006. 24. Liang, N.Y., Huang, G.B., Saratchandran, P., Sundarajan, N., A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw., 7, 6, 1411, 2006. 25. Soniya, V., Swetha Sri, R., Swetha Titty, K., Ramakrishnan, K., Sivakumar, S., Attendance Automation Using Face Recognition Biometric Authentication, in: SSN College of Engineering, Chennai—IEEE 2017 International Conference On Power And Embedded Drive Control (ICPEDC), 16th–18th March 2017. 26. Sivakumar, S. and Rajalakshmi, R., Comparative Evaluation of various feature weighting methods on movie reviews, in: Veer Surendra Sai University of Technology, Sambalpur, Odisha—Springer 2017, International Conference on Computational Intelligence in Data Mining (ICCIDM), 11th and 12th of November 2017. 27. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., Extreme learning machine: Theory and applications. Neurocomputing, 70, 1–3, 489, 2006, https://doi.org/10.1016/j. neucom.2005.12.126. 28. Sagade, A.A., Samdarshi, S.K., Panja, P.S., Experimental determination of effective concentration ratio for solar box cookers using thermal tests. Sol. Energy, 159, 984, 2018, https://doi.org/10.1016/j.solener.2017.11.021. 29. Bhavani, S., Shanmugan, S., Selvaraju, P., Monisha, C., Suganya, V., Fuzzy Interference Treatment applied to Energy Control with effect of Box type

SSBC Based on OSELM 217 Affordable Solar Cooker. Mater. Today: Proc., 18, 3, 1280, 2019, https://doi. org/10.1016/j.matpr.2019.06.590. 30. Illa, M.P., Khandelwal, M., Sharma, C.S., Bacterial cellulose-derived carbon nanofibers as anode for lithium-ion batteries. Emerg. Mater., 1, 105, 2018, https://doi.org/10.1007/s42247-018-0012-2.

11 Applications to Radiography and Thermography for Inspection Inderjeet Singh Sandhu1, Chanchal Kaushik2* and Mansi Chitkara1 Chitkara University Institute of Engineering & Technology, Chitkara University, Punjab, India 2 Chitkara School of Health Sciences, Chitkara University, Punjab, India 1

Abstract

Radiography is a branch of science that uses X-ray radiation to obtain a picture of internal organs of the human body on film or detector plate. They are used to create images of internal organs or for inspection via different types of equipment such as CT scan modality, X-ray radiography machine (computed or digital), mammography, fluoroscopy, dual-energy X-ray absorptiometry (DEXA) scan, dental radiography, ortho pan tomography (OPG), etc. Further understanding of the interaction of X-rays with the matter, and radiographic image quality, would help in elucidating the role of different X-ray equipment in diagnosis and inspection. Radio-diagnosis, nuclear medicine, and radiotherapy remain strong pillars for inspection, diagnosis and treatment delivery. Recent advances in artificial intelligence in radiography such as computed tomography have bought revolution in the field by assisting in quicker and effective diagnosis. Further its applications in industrial radiography, food irradiation and thermography are also expounded in the chapter. Keywords: Radiography, inspection, X-rays, thermography, computed tomography, mammography, infrared

*Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (219–240) © 2021 Scrivener Publishing LLC

219

220 Machine Vision Inspection Systems

11.1 Imaging Technology and Recent Advances The medical industry is one of the fast-growing industries in the last decade. Due to the advancements in technology, integration of high-end modalities in healthcare, and continuous upgradations in software technology like image processing techniques, has not only brought a revolution in healthcare but also in other industries as well [1]. On one hand, we have come so far in advances in diagnostic techniques such as CT scans, MRI scans, PET-CT, SPECT-CT, fusion imaging, etc., while on other hand the picture archiving and communication system (PACS) and digital imaging and communication in medicine (DICOM) and teleradiology are bridging the barriers within the hospital and between urban and rural areas [2]. The ultimate goal remains to keep upgrading human lives for better living and advances in the medical industry is crucial for early detection of diseases and inspection in other industry such as airport safety. Radiology is the backbone of any hospital industry. Similarly, its role in other industries is also vital due to its applications. Radio-diagnosis, nuclear medicine, and radiotherapy all are strong pillars for diagnosis and treatment delivery [3].

11.2 Radiography and its Role Radiography is a branch of science that is used to diagnose the diseases and help in better treatment delivery. It is an imaging technique, which uses X-rays to obtain a picture of internal organs of the human body on film or detector plate [4]. An X-ray is a form of energy in the electromagnetic spectrum, which is ionizing in nature. X-rays are invisible and can create the biological effects of radiation. Its judicious use is of extreme importance. X-rays are used to create images of internal organs or for inspection via different types of equipment such as CT scan modality, X-ray radiography machine (computed or digital), mammography, fluoroscopy, dual- energy X-ray absorptiometry (DEXA) scan, dental radiography, ortho pan tomography (OPG), etc. [5]. This medical equipment, have an X-ray tube installed, which is used to deliver radiation on the field of view under observation (interaction of X-rays with matter) and present the image on a film plate or screen/detector (image formation). The images obtained are then processed and read by a radiologist/inspector for reporting/analysis and further helps in decision making [6]. The personnel who performs these medical exposures are trained professionals with a bachelor’s degree in medical imaging technology under allied health sciences. The occupational exposures from medical sources are of extreme importance and

Applications to Radiography and Thermography 221 contribute to 20 mSv/year. The limit to public exposures from these medical exposures is .01 mSv/year [7]. Since these exposures create stochastic effects and can cause genetic induced cancers, therefore, optimization of these radiation doses is of utmost importance and cannot be neglected [8].

11.3 History and Discovery of X-Rays In 1895, on 8th November Sir Wilhelm Conrad Roentgen, a German physicist discovered X-rays, while working on a partially evacuated glass tube. He observed that a barium platinocynide screen placed in the room nearby started glowing when the tube was in operation. He said that some unknown radiation caused fluorescence of the screen. Further investigations were made that these rays could pass through materials like paper, wood, and aluminum, etc. These rays were also known as roentgen radiations. The first X-ray taken was of his wife Anna Bertha’s hand and is called the first “Roentgenogram” [9, 10]. The use of X-rays bought a revolution in the medical industry, as now, one could see the picture of internal organs, their normal anatomy, and diseased tissue anatomy. Further advancements in the technology and image processing led to CT scans, which perform multiple projection radiography and obtains 3D images of a body, along with various post-processing techniques [11]. X-ray tube is an equipment used to produce X-rays (Figure 11.1: a, b). These X-rays are produced as per need in a controllable manner, as per inspection and examination protocol. An X-ray tube comprises a cathode and an anode. High-speed electrons from the cathode (thermionic emission) are targeted to the anode (tungsten), which is rotatory in fashion. It produces X-rays (Bremsstrahlung and Characteristics). Filter present at the face of X-ray exit from the tube is used to absorb low energy X-rays and collimator helps in collimating the area of interest, reducing radiation dose to other areas. The whole X-ray tube is enclosed with lead protection

(a)

(b)

Figure 11.1 X ray tube: (a) stationary X-ray tube and (b) rotatory X-ray tube.

222 Machine Vision Inspection Systems to reduce any leakage radiation. A beryllium window is present at the exit of X-rays from the tube [12].

11.4 Interaction of X-Rays With Matter When an X-ray beam is an incident on a human body, X-ray beam attenuates (absorption and scattering), resulting in a reduction in X-ray photons in the beam. The scattering of radiation results in noise and does not contribute to image formation. The absorption of photons causes its removal from the X-ray beam. The transmission of radiation plays an important role in carrying the diagnostic information. There are 5 types of interaction which may happen between X-rays and atoms of absorbing material: 1. 2. 3. 4. 5.

Photoelectric absorption Compton scattering Coherent scattering Pair production Photo disintegration.

The fraction of X-rays removed from the beam per unit thickness of the attenuating medium is known as the linear attenuating coefficient. More the density of a material, more X-rays will be attenuated. Linear attenuating coefficient decreases with increasing energy of X-rays. Photoelectric absorption is a dominant process in the diagnostic energy range. It happens when incident photon energy is slightly greater than the binding energy of the innermost shell (K or L) [13]. On interaction, the incident electron is completely absorbed by the atom and results in the ejection of K-shell electron, leaving a hole. The atom is ionized now, with a vacancy in K-shell. An ejected electron is called photon electron. Now, an electron jumps from a higher energy state (L or M shell) to fill up the vacancy in K-shell, resulting in transition and energy difference between two shells is radiated in the form of X-rays (Characteristic X-rays). These X-rays are used in soft tissue imaging such as in Mammography. This interaction enhances the natural tissue contrast and results in a radiographic image of excellent quality.

11.5 Radiographic Image Quality The X-ray or CT image quality of a radiograph is read on 04 parameters i.e., noise, contrast, unsharpness, and resolution [14].

Applications to Radiography and Thermography 223 Noise is the grainy appearance of a radiograph. It occurs due to fog on a radiograph, humidity, temperature changes, film life, and exposure to sunlight, etc. It deteriorates the image quality and renders the image of poor quality and repeat exposures. Noise can also result due to low X-ray photons in the beam. This is known as the quantum mottle. Overall, the appearance of noise is a bad phenomenon and must be avoided [15]. It can be reduced by increasing the mAs, the use of fresh films, keeping films in manufacturer recommended conditions, avoiding contact from sunlight. Increasing mAs may further result in increased patient radiation dose. High signal to noise ratio (SNR) results in better image quality and diagnosis (Figure 11.2: a, b). The degree of blackening is the density and the difference in density obtained is the Contrast. Density can be measured with the help of a tool densitometer. Radiographic contrast is of two types: Film contrast and subject contrast. The study of the characteristic curve will further help in understanding film and subject contrast. Film contrast depends on types of grains used in film emulsion layer, type of emulsion layer, safelight used in the darkroom as per film sensitivity, etc. The T-grain technology of film emulsion and the use of dyes to reduce lateral scattering in the film will help in controlling film contrast. Appropriate use of safelight as per film type (monochromatic, orthochromatic and panchromatic) and its sensitivity to light will help in unnecessary exposure of films to light during image processing. Amber color safelight should be used for monochromatic films, red color safelight should be used for orthochromatic films, while panchromatic films should be processed in complete darkness. Subject Contrast depends upon parameters such as kVp used. kVp is the penetrating power of X-ray photons. Increase in kVp results in a decrease in contrast. This is the reason that in mammography, where we need high soft-tissue contrast, a low kVp is used to enhance microcalcification. Subject contrast also depends upon characteristics like patient thickness, density, an atomic number of materials under investigation. A high kVp and low mAs technique are used nowadays in digital radiography to reduce patient radiation dose. Image quality is then modified by application software. External contrast agents such as barium sulphate and iodinated ionic and non-ionic contrast agents are also used to enhance contrast in imaging. These contrast agents are used widely in fluoroscopy, radiography, and CT scans to differentiate the structures under observation [16]. Unsharpness can be understood from the sudden change in the boundaries of different structures in an image. It is of three types: Photographic unsharpness, geometric unsharpness, motional unsharpness. Photographic unsharpness results from the poor film-screen combination.

224 Machine Vision Inspection Systems water

fat

OP

IP

Abdomen

Lor Nasopharynx

(a)

(b)

Figure 11.2 (a): High SNR: large FOV images: MRI and (b): lower limb angiography: high contrast: high SNR.

Film–screen combination must be checked regularly with the help of test tools as per guidelines. A good film screen combination will result in a sharper image. Geometric unsharpness can result due to disturbance in the geometry of the source to image receptor distance. X-ray is the source of radiation and the grid bucky being the image receptor should have a 100 cm distance for all radiographic examinations except, chest PA (posteroanterior) and cervical lateral (180 cm). Geometric unsharpness

Applications to Radiography and Thermography 225 occurs due to a decrease in source to image distance, an increase in object to film distance, and the use of large focal spot size. These can be controlled by using a small focal spot size, increasing source to image distance, and decreasing the object to film distance. Motional unsharpness results from the patient›s movement during exposure. It causes blur in an image and deteriorates the image quality. Patient movement during exposure to radiation while taking a scan can be controlled with the use of immobilization devices, such as sandbags. Using a fast film screen combination will also help in capturing images faster. Also, the recent advances such as dual-source CT scans are developed for taking faster scans without patient breathing artifacts in an image. For pediatric patients, natural sleep works best, along with a friendly approach with the child. In the case of traumatic patients, faster scans, and immobilization devices both help [17]. The resolution in radiographic image quality is explained from the modulation transfer function (MTF). MTF is the measure of the true resolution of a system. It can be understood as spatial frequency response, MTF of one means image depicts all information of that of an object [18].

11.6 Applications of Radiography Applications of radiography are primarily divided into medical and industrial use. In medical use, radiography plays a vital role in the detection and diagnosis of diseases with the use of an X-ray unit (computed and digital radiography) (Figure 11.3: a–f), fluoroscopy (real-time imaging), CT scanning, DEXA scans, dental radiography, cath lab radiography, operation theaters, and mammography. Other applications are in security and checking, food industry, etc. Radiographic Modalities in use are:

11.6.1 Computed Radiography (CR)/Digital Radiography (DR) Computed radiography uses X-rays and photostimulable phosphor (PSP) plates to take images. DR uses flat panel detectors (direct and indirect) to directly acquire images without handling of cassettes (Figure 11.4). It is the recent advances in radiography, where images are acquired and films are processed on AGFA or FUZI systems, instead of previous film-screen systems and chemical processing. Images now are processed on computer and laser printers are used to print the images. No need for darkroom procedures like before. CR and DR have reduced the workload of a

226 Machine Vision Inspection Systems

(a)

(b)

(c)

(d)

(e)

(f)

Figure 11.3 Medical applications of X-rays: (a) Chest PA, (b) Cervical spine Lateral, (c) Pelvis AP, (d) Shoulder AP, (e) Hand PA, (f) Knee AP.

department and also, bought a digital revolution in the medical industry in X-ray imaging. Now, images are available on the spot, saving time, and faster treatment delivery for patients. Also, radiographs can be acquired in different positions as per patients’ comfort, and also images can be easily sent to medical professionals sitting in other departments in a short time. Radiologists can now report images at their convenience and location and

Figure 11.4 Digital radiography machine: medical application.

Applications to Radiography and Thermography 227 can also change the contrast of images as per the patient’s clinical conditions. Images can also be retrieved now very easily even if lost, and it’s very easy to make multiple copies as per requirements. Apart from the above-mentioned disease, CR, DR can also be used for traumatic patients in emergency departments due to its fast technology. Now the images acquired have a very good contrast resolution as well [19]. Applications of radiography using X-rays are as follows. Detection of: • • • • • • • • • • •

Bone disorders (Arthritis, Osteoporosis) GIT disorders Foreign body localization Calculus/stones in kidneys Infections (such as pneumonia, lower lung infection) Cancer Pleural effusion Collapsed lungs Chronic lung conditions Tumor detection Dental radiography

11.6.2 Fluoroscopy Fluoroscopy is real time imaging utilizing X-rays (3–5 mA) and contrast agents. Diagnostic procedures are performed under fluoroscopic guidance such as gastrointestinal studies, hepatobiliary studies, urinary tract investigations, reproductive imaging, brain imaging, etc. Contrast agents used are barium based and iodine-based. Barium sulphate is the choice of contrast media for all gastrointestinal studies with contraindications of perforation and aspiration. Iodinated contrast media such as urograffin, conray, iohexol, and visipaque are the choice of contrast media for the urinary tract, reproductive tract, and other investigations. Pulsed fluoroscopy is used to reduce radiation dose to patients. Recently digital fluoroscopy is used. Fluoroscopy uses image intensifier television (IITV) [20]. Applications of Fluoroscopy are as follows: • Gastro-intestinal investigations: Barium swallow: (esophageal abnormalities, dysphagia, odynophagia, trachea-esophageal fistula, etc.), Barium meal: (Dyspepsia), Barium meal follow-through, barium enema (Figure 11.5a), Enteroclysis (small bowel enema for Crohn’s disease), and instant barium enema (ulcerative colitis).

228 Machine Vision Inspection Systems

(a)

(b)

Figure 11.5 (a) Barium enema and (b) T-tube cholangiography.

• Hepatobiliary investigations: T tube cholangiography studies (Figure 11.5b), ERCP, PTC, etc. • Urinary studies investigations: Intravenous urography, MCU, RGU, PCN, etc. • Myelography • Interventional radiology procedures • Interventional neuroradiology procedure • Biopsies including liver biopsy under fluoroscopy • Orthopedic surgery to guide fracture reduction • Angiography and Venography • Placements of catheters and tubes. • Urological studies including RGU, MCU, and pyelography • Discography • Hysterosalpingography [21].

11.6.3 DEXA A dual-energy X-ray absorptiometry is a method of measuring bone mineral density. It is a non-invasive test that can measure bone density. In this technique, a low energy X-ray source is used. X-rays with different energy ranges are passed through the patient body. One is for soft tissue and another is for bones. Soft tissue is eliminated from body surface leaving only bone part. The measurements are then compared with a normal range of T and Z scores. T score is the amount of bone compared with a young adult of the same gender. Z score is the amount of bone compared with other people of the same age group [22].

Applications to Radiography and Thermography 229 Applications of DEXA scan are: • • • • • • • • • •

Detection of osteoporosis Occurrence of fracture Monitor prognosis of strengthening treatment Patients with nutritional rickets, lupus and Turner syndrome Also used to measure skeletal maturity, body fat composition To assess the outcomes of pharmaceutical therapy To diagnose bone mass acquisition in childhood Women after menopause and deficiency of estrogen Abnormalities of the vertebra, thyroid disorder Hip and spine disorder [23].

11.6.4 Computed Tomography CT scans are used to take a 3D image of the body with the help of reconstruction techniques (Figure 11.6). Multiple X-ray projections are taken of an area of interest, data is collected by detectors and a digital image is formed. Iterative reconstruction and filtered back projection are used. Post-processing techniques such as multiple projection radiography (MPR), maximum intensity projection (MIP), minimum intensity projection (Minip), volume rendering techniques (VRT), and surface shaded display (SSD) are used. Newer generations of CT scans have longer z-axis coverage and faster scans [24].

Figure 11.6 Computed tomography machine.

230 Machine Vision Inspection Systems Applications are: CT Brain Imaging (Figure 11.7) • • • •

Injury or trauma Tumors Cancer Brain angiography to see blood vessels or blockages in blood vessels • Hydrocephalus • Aneurysms CT Imaging of Chest (Figure 11.8) Multislice CT helps in acquiring faster scans and therefore is very useful for chest CT scans. It is done for the detection of: • • • • • • •

Pneumonia Pleural effusion Lung carcinoma Interstitial lung disease (ILD) Lung Biopsies Chronic fibrosis Tuberculosis

Figure 11.7 Contrast brain angiography.

Applications to Radiography and Thermography 231

Figure 11.8 Cardiac CT angiography.

• Bronchiectasis • Congenital abnormalities, etc. Calcium scoring technique is done to see deposition of plaque in arteries. It’s the one of the effective way to spot atherosclerosis before symptoms develops. CT Abdominal Imaging (Figure 11.9: a, b, c) • • • • • • •

Appendicitis, pyelonephritis, abscesses inflammatory bowel disease Crohn’s disease pancreatitis Abdominal aortic aneurysms (AAA) Injuries to abdominal organs Liver cirrhosis, etc. [25].

11.6.5 Industrial Radiography It is a method for inspecting products under quality control checks such as leakage, cracks, porosity, durability, etc. of pipes. It helps in finding out the hidden flaws and therefore maintain the durability of products.

232 Machine Vision Inspection Systems

(a)

(b)

(c)

Figure 11.9 (a) CT abdomen angiography and (b) whole abdomen angiography revealing spleen tumor and (c) CT urography: renal calculi.

Applications to Radiography and Thermography 233 Radiography helps in taking the internal pictures of these products, without having to break and then test them. The cracks, etc. in these products are easily seen via radiography cameras such as gamma cameras or X-ray cameras. The radiation penetrates these products and helps in non-destructive testing. It is also used for the research and development of products during manufacturing and installation procedure in service. CR and DR are used for inspection [26, 27]. Applications: • Testing defects in metal parts such as shrinkage, porosity, cracks • Ensuring durability of products • Open-field radiography • Accessing internal and external geometry • Inspection • Quality of welds determination • Corrosion extent. These devices are portable and use gamma rays. Proper inbuilt shielding is provided for radiation protection to workers. It is also used in other industries for inspection: • Packaging products: to check for any leakage • In manufacturing industry • In Defense and military • In Medical devices • Automotive • Aerospace. Conventional industrial radiography provides high contrast images due to usage of fine grain films, which gives good contrast. Films are processed on site. X-ray tube, types: • Directional tube (one direction) • Panoramic tube (uni or bipolar) Megavolt equipment in the range of 1–16 MeV are used in industrial radiography. The linear accelerator (linac): It is used at an energy level of 4 and 8 MeV. They have very high radiation output and small focal spot.

234 Machine Vision Inspection Systems The Bètatron: It is an electron accelerator, with X-ray radiation in the range of 2–30MeV. They can be built with very small focal spots. Portable low energy betatrons are also available (2–6 MeV), with low radiation output. Food irradiation: To improve food safety, by preservation, delaying ripening, sterilization, invasion of pets, therefore reducing foodborne illness, food irradiation is done. Here, food is exposed to ionizing radiations such as X-rays, gamma rays, and electron beam. The process eliminates the organisms which are responsible for foodborne illness and therefore extends the food life.

11.6.6 Thermography Thermography or Infrared thermography is an instrument or imaging technique, which determines the infrared rays released from the source and change it to temperature and therefore provides an image of temperature dispensation. It comes under infrared imaging sciences. These cameras usually detect the range of 9,000 to 14,000 nanometers [28]. It is mostly used in veterinary sciences. Nowadays it is also used for screening purposes in breast cancers [29]. It also came into screening practice at airports during a pandemic in 2009 of Swine flu and in many other fields [30]. Firefighters also use this technique to detect the foundation of fire, detection of any failure in wires, and to improve the regulations of heat and air cooling units [31]. Hildebrandt et al. in 2020 explained the role of infrared thermography in the field of sports. They reported that medical imaging thermography is an efficient tool in medicine, especially for injury detection as it is inexpensive, non-invasive, and a non-radiation modality [32]. Thermography is of two types i.e. Active and Passive thermography. Active thermography is used for super-resolution microscopy, while the passive is used for surveillance in medical diagnosis [33]. The advantage of thermography is that it displays an optical image so that the temperature in the big regions can be contrasted. It can also grab a moving object in real-time imaging [34–37]. Applications There are several applications of thermography including condition monitoring, thermal mapping, medical imaging, detection of cracks in building, contact and non-contact thermography, neurology, musculoskeletal, abnormalities in the thyroid gland, veterinary imaging, night vision, research, destructive testing, thermology, process control, security surveillance, chemical imaging, volcanology, building, etc. [38–41].

Applications to Radiography and Thermography 235

11.6.7 Veterinary Imaging Yanmaz et al. in 2007 reviewed the clinical use of thermography mostly in the veterinary field. They reported that it is a non-invasive imaging tool and has several applications including foot, joints, long bones, muscles, ligaments, and tendons. It can also be applied to radiography and ultrasonography. It is beneficial for the patients and uses infrared radiation and has recent advancements in the field of veterinary imaging [42]. Another author Soroko et al. in 2016 presented the applications of thermography in veterinary sciences such as inflammatory, neurological, and pathological conditions [39]. They agreed with Yanmaz et al. and concluded that, it confirms the signs of disorder before its clinical indications. The recent trends and its use in equine medicine is also reported by Soroko et al. in 2018 [43]. Thermography plays a main commendatory part in the primary diagnosis of a disease. Its sensitivity and specificity are higher when combined with the other imaging modalities such as radiography and ultrasonography. Further research is needed for the accuracy of this technique [44]. Turner’s [45] study is similar to several other studies that reported the application of thermography in the field of veterinary sciences. Turner concluded it as the most important and recent imaging tool in veterinary sciences. It plays an important role in the initial detection of bone, joints, and muscle, tendons, and ligaments injuries in the horses. Eddy et al. in 2001 also highlighted its role in managing the pathologies of horses. The authors reported that it is popular due to the recent advances in imaging systems and imaging cameras. This technique is not only limited to the detection of the clinical symptoms of the disease but also in the progress of their healing [46].

11.6.8 Destructive Testing Roemer et al. in 2013 concluded that thermography can be used in destruction observation in metals and compound objects. Pulsed and Vibro techniques are commonly used. These methods are highly effected in detection defects in objects [47].

11.6.9 Night Vision Thermography is an excellent technique for night vision also. Infrared cameras are used to detect the thermal radiation without the need of a light source. They can produce images in the dark and can also detect through the fog, smoke, and rain, etc. This technique is mostly used in aircraft.

236 Machine Vision Inspection Systems

11.6.10 Conclusion Thermography has a wide range of applications in various industries. Starman et al. in 2011 reported its use for the detection of cracks. It can be used through an infrared camera for the detection of cracks up to four blocks within twenty seconds [46, 48]. Wu et al. in 2009 reported its application in the assessment of pain in the coccyx regions of the patients before and after the treatment therapies. The major advantage is that, this technique is painless, non-invasive, and easy to use [49]. Lewis et al. in 1979 presented its applications in patient’s suffering from infertility. A new diagnostic tool is developed which is portable, less expensive as compared to infrared camera i.e. contact scrotal thermography camera which gives a permanent record of the temperature of the scrotum. It has an important role in the field of urology [50]. Maierhofer et al. in 2006 also reported the impulse thermography in civil engineering to check the gaps and puncture in concretes or plasters for safety purposes [51].

References 1. Vitthal, P.C., Subhash, A.R., Sharma, B.R., Ramachandran, M., Emerging trends and future prospects of medical tourism in India. J. Pharm. Sci. Res., 7, 5, 248, 2015 May 1. 2. Dreyer, K.J., Hirschhorn, D.S., Thrall, J.H., PACS, M., A guide to the digital revolution, Springer, New York, 2006. 3. Robertson, I.D. and Saveraid, T., Hospital, radiology, and picture archiving and communication systems. Vet. Radiol. Ultrasound, 49, S19–28, 2008 Jan. 4. Maglinte, D.D., Balthazar, E.J., Kelvin, F.M., Megibow, A.J., The role of radiology in the diagnosis of small-bowel obstruction. AJR: Am. J. Roentgenol., 168, 5, 1171–80, 1997 May. 5. Rowlands, J.A., The physics of computed radiography. Phys. Med. Biol., 47, 23, R123, 2002 Nov 20. 6. Yaffe, M.J. and Rowlands, J.A., X-ray detectors for digital radiography. Phys. Med. Biol., 42, 1, 1, 1997 Jan. 7. Board, A.E., AERB Safety Guide NO. AERB/RF-RQTL/SG-1. AERB Publications, Mumbai, India, 400 094, 2018. 8. Kalra, M.K., Maher, M.M., Toth, T.L., Hamberg, L.M., Blake, M.A., Shepard, J.A., Saini, S., Strategies for CT radiation dose optimization. Radiology, 230, 3, 619–28, 2004 Mar. 9. Glasser, O., WC Roentgen and the discovery of the Roentgen rays. AJR: Am. J. Roentgenol., 165, 5, 1033–40, 1995 Nov. 10. Glasser, O., Wilhelm Conrad Röntgen and the early history of the Roentgen rays, Norman Publishing, San Francisco, 1993.

Applications to Radiography and Thermography 237 11. Goodsitt, M.M., Beam hardening errors in post-processing dual energy quantitative computed tomography. Med. Phys., 22, 7, 1039–47, 1995 Jul. 12. P. Schardt, E. Hell, D. Mattern, Inventors; Siemens AG, assignee. X-ray tube. United States patent US 6,339,635, 2002 Jan 15. 13. Hall, H., The theory of photoelectric absorption for X-rays and γ-rays. Rev. Mod. Phys., 8, 4, 358, 1936 Oct 1. 14. Rossmann, K. and Wiley, B.E., The central problem in the study of radiographic image quality. Radiology, 96, 1, 113–8, 1970 Jul. 15. W.W. Godlewski, J.D. Chapman, G.M. Diana, S.P. Hiss, J.M. Volo, R. Weil, L.H. Underwood, inventors, Eastman Kodak Co, assignee, Digital radiographic image quality control workstation operable in manual or passthrough modes. United States patent US 5,270,530, 1993 Dec 14. 16. B.S. Manian, inventor, Lumisys Inc, assignee, Radiographic image quality assessment utilizing a stepped calibration target. United States patent US 5,565,678, 1996 Oct 15. 17. Clark, J.L., Wadhwani, C.P., Abramovitch, K., Rice, D.D., Kattadiyil, M.T., Effect of image sharpening on radiographic image quality. J. Prosthet. Dent., 120, 6, 927–33, 2018 Dec 1. 18. Rossmann, K.T., The spatial frequency spectrum: A means for studying the quality of radiographic imaging systems. Radiology, 90, 1, 1–3, 1968 Jan. 19. Merritt, C.R., Tutton, R.H., Bell, K.A., Bluth, E.I., Kalmar, J.A., Matthews, C.C., Miller, K.D., Kogutt, M.S., Balter, S., Clinical application of digital radiography: computed radiographic imaging. RadioGraphics, 5, 3, 397–414, 1985 May. 20. Colbeth, R.E., Allen, M.J., Day, D.J., Gilblom, D.L., Harris, R.A., Job, I.D., Klausmeier-Brown, M.E., Pavkovich, J.M., Seppi, E.J., Shapiro, E.G., Wright, M.D., Yu, M.J., Flat-panel imaging system for fluoroscopy applications, in: Proc. SPIE 3336, Medical Imaging 1998: Physics of Medical Imaging, vol. 3336, pp. 376–387, 1998 Jul 24. 21. Strother, C.M., Sackett, J.F., Crummy, A.B., Lilleas, F.G., Zwiebel, W.J., Turnipseed, W.D., Javid, M., Mistretta, C.A., Kruger, R.A., Ergun, D.L., Shaw, C.G., Clinical applications of computerized fluoroscopy: The extracranial carotid arteries. Radiology, 136, 3, 781–3, 1980 Sep. 22. Haarbo, J., Gotfredsen, A., Hassager, C., Christiansen, C.J., Validation of body composition by dual energy X-ray absorptiometry (DEXA). Clin. Physiol., 11, 4, 331–41, 1991 Jul. 23. Salvatoni, A., Brambilla, P., Deiana, M., Nespoli, L., Application of DEXA in body composition assessment in children. Ann. Diagn. Paediatr. Pathol., 2, 1, 49–51, 1998 Mar 1. 24. Hounsfield, G.N., Picture quality of computed tomography. Am. J. Roentgenol., 127, 1, 3–9, 1976 Jul 1. 25. Nikolaou, K., Thieme, S., Sommer, W., Johnson, T., Reiser, M.F., Diagnosing pulmonary embolism: New computed tomography applications. J. Thorac. Imaging, 25, 2, 151–60, 2010 May 1.

238 Machine Vision Inspection Systems 26. https://www.iaea.org/topics/industrial-radiography. 27. https://www.aerb.gov.in/english/regulatory-facilities/radiation-facilities/ application-in-industry/industrial-radiography. 28. https://en.wikipedia.org/wiki/Thermography. 29. Breast Cancer Screening: Thermogram No Substitute for Mammogram, fda. gov, US Food and Drug Administration, 27 October 2017, Archived from the original on 23 June 2018, Retrieved 23 June 2018. 30. FLIR infrared cameras help detect the spreading of swine flu and other viral diseases, applegate.co.uk, The British Institute of Non-Destructive Testing, UK, 29 April 2009, Archived from the original on 29 February 2012, Retrieved 18 June 2013. 31. Kanimozhi, P., Sathiya, S., Balasubramanian, M., Sivaraj. P., Thermal Image Processing-An Eagle Eye Analysis. International Journal of Research in Advent Technology (IJRAT). Special Issue International Conference “INTELINC 18”, 12th & 13th October 2018 272-277. Available online at www.ijrat.org. 32. Hildebrandt, C., Raschner, C., Ammer, K., An overview of recent application of medical infrared thermography in sports medicine in Austria. Sensors, 10, 5, 4700–15, 2010 May. 33. Graciani, G. and Amblard, F., Super-resolution provided by the arbitrarily strong superlinearity of the blackbody radiation. Nat. Commun., 10, 1, 5761, December 2019, Bibcode:2019NatCo.10.5761G. PMID 31848354. 34. Costello, J.T., McInerney, C.D., Bleakley, C.M., Selfe, J., Donnelly, A.E., The use of thermal imaging in assessing skin temperature following cryotherapy: A review (PDF). J. Therm. Biol., 37, 2, 103–110, 2012-02-01. 35. Bach, A.J., Stewart, I.B., Minett, G.M., Costello, J.T., Does the technique employed for skin temperature assessment alter outcomes? A systematic review (PDF). Physiol. Meas., 36, 9, R27–51, September 2015, Bibcode:2015PhyM…36R.27B. PMID 26261099. 36. Costello, J.T., McInerney, C.D., Bleakley, C.M., Selfe, J., Donnelly, A.E., The use of thermal imaging in assessing skin temperature following cryotherapy: A review (PDF). J. Therm. Biol., 37, 2, 103–110, 2012-02-01. 37. Using Thermography to Find a Class of Latent Construction Defects Globalspec.com. Retrieved on 2013-06-18. https://psychology.wikia.org/ wiki/Thermography 38. Kylili, A., Fokaides, P.A., Christou, P., Kalogirou, S.A., Infrared thermography (IRT) applications for building diagnostics: A review. Appl. Energy, 134, 531–549, 2014. 39. Soroko, M. and Morel, M.C., Equine thermography in practice, CABI, Wallingford–Boston, 2016. 40. Sansivero, F., Vilardo, G., De Martino, P., Augusti V., Chiodini, G., Campi Flegrei volcanic surveillance by thermal IR continuous monitoring. 11th International Conference on Quantitative InfraRed Thermography. Istituto Nazionale di Geofisica e Vulcanologia - Osservatorio Vesuviano, Via Diocleziano 238 - 80124 Napoli, Italy, [email protected].

Applications to Radiography and Thermography 239 41. Infrared Building Inspections—Resources for Electrical, Mechanical, Residential and Commercial Infrared/Thermal Inspections, Infraredbuildinginspections.com (2008-09-04), Retrieved on 2013-06-18. http:// daytonthermalinspection.com/contact-us/ 42. Yanmaz, L.E., Okumus, Z., Dogan, E., Instrumentation of thermography and its applications in horses. J. Anim. Vet. Adv., 6, 7, 858–62, 2007 Jul 1. 43. Soroko, M., Howell, K., Zielińska, P., Application of thermography in racehorse performance, in: 13th Quantitative Infrared Thermography Conference, pp. 765–769, 2016. 44. Soroko, M. and Howell, K., Infrared thermography: Current applications in equine medicine. J. Equine Vet. Sci., 60, 90–6, 2018 Jan 1. 45. Turner, T.A., Diagnostic thermography. Vet. Clin. North Am.: Equine Pract., 17, 1, 95–114, 2001 Apr 1. 46. Eddy, A.L., Van Hoogmoed, L.M., Snyder, J.R., The role of thermography in the management of equine lameness. Vet. J., 162, 3, 172–81, 2001 Nov 1. 47. Roemer, J., Pieczonka, L., Szwedo, M., Uhl, T., Staszewski, W.J., Thermography of Metallic and Composite Structures—Review of applications. e-J. Nondestruct. Test., Test. International Workshop on SMART MATERIALS, STRUCTURES & SHM NDT in Canada 2013 Conference & NDT for the Energy Industry, Calgary, Alberta, CANADA, 2013 Oct 7. 48. Starman, S. and Matz, V., Automated system for crack detection using infrared thermographic testing. 4th International CANDU In-service Inspection Workshop and NDT in Canada, Toronto, Ontario, 2012 June 18-21. 49. Wu, C.L., Yu, K.L., Chuang, H.Y., Huang, M.H., Chen, T.W., Chen, C.H., The application of infrared thermography in the assessment of patients with coccygodynia before and after manual therapy combined with diathermy. J. Manipulative Physiol. Ther., 32, 4, 287–93, 2009 May 1. 50. Lewis, R.W. and Harrison, R.M., Contact scrotal thermography: Application to problems of infertility. J. Urol., 122, 1, 40–2, 1979 Jul. 51. Maierhofer, C., Arndt, R., Röllig, M., Rieck, C., Walther, A., Scheel, H., Hillemeier, B., Application of impulse-thermography for non-destructive assessment of concrete structures. Cem. Concr. Compos., 28, 4, 393–401, 2006 Apr 1.

12 Prediction and Classification of Breast Cancer Using Discriminative Learning Models and Techniques M. Pavithra, R. Rajmohan*, T. Ananth Kumar and R. Ramya Department of CSE, IFET College of Engineering, Villupuram, Tamil Nadu, India

Abstract

Mammography is a specialized clinical two imaging that makes use of two lowdose X-ray two gadgets to observe the typology of breast cancer. A mammogram is a mammography examination document that helps in the detection and analysis of breast illnesses in girls at an early stage. This challenge proposes to classify mammography breast scans into their respective training and makes use of interest mastering to localize the unique pixels of malignancy. The use of overlay convolutional neural networks allows characteristic extraction from the mammography scans which is thereafter fed into a recurrent neural community. Mammography pictures are equalized, more advantageous, and augmented earlier than extracting the elements and assigning weights to them as a section of the information preprocessing procedures. This process would in actuality assist in tumor localization in case of breast cancers. In this work, the breast cancer can be detected using low-level preprocessing techniques and Image segmentation. In Image segmentation, the thresholding technique and RCNN algorithm are compared using Binarization. Keywords: Breast cancer, attention learning, encoder–decoder, convolutional neural networks, region CNN

This chapter deals with the prediction and classification of breast cancer using discriminative deep learning algorithms.

*Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (241–262) © 2021 Scrivener Publishing LLC

241

242 Machine Vision Inspection Systems

12.1 Breast Cancer Diagnosis Breast Cancer disease is one of the significant reasons for death among humans. The efficient technique for early recognition and screening of breast diseases is ultrasound mammography. Ultrasound mammography is usually utilized in diagnostic practice for symptomatic and screening purposes. Perusing mammography is a requesting work for radiologists, and can’t give steady outcomes now and then. Consequently, a few PC supported determination (CAD) plans [1] have been created to improve the recognition of essential marks of this illness. The sickness is treatable whenever identified early enough. Essential anticipation appears to be incomprehensible since the reasons for this malady are as yet staying obscure. The advancement of breast carcinoma has been related to a few all-around perceived epidemiological hazard factors, for example, early menarche and late menopause, family ancestry, dietary, ecological factor, and hereditary variables. The cells with comparative capacity develop next to each other to frame a typical tissue, for example, cerebrum tissue or muscle tissue or bone tissue. As these typical cells multiply, they start to jam and chance upon one another and a wonder that scientists call cell acknowledgment happens and a message is sent back to the individual cells in the tissue to quit multiplying. Malignant growth cells don’t perceive this marvel, and they proceed to develop and increase and cause the tissue to venture into a bigger mass called a tumor. Little bunches of smaller scale calcifications showing up as an assortment of white spots on mammograms identifies an early admonition of breast malignancy. Smaller scale groupings are smidgens of calcium that may appear in bunches or designs (like circles) and are related to additional cell movement in breast tissue. Typically, the additional phone development isn’t dangerous, however now and again close bunches of small scale calcifications can demonstrate early breast malignant growth. Dissipated small scale calcifications are typically an indication of benevolent breast tissue. Computerized pictures of mammography are shown on a PC screen and can be improved that is either helped or obscured, before they are imprinted on film. Picture handling procedures are broadly utilized in a few clinical zones for picture improvement in earlier disclosure and treatment stages, where the time factor is basic to discover the abnormality issues in target pictures, especially in various harmful development tumors, for instance, breast sickness, lung ailment, etc.

Breast Cancer Detection: Models & Techniques 243 The main scope of this work is to analyze various approaches to predict breast cancer and improve the classification accuracy using the region convolutional neural network (RCNN). This challenge proposes to classify mammography breast scans into their respective training and makes use of interest mastering to localize the unique pixels of malignancy the use of a warmth map overlay. The interest in studying mannequin is a widespread encoder–decoder circuit whereby convolutional neural networks operate the encoding and region convolutional neural networks function the decoding. Convolutional neural networks allow characteristic extraction from the mammography scans which is thereafter fed into a recurrent neural community that focuses on the location of malignancy based totally on the weights assigned to the extracted points over a collection of iterations at some point of which the weights are always adjusted owing to the remarks acquired from the preceding new release or epoch [2]. Mammography pictures are equalized, more advantageous, and augmented earlier than extracting the elements and assigning weights to them as a section of the information preprocessing procedures. This process would in actuality assist in tumor localization in case of breast cancers.

12.2 Breast Cancer Feature Extraction Breast cancer is the most continuous disease in humans around the world. The sickness is reparable whenever identified early enough. Essential avoidance appears to be inconceivable since the reasons for this malady are as yet staying obscure. The principal target of this work is to anticipate the breast malignancy and improve the characterization precision of the breast disease utilizing discriminative profound learning model. The different studies on the present frameworks are talked about in this writing review. Convolutional Neural systems (CNNs) have been effectively utilized for histology picture examination. The characterization of breast disease histology pictures into typical, generous, and threatening sub-classes is identified with cells’ thickness, inconstancy, and association alongside in general tissue structure and morphology. Given this, both littler and bigger size patches from histology pictures are removed, including celllevel and tissue-level highlights huge info patches were utilized to prepare the stacked CNNs to learn both cell data and worldwide tissue structures [3]. This strategy improved the exactness and grouping of breast malignant growth expectations however neglected to work successfully on pictures with less measure of difference.

244 Machine Vision Inspection Systems The no-inspecting direct examining strategy [4] is applied to microwave imaging of carcinogenic tissues in practical models of female breast got from attractive reverberation imaging. The nSLSM has been applied to the location of tumors in two unique apparitions by utilizing reproduced uproarious information and a lot of blunder parameters ready to assess the nature of the nSLSM breast malignant growth recognition has been proposed and utilized. It gives methodology to malignant growth identification can be a device to improve the explicitness of other basic imaging procedures, primarily for screening purposes. The primary favorable position of this strategy is its unimportant invasivity and the diminished expenses of devices for microwave signals age. Be that as it may, this usage procedure requires computationally costly and appropriate in a statement. The LBP highlights for breast malignant growth identification methodology [5] proposes a strategy for breast disease location at a beginning period, utilizing LBP highlights. The foundation picture and breast picture is isolated utilizing the Otsu strategy. The Local double examples esteem for breast picture are determined, in earlier revelation and treatment stages, where the time factor is basic to discover the inconsistency issues in target pictures, especially in various harmful development tumors, for instance, breast sickness, lung ailment, etc. Support vector machine (SVM) in light of uniform LBP histogram is utilized to order, regardless of whether the relating district is typical or malignant did on mammographic pictures acquired from DDSM and MIAS database. Their advantage is it achieves high discriminative power and computational simplicity. The major drawback is their structural information capturing limited capacity. Moreover, for detection, only pixel difference is used and magnitude information is ignored. A tale approach for breast cancer identification and division in a mammogram [6] is prescribed to recognize the malignant territory, and afterward fragment the particular zone. In a discovery stage, by applying averaging channel and thresholding activity to a unique info picture, a dangerous area is recognized. With the assistance of a rectangular window over the recognized destructive area, max-mean and least change strategies are applied to locate the dangerous tissues. At last, by applying morphological shutting activity, tumor fix is distinguished and the locale limit is recognized utilizing picture inclination method. The upsides of this technique incorporate screening regularly takes into consideration the recognition of malignant growths at a beginning period of improvement. Be that as it may, they linger behind in light of the planning downsides, for example, holding up period and tension increments when extra assessments are required. Likewise, a choice tree as a non-parametric AI strategy has been utilized.

Breast Cancer Detection: Models & Techniques 245

12.3 Machine Learning in Breast Cancer Classification Machine learning is adopted in breast cancer classification to rise out breast cancer scans classification for the analysis of malignancy-related to every scan. Classification primarily based on more than one affiliation rule (CMAR): This technique proposed a neural community classifier. Nodes in the enter layer are consultants of one attribute from every rule. The variety of entering nodes had been equal to the variety of characteristics, the numbers of hidden nodes have been equal to the variety of regulations and range of output nodes had been equal to the wide variety of mammography classes. Backpropagation is used for studying the community mannequin with a 10-fold pass validation and sigmoid activation function. The sensitivity and specificity are calculated to plot the ROC curve to measure the overall performance of the classifier. Classifiers primarily based on a couple of affiliation rules with neural community yields an accuracy of 84.5% albeit in a smaller dataset with massive bias in the education cases accessible below every type [7]. Two-dimensional discrete wavelet seriously change classifier: Feature vectors are generated with the aid of gray degree co-occurrence matrix to all certain co-efficient from 2D-discrete wavelet transformation of the area of malignancy. Derivations of applicable points are performed via the f-test and t-test of random sampling. The area below receiver running characteristic (ROC) curve is better. Accuracy is abnormally excessive due to the absence of measures to keep away from overfitting such as a dropout. Assignment of weights to the facets representing the vicinity of malignancy or activity and adjustment of these weights from the attained comments is absent thereby denying the method of interest [8]. Adaboost primarily based more than one guide vector machines for recursive characteristic removal (SVM-RFE) for mammogram classification: it is a wrapper variant function resolution procedure. Ranking of points is executed through the SVM-RFE via calculating statistics reap throughout iterative backward function elimination. In every generation the SVM-RFE kinds facets in working set in order of distinctions between the goal features and eliminates the characteristic with minimal distinction. Ensemble technique is used to mix the SVM-RFE with the boosting method to lift out replication of unique dataset with the aid of random resampling to attain a greater improvement of this ensemble every replicate is one-of-a-kind from one every other to obtain maximal classification accuracy. Ensemble technique of integrating more than one SVM-RFE with AdaBoost performs splendidly on the classification paradigm however an

246 Machine Vision Inspection Systems easier visualization of the place of hobby on the scans in an independent mammography dataset is lacking [9]. Classification of ordinary and peculiar patterns for analysis of breast most cancers in digital mammograms in the DDSM dataset performs function extraction the usage of a gray stage co-occurrence matrix (GLCM). This is accompanied by presenting the GLCM as an entrance to the neural community to instruct the classifier and take a look at its overall performance on the check information allotted from the DDSM dataset. It produced a classification output between one of the two lessons (cancer superb and normalcy). The classification document had a giant wide variety of real positives however an even large range of false negatives which indicated that the classifier had a lot of misclassifications of most cancers nice scans as regular ones. Thereby the classifier outcomes have been no longer dependable [10]. Breast imaging file records structures (BIRADS) classification in mammography: This system describes aspects in the mammography scans such as mass, shape, densities, architectural distortions region of lesions to file the breast abnormalities such as fatty breasts, fibro-glandular breasts, heterogeneously; homogeneously dense breasts, etc. It additionally consists of the lexicon; descriptive representations of the anomalies as nicely as recommendations; annotations based totally on unique mammographic cases. Although the function resolution and facts availability are variant as properly as exhaustive; it does now not provide mundane visualization which would be greater complete to the sufferers except plenty of intervention from the radiologists [11].

12.4 Image Techniques in Breast Cancer Detection The most effective technique to defeat the challenges with mammography is the utilization of man-made brainpower. Artificial intelligence-based computerized picture preparation can help with the right translation of the pictures. To expand radiologists’ symptomatic presentation, a few PC helped determination (CAD) plans have been created to improve the discovery of essential marks of maladies: masses and smaller scale calcifications. Computerized mammography is utilized for this reason which takes an electronic picture of the Breast and stores it straightforwardly on a PC. It has conquered the constraints and has some potential points of interest over film mammography like: 1) The dynamic range is wider and noise is low, 2) Advanced image contrast, 3) Extended image quality and 4) Lower Ultrasound dose.

Breast Cancer Detection: Models & Techniques 247 A robotized 3D division strategy can be acknowledged for trans-rectal ultrasound breast pictures, which depends on multi-map book enrollment and measurable surface earlier. The map book database incorporates enlisted breast pictures from past patients and their fragmented breast surfaces. Three symmetrical Gabor channel banks are utilized to extricate surface highlights from each picture in the database. Understanding explicit Gabor highlights from the chart book database are utilized to prepare and afterward to section Using Super Pixel Segmentation the breast picture from another patient. Advanced pictures of mammography are shown on a PC screen and can be upgraded that is either helped or obscured, before they are imprinted on film. Picture handling strategies are extensively used in a couple of clinical regions for picture improvement in earlier revelation and treatment stages, where the time factor is basic to discover the variety from the standard issues in target pictures, especially in various harm tumors, for instance, Breast infection, lung threat, etc. 1) Method Of Detection There are a few imaging methods for assessment of the Breast, including attractive reverberation imaging, ultrasound imaging, and Ultrasound imaging [12]. • Mammography/Thermography—Mammograms can distinguish many Breast malignancies, yet there is worry over bogus outcomes and the risks of radiation introduction that outcome from the tests. There are two new types of mammography: Computed Tomography Laser Mammography (CTLM) and Full Field Digital Mammography (FFDM). The CTLM—(Computed Tomography Laser Mammography) framework utilizes cutting edge laser innovation, an exceptional exhibit of locators and exclusive registered calculations. The CTLM framework doesn’t open the patient to ionizing radiation or require breast pressure. This methodology is anticipating FDA endorsement. Computerized mammography despite everything utilizes low vitality ultrasounds that go through the breast precisely like customary mammograms however are recorded by methods for an electronic advanced identifier rather than the film. This electronic picture can be shown on a video screen like a TV or imprinted onto film. The radiologist can control the computerized mammogram electronically to amplify a zone, change differentiate, or modify the splendor.

248 Machine Vision Inspection Systems • Ultrasound or sonogram can be utilized to decide if a bump is a pimple (containing liquid) or a strong mass and to decisively find the situation of a known tumor. a) Other Imaging Methods: Various other imaging techniques are currently accessible for distinguishing breast malignancy. At present, they are utilized predominantly to look into studies, and now and then to get more data about a tumor found by another technique. Every one of these new techniques creates an electronic picture that the specialist can break down for the nearness of an unusual breast protuberance. These include: b) Scintigraphy: Also called scintimammography, this test utilizes an uncommon camera to show where a tracer (a radioactive concoction) has clung to a tumor. A scanner is then used to check whether the breast bump has gotten a greater amount of the radioactive material than the remainder of the breast tissue. Dr. Fleming in Omaha has been utilizing this methodology. There are likewise clinical preliminaries for this methodology. c) MRI: A magnetic resonance imaging (MRI) machine utilizes an enormous magnet and radio waves to quantify the electromagnetic signs your body normally radiates. It makes exact pictures of within the body, including tissue and liquids. X-ray can likewise be utilized to check whether a silicone breast embed has spilled or burst. d) PET scan: Disease cells become quicker than different cells, so they go through vitality quicker, as well. To gauge how quick glucose (the body’s fuel) is being utilized, a tracer (radioactive glucose) is infused into the body and examined with a Positron Outflow Tomography (PET) machine. The PET machine distinguishes how quickly the glucose is being utilized. On the off chance that it is being spent quicker in specific spots, it might show the nearness of a harmful tumor.

12.5 Dip-Based Breast Cancer Classification To take care of the breast cancer arrangement issue, computerized picture preparation is actualized for location and the accompanying advances are trailed [13]. The dataset has numerous highlights extricated from a wide

Breast Cancer Detection: Models & Techniques 249 assortment of malignant growth pictures. These highlights were removed utilizing sandboxes apparatuses. Each element speaks to a particular trademark or a conduct of a breast cancer growth. Figure 12.1 shows the breast cancer detection framework the special camera is called ultrasound is used for taking images. An image is made up of a finite no of elements called pixels, each of which has a particular location and values. The image cropping is the process of removing selected pixels from a digital image. It is done to make all the images are of equal sizes. Cropping is the expulsion of undesirable external regions from a photographic or delineated picture. The procedure, for the most part, comprises of the evacuation of a portion of the fringe territories of a picture to expel unessential junk from the image, to improve its surrounding, to change the viewpoint proportion, or to emphasize or disengage the topic from its experience. Division segments a picture into unmistakable areas containing every pixel with comparative characteristics. 1) Image Enhancement Upgrade is the way toward controlling a picture with the goal that the outcome is more appropriate than the first for a particular application. Power Transformations are mainly utilized in picture improvement. Picture upgrade is the improvement of advanced picture quality (needed for example for visual examination or machine investigation), without information about the wellspring of debasement [14]. Picture upgrade should be possible in two areas a. Spatial Domain b. Frequency Domain. Spatial space alludes to the picture handling strategies in this classification depend on the direct control of pixels in the picture. The pixel regards are controlled to achieve needed overhaul. In repeat space methods, the INPUT IMAGE

OUTPUT

IMAGE CROPPING

CLASSIFICATION

Figure 12.1 The breast cancer detection framework.

FEATURE EXTRACTION

SEGMENTATION

250 Machine Vision Inspection Systems image is first moved in to repeat territory. It proposes that the Fourier Transform of the picture is enlisted first. All the update practices are performed on the Fourier distinction in the picture and sometime later the Inverse Fourier change is performed to get the resultant picture. These redesign exercises are acted to change the image brightness, separate, or the allotment of the diminishing levels. As a result, the pixel regard (powers) of the yield picture will be adjusted by the change work applied to the data regarding. Picture upgrade in the spatial area is finished utilizing two techniques • Histogram equalization • Image sharpening. Histogram Equalization is a basic technique used and can be done using inbuilt functions of MATLAB Image Processing Toolbox. On the other hand, Image sharpening requires designing a suitable mask for the application. a) Histogram Equalization Histogram leveling is a typical strategy for improving the presence of pictures. Assume we have a prevalently dim picture. At that point, its histogram would be slanted towards the lower end of the dim scale and all the picture detail is compacted into the dull finish of the histogram. If we could ‘loosen up’ the dim levels at the dull end to deliver an all the more consistently circulated histogram, then the picture would turn out to be much clearer. The strategy is useful in pictures with establishments and frontal zones that are both splendid or both dull. In particular, the system can incite better points of view on bone structure in ultrasound pictures, and to all the almost certain detail in photographs that are done or under-revealed. A key great situation of the technique is that it is a really clear strategy and an invertible manager. Soon a basic level, if the histogram leveling limit is known, then the primary histogram can be recovered. The estimation isn’t computationally thought. An obstacle to the system is that it is careless. It may grow the distinction of establishment clatter, while reducing the usable sign. b) Image Sharpening Picture honing falls into a class of picture preparing called spatial separating. Spatial channel comprises of an area and

Breast Cancer Detection: Models & Techniques 251 a pre-characterized activity performed on the picture pixels characterizing the area. The consequence of separating another pixel with facilitated of the areas place and the worth characterized by the activity. On the off chance that the activity is straight, the channel is supposed to be a direct spatial channel. The key goal of honing is to feature changes in force. Employments of picture honing change and incorporate applications going from electronic printing and clinical imaging to modern review and self-governing direction in military frameworks. Since averaging is analogs to reconciliation, it is intelligent to reason that honing can be practiced by spatial separation. On a very basic level, the quality of the reaction of a subsidiary administrator is corresponding to the level of power intermittence of the picture at where the administrator is applied. 1. Read the image using imread. 2. Covert the image to a grayscale level. 3. Convert the image into a double which enables the image for mathematical operations. 4. Apply the 3 × 3 laplacian mask on every pixel of the image to obtain the sharpened image. 5. Display the sharpened image. The results are shown for the same image as used in histogram equalization and results are compared. As shown in Figure 12.2, the original image has smoothened boundaries and after using a sharpening mask, the result has sharp boundaries. Having sharp boundaries is important especially for this application of cancer detection. Sharpening shows better results than histogram equalization. Histogram equalization smoothened the edges which is highly not suitable for the detection of cancerous tumors. In order to detect the cancer tissue accurately sharpening is very much needed and thus sharpened image is taken to the next level of processing. 1) Image Segmentation Picture division is an essential system for most picture assessment following endeavors. Division allocates pictures into specific areas containing each pixel with similar qualities. To be significant and accommodating for picture examination and comprehension, the districts should unequivocally relate to outlined articles or features of premium. Huge division is the underlying

252 Machine Vision Inspection Systems Input Image

Double

Gray

shapening

Figure 12.2 Image sharpening.

advance from low-level picture getting ready changing a diminish scale or concealing picture into at any rate one distinct picture to raised level picture depiction to the extent features, articles, and scenes. The accomplishment of picture assessment depends after the relentless nature of the division, anyway an accurate distributing an image is normally an especially testing issue [14]. Division methodology is either important or non-coherent. The last neglect to survey spatial associations between features in an image and assembling pixels dependent on some overall quality, for instance, dull level or concealing. Applicable methodologies besides misuse these associations, for instance, bundle together pixels with equivalent diminish levels and close spatial zones. The image that is obtained after image sharpening as shown in Figure 12.3 is used for segmentation purposes.

Figure 12.3 Sharpened image.

Breast Cancer Detection: Models & Techniques 253 Non-Contextual Thresholding: Thresholding is one of the most mind- boggling resources for image segmentation. Thresholding is the most direct non-significant division technique. With a single cutoff, it changes a diminish scale or concealing picture into a twofold picture considered as a twofold zone map. The twofold guide contains two conceivably disjoint locales, one of them containing pixels with input data regards humbler than an edge and another relating to the information regards that are at or over the cutoff. The past and last zones are typically named with zero (0) and non-zero (1) names, exclusively. The division depends upon picture property being edge and on how the edge is picked. Generally, the non-significant thresholding may incorporate at any rate two cutoff points similarly as produce different sorts of areas to such a degree, that extents of data picture signals related to each region type are secluded with edges. Simple Thresholding: Thresholding is a non-straight activity that changes over a dark scale picture into a paired picture where the two levels are appointed to pixels that are beneath or over the predetermined edge esteem. The most well-known picture property to limit is pixel dim level. Global Thresholding: The worldwide edge is a steady applied everywhere throughout the picture, the procedure is called worldwide thresholding. Worldwide thresholding is broadly utilized in picture preparing to create paired pictures, which are utilized by different example acknowledgment frameworks. Ordinarily, numerous highlights that are available in the first dim level picture are lost in the subsequent twofold picture. The fundamental issues are whether it is conceivable and, if true, how to pick a sufficient limit or various edges to isolate at least one wanted articles from their experience. The algorithm for image segmentation using global thresholding is as follows: 1. Read the image using imread function. 2. Covert the image to a grayscale level. 3. Convert the image into a double which enables the image for mathematical operations. 4. Compare each pixel value with the global threshold value 0.6 (say) and assign the pixels with intensity less than 0.6 as 0 and greater than 0.6 as 1. 5. The output image has two intensity values 0 and 1 resulting in the segmented image. 6. Display the output image that is a segmented image.

254 Machine Vision Inspection Systems 2) Data Analysis The dataset is from the bio imaging 2015 breast histology grouping challenge, made out of high-goals (2048_1536 pixels) and H&E recolored breast malignancy histology pictures. The pictures were digitized with amplification of 200× and pixel size of 0:42_m_0:42_m. Two pathologists marked pictures as typical, kind, in situ carcinoma or obtrusive carcinoma as indicated by the prevalent malignant growth type in each picture, without determining the territory of intrigue. This dataset made out of a preparation set of 249 pictures, an underlying test set of 20 pictures, and an all-encompassing test set of 16 pictures with expanded uncertainty is openly accessible at https://rdm.inesctec.pt/ dataset/nis-2017-003. The input images are the sharpened image and the output image is the threshold two intensity level images. Figure 12.4 shows the after global thresholding results.

original

double

Gray

segmented

Figure 12.4 Result of image global thresholding.

Breast Cancer Detection: Models & Techniques 255

12.6 RCNNs in Breast Cancer Prediction The current learning based identification frameworks utilizing convolutional neural systems (CNNs) have been effectively pushed off for histology pictures examination [15]. The grouping of breast disease histology pictures into ordinary, generous, and dangerous sub-classes is identified with cells’ thickness, inconstancy, and association alongside in general morphology. In light of this, we extricate the littler and larger size patches from histology pictures; including both cell and tissue level highlights, individually. Notwithstanding, some, examined cell-level fixes that don’t contain enough data that coordinates the picture tag. The patches’ screening strategy dependent on the grouping calculation and CNN to choose increasingly discriminative patches. The methodology proposed in this paper is applied to the 4-class characterization of breast malignant growth histology pictures and accomplishes precision on the underlying test set and exactness on the general test set. The outcomes are serious contrasted with the aftereffects of other cutting edge strategies. The conclusion of breast malignant growth histology pictures with hematoxylin and eosin recolored is non-inconsequential, work concentrated, and frequently prompts a difference between pathologists. PC helped determination frameworks add to assist pathologists with improving demonstrative consistency and proficiency. Though enormous advancements in CNN, there exists notable disadvantages that decrease the efficiency of the system: 1) Human-based systems are involved in the breast Cancer Detection process. 2) This type of system is having reliably issues with man-made error chances. 3) Cannot operate on images with less amount of contrast. 4) A limited number of literature focuses on the detection performance. RCNN is a cutting edge visual item identification framework that consolidates base up locale recommendations with rich highlights figured by a convolutional neural system. It comprises of two stages first utilizing a particular hunt it recognizes a sensible number of bounding box object locale competitors and afterward it separates CNN highlights. The advantages of RCNN over other analytical techniques include 1) Reduction in a significant amount of workload and time for ophthalmologists, 2) RCNN is able to generalize since they are trained by example and 3) Early detection of Cancer will reduce the complication. The Region Convolutional Neural Systems (RCNNs) have been effectively utilized for histology pictures investigation. The order of breast disease histology pictures into ordinary, considerate, and dangerous

256 Machine Vision Inspection Systems sub-classes is identified with cells’ thickness, inconstancy, and association alongside in general tissue structure and morphology. 1) RCNN Segmentation The term RCNN alludes to an edge that separates zones depleted by various stream frameworks. A catchment bowl in Figure 12.5 is utilized for geological zone depleting into a waterway or repository. Understanding the RCNN change necessitates that us to think about a picture as a surface. For example, consider the image below: 2) RCNN Algorithm The RCNN transform finds “catchment basins” and “RCNN ridge lines” in an image by treating it as a surface where light pixels are high and dark pixels are low. Segmentation using the RCNN transforms works well if you can identify, or “mark,” foreground objects and background locations. Marker-controlled RCNN segmentation follows this basic procedure: 1. Segmentation function computation. Segment the dark image of an object. 2. Foreground markers computation. Connected blobs of pixels within each of the objects. 3. Background markers Computation. The pixels that are not part of the object. 4. In order to minimize the foreground and background marker location modify the segmentation function. 5. Compute the RCNN transform of the modified segmentation function.

Figure 12.5 Synthetically generated image of two dark blobs.

Breast Cancer Detection: Models & Techniques 257 Marker-controlled RCNN approach has two types: External associated with the background and Internal associated with the objects of interest. The issue the RCNN framework attempts to tackle it is to find protests in a picture (object detection). To start with a sliding window approach. When utilizing this technique, you simply go over the entire picture with various measured square shapes and take a gander at those littler pictures in a beast power strategy. Item location is the way toward finding and ordering objects in a picture. One profound learning approach, districts with convolutional neural systems (RCNN), joins rectangular area proposition with convolutional neural system highlights. RCNN is a two-phase discovery calculation. The main stage recognizes a subset of districts in a picture that may contain an article. The subsequent stage groups the article in every locale. The RCNN illustrated in Figure 12.6 identifier initially creates district proposition utilizing a calculation, for example, Control Boxes. The proposition segments are edited out of the appearance and resized. At that point, the CNN characterizes the reaped and resized districts. At long last, the locale proposition jumping boxes are modern by a help vector machine (SVM) that is talented utilizing CNN highlights. Utilize the train RCNN Object Detector social affair to prepare a RCNN object identifier. The capacity yields an RCNN Object Detector object that distinguishes questions in a picture. 3) Prediction Analysis To represent the region of malice in the mammography scan images we operate an overlay over the image representing the expected malignancy class. Thereafter the concrete mammography class is compared with the predicted mammography class and accurateness of our model is calculated in order to evaluate the effectiveness of the training and efficacy of the performance of the model on the test data.

Region proposal selective search

Figure 12.6 RCNN system.

Feature extraction with CNN

Support Vector Machine

Bounding box Regressor

258 Machine Vision Inspection Systems Here Figure 12.7 shows their status in which it can predict their breast cancer as normal or cancer. In this, they can insist the users avoid such breast cancer split data into train and test data and they prevent their breast cancer. Preparing comprises learning a connection among information and characteristics from a small amount of the preparation dataset, and testing comprises testing expectations of this connection on another piece of the dataset. By utilizing comparable information for preparing and testing, you can confine the effects of data irregularities and better grasp the characteristics of the model. Figures 12.8 and 12.9 show the result of RCNN algorithm has better results than the Otsu algorithm but optimum thresholding suits better for this application. The thresholding factor can be best decided depending on mammogram using optimum thresholding technique but whereas in Otsu and RCNN the threshold is calculated depending on overall image intensity consideration. There are lots of challenges regarding this detection process. The IDC means that the input image is affected by cancer the not affected image represents the normal image. Figure 12.10 represents the detection of breast cancer and Figure 12.11 represents breast cancer detection for normal image.

train_acc vs val_acc train_acc val_acc

0.98 0.96 0.94 0.92 0.90 0.88 0.86 0.84 0

5

Figure 12.7 Accuracy of training.

10

15

20

25

30

Breast Cancer Detection: Models & Techniques 259 a

Gray

shapening

watershed

Figure 12.8 RCNN result.

watershed

Figure 12.9 RCNN output image.

260 Machine Vision Inspection Systems

prediction = IDC

Figure 12.10 Breast cancer detection for cancer.

prediction = NORMAL

Figure 12.11 Breast cancer detection for normal.

12.7 Conclusion and Future Work Image Processing is an indispensable tool for processing digital images especially mammograms in detecting cancers. In image enhancement, from the results of histogram equalization and sharpening, sharpening had better results as per the application. In image segmentation, the algorithm shows over-segmentation characteristics and the RCNN algorithm has moderate segmentation. Feature extraction results show whether the mammogram is cancer-prone or cancer free. There are several other algorithms that are sophisticated and have enhanced features. This report presents the basic processing and can be used for initial diagnosis purpose. Technology development is exponentially increasing for the last few decades and developments in the Digital image processing field are also increasing and are

Breast Cancer Detection: Models & Techniques 261 marching towards motion detection, creating robots enabled with the capability of the human eye. Embedding these types of processes and applications can make the work of analyzing and detecting not only Breast but also other cancers can be made possible. Malignancy localization is an essential section of tumor diagnosis. The provision of unique interest to the pixels related to the tumor and easy visualization of the equation can assist us to infer from the mammography scans. An encoder–decoder structure is used for the equal whereby the CNN performs the performance of the encoder through extracting the malignancy points from the scans and the RCNN performs the performance of the decoder by assigning and adjusting the weights for the extracted facets based totally on the comments acquired over a wide variety of iterations. A heat map overlay may additionally be used to visualize the place of malignancy. The classifier has educated the use of the augmented coaching facts of mammography scans and is examined with the aid of predicting the lessons of the mammography photographs in the check information alongside with the software of interest mechanism on the snapshots in the take a look at information whose instructions have been anticipated via the classifier.

References 1. Mittal, N. and Sahni, P., Breast Cancer Detection Using Image Processing Techniques, Advances in Interdisciplinary Engineering, Lecture Notes in Mechanical Engineering, pp. 813–823, Springer, Singapore, 01 June 2019. 2. Gupta, B. and Singh, A.K., A novel approach for breast cancer detection and segmentation in a mammogram, Eleventh Int. Multi-Conference on Information Processing. Procedia Comput. Sci., 54, 676−682, 10.1016/j. procs.2015.06.079. 3. Li, Y., Classification of Breast Cancer Histology Images Using Multi-Size and Discriminative Patches Based on Deep Learning. IEEE Access, 7, 1–1, 2019. 10.1109/ACCESS.2019.2898044. 4. Bozza, G., Brignone, M., Pastorino, M., Application of the No-Sampling Linear Sampling Method to Breast Cancer Detection. IEEE Trans. Bio-Med. Eng., 57, 2525–34, 2010, 10.1109/TBME.2010.2055059. 5. Král, P. and Lene, L., LBP features for breast cancer detection. IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, pp. 2643– 2647, 2016. 6. Basha, S. and Prasad, K., Automatic detection of breast cancer mass in mammograms using morphological operators and fuzzy c-means clustering. J. Theor. Appl. Inf. Technol., 5, 704–709, 2009.

262 Machine Vision Inspection Systems 7. Zaiane, O.R., Antonie, M.-L., Coman, A., Mammography classification by an association rule-based classifier. Proceedings of the Third International Conference on Multimedia Data Mining, Springer-Verlag, 2002. 8. Bureau, S., Majhi, B., Dash, R., Mammogram classification using two-dimensional discrete wavelet transform and gray-level co-occurrence matrix for detection of breast cancer. Neurocomputing, 154, 1–14, 2015. 9. Yoon, S. and Kim, S., AdaBoost-based multiple SVM-RFE for classification of mammograms in DDSM. BMC Med. Inf. Decis. Making, 9, 1, S1, 2009. 10. Nithya, R. and Santhi, B., Classification of normal and abnormal patterns in digital mammograms for diagnosis of breast cancer. Int. J. Comput. Appl., 28, 6, 21–25, 2011. 11. Balleyguier, C. et al., BIRADS™ classification in mammography. Eur. J. Radiol., 61, 2, 192–194, 2007. 12. Jochelson, M., Advanced imaging techniques for the detection of breast cancer. Am. Soc. Clin. Oncol. Educ. Book, 10, 65–69, 2012. 13. Guzman-Cabrera, R., Guzmán-Sepúlveda, J.R., Torres-Cisneros, M., MayArrioja, D., Ruiz-Pinales, J., Ibarra-Manzano, O., Avina-Cervantes, J., Gonzalez-Parada, A., Digital Image Processing Technique for Breast Cancer Detection. Int. J. Thermophys., 34, 1519, 2013, 10.1007/s10765-012-1328-4. 14. Nixon, M. and Aguado, A., Feature Extraction and Image Processing for Computer Vision, 4th Edition, Academic Press, 18th November, New Delhi, 2019. 15. Rakhlin, A., Shvets, A., Iglovikov, V., Kalinin, A., Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis. International Conference Image Analysis and Recognition, Image Analysis and Recognition, pp. 737–744, 2018, 10.1007/978-3-319-93000-8_83. 16. Shahzad, A., Sharif, M., Raza, M., Hussain, K., Enhanced RCNN Image Processing Segmentation. J. Inf. Commun. Technol., 2, 1, 32−38, Spring 2008.

13 Compressed Medical Image Retrieval Using Data Mining and Optimized Recurrent Neural Network Techniques Vamsidhar Enireddy, Karthikeyan C.*, Rajesh Kumar T. and Ashok Bekkanti Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation (KLEF), KL University, Vaddeswaram, Guntur, India

Abstract

Content-based image retrieval (CBIR) techniques, in which the query is an image, and it retrieves the similar images as output. It uses the image features extracted from the images and these features are compressed with the images in the databases for the similarity measures. Daubechies wavelet is used for compressing the image so that the storage and transmission of the image takes less space and bandwidth. Edge features are extracted using Sobel edge detector, texture features using the Gabor filters. Feature selection process is used to select the best feature subset. The feature selection algorithm such as Information Gain (IG), Mutual Information (MI) and the proposed wrapper based Cuckoo Search (CS) technique is used for the feature selection. A proposed recurrent neural network is optimized using the hybrid cuckoo search is used for classification and it is compared with the existing techniques. The CS-PSO based FS, Proposed Activation- RNN-BPTT with CS-PSO optimization method improves classification accuracy by 4.43%, 2.12%, 3.27%, 1.13%, 4.87%, 3.85%, 1.13% and 0.70% when compared with MI-based FS, RNN BPTT with CS-PSO optimization, MI-based FS, Proposed Activation RNN-BPTT with CS-PSO optimization, CS-based FS,RNN-BPTT with CS-PSO optimization, CS-based FS, Proposed Activation-RNN-BPTT with CS-PSO optimization, CS-PSO-based FS, RNN-BPTT, CS-PSO-based FS,RNN-BPTT with CS optimization, CS-PSO-based FS,RNN-BPTT with CS-PSO optimization and CS-PSO-based FS, Proposed Activation RNN-BPTT with CS optimization methods. *Corresponding author: [email protected] Muthukumaran Malarvel, Soumya Ranjan Nayak, Prasant Kumar Pattnaik and Surya Narayan Panda (eds.) Machine Vision Inspection Systems, Volume 2: Machine Learning-Based Approaches, (263–286) © 2021 Scrivener Publishing LLC

263

264 Machine Vision Inspection Systems Keywords: Medical image, compression, data mining, neural network, image retrieval

13.1 Introduction Medical images/data is an emerging field with ascending divergent services/applications related to telehealth, biomedicine, and varied telemedical analyses. Voluminous data is embedded in medically produced images from various procedures like PET/CT, Bone Densitometry, MRI, Ultrasound and medical related scans. These produce images need more storage space and managing which is very difficult. These images demand high end networks for transmission as in Telemedicine. The proposed method successfully compresses medical images and also techniques to classify the compressed images which are useful in telemedicine. In order to encode the given original image with a few bits a data compression application is used to compress the image. Image compression reduces image redundancy and stores/transmits data efficiently. Such systems aim in reducing storage space and also display the decoded image in an output device related to original image. • Studies retrieval of compressed medical images from the huge databases. In CBIR image classification is an important criterion and hence a new classification algorithm is proposed. • Medical Images are compressed using Daubechies and Haar wavelets. Gabor filter for texture features and for extracting the edge features Sobel edge detector is used. Classification I done using the Partial Recurrent Neural Network (PRNN). • The optimization of the neural network’s learning rate is achieved using the hybrid model of Particle swarm optimization (PSO) and Cuckoo search algorithm (CS). The Chapter is organized as: Section 13.2 reviews research work, relevant and specific to CBIR, Medical image compression, classifiers in CBIR. Section 13.3 explains the hybrid cuckoo search framework for feature selection and classifier optimization in compressed medical image retrieval. Section 13.4 presents a detailed results and discussion. The conclusion is discussed in Section 13.5 along with the future scope of this work.

CBIR Using Data Mining & RNN Techniques 265

13.2 Related Work 13.2.1 Approaches in Content-Based Image Retrieval (CBIR) CBIR is based on (automated) feature query image matching with a database image through an image-image similarity evaluation. Hence, images are visual content based when being indexed. These are based on underlying features like color (an image’s color intensity distribution) in texture (presence of visual patterns with homogeneity not from one color/ intensity), shape (an image’s boundaries/objects interiors) or other visual feature or an elementary visual features combination [1]. Figure 13.1 shows that a CBIR system has four steps: • Image database feature extraction/indexing [36] are done according to selected features like shape, color, texture or all three together. • Query image feature extraction. • Matching query image to related images in a database based on image-image resemblance measure forms CBIR’s search. • User interface and feedback govern outcomes display, ranking, type of user-interaction with chances of filtering search by automatic and manual preference schemes.

Image set

Query

Feature Space

Image set

Image set

Figure 13.1 Generic CBIR system.

User Interface

User

Most similar images

Image set

Search Engine

Database

266 Machine Vision Inspection Systems It is possible to share visual information globally through computer networks and the World Wide Web (WWW) due to availability of huge multimedia databases, digital libraries and due to advancement in Information Technology. Large databases of medical, educational, industrial and scientific applications need efficient/automatic procedures to index/retrieve images from databases [2]. In Ref. [3] authors have presented CBIR technique to extract color and texture feature using color moment (CM) on color images for color and using local binary pattern (LBP) on grayscale image for texture feature and then these features of images were merged to form single feature vector. LBP was mainly applied for face recognition [34] with providing accurate, efficient and less complex retrieval system. The authors in Ref. [4] have proposed a Region-Based Image Retrieval method using a Shape Adaptive DCT (SA-DCT). Simulation results showed that the new method identifies main objects and decreases effect of background in the image, and thus improved image retrieval in comparison [35] with a conventional CBIR based on DCT. Li & Miao [5] suggested novel image matching approach describing the image-class and the process is structured into a set of hypothesis tests dealt with modeling positive and negative hypotheses and validating query image against the two hypotheses. Universal image model (UIM) was evaluated to model the two hypotheses and derived UIM was applied as reference for calculating the adapted models for each image in database. The results datasets indicated that the approach had enhanced robust and evident performance.

13.2.2 Medical Image Compression A new lossy pressure system with solitary worth decay (SVD) trailed by Huffman coding deteriorating the picture by SVD and decreasing the rank by overlooked some of particular qualities furthermore lines of holder and aligner frameworks. The system was assessed on a few therapeutic pictures with better results [6]. The digital image data compression’s general framework is illustrated in Figure 13.2 [7]: Sanchez & Bartrina-Rapesta [8] displayed upgrades to High Efficiency Video Coding (HEVC) intra coding process for lossless pressure of grayscale restorative pictures. Specifically, the option precise and planar forecast modes were tried and true on test insightful DPCM with expanded scope of directionalities. The creators recommended execution of DPCM

CBIR Using Data Mining & RNN Techniques 267

Image

Lossless Image Transformation

Lossy Quantization

Table Specification

Entropy Encoding

Compressed Image

Table Specification

Figure 13.2 Image compression framework.

disentangling process with keeping up piece savvy coding structure of HEVC and assessment results on different restorative pictures showed that the proposed DPCM modes had proficiently expected extensive measure of edges in these pictures achieved bit-rate reserve funds of up to 15%. A perceptual model was proposed by Ref. [9] in spatial area identified with foundation luminance converging with lossless pressure outline work to evacuate outwardly excess information with applying piece coordinating on anatomical symmetry in therapeutic pictures to uproot intra and bury cut relationships.

13.2.3 Image Retrieval for Compressed Medical Images Authors [10] have developed versatile picture recovery system with highdimensional elements for histopathology picture examination. The proposed strategy utilized administered bit hashing method which influences little data in figuring out how to gather picture highlight vector into just several parallel bits with instructive marks. Clustering method [32] using dictionary learning was suggested by Ref. [11] to group big medical databases into clusters by gathering similar images is represented by dictionaries and learning dictionaries together via K-SVD. A resemblance measure is used in comparing the input image to find similarity and cluster the image based on dictionary. The performance of proposed method was verified on Image Retrieval in Medical Applications (IRMA) test image database. In the experimental results, the retrieval of medical images in the proposed method is observed to be efficacy.

268 Machine Vision Inspection Systems

13.2.4 Feature Selection in CBIR Content-based medical Image recovery (CBMIR) framework with upgraded highlight determination strategy was recommended by [12] applying half and half approach of “branch and bound calculation” and “simulated honey bee province calculation” with bosom disease, Brain tumor and thyroid pictures performing characterization by Fuzzy based Relevance Vector Machine (FRVM) to create gatherings of significant picture highlights and pertinence criticism technique with various thickness calculation to expand execution of CBMIR. A novel CBIR method that uses the color and wavelet features is followed by Ant Colony Optimization (ACO). This proposed method uses wavelet transformation and RGB-HSV feature vector representation of texture and color features respectively for image in database [13]. Retrieving results were susceptible to image features applied in CBIR. It was compared with earlier proposed systems for performance of proposed CBIR schema evaluation and the results indicated higher precision and recall of proposed schema than older ones for majority of image categories. In Ref. [14] authors thought about five component determination routines, for example, Relief-F, Information Gain, Gain Ratio, OneR and the factual measure χ2 with fluffy harsh strategy. The primary motivation behind examination was gone for positioning picture highlights with better results. Results demonstrated that the recovery framework with fluffy harsh component choice would do well to recovery precision.

13.2.5 CBIR Using Neural Network Artificial Neural Network (ANN) related circulated preparing structural planning for Semantic CBIR enhancing it further with learning through criticism amid skimming comes about and proposed disseminated handling results into quick recovery of pictures. Proposed framework was approved over neighborhood (LAN) [15]. Optimized PCNN extract image signature from the image as numeric vector feature. Applying PCNN [37] parameters to improve signature quality and classification the results [16]. Further, in search space the number of images is matched and classified using K-Nearest Neighbor (K-NN) algorithm. Instead of multiple categories it was optimized into single category for matching and classification. To validate and show the efficient classification and retrieval of images a CBIR prototype method is developed.

CBIR Using Data Mining & RNN Techniques 269

13.2.6 Classification of CBIR A proposed work in progress for CBIR and Classification using interest point’s detector and descriptor named SURF combined with Bag-ofVisual-Words (BoVW). Merging yielded better good retrieval than other methods proposing new dictionary building method and testing on highly diverse COREL1000 database indicated more discriminative classification and retrieval result [17]. Unsupervised image grouping model finds semantic quantity of images with Probabilistic LSA. This Probabilistic LSA groups the semantic quantity with K-means algorithm [18]. Three versatile grouping calculations like segment based, model-based, and thickness based to isolate nearby shading and surface elements for picture arrangement and testing was did on openly accessible WANG database [19]. The outcomes demonstrated that versatile EM/GMM calculation would do well to execution in versatile k-implies and mean movement calculations. A new system is recommended to enhance execution of current sack of-word based picture order process with presenting a couple astute picture coordinating plan to choose discriminative components after element extraction connected as refinement venture for current picture grouping and recovery process. The outcomes demonstrated that share of discriminative components had upgraded essentially to 87% by connected determination [20]. In Ref. [21] authors organized component vector from extraction of multi-level elements utilizing two machine learning strategies. KNN and SVM routines for learning machine were performed to group pictures and Image CLEF med2005 were connected as database for order approaches and central part investigation was connected thrice to reduction length of the component vector. The outcomes demonstrated altogether improved exactness with those of comparative grouping approaches in light of the same database.

13.3 Methodology In this work, the images were shrunk using Daubechies wavelet with Huffman coding such that visually lossless compression is achieved. The shrunk images were divided and features are extracted using Gabor shape techniques and Gabor filters. Features are chosen using Mutual Information

270 Machine Vision Inspection Systems (MI) and the recommended wrapper based CS technique. Multilayer Perceptron and proposed RNN are used for classification. Figure 13.3 shows the proposed framework to retrieve medical images. The techniques used are detailed in subsequent subsections.

13.3.1 Huffman Coding This is a lossless data compression entropy encoding algorithm. The word is a reference to using a flexible-length code table to encode a source symbol, where a flexible-length code table was derived in a special way based on estimated occurrence probability. Huffman coding is based on a data item’s occurrence frequency [22]. Huffman algorithm’s ranges include static and adaptive. Static Huffman algorithm encodes data in two passes. Step 1: Given string

3 Z

8 K

25 M

33 C

32 C

25 M

Step 2: Sort data by frequencies

7 K

2 Z

Step 3: Find two smallest frequencies count.

3 Z

8 K

CBIR Using Data Mining & RNN Techniques 271 Step 4: Combine them with their sum and update data 9

8 K

3 Z

Step 5: Repeat step 2, 3, 4. The final Huffman tree is as follows: 69

33 C

36

25 M

11

8 K

3 Z

13.3.2 Haar Wavelet Using the Haar Transform images are converted from space to frequency domain [23]. The sum and differences are calculated from the adjacent elements of the given data. The mother wavelet function ψ(t) is given as in Equation (13.1):

 1 t ∈[0,1 / 2)  Ψ(t ) =  −1 t ∈[1 / 2,1)  0 t ∉[0,1) 

(13.1)

272 Machine Vision Inspection Systems and scaling function ø (t) is described as in Equation (13.2):

 1 t ∈[0,1) φ(t ) =   0 t ∉[0,1)

(13.2)

A signal is decayed into two parts normal (estimate) and detail (vacillation) by the Hilbert Transform. A signal having 2n test values, first normal sub signal a1 = (a1,a2,…aN/2) for a signal length of N is given as in mathematical statement (13.3):

an =

x 2n−1 + x 2n 2

, n = 1, 2,…, N / 2

(13.3)

And first detail sub signal d1 = (d1,d2,….,dN/2) is given as in Equation (13.4):

dn =

x 2n−1 − x 2n 2

, n = 1, 2,…, N / 2

(13.4)

The change is connected to the columns of a framework. Putting estimate parts of a line change in first n sections and relating point of interest parts in keep going n segments shapes a first level network which have four pieces in every bit of measurement (No. of lines/2) × (No. of segments/2) [24].

t14   t 24     t34  t 44 

   M=    

 t A =  11  t 21

 t t12  13  H = t 22   t 23

t14   t 24 

 t V =  31  t 41

 t t32  43 = H   t 42   t53

t 44   t54 

t11

t12



t13

t 21

t 22



t 23

 t31

 t32

 

 t33

t 41

t 42



t 43

CBIR Using Data Mining & RNN Techniques 273

13.3.3 Sobel Edge Detector Edges have boundaries and are important as it reduces useless data preserving the image’s key structural properties. Classical edge detection converts images using a 2-D filter also generally known as an operator; it is highly susceptible to large gradients in an image and returns null value in consistent areas. More care must be taken to select the orientation operators, Edge structure and Noise environment are considered when designing edge detection operator variables [31]. The operator’s geometry determines sensitivity to edges. The operators seek horizontal, vertical, diagonal edges. Sobel Edge Detector generates a gradient magnitude series via a simple convolution kernel [30]. An image’s gradient be f(x, y) at position (x, y) is specified by a vector seen in Equation (13.5): Using the magnitude of an edge direction’s gradient vector is represented as Equation (13.6) [25]: 1/ 2

∇f = mag (∇f ) = Gx2 + G y2 

(13.6)

The magnitude of a gradient is given as Δf =|Gx|+|Gy| and gradient vecG  tor direction is given by α ( x , y ) = tan −1  x  in which on x-axis the angle  Gy 

is measured. The Sobel operators are given by gradient’s equal digital form and the equation is Gx = (P7 + 2P8 + P9) − (P1 + 2P2 + P3) and similarly Gy = (P3 + 2P6 + P9) − (P1 + 2P4 + P7) where P1 to P9 are pixels in a sub image as in Figure 13.3 [26]. Figure 13.4 calculates Gx at center of a 3 × 3 region and later computes Gy. Sobel Edge Detector generates a gradient magnitudes series through a simple convolution kernel.

13.3.4 Gabor Filter Gabor filters model picture understanding assignments composition because of solid relations between distinctive channel yields. Outwardly distinctive picture areas have same first request insights. Utilizing second request measurements enhances a circumstance considering dark pixel levels as well as the spatial connections between them [27]. Both 2-D Gabor capacity g(x, y) and Fourier change G (u, v) are given by the comparison (13.7):

274 Machine Vision Inspection Systems Start

MRI Input image

Image compression Daubechies wavelet with Huffman coding

Texture features using Gabor filter

Edge features using Sobel detector

Feature Selection Mutual Information and Proposed technique

Classification and performance measurement

Figure 13.3 Flowchart of proposed methodology.

P1

P2

P3

-1

-2

-1

-1

0

1

P4

P5

P6

0

0

0

-2

0

2

P7

P8

P9

1

2

1

-1

0

1

(a)

(b)

(c)

Figure 13.4 shows Sobel masks. (a) Sub image (b) Sobel mask for horizontal direction (c) Sobel mask for vertical direction.

 1  x  2  y  2   1   j x y g (x , y ) =  exp − + + ω ( cos θ + sin θ )    σ   σ   πσ σ 2 2    y x y     x (13.7)

CBIR Using Data Mining & RNN Techniques 275 Where σ is spatial spread ω is frequency and θ is orientation

1 1  1 (u −W )2 v 2  G(u, v )= exp −  + 2  Wherre σ u = and σ v = 2 2πσ x 2πσ y σ v   2  σ u

(13.8)

Gabor filters were chosen as a texture model for image understanding tasks due to relations between filters outputs. The relationships form new texture features and concisely describe texture information. A new feature encodes filter distribution response information and feature performance is assessed by application of more image regional classification. The results are compared with those got with features that do not use filter output relationships. The convolution of an image is done using the Gabor function for every orientation in spatial frequency and the result is obtained. Given an image F(x, y), it filter this image with Gab(x,y,W,θ,σx,σy) [28].

FGab( x , y ,W ,θ , σ x , σ y ) =

∑∑ F(x − k, y − l)*Gab(x , y ,W ,θ ,σ ,σ ) x

k

y

l

(13.9)

The magnitudes of the Gabor filters responses are represented by three moments as in Equation (13.10): 1 µ(W ,θ , σ x , σ y ) = XY

X

∑∑ FGab(x , y ,W ,θ ,σ ,σ ) x

Y

∑∑|| FGab(x , y ,W ,θ ,σ

std(W ,θ , σ x , σ y ) =

x =1 y =1

1 Skew = X XY

X

Y

x

, σ y )| − µ(W ,θ , σ x , σ y )|2

 FGab( x , y ,W ,θ , σ x , σ y ) − µ(W ,θ , σ x , σ y )    std(W ,θ , σ x , σ y )  y =1 

∑∑ x =1

y

x =1 y =1

X

Y

3

(13.10)

The attribute vector is created using μ(W,θ,σx,σy),std(W,θ,σx,σy) and also Skew as attribute components.

276 Machine Vision Inspection Systems

13.3.5 Proposed Hybrid CS-PSO Algorithm The parameters pa, λ, and α presented in the CS help the calculation to discover ideal arrangement. Among them, pa is a significant parameter in deciding the extent of worst nests and can be conceivably utilized in altering convergence rate of algorithm. The conventional CS calculation utilizes a fixed value for pa. This value is set in the instatement arrange and can’t be changed during the entire iterative procedures. The fundamental disadvantage of this technique is that it isn’t anything but difficult to locate the best extent. The extent of worse nests too big or too small will all prompt a case that the calculation can’t acquire the optimal solution [29]. PSO is an advancement calculation portrayed by Eberhart and Kennedy in 1995. It is a stochastic improvement figuring of swarm information considering the diversion of various total acts of the living creatures, for instance, winged animal surging, fish teaching, and swarm speculation. As a progression device, PSO gives a population based interest framework in which individuals are called particles. In PSO, fragments fly everywhere in a multi-dimensional interest space. Most of the particles are evaluated by the objective work and have a definite speed which impacts the advancement of particles. The velocity and position vector is refreshed by the accompanying condition (13.11):

Wt =

M gen − t M gen

(13.11)

Where Wt is inertia weight, Mgen is the maximum generations of the algorithm. The pseudo code of CS-PSO based image enhancement: Begin Objective function F(x), x = (x1,……..xd)T Generate initial population of n host nests xi(i = 1, 2,……..,n) and corresponding random velocities Calcuate fitness value Fi by using the objective function defined in

 E( I e )* N T   sum(hTh )  F ( I e ) = log   ∗    ( M * H  ∆h

Obtain p_besti and g_best While (t < MaxGeneration) or (Stop criterion)

CBIR Using Data Mining & RNN Techniques 277 // Original CS algorithm Generate new position vectors xnewi by using CS algorithm defined in

xi(t +1) = xi(t ) + α ⊕ Levy (λ )

// Disturbance by PSO Update velocity and position using

(

)

(

vit +1 = W t * vit + c1 * r1 * p _ bestit − xit + c2 * r2 * g _ best t − xit

and

)

xit +1 = xit + vit +1

Calculate fitness value F_newi by using the objective function defined in

 E( I e )* N T   sum(hTh )  F ( I e ) = log   ∗    ( M * H )  ∆h

Obtain g_best if (F_newi > Fi) Fi = F_newi p_besti = xnew∂ End if Compare and obtain the optimal g_best End while Output the enhanced image based on optimal parameter End

CS-PSO uses PSO algorithm as a disturbance, substitute for the procedure of refreshing the worse nests in Cuckoo Search algorithm. p_best and g_best facilitate the PSO algorithm to enhance local results efficiently into global optimal. The disturbance does nothing with the worse, while broader hunting and rapidly concentrates to the optimal solution.

13.4 Results and Discussion The proposed optimized RNN is evaluated using a dataset containing 5 different class of image with 1,200 in each class totalling 6,000 images. The experiments are run 10-fold cross validation. The Daubechies wavelet and Huffman coding is worn for image shrinking. Gabor filters are utilized to obtain texture features. Feature selection is achieved through MI and proposed hybrid Cuckoo based feature selection.

278 Machine Vision Inspection Systems The design parameters of the proposed RNN are shown in below The parameters used in this model at different phases are: for Gabor filters the phase offset value is taken as zero degree and the orientation is 15, 45, 75°. In the neural networks the number of input neurons are 50, output neurons are 5, in hidden layer 10 neurons are used. For training the network BPTT algorithm is used with sigmoid activation function at the hidden layer and for termination the criteria is RMSE