Computer, Communication, and Signal Processing. AI, Knowledge Engineering and IoT for Smart Systems. 7th IFIP TC 12 International Conference, ICCCSP 2023 Chennai, India, January 4–6, 2023 Revised Selected Papers 9783031398100, 9783031398117

118 32

English Pages 344 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Computer, Communication, and Signal Processing. AI, Knowledge Engineering and IoT for Smart Systems. 7th IFIP TC 12 International Conference, ICCCSP 2023 Chennai, India, January 4–6, 2023 Revised Selected Papers
 9783031398100, 9783031398117

Table of contents :
Preface
Organization
Contents
Artificial Intelligence in Health Care
Early Stage Prediction Model of Autism Spectrum Disorder Traits of Toddlers
1 Introduction
2 Related Work
3 Methodology
3.1 Data Preprocessing
3.2 Feature Selection
3.3 Classification
3.4 Performance Metrics
4 Experimental Result
5 Discussion
6 Conclusion
References
Detection and Estimation of Diameter of Retinal Vessels
1 Introduction
2 Methodology
2.1 Estimation of Diameter of Retinal Vessel Based on Morphological Operation
2.2 Diameter Calculation Based on Center Line
2.3 Measurement of Retinal Vessel Thickness Based on Minimum Line Integrals
3 Result and Discussion
4 Conclusion and Future Direction
References
Automated Summarization of Gastrointestinal Endoscopy Video
1 Introduction
2 Methodology
2.1 CNN Model
2.2 Description of the Dataset
2.3 Model Training Details
3 Experimental Results
4 Conclusion
References
Interpretation of Feature Contribution Towards Diagnosis of Diabetic Retinopathy from Exudates in Retinal Images
1 Introduction
2 Related Work
3 Proposed Methodology (SHAP-DFHE-MR Model)
3.1 Preprocessing
3.2 Optic Disc Segmentation
3.3 Exudates Detection
3.4 Feature Extraction and Classification
3.5 SHAP Tree Explainer Interpretation
4 Datasets
5 Results and Discussion
5.1 Performance Evaluation of Segmentation of Exudates
5.2 Performance Evaluation of Classification
5.3 Analysis of Explainability
6 Conclusion
References
False Positive Reduction in Mammographic Mass Detection
1 Introduction
2 Literature Survey
3 Proposed Methodology
3.1 Dataset Description
3.2 Data Pre-processing
3.3 False Positive Reduction in Density-Specific Mass Detection Model
4 Results and Discussion
4.1 Pre-processing and Segmentation
4.2 False Positive Reduction in Density-Specific Mass Detection Model
5 Conclusion
References
An Efficient and Automatic Framework for Segmentation and Analysis of Tumor Structure in Brain MRI Images
1 Introduction
2 An Efficient and Robust Framework
2.1 SVM Classifier for Classification of MRI Dataset
2.2 Watershed Method Technique for Tumor Region Extraction
2.3 Analysis and Diagnosis of Tumor Structure with EM-GM Method
3 Results and Discussion
4 Conclusion
References
Machine Learning and Deep Learning
Collaborative CNN with Multiple Tuning for Automated Coral Reef Classification
1 Introduction
2 Related Works
3 CNN Architectures
3.1 M1 Model
3.2 M2 Model
4 Ensemble Method of Averaging
5 Experiment and Analysis
5.1 Dataset Used
5.2 Performance Analysis
5.3 Modified CNN Architectures
6 Conclusion
References
Analysis of Optimum 3-Dimensional Array and Fast Data Movement for Efficient Memory Computation in Convolutional Neural Network Models
1 Introduction
2 Analysis of 3-dimensional Feature Array Size and Data-Flow Movement Scheduling Strategy
2.1 Choosing the Optimum 3D Feature Array (Fa)
2.2 Data-Flow Scheduling Using 8 × 8 × 32 3D Feature Array
3 Simulation Results and Discussion
3.1 Evaluation of 3-D Feature Array (Fa) for the CNN Model
3.2 Analysis and Evaluation of Strategies for the CNN Layer
4 Conclusion
References
Generation Z’s Satisfaction with Artificial Intelligence Voice Enabled Digital Assistants
1 Introduction
2 Literature Review
3 Research Methodology
4 Data Analysis and Discussion
5 Conclusion
Appendix 1. Constructs Measurement Items
References
Deep Learning Models for Automatic De-identification of Clinical Text
1 Introduction
2 Clinical De-identification Process
2.1 Pre-processing
2.2 Word Embeddings
2.3 Label Prediction Layer
2.4 Label Sequence Optimization
3 Datasets
4 Limitations
5 Discussion
6 Conclusion
References
Forecast of Movie Sentiment Based on Multi Label Text Classification on Rotten Tomatoes Using Multiple Machine and Deep Learning Technique
1 Introduction
2 Methodologies
2.1 Dataset Used
2.2 Data Investigation
2.3 Data Pre-processing
2.4 n-grams and Tf-idf Vectorizer
2.5 Machine Learning (ML)
2.6 Deep Learning (DL)
3 Experiments and Results
4 Conclusions
References
An Ensemble Approach to Hostility Detection in Hindi Tweets
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 Dataset
3.2 Preprocessing
3.3 Model
4 Results
5 Conclusion
References
An Optimized Framework for Diabetes Mellitus Diagnosis Using Grid Search Based Support Vector Machine
1 Introduction
2 Literature Works
3 Methodology
3.1 Data Preprocessing
3.2 Feature Selection
3.3 Bio-Inspired Computing for Feature Selection
3.4 Binarized Whale Optimization Algorithm
3.5 Binarized Grey Wolf Optimization
3.6 Classifier
4 Results and Discussions
4.1 Performance Metrics
4.2 Dataset Description
4.3 Features Selection
5 Conclusion
References
Signal Processing
Inquisition of Vision Transformer for Content Based Satellite Image Retrieval
1 Introduction
2 Related Work
3 Deep and Vision Transformer Based Feature Extraction
3.1 Feature Extraction
3.2 Matching and Retrieval
3.3 Dataset Details
3.4 Evaluation Metric
3.5 Implementation Details
4 Experiments Results
4.1 Results
4.2 Precision Recall Plots
4.3 Time
5 Conclusion
References
Impact of Spectral Domain Features for Small Object Detection in Remote Sensing
1 Introduction
2 Related Works
2.1 Object Detection
2.2 Multi-scale Object Detection
2.3 Small Object Detection
2.4 Spectral Context-Fourier Transform and Wavelet Transform
3 Methodology
3.1 Detailed Overview of Our Approach
4 Experiments
4.1 Dataset
4.2 Implementation Details
5 Results
6 Conclusion
References
Application of Phonocardiogram and Electrocardiogram Signal Features in Cardiovascular Abnormality Recognition
1 Introduction
2 Related Work
2.1 Heart Signal Classification and usage of the PhysioNet/CinC Dataset
2.2 Contributions
3 Signals in Medical Domain Data
3.1 PCG Signals
3.2 ECG Signals
4 Proposed Methodology
4.1 PCG Signals
4.2 ECG Signals
4.3 Hybrid Feature Approach
5 Performance of the Models
6 Result and Conclusion
7 Future Scope
References
Analyzing Cricket Biomechanical Parameters Through Keypoint Detection and Tracking
1 Introduction
2 Literature Review
3 Some Biomechanical Parameters
3.1 Backlift Angle
3.2 Downswing Angle
4 The Dataset
4.1 The Model
4.2 Architecture Details
5 Conclusion
References
Multi-feature Based Sea–land Segmentation for Multi-spectral and Panchromatic Remote-Sensing Imagery
1 Introduction
2 Related Work
3 Theory
3.1 CMYK Color Space
3.2 HSV Color Space
3.3 Gabor Feature
3.4 Otsu Segmentation
3.5 Gray Connectedness
3.6 Morphological Operations
4 Proposed Method
5 Experimental Results
6 Conclusion
References
Internet of Things for Smart Systems
Deoxyribonucleic Acid Cryptography Based Least Significant Byte Steganography
1 Introduction
2 Motivation
3 Literature Survey
4 Proposed Method
4.1 DNA Cryptography
4.2 Encryption
4.3 Algorithm Selection
4.4 LSB Steganography
4.5 DNA Decoding
5 Results and Analysis
5.1 Dataset
5.2 Performance Metrics
5.3 Analysis and Result
5.4 Comparative Analysis
6 Conclusion
References
A Simple Hybrid Local Search Algorithm for Solving Optimization Problems
1 Introduction
2 Optimization Problem Statement
3 New Hybrid Local Search Algorithm (FP-AB)
4 Benchmark Functions
5 Computational Results and Analysis with Other Algorithms
6 Statistical Analyses
7 Conclusions, Limitation and Future Work
References
Performance Study of RIS Assisted NOMA Based Wireless Network with Passive IoT Communication
1 Introduction
2 System Model
2.1 Outage Probability
2.2 Ergodic Rate
3 Results and Discussion
4 Conclusion
References
Multi-channel Man-in-the-Middle Attacks Against Protected Wi-Fi Networks and Their Attack Signatures
1 Introduction
2 Background
2.1 MC-MitM Attack Setup
2.2 MC-MitM Base Variant Attack
2.3 MC-MitM Improved Variant Attack
3 MC-MitM Attack Traffic Analysis
4 MC-MitM Attack Signature Creation
4.1 Assumptions
4.2 Reference Scenario
4.3 Stage 1 Attack Traffic Signatures
4.4 Stage 2 Attack Traffic Signatures
4.5 Summary of Attack Traffic Signatures
4.6 Discussion
5 Conclusions and Future Works
References
A Study in Analysing the Critical Determinants of Internet of Things (IoT) Based Smart Processing for Sustainable Supply Chain Management
1 Introduction
2 Review of Literature
3 Research Methodology
3.1 Research Hypothesis
4 Analysis and Discussion
4.1 Percentage Analysis
4.2 Correlation Analysis
4.3 Chi Square Test
5 Conclusion
References
IOT Enabled Rover for Remote Survey of Archeological Areas
1 Introduction
2 Literature Review
3 Methodology
3.1 Mechanical Parts
3.2 Algorithm
3.3 Powertrain Design
3.4 IOT Enabling
3.5 Prototype
4 Results and Discussions
5 Limitations and Future Scope
6 Conclusion
References
An Embedded System for Smart Farming Using Machine Learning Approaches
1 Introduction
2 Related Works
3 Methodology
3.1 Block Diagram
4 Machine Learning Algorithms
4.1 Support Vector Machine (SVM)
5 Experimentation
6 Results and Analysis
7 Conclusion
References
An Intrusion Detection System for Securing IoT Based Sensor Networks from Routing Attacks
1 Introduction
2 Literature Survey
3 Proposed Methodology
3.1 Proposed System Architecture
3.2 Intrusion Detection System
4 Results and Discussion
5 Conclusion
References
Author Index

Citation preview

IFIP AICT 670 Eunika Mercier-Laurent Xavier Fernando Aravidan Chandrabose (Eds.)

Computer, Communication, and Signal Processing AI, Knowledge Engineering and IoT for Smart Systems 7th IFIP TC 12 International Conference, ICCCSP 2023 Chennai, India, January 4–6, 2023 Revised Selected Papers

IFIP Advances in Information and Communication Technology

670

Editor-in-Chief Kai Rannenberg, Goethe University Frankfurt, Germany

Editorial Board Members TC 1 – Foundations of Computer Science Luís Soares Barbosa , University of Minho, Braga, Portugal TC 2 – Software: Theory and Practice Michael Goedicke, University of Duisburg-Essen, Germany TC 3 – Education Arthur Tatnall , Victoria University, Melbourne, Australia TC 5 – Information Technology Applications Erich J. Neuhold, University of Vienna, Austria TC 6 – Communication Systems Burkhard Stiller, University of Zurich, Zürich, Switzerland TC 7 – System Modeling and Optimization Lukasz Stettner, Institute of Mathematics, Polish Academy of Sciences, Warsaw, Poland TC 8 – Information Systems Jan Pries-Heje, Roskilde University, Denmark TC 9 – ICT and Society David Kreps , National University of Ireland, Galway, Ireland TC 10 – Computer Systems Technology Achim Rettberg, Hamm-Lippstadt University of Applied Sciences, Hamm, Germany TC 11 – Security and Privacy Protection in Information Processing Systems Steven Furnell , Plymouth University, UK TC 12 – Artificial Intelligence Eunika Mercier-Laurent , University of Reims Champagne-Ardenne, Reims, France TC 13 – Human-Computer Interaction Marco Winckler , University of Nice Sophia Antipolis, France TC 14 – Entertainment Computing Rainer Malaka, University of Bremen, Germany

IFIP Advances in Information and Communication Technology The IFIP AICT series publishes state-of-the-art results in the sciences and technologies of information and communication. The scope of the series includes: foundations of computer science; software theory and practice; education; computer applications in technology; communication systems; systems modeling and optimization; information systems; ICT and society; computer systems technology; security and protection in information processing systems; artificial intelligence; and human-computer interaction. Edited volumes and proceedings of refereed international conferences in computer science and interdisciplinary fields are featured. These results often precede journal publication and represent the most current research. The principal aim of the IFIP AICT series is to encourage education and the dissemination and exchange of information about all aspects of computing. More information about this series at https://link.springer.com/bookseries/6102

Eunika Mercier-Laurent Xavier Fernando Aravidan Chandrabose Editors •



Computer, Communication, and Signal Processing AI, Knowledge Engineering and IoT for Smart Systems 7th IFIP TC 12 International Conference, ICCCSP 2023 Chennai, India, January 4–6, 2023 Revised Selected Papers

123

Editors Eunika Mercier-Laurent University of Reims Champagne-Ardenne Paris, France

Xavier Fernando Toronto Metropolitan University Toronto, ON, Canada

Aravidan Chandrabose Sri Sivasubramaniya Nadar College of Engineering Kalavakkam, India

ISSN 1868-4238 ISSN 1868-422X (electronic) IFIP Advances in Information and Communication Technology ISBN 978-3-031-39810-0 ISBN 978-3-031-39811-7 (eBook) https://doi.org/10.1007/978-3-031-39811-7 © IFIP International Federation for Information Processing 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The Department of Information Technology is glad to present the proceedings of the IFIP 7th International Conference on Computer, Communication and Signal Processing (ICCCSP 2023), held at Sri Sivasubramaniya Nadar (SSN) College of Engineering, Kalavakkam, Tamil Nadu, India, during January 4–6, 2023. This conference was the seventh edition since 2017, with excellent support from the SSN management. ICCCSP 2023 focused on the rapidly emerging fields of Artificial Intelligence, Knowledge Engineering, and IoT for Smart Systems and provided a forum for researchers, scientists, academicians, and industrial experts across the globe to present and discuss the most recent innovations, ideas, and practical solutions relevant to this field. The conference was technically sponsored by the International Federation for Information Processing (IFIP) under the TC 12 working group, an IFIP technical committee on artificial intelligence. We thank Eunika Mercier-Laurent, chair of TC12, for readily considering our request and technically sponsoring ICCCSP 2023 under the Artificial Intelligence track. We acknowledge the valuable association of the advisory committee members, Balaraman Ravindran, IIT Madras, India, and Prahlad Vadakkepat, National University of Singapore, and the program committee members across the globe. ICCCSP 2023 attracted 132 submissions across India from highly reputed institutions such as IITs, DRDO, and the Centre for Artificial Intelligence and Robotics, as well as authors from countries including Bangladesh, France, Sri Lanka, Spain, the United Arab Emirates, and the United Kingdom. All the articles submitted through the EasyChair conference management system were subjected to a thorough open peer-review process by a team of 175 expert members. A minimum of three reviews have been carried out on each article. Based on the review comments, the members of the Program Committee accepted the top 26 articles, representing 19% acceptance of all submissions. ICCCSP 2023 started on January 4, 2023, with three parallel workshops on ‘Sustainable AI’, ‘IoT for Smart Systems’, and ‘Text Analysis and Information Extraction & Retrieval (TIER 2023)’ handled by subject experts from industry and academia. Hands-on sessions were provided to 95+ researchers and students across the country. On January 5, 2023, the conference was inaugurated in the presence of Shashikant Albal, Director, SSN School of Advanced Career Education, and V.E. Annamalai, Principal, SSN College of Engineering along with the faculty members, participants, research scholars, and students of SSN. The inauguration was followed by a keynote address by Xiaoli Li, Head of the Machine Intelligence Department, Institute for Infocomm Research, A* Singapore, on ‘Challenges and Methods for Equipment Remaining Useful Life Prediction’. Vijay John, Research Scientist, RIKEN Institute, Japan, delivered the second keynote address, titled ‘Dealing with missing modality in sensor fusion for perception tasks’. The next keynote address was delivered by Ponnuthurai Nagaratnam Suganthan,

vi

Preface

Research Professor, Qatar University, Doha, Qatar, on ‘Randomization-Based Deep and Shallow Neural Networks’. The third day of the conference started with a keynote address by Vikash Kumar, Solution Architect, EY GDS, India, on ‘Ontology-Based Knowledge Modelling for System Engineering’. It was followed with a keynote address by V. Subhashini, Research Scientist, Google Research, USA, on the topic ‘ML Applications for Healthcare and Accessibility’. Soundari Arunachalam, Director of R&D, Honeywell Technology Solutions, India, delivered the final keynote address, on ‘Building High-Performance Distributed Applications’. In between the keynote addresses, there were sessions for paper presentations where 26 registered authors presented their work under six tracks: ‘AI in Health Care’, ‘IoT and Social Networks’, ‘Network and IoT for Smart Systems’, ‘Signal Processing’, ‘Machine Learning’, and ‘Deep Learning’. The sessions were chaired by eminent experts from reputed universities along with subject experts from SSN. This volume is a collection of 26 papers accepted by the ICCCSP 2023 Program Committee, reflecting the revisions suggested during the presentation sessions. We believe that every research work presents an insightful approach and solution for the problem discussed in the article, integrating artificial intelligence, knowledge engineering, and IoT. We sincerely thank the International Federation of Information Processing (IFIP) for approving the conference proposal and supporting it in various avenues. We extend our gratitude to Springer for publishing the proceedings and presenting these articles to the global research community. We take this opportunity to thank and acknowledge the financial support provided by SSN. We also extend our thanks and appreciation to all the Program Committee members, reviewers, session chairs, Organizing Committee members, and participants for their contributions to the success of ICCCSP 2023. June 2023

Eunika Mercier-Laurent Xavier Fernando Aravindan Chandrabose

Organization

Advisory Committee Balaraman Ravindran Prahlad Vadakkepat

IIT Madras, India National University of Singapore, Singapore

Conference Chairs Xavier Fernando Chandrabose Aravindan

Toronto Metropolitan University, Canada Sri Sivasubramaniya Nadar College of Engineering, India

Program Committee Eunika Mercier-Laurent Kun-Chan Lan Yiu Wing Leung Ahmed Banafa Vijay John Chockalingam Aravind Vaithilingam Saswati Mukherjee Premkumar K. Anand Kumar M. Santhosh C. K. Srinivasan R. Shahina A.

University of Reims Champagne-Ardenne, France National Cheng Kung University, Taiwan Hong Kong Baptist University, China San José State University, USA RIKEN Institute, Japan Taylor’s University, Malaysia Anna University, India IIITDM, Kancheepuram, India National Institute of Technology, Karnataka, India Nokia, India Sri Sivasubramaniya Nadar College of Engineering, India Sri Sivasubramaniya Nadar College of Engineering, India

Organizing Committee E. M. Malathy K. S. Gayathri V. Sivamurugan K. R. Uthayan

Sri Sivasubramaniya India Sri Sivasubramaniya India Sri Sivasubramaniya India Sri Sivasubramaniya India

Nadar College of Engineering, Nadar College of Engineering, Nadar College of Engineering, Nadar College of Engineering,

viii

Organization

Session Chairs R. Srinivasan A. Shahina T. Shree Sharmila V. Sivamurugan S. Mohanavalli P. Vasuki V. Thanikachalam

Sri Sivasubramaniya India Sri Sivasubramaniya India Sri Sivasubramaniya India Sri Sivasubramaniya India Sri Sivasubramaniya India Sri Sivasubramaniya India Sri Sivasubramaniya India

Nadar College of Engineering, Nadar College of Engineering, Nadar College of Engineering, Nadar College of Engineering, Nadar College of Engineering, Nadar College of Engineering, Nadar College of Engineering,

Additional Reviewers Abhishek Gudipalli Abirami A. M. Ajitha P. Angel Deborah P. Anitha Sarafin X. Anuja T. Anushiya Rachel G. Arasakumar M. Archana M. Arulkumar V. Arun Karthick Arun Raj L. Ashwinth Janarthanan Balaganesh N. Balaji A. Balamurugan M. Betina Antony Bharathi B. Brindha Merin J. Chamundeswari A. Chithra S. Deepika S. Dhanalakshmi B. Durga G. Felix Enigo V. S. Geetha A.

Geetha K. Gino Sophia S. G. Gopinath N. Gurusubramani S. Hanis S. Hariharan B. Hariharan S. Harriet Linda C. Heltin Genitha C. Hema Chandrabose Hemavathi S. Indira K. Indumathi C. P. Jahir Husain A. Jawahar A. Jayamala R. Jayanthi Palraj Jinil Persis D. Joe Louis Paul I. Josephine Julina J. K. Kali Doss Kanchana R. Karmel A. Karthick Gunasekaran Karthika Renuka D. Karthika S.

Organization

Kavitha D. Kavitha M. G. Kavitha Velayutham Kiruba Karan Lakshmanan B. Lakshmi Priya S. Lalitha V. Latchoumy P. Lekshmi K. Lokeswari Y. V. Madheswari K. Madhu Priya Govindarajan Mala Kaliappan Malathi D. Malathy C. Malathy N. Manimala G. Manisha S. Meera S. Mirunalini P. Mohammed Thaha Mohanavalli S. Murugesan K. Muthulakshmi V. Muthupriya Vasudevan Mythili Asaithambi Nabeena Ameen Nalini N. J. Nithya M. Nithya R. Paavai Anand Padma Bharathidasan Parimala Gandhi G. Partibane B. Pavithra L. Prabavathy B. Prathipa S.

Praveen Joe I. R. Priya B. Priyadharsini R. Radha N. Raja K. Rajalakshmi K. Rajasekaran G. Rajavel G. Rajesh S. Ramachandran A. Ramkumar M. P. Ramprabhu S. Ramya N. Rathi S. Revathi A. Ruby Dass Sadagopan S. Samundeswari Sivasubramanian Sandana Karuppan B. Sankar Ram Chellappa Sarala Kandassamy Saraswathi S. Sarath Chandran K. R. Saritha M. Sasirekha S. Sathish Kumar Selvakumar B. Senthilkumar D. Vasantha Kumar V. Vasuki P. Venkateswari R. Vidhusha S. Vidhya S. Vijayalakshmi M. Vinodh Kumar P. Yogarajan G.

ix

x

Organization

Sponsor

Contents

Artificial Intelligence in Health Care Early Stage Prediction Model of Autism Spectrum Disorder Traits of Toddlers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mousumi Bala and Mohammad Hanif Ali

3

Detection and Estimation of Diameter of Retinal Vessels . . . . . . . . . . . . . . . . Abhinav Jamwal

18

Automated Summarization of Gastrointestinal Endoscopy Video . . . . . . . . . . . B. Sushma and P. Aparna

27

Interpretation of Feature Contribution Towards Diagnosis of Diabetic Retinopathy from Exudates in Retinal Images . . . . . . . . . . . . . . . . . . . . . . . . Kanupriya Mittal and V. Mary Anita Rajam False Positive Reduction in Mammographic Mass Detection. . . . . . . . . . . . . . S. Shrinithi, R. Lavanya, and Devi Vijayan An Efficient and Automatic Framework for Segmentation and Analysis of Tumor Structure in Brain MRI Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Bhima, M. Neelakantappa, K. Dasaradh Ramaiah, and A. Jagan

36

51

66

Machine Learning and Deep Learning Collaborative CNN with Multiple Tuning for Automated Coral Reef Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Jannathul Firdous and S. Sabena

81

Analysis of Optimum 3-Dimensional Array and Fast Data Movement for Efficient Memory Computation in Convolutional Neural Network Models . . . . Deepika Selvaraj, Arunachalam Venkatesan, and David Novo

94

Generation Z’s Satisfaction with Artificial Intelligence Voice Enabled Digital Assistants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Thiruvenkadam Thiagarajan and Sudarsan Jayasingh Deep Learning Models for Automatic De-identification of Clinical Text. . . . . . 116 Ravichandra Sriram, Siva Sathya Sundaram, and S. LourduMarie Sophie

xii

Contents

Forecast of Movie Sentiment Based on Multi Label Text Classification on Rotten Tomatoes Using Multiple Machine and Deep Learning Technique . . . . 128 Debarati Nath and Joseph Roy An Ensemble Approach to Hostility Detection in Hindi Tweets . . . . . . . . . . . 143 Santosh Rajak, Monseej Purkayastha, Amitabh Deb, and Ujwala Baruah An Optimized Framework for Diabetes Mellitus Diagnosis Using Grid Search Based Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 S. Amutha and J. Raja Sekar Signal Processing Inquisition of Vision Transformer for Content Based Satellite Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 S. Dhruv Shindhe and A. G. J. Faheema Impact of Spectral Domain Features for Small Object Detection in Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Urja Giridharan, Neeraj Ramprasad, Sukanta Roy, and S. N. Omkar Application of Phonocardiogram and Electrocardiogram Signal Features in Cardiovascular Abnormality Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 196 R. Geetha Ramani, Abhinand Ganesh, Roshni Balasubramanian, and Aruna Srikamakshi Ramkumar Analyzing Cricket Biomechanical Parameters Through Keypoint Detection and Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Pranay Pandey, S. Dhruv Shindhe, and S. N. Omkar Multi-feature Based Sea–land Segmentation for Multi-spectral and Panchromatic Remote-Sensing Imagery . . . . . . . . . . . . . . . . . . . . . . . . . 218 D. K. Savitha, Rajkumar Kethavath, and Arshad Jamal Internet of Things for Smart Systems Deoxyribonucleic Acid Cryptography Based Least Significant Byte Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Aditya Anavekar and S. Rajkumar A Simple Hybrid Local Search Algorithm for Solving Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 A. Baskar and M. Anthony Xavior

Contents

xiii

Performance Study of RIS Assisted NOMA Based Wireless Network with Passive IoT Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 B. Aswini, Laxmikandan Thangavelu, and Manimekalai Thirunavukkarasu Multi-channel Man-in-the-Middle Attacks Against Protected Wi-Fi Networks and Their Attack Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Manesh Thankappan, Helena Rifà-Pous, and Carles Garrigues A Study in Analysing the Critical Determinants of Internet of Things (IoT) Based Smart Processing for Sustainable Supply Chain Management . . . . . . . . 286 S. Meena and T. Girija IOT Enabled Rover for Remote Survey of Archeological Areas . . . . . . . . . . . 296 Abhirup Sarkar, Ayushman Khetan, Eshan Gupta, Hussain Sabunwala, R. Harikrishnan, and Priti Shahane An Embedded System for Smart Farming Using Machine Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 R. Dhaya Sree, A. Arun Raja, and S. Jayanthy An Intrusion Detection System for Securing IoT Based Sensor Networks from Routing Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Shalini Subramani, M. Selvi, S. V. N. Santhosh Kumar, K. Thangaramya, M. Anand, and A. Kannan Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

Artificial Intelligence in Health Care

Early Stage Prediction Model of Autism Spectrum Disorder Traits of Toddlers Mousumi Bala1(B) and Mohammad Hanif Ali2 1 Eastern University, Savar, Bangladesh

[email protected]

2 Jahanginagar University, Savar, Bangladesh

[email protected]

Abstract. Globally, autism spectrum disorder (ASD) is a major concern. It is a complex neurological condition that affects a child’s ability to identify objects, faces, express emotions, and develop social skills. It is incurable, but earlier detection and diagnosis of ASD aid in better treatment. It begins in early childhood, although symptoms may be recognized as late as adulthood. As a result, many children are unable to get proper treatment at an early age, causing their health to become more complicated. In our study, we explored the toddler dataset. The selection of the important features in medical applications such as ASD screening therefore plays a key role in model development, since that directly affects classification accuracy. This study examines several feature selection approaches. We use Generalized Learning Vector Quantization (GLVQ) and propose a method, Modified Generalized Learning Vector Quantization (MGLVQ) for selecting important features. We apply knearest neighbor (kNN), Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), Classification and Regression Trees (CART), and Multilayer Perceptron (MLP) classification machine learning algorithms. The performance of classifiers is evaluated by evaluation methods such as accuracy, Fβ and area under the curve (AUROC). The results of all classifiers have been examined, and the performance of the MLP in different evaluation methods is excellent. Moreover, we illustrate the MGLVQ technique explores important features that impact the primary diagnostic procedure of ASD traits in toddlers. As a result, the proposed machine learning model could be effective in detecting ASD at an earlier stage. The objective of our study is to build an online application that can accurately identify ASD at an early stage using machine learning. Keywords: ASD · toddlers · feature selection · classifiers · MLP

1 Introduction Autism spectrum disorder (ASD) is a neurodevelopmental disorder that impacts how a person makes interaction and communication with others [1]. The actual cause of ASD is still unknown. Autistic people show different abnormal activities like repetitive behaviors, restricted interests, social communication deficits, etc. © IFIP International Federation for Information Processing 2023 Published by Springer Nature Switzerland AG 2023 E. Mercier-Laurent et al. (Eds.): ICCCSP 2023, IFIP AICT 670, pp. 3–17, 2023. https://doi.org/10.1007/978-3-031-39811-7_1

4

M. Bala and M. Hanif Ali

About 1 in 100 children are estimated as ASD throughout the world [3]. Like other autistic patients, it causes significant and serious developmental delays in toddlers. Hence, it is essential to detect ASD of them in the preliminary stage [4]. To assess ASD in toddlers [5], it is required to identify toddler’s behavioral history from their parents, relatives and caregivers. Likewise, a question-based survey is useful where different conditions are placed by researchers. The toddlers whose covered in these conditions may have ASD. Therefore, various autism screening tools including the checklist for ASD in toddlers (Q-CHAT) [6], MCHAT, AQ, etc. are used for such symptoms to manipulate this condition [7].

Fig. 1. The proposed framework of predicting ASD traits of toddlers.

In this study, we developed a machine-learning model that predicts ASD in toddlers. However, different features of toddlers were identified using Generalized Learning Vector Quantization (GLVQ) and Modified Generalized Learning Vector Quantization (MGLVQ) methods for screening autism more precisely. Then, various classifiers including k-nearest neighbor (kNN), Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), Support Vector Classifier (SVM), and Multilayer Perceptron (MLP) were employed in the individual feature subsets. For identifying the best model for the dataset. Various evaluation metrics were used to compare the outcomes of each classifier. Due to social harassment, many individuals in developing countries (i.e., particularly in Bangladesh) are not agreed to check for autism physically, hence they refuse to go to health professionals. The implication of this study is considered that anyone can use and determine autism in toddlers using this model. Further, this study assists us in better understanding the traits of toddlers and early detect autism and ensuring the treatment policy for them. The contributions of this study are given as follows: – Proposed model is especially detected ASD in toddlers. – We identified several discriminatory factors of ASD for toddlers.

Early Stage Prediction Model of Autism Spectrum Disorder Traits

5

– Proposed methodology outperforms the results of several standard classifiers and accompanying pre-trained models. – Our proposed method MGLVQ performs better than other methods.

2 Related Work Islam et al. [9] used machine learning methods to detect ASD optimum behaviors. The toddler data set collected from the Kaggle repository [8] consisted of twenty features and one target feature with binary class. They applied SVM, Random forest, Naive Bayes and kNN classifiers. They observed that kNN and Random Forest showed the highest accuracy at 98% and 93/ Mohanty et al. [10] illustrated the deep learning model using ASD data sets (toddlers, child, adolescent and adult). They were considered principal component analysis (PCA) for the reduction of feature dimension and used deep neural network (DNN). They found accuracy as 85.24% for toddlers, 84.24% for adolescent, 85.71% for child and 89.24% for adult data sets. Saihi et al. [11] developed the machine learning model for toddler data set. They applied three classifiers such as C4.5, random forest, and neural network and found that the neural network showed excellent accuracy at 99%. Akyol et al. [12] developed the fuzzy rule model to identify the most prevalent features used for ASD detection in children up to the age of three years. In order to identify the most useful features from the dataset, they developed a hybrid model of logistic regression and fuzzy rules methods. The findings indicated that feature selection had a significant effect on predicting autistic traits in individuals with ASD. Alzubi et al. [13] suggested a hybrid feature selection technique that combines the logical mutual information maximization and SVMrecursive feature elimination methods. According to the findings, the hybrid technique outscored other algorithms by up to 89%. The authors have shown, from that results, that the subset of the dataset produced with their hybrid model is enough, at least on the basis of the dataset, for an efficient distinction between those affected and those healthy. Rajab et al. [14] proposed a machine learning model and they applied different feature selection methods. The classifiers kNN, AdaBoost, and ID3 were applied for the toddler dataset and they observed that the maximum accuracy was 98.34% of AdaBoost with fast correlated-based filter. Satu et al. [15] used Autism Barta apps for collecting data and applied different classifiers. They showed that J48 performed the best outcome than other classifiers. We develop an efficient model for ASD prediction using toddlers autism screening datasets provided by Fadi Fayez Thabtah [8].

3 Methodology As shown in Fig. 1, the proposed model consists of several steps which are given as follows.

6

M. Bala and M. Hanif Ali Table 1. Variable mapping of toddlers dataset

Variable

Q-CHAT-10 Features (18–36 months)

Answer Code 1

Does your child look at you when you call his/her name?

Answer Code 2

How easy is it for you to get eye contact with your child?

Answer Code 3

Does your child point to indicate that s/he wants something?

Answer Code 10

Does your child stare at nothing with no apparent purpose?

Table 2. Feature description of toddlers dataset Serial no Feature

Data type

Description

1–10

A1–A10

Binary

The answer code of the corresponding question

11

Age Mons

Integer

Toddlers (months)

12

Gender

String(f, m)

Male or female

13

Score

Integer

Based on the screening test answers, score out of 10

14

Ethnicity

String

List of common ethnicities

15

Jaundice

String(f, m)

Whether the toddler was born with jaundice

16

Family ASD String(yes, no)

If the toddler’s family were diagnosed with autism

17

User

String

Who is responding to the screening test questions

18

Class ASD

String(Yes, No) ASD traits or No ASD traits of toddler

3.1 Data Preprocessing In this study, we collected toddlers data set from Kaggle data repository [8]. It includes both autistic and non-autistic toddlers records whose ages are from 12 to 36 months. It follows a standard screening tool called Q-Chat-10 [9] to collect cases. The toddlers data set contains 1054 instances with 18 features where 30.76 percent of toddlers are female and 69.24 percent are male. This survey was taken from toddler’s parent, caregiver, and medical personnel. Tables 1 and 2 show a description of the relevant features of the Toddler data set. In the pre-processing steps, we cleaned this data set and removed several unwanted things. Then, we performed one hot encoding technique to convert categorical features into binary values. 3.2 Feature Selection In this work, we used two methods as Generalized Learning Vector Quantization (GLVQ) and Modified Generalized Learning Vector Quantization (MGLVQ) to identify significant features of ASD. [17]. Generalized Learning Vector Quantization. As a supervised learning approach, GLVQ is best suited for classification tasks [16]. It is a method for enhancing vector

Early Stage Prediction Model of Autism Spectrum Disorder Traits

7

quantization-based techniques, particularly learning vector quantization (LVQ) 2.1[17]. The algorithm’s flow is: 1. The input space has two reference vectors, r1 and r2. 2. It is possible to define an input vector y in such a way that r1 is the closest reference vector that also belongs to the same class as y, and r2 is the closest reference vector that instead belongs to a class that is different from y. 3. Using the Euclidean distance, r1 and r2 that are relatively close to the input vector. To define the relative distance difference μ(y), we will use the following formula: μ(y) =

(l1 − l2 ) l1 + l2

(1)

where l1 and l2 are the distances of y from r1 and r2 . μ(y) ranges between −1 and +1 If the, is negative, y is classified correctly. So we need to decrease μ(y), value. 4. Using the following equation, the minimization of cost function K is determined: n    K= f μ yj (2) j=1

where n is used for training of input vectors and the function μ(y) is monotonically rising. 5. The following is an updated version of the reference vectors: r1 = r 1 + α

l2 δf (y − r1 ) δθ (l1 + l2 )2

(3)

r2 = r2 + α

l1 δf (y − r1 ) δθ (l1 + l2 )2

(4)

where α is a learning rate. Proposed Method. We proposed an new concept of GLVQ classifier, named Modified Generalized Learning Vector Quantization (MGLVQ). The flow of the algorithm is as follows: 1. The input space has two reference vectors, r1 and r2. 2. It is possible to define an input vector y in such a way that r1 is the closest reference vector that also belongs to the same class as y, and r2 is the closest reference vector that instead belongs to a class that is different from y. 3. Using the Euclidean distance, r1 and r2 that are relatively close to the input vector. To define the relative distance difference μ(y), we will use the following formula: μ(y) =

(l1 − l2 ) (ξ l1 + l2 )

(5)

where l1 and l 2 are the distances of y from r1 and r2 . Where ξ = 4, is a key factor μ(y) ranges between −1 and +1 If the μ(y), is negative, y is classified correctly. So we need to decrease μ(y) value.

8

M. Bala and M. Hanif Ali

4. Using the following equation, the minimization of cost function K is determined: n K= f (μ(yi )) (6) j=1

where n is used for training of input vectors and the function μ(y) is monotonically rising. 5. The following is an updated version of the reference vectors: r1 = r 2 + α

5l2 δf (y − r1 ) δθ (4l1 + l2 )4

(7)

r2 = r2 + α

5l1 δf (y − r1 ) δθ (4l1 + l2 )4

(8)

where α is a learning rate. 3.3 Classification In this work, we used several popular classification methods including k-Nearest Neighbour (KNN), Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), Support Vector Machine (SVM), and Multilayer perceptron (MLP).

Fig. 2. Comparison between GLVQ and MGLVQ.

3.4 Performance Metrics The performance metrics including accuracy [18], Fβ [20], and the area under the curve (AUROC) [14] were used to investigate the performance of individual classifiers which are represented as follows: True Positive (TrPo) values The toddlers with ASD appropriately identified as having ASD. True Negative (TrNe): The individual did not have ASD and was appropriately diagnosed as such. False Positive (FaPo): When an individual who does not have ASD is wrongly labeled as having ASD.False Negative (FaNe): A individual with ASD was

Early Stage Prediction Model of Autism Spectrum Disorder Traits

9

wrongly labeled as not having ASD. True positive rate(TPR) and false positive rate(FPR) are considered for measuring the average area under the ROC for all possible points. (Tr Po + Tr Ne ) (Tr Po + Fa Po + Fa Ne + Tr Ne )   (precision ∗ recall)   Fβ = 1 + β 2 ∗  2 β ∗ precision + recall

Accurancy =

(9) (10)

Tr Po (Tr Po + Fa Ne )

(11)

Fa Po  FRP =  Fa Pp + Tr Ne

(12)

TRP =

AUROC =

TRP − FRP + 1 2

(13)

where TrPo, TrNe, FaPo, FaNe are called true positive, true negative, false positive, and false negative values respectively. Table 3. The observations with various values of β Observations

Training (%):Testing (%)

Observation 1 (β = 50)

50:50

Observation 2 (β = 60)

60:40

Observation 3 (β = 70)

70:30

Observation 4 (β = 80)

80:20

Observation 5 (β = 90)

90:10

4 Experimental Result We used sci-kit machine learning libraries for each of the classifiers separately for these methods. we applied the Generalized Learning Vector Quantization (GLVQ) and Modified Generalized Learning Vector Quantization (MGLVQ) methods to the toddlers data set for selecting important features. The comparison of the above two methods for feature selection was shown in Fig. 2. We considered only the positive value of coefficients for both methods. The subset (A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, Age, Gender, Jaundice) was selected by the Generalized Learning Vector Quantization (GLVQ) and denoted as FS GLVQ. The subset (A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, Age, Gender, Jaundice, Ethnicity) was also selected by the Modified Generalized Learning Vector Quantization (MGLVQ) and denoted as FS MGLVQ. With 18 features, the data set of toddlers was denoted as FS full. On these subsets, KNN, LDA, CART, NB, SVM, and MLP were applied and determined accuracy, Fβ, and AUROC of them.

10

M. Bala and M. Hanif Ali

Therefore, the performance of these classifiers is needed to investigate. In this case, we considered five observations with different β values (i.e., such as 50, 60, 70, 80, 90) and split this data set where 50%, 60%, 70%, 80% and 90% of the data set were used for training and 50%, 40%, 30%, 20% and 10% were used for testing purpose. Table 3 shows the observations for the training data and the testing data with different values of β. The training and testing data was chosen at random 100 times for each case of β. The accuracy and AUROC of the individual classifiers are shown in Table 4 and Table 6 for all subsets. Also, Table 5 shows the score of Fβ of all classifiers with all subsets. Table 4. Accuracy for various classifiers. Observations

Classifier

FS full

Observation 1

KNN

92.42

93.74

94.31

LDA

93.84

94.69

94.12

CART

88.24

90.51

90.70

Observation 2

Observation 3

Observation 4

FS GLVQ

FS MGLVQ

NB

90.51

92.60

93.93

SVM

96.21

96.58

96.39

MLP

98.86

99.05

100.00

KNN

92.18

94.08

94.79

LDA

96.92

94.79

96.45

CART

91.47

92.18

91.47

NB

93.84

93.13

95.97

SVM

97.16

98.10

96.45

MLP

99.76

100.00

100.00

KNN

92.43

94.34

93.06

LDA

95.90

95.58

94.95

CART

90.85

91.17

90.22

NB

95.27

97.16

95.90

SVM

97.79

98.11

98.42

MLP

99.68

100.00

100.00

KNN

91.94

94.31

95.73

LDA

92.89

93.84

96.21

CART

92.88

93.84

94.42

NB

95.26

94.79

95.73 (continued)

Early Stage Prediction Model of Autism Spectrum Disorder Traits

11

Table 4. (continued) Observations

Observation 5

Classifier

FS full

FS GLVQ

FS MGLVQ

SVM

97.16

98.58

99.53

MLP

99.53

100.00

100.00

KNN

91.94

97.17

91.51

LDA

92.89

96.23

93.40

CART

91.00

92.45

90.57

NB

95.26

94.34

92.45

SVM

97.16

99.00

90.57

MLP

99.53

100.00

100.00

Result Analysis of KNN: We considered all observations where KNN showed the high accuracy 95.73% when β = 80 from other observations for subset FS MGLVQ. The maximum score of Fβ was 98.26% in observation-1 for subset FS GLVQ. Then, KNN gave the high score of AUROC was 99.64% for subset FS GLVQ in observation-5. Result Analysis of LDA: In observation-4, LDA showed the maximum accuracy 98.45% from other observations for subset FS MGLVQ. It obtained the high score 96.21% for Fβ for subset FS MGLVQ in observation-4. It showed high score of AUROC was 99.54% for observation-3 with subset FS full. Result Analysis of CART: In observation-4, CART obtained the high accuracy 93.84% from other observations for subset FS GLVQ. The high score of Fβ was 96.75% for observation-4 with subset FS MGLVQ. It required maximum score of AUROC was 92.50% for observation-5 with subset FS MGLVQ. Result Analysis of Naive Bayes: For observation-3, the Naive Bayes showed a high accuracy of 97.16% from other observations for subset FS GLVQ. The maximum score of Fβ was 97.81% for observation-3 with subset FS GLVQ. It required a high score of AUROC 99.26% for observation-2 with subset FS GLVQ. Result Analysis of SVM: The SVM achieved a maximum accuracy of 99.53% in the observations-4 for the FS MGLVQ subset. When using the subset FS MGLVQ, the maximum Fbeta score was 99.45%. With subset FS GLVQ, AUROC for observation-4 was 99.92%. Result Analysis of MLP: The MLP achieved maximum accuracy of all observations from other classifiers. It showed the accuracy range from 98.86% to 99.76% with the values of β for FS full. The accuracy ranged from 99.05% to 100% for subset FS GLVQ and 100% for FS MGLVQ with all observations of MLP. It acquired the score 100% when β ≥ 60% for FS GLVQ. It obtained the maximum accuracy of 100% with subset FS MGLVQ for all test observations. The MLP acquired the score of Fβ range from 97.45% to 99.44% for FS full, from 97.81% to 99.90% for FS GLVQ and from 92.65% to 99.46% for FS MGLVQ. The MLP achieved a maximum score Fβ of 99.90% for

12

M. Bala and M. Hanif Ali

FS GLVQ. It also obtained the score of AUROC range from 97.98% to 99.97% for FS full, from 99.52% to 100% for FS GLVQ and from 99.59% to 100% for FS MGLVQ. It achieved the high score AUROC of 100% from other classifiers for FS MGLVQ. Figure 3 showed graphs of the AUROC curves for toddlers dataset for subsets of FS MGLVQ of the case of β = 80%. We observed that the MLP achieved the best AUROC result from others. Table 5. Fβ for various classifiers. Observations

Classifier

FS full

FS GLVQ

FS MGLVQ

Observation 1

KNN

97.13

96.54

96.67

Observation 2

Observation 3

Observation 4

LDA

97.22

97.67

96.76

CART

90.64

93.58

93.77

NB

92.47

94.49

95.23

SVM

97.98

97.80

96.46

MLP

97.78

97.81

96.47

KNN

97.37

96.15

96.32

LDA

98.46

97.16

97.26

CART

93.06

94.55

94.01

NB

95.54

94.57

95.96

SVM

97.37

98.23

96.12

MLP

97.77

98.26

96.13

KNN

95.12

97.00

96.74

LDA

97.31

97.31

97.37

CART

93.45

94.90

93.75

NB

96.36

97.81

97.43

SVM

97.44

98.63

98.98

MLP

97.45

98.64

98.99

KNN

97.08

95.62

97.51

LDA

96.61

96.24

98.45

CART

94.14

94.91

96.34

NB

96.28

95.79

96.75

SVM

97.98

98.69

99.45

MLP

97.99

98.99

99.46 (continued)

Early Stage Prediction Model of Autism Spectrum Disorder Traits

13

Table 5. (continued) Observations

Classifier

FS full

FS GLVQ

FS MGLVQ

Observation 5

KNN

97.08

98.26

94.51

LDA

96.61

98.80

96.03

CART

94.79

94.67

90.12

NB

96.28

94.97

93.39

SVM

97.98

99.89

92.64

MLP

99.44

99.90

92.65

Fig. 3. AUCROC curve of the MGLVQ for the observation of β = 80%. Table 6. AUROC for various classifiers Observations

Classifier

FS full

Observation 1

KNN

96.71

Observation 2

FS GLVQ 98.82

FS MGLVQ 99.18

LDA

98.46

99.35

99.14

CART

83.58

89.68

90.61

NB

98.67

98.94

99.08

SVM

98.23

99.83

99.92

MLP

99.00

100.00

100.00

KNN

98.35

99.10

98.54

LDA

99.18

99.20

99.24 (continued)

14

M. Bala and M. Hanif Ali Table 6. (continued)

Observations

Observation 3

Observation 4

Observation 5

Classifier

FS full

FS GLVQ

FS MGLVQ

CART

87.25

89.64

89.76

NB

94.55

99.24

98.83

SVM

99.68

99.84

99.70

MLP

99.97

100.00

99.92

KNN

98.22

98.55

97.78

LDA

99.54

99.05

99.30

CART

81.55

87.92

90.07

NB

98.34

99.22

99.12

SVM

99.89

99.61

99.21

MLP

99.90

100.00

99.59

KNN

95.64

97.51

97.21

LDA

97.49

97.92

98.79

CART

80.11

85.54

86.00

NB

95.11

98.18

98.02

SVM

98.01

98.63

99.23

MLP

99.44

99.89

100.00

KNN

95.54

99.64

97.44

LDA

97.49

98.69

97.02

CART

85.66

92.50

87.14

NB

95.11

99.28

99.64

SVM

98.05

98.80

99.23

MLP

97.98

99.52

100.00

5 Discussion In this study, we implemented the toddler dataset for ASD prediction. To begin with, we needed to acquire a satisfactory result using a dataset we already had to do some essential pre-processing on for a machine learning framework. To perform properly, the learning model must be well-trained. Additionally, we should do some good data preprocessing in preparation for future proofreading in the learning model, which refers to the acceptability of models and their proper performance. This is also dependent on pre-processing. Thus, What we did during pre-processing was to retain all of the feature values except for three feature values, in binary. Among those three features, “Ethnicity” was one of them, and it was a string type. Regarding the strings, We encoded the data using a technique called “one-hot encoding.” It converts data of the string type to a binary value, however unique. We applied GLVQ and MGLVQ for feature selection and implemented KNN, LDA, CART, NB, SVM, and MLP classifiers for the dataset. The

Early Stage Prediction Model of Autism Spectrum Disorder Traits

15

Table 7. Comparison of our model to the recent studies. Dataset

Feature reduction

Paper

Accuracy (%)

Fβ (%)

Toddler

Yes

[14]

98.34



99.62

No

[19]

98.77



99.98

No

[18]

100





Yes

Proposed model

100

99.90

AUROC (%)

100

outcomes of all these algorithms were analyzed and observed that MLP showed good performance from others. The accuracy of MLP for the toddlers dataset was obtained as 100% with the subset of proposed MGLVQ methods. Using the MLP classification approach, we can accurately identify ASD traits in toddlers. We evaluated by comparing our model to the recent literature reviews, as indicated in Table 7. Rajab et al. [14] employed the toddlers dataset for the prediction ASD and implemented kNN, AdaBoost, and ID3 classifiers. In their observation, the highest results for AdaBoost as 98.34% accuracy and 99.62% AUROC were found. Akter et al. [19] found 98.77% accuracy and 99.98% AUROC for the toddlers dataset. Hossain et al. [18] observed the results of accuracy 100% for toddlers dataset.

6 Conclusion We investigated the proposed MGLVQ method picked out important features and performed better than others. We tested the performance of several classifiers on different subsets and observed the MLP classification did significantly better than the others. Diagnostic procedures for ASD are rarely inexpensive. We discovered that detecting ASD traits in toddlers at an early stage can help with diagnosis. Using our proposed model, ASD traits in toddlers can be easily diagnosed. ASD diagnosis is a time-consuming procedure. If it can be identified early on, it can save money and time. Because we don’t have enough data to train our model, it has only one limitation. The absence of a dataset is the most likely explanation for the restrictions on the design or technique that affected or influenced our thesis study. It didn’t assist us at all, even after we preprocessed the data. Autism spectrum disorders (ASDs) have traditionally been detected at various ages, but we’ve focused our efforts on detecting ASDs early on. To acquire the most accurate findings possible, we decided to restrict our model to a certain age range. The aim of having such an age limit in our model is to achieve the most accurate results achievable. We can do a lot more with more accurate findings and more data to develop our model. Most countries struggle to recognize autism as early as possible, but with our model and the collection of questions we gathered, the issue may be easily solved. In the future, we will need to examine a large amount of data and try to replace traditional classifiers with deep learning. However, we are considering developing a user-friendly smartphone app for end users based on our suggested model, so that any person may use the application to detect early ASD signs easily and find medical help if necessary.

16

M. Bala and M. Hanif Ali

References 1. Landa, R.: Autism spectrum disorders in the first 3 years of life (2008). https://psycnet.apa. org/record/2008-09656-007 2. Zwaigenbaum, L., Brian, J.A., Ip, A.: Early detection for autism spectrum disorder in young children. Paediatrics Child Health 24(7), 424–432 (2019) 3. Lord, C., Risi, S., Di Lavore, P.S., Shulman, C, Thurm, A., Pickles, A.: Autism from 2 to 9 years of age. Arch. Gen. Psychiatry 63(6), 694–701 (2006) 4. McCarty, P., Frye, R.E.: Early detection and diagnosis of autism spectrum disorder: why is it so difficult?.In: Seminars in Pediatric Neurology, vol. 35, p. 100831. WB Saunders, October 2020 5. Baron-Cohen, S., Allen, J., Gillberg, C.: Can autism be detected at 18 months? The needle, the haystack, and the CHAT. Br. J. Psychiatry 161, 839–843 (1992). https://doi.org/10.1192/ bjp.161.6.839.PMID1483172 6. Miller, L.E., Burke, J.D., Robins, D.L., Fein, D.A.: Diagnosing autism spectrum disorder in children with low mental age. J. Autism Dev. Disord. 49(3), 1080–1095 (2019) 7. Allison, C., Auyeung, B., Baron-Cohen, S.: Toward brief “Red Flags” for autism screening: the short autism spectrum quotient and the short quantitative checklist for autism in toddlers in 1,000 cases and 3,000 controls. J. Am. Acad. Child Adolesc. Psychiatry (2012). https:// doi.org/10.1016/j.jaac.2011.11.003 8. Thabtah F. Autism Screening Data for Toddlers. https://www.kaggle.com/fabdelja/autism-scr eening-for-toddlers. Accessed 10 Sept 2018 9. Islam, S., Akter, T., Zakir, S., Sabreen, S., Hossain, M.I.: Autismspectrum disorder detection in toddlers for early diagnosis using machine learning. In: 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) (pp. 1–6). IEEE 10. Mohanty, A.S., Parida, P., Patra, K.C.: Identification of autismspectrum disorder using deep neural network. J. Phys. Conf. Ser. 1921(1), 012006 (2021). IOP Publishing 11. Saihi, A., Alshraideh, H.: Development of an autism screening classification model for toddlers. arXiv preprint arXiv:2110.01410 12. Akyol, K., Gultepe, Y., Karaci, A.: A study on autistic spectrum disorder for children based on attribute selection and fuzzy rule. In: International Congress on Engineering and Life Science (ICELIS), pp. 804–806 (2018) 13. Alzubi, R., Ramzan, N., Alzoubi, H.: Hybrid attribute selection method for autism spectrum disorder SNPs. In: 2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–7. IEEE Manchester (2017). https://doi.org/10. 1109/cibcb.2017.8058526 14. Rajab, K.D., Padmavathy, A., Thabtah, F.: Machine learning application for predicting autistic traits in toddlers. Arab. J. Sci. Eng. 46(4), 3793–3805 (2021) 15. Satu, M.S., Sathi, F.F., Arifen, M.S., Ali, M.H., Moni, M.A.: Early detection of autism by extracting features: a case study in Bangladesh. In: 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 400–405. IEEE, January 2019 16. González, A.I., Grana, M., D’Anjou, A.: An analysis of the GLVQ algorithm. IEEE Trans. Neural Netw. 6(4), 1012–1016 (1995) 17. Melin, P., Amezcua, J., Valdez, F., Castillo, O.: A new neural network model based on the LVQ algorithm for multi-class classification of arrhythmias. Inf. Sci. 279, 483–497 (2014)

Early Stage Prediction Model of Autism Spectrum Disorder Traits

17

18. Hossain, M.D., Kabir, M.A., Anwar, A., Islam, M.Z.: Detecting autism spectrum disorder using machine learning techniques. Health Inf. Sci. Syst. 9(1), 1–3 (2021) 19. Akter, T., et al.: Machine learning-based models for early stage detection of autism spectrum disorders. IEEE Access 11(7), 166509–166527 (2019) 20. Alaiz-Moreton, H., Aveleira-Mata, J., Ondicol-Garcia, J., Muñoz-Castañeda, A.L., García, I., Benavides, C.: Multiclass classification procedure for detecting attacks on MQTT-IoT protocol. Complexity (2019)

Detection and Estimation of Diameter of Retinal Vessels Abhinav Jamwal(B) Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India [email protected]

Abstract. The change in retinal vessel diameter is a crucial measure of disorders, including hypertension, high blood pressure, and diabetes. Abnormalities in the retinal vasculatures can signify a spectrum of diseases. In retinal fundus images, providing an automatically estimated diameter of blood vessels may aid clinicians in diagnosing diseases like hypertensive retinopathy, which can even lead to blindness. Therefore, automated diagnosis of these diseases requires accurate measurements of vascular diameters. In this research work, this paper have investigated three methods for the estimation of the diameter of the retinal blood vessel. Two publicly available datasets such as Central Light Reflex Image Set (CLRIS) and Vascular Disease Image Set (VDIS) of The Retinal Vessel Image Set for Estimation of Width (REVIEW) database have been used for the comparison of estimated diameter of the retinal vessel. The performance of these three algorithms is compared against the available ground truth average diameter for both CLRIS and VDIS. It was found that the estimated diameter is closer to the available ground truth using morphological-based and center line-based, respectively. However, the estimated diameter using the line integral-based method is significantly different from the ground truth for both datasets. Keywords: Retinal Vessel · Diameter Estimation · Image Segmentation · Fundus image

1 Introduction Retinal vessels are the only component of the central circulation that can be closely examined and directly visualized. Various disorders cause retinal vessels to undergo various morphological changes. The diameter of the retinal vessel is one of the morphological changes. It is believed that changes in retinal vascular diameter inside the fundus are a reasonable indicator of the risk of diabetic retinopathy. The accurate measurement of the diameter of the retinal vessels can help us with the diagnosis. Vascular diameter measurement and vessel segmentation are both crucial and challenging technical tasks that must be accomplished by any system attempting to automate the diagnosis of vascular disorders. © IFIP International Federation for Information Processing 2023 Published by Springer Nature Switzerland AG 2023 E. Mercier-Laurent et al. (Eds.): ICCCSP 2023, IFIP AICT 670, pp. 18–26, 2023. https://doi.org/10.1007/978-3-031-39811-7_2

Detection and Estimation of Diameter of Retinal Vessels

19

In literature, several methods are suggested to segment the vascular network. In this work, this study have used Otsu method, which is a well-known method for image segmentation. Morphological (MO) [1], Center Line (CL) [2], and Line Integral (LI) [3] are methods for the estimation of diameter of retinal vessels in literature. This paper presents a comparison of three techniques for measuring vascular diameter of 2-D retinal image. Kawasaki et al. [4] revealed that retinal artery narrowing and retinopathy are associated with an increased risk of stroke in people without diabetes. After Long-Term Follow-Up in the Rotterdam Cohort Study, any stroke was shown to be substantially correlated with the retinal vascular diameter [5]. Salwan et al. [6] developed a concept for measuring vessel diameter using fundus images. In this, two lengths of plastic tubing with known diameters were infused via blood and photographed. These presumptively have blood column widths of 8–35% and 7–48%. With more precise measurement of diameter on all profiles and the kick points in vessel width, they came to the conclusion that the vessel width can be measured. Wong et al. [7] introduce a technique for measuring vessel width that identifies the fundus associated with abnormalities of micro vascular retinal vessels. The presence and severity of arteriolar inner retinal tightness as well as another microvascular distinctness are identified using original computer-connected imaging techniques. N. Chapman et al. [8] introduced a technique for measuring blood artery width based on arterial diameters obtained from the branch points that maintains a constant shear stress in arterial networks. Other methods for measuring retinal diameter suggested in literature are based on different image processing ideas, such as the graph-theoretic method [9], adaptive Higuchi’s dimension [10], mask creation [11] and multi-step regression method [12].

2 Methodology 2.1 Estimation of Diameter of Retinal Vessel Based on Morphological Operation It is a computer-aided technique for measuring the retinal blood vessel diameter. This method is evaluated on the VDIS and CLRIS of the REVIEW [13] dataset. The images of VDIS and CLRIS are considered for this experiment. In preprocessing, this paper have considered only the green channel of the retinal image as the blood vessels have better contrast in the green channel than the red and blue channels (see Fig. 1). A part of the retinal image is captured to obtain the region of interest, as mentioned in the description of the dataset. Segmentation and Extraction of Region of Interest. The ROI detection begins with selecting the vessel whose diameter needs to be estimated. Once the ROI is captured, as shown in Fig. 1 then, it is segmented by Otsu’s approach, which determines the threshold value in such a way that has minimum intra-class variance. After the ROI is segmented, morphological operation such as opening is applied to reduce the effect of noise. Calculating the Euclidian Distance Transform. Distance transform determines the distance between each pixel in a binary image and all the white neighboring pixels. Let’s say X and Y are two distinct pixels that reside in an image space, with X coordinate being

20

A. Jamwal

Fig. 1. Cropping the ROI from the original image.

(X c ,X d ) and Y coordinate being (Y c ,Y d ). The Euclidean distance (ED) between X and Y is calculated as:  2 (1) D(X , Y ) = ((Xc − Yc )2 + (Xd − Yd ) ) ED technique applied to the obtained binary image. Figure 2(a) shows the extracted ROI Euclidean distance. The center line shown in Fig. 2(b), represents the ED intensity along the boundary from each location. In the ED technique, each binary image pixel is provided a number that indicates the separation between that pixel and the closest white pixel in that image.

Fig. 2. (a) ROI of binary image, (b) EDT

Skeletonise the ROI. In order to identify the blood vessel centerline, this method uses the morphological thinning operation [14] to skeletonize the extracted ROI. The thinning process iteratively lowered the object pixels in the boundary region of the ROI. The thinning procedure keeps important ROI pixel information during the object pixels reduction. If it is unable to narrow the ROI. further, then an iterative thinning procedure is terminated. Once thinning has been halted, the ROI skeleton is estimated by counting the available pixels. The skeletonization of ROI is shown in Fig. 3(b). The thinning method offers a few essential features, such as maintaining the object characteristics and maintaining the original object topology, which is necessary for classifying data, extracting features, identifying objects, and creating a one-pixel width skeleton.

Detection and Estimation of Diameter of Retinal Vessels

21

Fig. 3. (a) ROI of binary image (b) ROI skeletonization

For exact quantification of vessel diameter, it is essential to skeletonize the ROI such that its centerline is of one row. Vessel Diameter Quantification. Finally, the radius of ROI is determined using its center line and the mean ED between several white pixels located along the border. Using the average value of the radius, the mean diameter of the ROI is calculated. Let Q1 , Q2 , Q3 , . . . , Qn be the white pixel at the extracted ROI edge and P1 , P2 , P3 , . . . , Pn be the white pixel on the center line of ROI, and here, ‘n’ stands for total number of pixel pairs. The coordinates for Q1 , Q2 , Q3 , andQn pixels are (X q1 , Y q1 ), (X q2 , Y q2 ), and (X qn , Y qn ), and the coordinates for P1 , P2 and Pn pixels are (X p1 , Y p1 ), (X p2 , Y p2 ), and (X pn , Y pn ). Following that, the mean of ED and the ROI radius can be calculated as:  n 2 2 i=1 ((Xqi − Xpi ) + (Yqi − Ypi ) ) MeanED = (2) n Here, “n” stands for the total number of points used for computing its radius. Finally, the mean diameter value is calculated by simply multiplying it by 2 as: MeanDiameter = 2 ∗ MeanRadius

(3)

2.2 Diameter Calculation Based on Center Line Diameter Calculation of Vessels. The distribution of gray levels in the direction of vascular diameter in the typical retinal image can be seen as an inverted Gaussian model shown in the Fig. 4. This method takes ‘k’ as the vertical dimension between vessel center line and pixel, σ 1 is the vessel diameter span, while g(k) is the pixel gray value. The function of the gray value distribution is given by 

−k 2 g(k) = −exp 2σ1

2 , g(k) = g(−k)

(4)

Extraction of Initial Center Points. This method selects an area of the retinal image that contains the interesting vascular region, as shown in Fig. 1, and computes the center points. In order to determine the center points of the vessel, this method uses the

22

A. Jamwal

Fig. 4. Up-side down Vessel Gaussian model

morphological thinning operation to skeletonize the extracted ROI. After extracting the center pixel of the vessel, this method will find the x and y coordinates of the center pixel of the vessel and then plot it into an image as shown in Fig. 5(a). Vascular Center Points with Polynomial Fitting. In this research work [2], a threeorder polynomial fitting is employed on center points to compute the vascular center line and its vertical line to account for different vascular morphologies. The three-order polynomial fitting equation is given as: y = a0 .x3 + a1 .x2 + a2 .x + a3

(5)

By solving the equation, one may get the value of the coefficients of the polynomials [a0 , a1 , a2 , a3 ] and then compute the derivative to obtain the tangential equation of the center line. The vertical line as shown in Fig. 5(c). Measuring the Diameter Along a Vertical Line. It is challenging to locate the exact intersection point of the vertical line, and vascular wall as no precise threshold can separate the vessels from the background in a gray-level image. The research work [2] employed a segmented image to overcome this problem. Firstly, ROI is captured in a monochrome image of the same size as the grey level vessel, then the center line and vertical line are created for diameter calculation. Figure 6 shows the segmented vessel and vertical lines.

Fig. 5. a) Center points b) Polynomial fitting c) Vertical lines

Detection and Estimation of Diameter of Retinal Vessels

23

Fig. 6. a) Vessel Segmentation. b) Vertical lines (Color figure online)

Now, to extract the diameter of the vessel, this method takes the (X, Y ) coordinates of all the yellow vertical lines, then the pixel value at each yellow line is calculated. It is considered that the coordinate of the first pixel of the yellow line, which has pixel value one as (a1 , b1 ) and the last one as (a2 , b2 ), the shortest distance between these two coordinates of every yellow line is the diameter of vessel and the shortest distance between these two coordinates of every yellow line is calculated as:  d = (a2 − a1 )2 + (b2 − b1 )2 (6)

2.3 Measurement of Retinal Vessel Thickness Based on Minimum Line Integrals This method is based on an intuitive technique for determining an object’s thickness. The simple approach to assess its local thickness is to place two fingers on either side of an item, and then change the angle of the section connecting the fingertips locally (similar to doing so), as long as there is a minimum Euclidean distance between them. This distance is then used to calculate the object’s local thickness. Therefore, this method can be pose as solving a restricted optimization problem, which is minimizing the distance in a particular region. However, in this approach, the area is identified by exactly the point where one would like to define the thickness. As a result, the limitation is that the point must be on the line segment joining the two fingertips. For this, this paper may consider only those line segments that pass through the point where one is interested to determine the thickness. In this case, several lines are drawn using line integral passing through the selected point. The shortest of them is taken as the thickness of the vessel at that point. Measurement of Vessel Diameter. This paper compute each line integral centered at each pixel point, starting from the point of interest and continuing in each of the two opposite directions separately. Once all the line integrals have been calculated (i.e., in all the possible directions, Fig. 7(a)), the minimum amongst them is taken as the diameter at that point (Fig. 7(b)). To reduce the impact of noise, one choice would be to take into account the average of some of the smallest integrals. For this experiment, this paper take the point of interest, and the line integrals are computed in all directions at a step angle of 15°. Then, a scan is performed in all directions until the edge of the vessel intersects with all the line integrals (Fig. 7(b)), and various intersecting points are obtained. The length of all the line integrals is measured and out

24

A. Jamwal

of all, the line integral with the minimum length is considered as the thickness of the vessel (Fig. 7(c)).

Fig. 7. (A) Line integrals, (b) Vessel intersect with edge, (c) Thickness of vessel

3 Result and Discussion The diameter of ROI is determined using 30 different segments from CLRIS [13] images and 75 different segments from VDIS [13] images. The manually diameter estimated on these two datasets from 3 independent observers is available. This paper compute the mean diameter of each vessel of the two datasets and compare with the mean of the diameters computed by three different independent observers. Tables 1 provide the mean vessel diameters determined for VDIS and CLRIS [13] datasets. The output of the existing algorithms, along with manually gathered REVIEW results, are illustrated and compared in Table 1. The name of algorithms is listed in column 2 of Table 1, and the relative percentage difference with respect to average of the observers is shown in column 5. The mean diameter and standard deviation (SD) in pixels utilized for the performance evaluation are provided in columns 3, and 4 of Table 1, respectively. It can be seen that in the case of CLRIS images, the center line method is performing better as compared to the morphological and line integral methods. The morphological approach gives minimum absolute relative percentage difference w.r.t. average of the observers and the line integral method gives maximum absolute relative percentage difference w.r.t. average of the observers. In the case of VDIS images, the centre line method performs better than the morphological method and line integral method. The centre line method gives minimum absolute relative percentage difference w.r.t. average of the observers and the line integral method gives maximum absolute relative percentage difference w.r.t. average of the observers. The results obtained from morphological operation-based method and center line method is close to the manually calculated mean diameter. Therefore, these methods can be seen as potential methods in the application of the diagnostic system of modern ophthalmology that can predict cardiovascular and retinal diseases.

Detection and Estimation of Diameter of Retinal Vessels

25

Table 1. Comparison of the diameter obtained with the existing methods and manual measurement Dataset

Method

Mean Diameter

SD

Absolute relative percentage difference w.r.t average of the observers

CLRIS

Average of the observers

13.8

4.12



VDIS

Center line

13.02

3.91

5.65%

Morphological

12.79

3.92

7.31%

Line integral

15.33

2.05

11.08%

Average of the observers

8.85

2.57



Center line

8.25

1.65

6.77%

8.33

1.3

5.87%

10.69

1.66

20.79%

Morphological Line integral

4 Conclusion and Future Direction The measurement of diameter of retinal vessels plays an important role in the structured analysis of the retina and are potentially valuable for the automated diagnosis of eye diseases such as arteriosclerosis and diabetic retinopathy. This paper have investigated three existing methods such as (i) morphological-based, (ii) Centre line-based and (iii) line integral-based to estimate diameter of retinal vessels. The performance of these three methods is compared with the available ground truth diameter on two publicly available datasets such as CLRIS [13] and VDIS [13]. The results obtained from the morphological and center line-based methods showed that these approaches are reliable and consistent in estimating the blood vessel diameter from retinal images. The performance of line integral based method was found to be inferior among the three methods investigated to estimate the diameter of vessel. As a part of future work, this study seeks to identify vascular bifurcation, branching and crossover points in retinal images. It helps to predict many heart diseases and can be also be used for image registration and biometric features.

References 1. Kipli, K., et al.: A review on the extraction of quantitative retinal microvascular image feature. Comput. Math. Methods Med. 2018 (2018) 2. Liu, L., Yang, T., Fu, D., Li, M.: Retinal vessel extraction and diameter calculation based on tensor analysis. In: 2016 55th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), pp. 1478–1483. IEEE (2016) 3. Aganj, I., Sapiro, G., Parikshak, N., Madsen, S.K., Thompson, P.M.: Measurement of cortical thickness from mri by minimum line integrals on soft-classified tissue. Hum. Brain Mapp. 30(10), 3188–3199 (2009)

26

A. Jamwal

4. Aliahmad, B., Kumar, D.K., Janghorban, S., Azemin, M.Z.C., Hao, H., Kawasaki, R.: Retinal vessel diameter measurement using multi-step regression method. In: 2012 ISSNIP Biosignals and Biorobotics Conference: Biosignals and Robotics for Better and Safer Living (BRC), pp. 1–4. IEEE (2012) 5. Moss, H.E.: Retinal vascular changes are a marker for cerebral vascular diseases. Curr. Neurol. Neurosci. Rep. 15(7), 1–9 (2015) 6. Chen, H., Patel, V., Wiek, J., Rassam, S.M., Kohner, E.M.: Vessel diameter changes during the cardiac cycle. Eye 8, 97–103 (1994) 7. Li, L.-J., Ikram, M.K., Wong, T.Y.: Retinal vascular imaging in early life: insights into processes and risk of cardiovascular disease. J. Physiol. 594(8), 2175–2203 (2016) 8. Chapman, N., et al.: Computer algorithms for the automated measurement of retinal arteriolar diameters. Br. J. Ophthalmol. 85(1), 74–79 (2001) 9. Xu, X., et al.: AV-CasNet: fully automatic arteriole-venule segmentation and differentiation in OCT angiography. IEEE Trans. Med. Imaging (2022) 10. Omori, J., et al.: Prophylactic clip closure for mucosal defects is associated with reduced adverse events after colorectal endoscopic submucosal dissection: a propensity-score matching analysis. BMC Gastroenterol. 22(1), 1–9 (2022) 11. Mahapatra, S., Agrawal, S., Mishro, P.K., Pachori, R.B.: A novel framework for retinal vessel segmentation using optimal improved frangi filter and adaptive weighted spatial FCM. Comput. Biol. Med. 147, 105770 (2022) 12. Engelmann, J., Villaplana-Velasco, A.,Storkey, A., Bernabeu, M.O.: Robust and efficient computation of retinal fractal dimension through deep approximation. In: Antony, B., Fu, H., Lee, C.S., MacGillivray, T., Xu, Y., Zheng, Y. (eds.) OMIA 2022. LNCS, vol. 13576, pp. 84–93. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16525-2_9 13. Al-Diri, B., Hunter, A., Steel, D., Habib, M., Hudaib, T., Berry, S.: A reference data set for retinal vessel profiles. In: 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2262–2265. IEEE (2008) 14. Si¸ ¸ sik, F., Eser, S.E.: Brain tumor segmentation approach based on the extreme learning machine and significantly fast and robust fuzzy c-means clustering algorithms running on raspberry pi hardware. Med. Hypotheses 136, 109507 (2020)

Automated Summarization of Gastrointestinal Endoscopy Video B. Sushma1(B) and P. Aparna2 1

2

Department of Electronics and Communication Engineering, CMR Institute of Technology, Bengaluru 560037, India [email protected] Department of Electronics and Communication Engineering, National Institute of Technology Karnataka, Surathkal, Mangalore 575025, Karnataka, India [email protected]

Abstract. Gastrointestinal (GI) endoscopy enables many minimally invasive procedures for diagnosing diseases such as esophagitis, ulcer, polyps and cancers. Guided by the endoscope’s video sequence, a physician can diagnose the diseases and administer the treatment. Unfortunately, due to the huge amount of data generated, physicians are currently discarding procedural video and rely on a small number of carefully chosen images to record a procedure. In addition, when a patient seeks a second opinion, the assessment of lesions in a huge video stream necessitates a thorough examination, which is a time-consuming process that demands much attention. To reduce the length of the video stream, an automated method to generate the summary of endoscopy video recordings consisting only of abnormal frames by using deep convolutional neural networks trained to classify normal, abnormal and uninformative frames is proposed. Results show that our method can efficiently detect abnormal frames and is robust to the variations in the frames. The proposed CNN architecture outperforms the other classification models with an accuracy of 0.9698 with less number of parameters.

Keywords: Convolutional Neural Networks Endoscopy · Video Summarization

1

· Deep Neural Network ·

Introduction

In recent days, GI tract endoscopy is considered as an active research area in detecting many GI tract abnormalities. A gastroenterologist can explore the GI tract using the GI endoscopic video sequence as a guide to diagnose GI disorders or collect tissue sample for biopsy. The GI tract is illuminated by the endoscope’s light source, and video frames of the illuminated GI tract surfaces are collected by an embedded camera. The video stream provides detailed structural and morphological information about the surfaces of the GI tract, as well as a record c IFIP International Federation for Information Processing 2023  Published by Springer Nature Switzerland AG 2023 E. Mercier-Laurent et al. (Eds.): ICCCSP 2023, IFIP AICT 670, pp. 27–35, 2023. https://doi.org/10.1007/978-3-031-39811-7_3

28

B. Sushma and P. Aparna

of the anatomical regions examined during endoscopy. The video is captured for 20 to 40 min at 30 frames per second with a frame resolution of 1920 × 1080. Due to the huge amount of video data generated, the current practice discards the procedural video and only few frames are selected for recording to avoid the high memory consumption. Another drawback of the current procedure is that it is difficult to diagnose the condition with only a few frames when the patient seeks a second opinion. Suppose the complete video is taken into account for the evaluation, the physician will have to devote a significant amount of time reviewing the entire video, which is also a tedious process.

Fig. 1. Endoscopy video summary generation method consisting of abnormal frames

Fig. 2. Proposed CNN architecture for GI endoscopy frame classification

WCE Video Summarization

29

Fig. 3. Main building blocks of proposed CNN architecture

In literature, many works have been proposed to detect the frames with one or two kinds of GI anomalies such as polyps, ulcers, Crohn’s disease and gastrointestinal cancers [4,9,10,15]. A deep learning method to detect and segment the GI anomalies is in [1]. A new tool to detect the celiac disease using principal component analysis is implemented in [12]. Segmentation of anomaly regions in endoscopy frames is proposed in [11], which can detect one anomaly in a frame. This model cannot detect multiple anomalies in a single frame. The gastroenterologist must still carefully review the majority of the frames with other GI anomalies. In the work proposed in [3,14], to reduce the number of redundant frames for the review, video is segmented into shots and from each shot the most representative frames are extracted. This method extracts frames consisting normal, abnormal and uninformative frames. Therefore a method to identify and remove the normal and uninformative frames is required. To address all of the aforementioned issues, a method that generates a summary comprising solely of abnormal frames is necessary. This work proposes an automated method for generating a summary of GI endoscopy procedural recordings comprising solely on anomalous frames to overcome the problem of storage and review time. The normal frames with out anomalies and frames captured during the movement of the endoscopic instrument that are uninformative are eliminated. This method reduces the number of frames required for storage as well as the number of frames required for review. The proposed method extracts a frame from the procedural video recording and determines whether it is normal, abnormal, or uninformative using a deep convolutional neural network (deep CNN). If an abnormal frame is detected, it is saved in a frame buffer; otherwise ignored as represented in Fig. 1. A deep CNN is trained to detect whether the input frame is abnormal or not. The workload

30

B. Sushma and P. Aparna

of gastroenterologists during second examinations could be reduced using a proposed approach. It could also assist inexperienced gastroenterologists to make better decisions. Furthermore, the suggested method’s automatic summary generation can reduce the physician’s workload, allowing them to focus on significant frames and improve diagnostic performance. In GI endoscopy, sometimes a limited number of features of an image are sufficient to provide the diagnosis. The usage of complex CNN may require more resources for implementation and training. Furthermore, a complicated CNN takes longer to predict and may not give real-time performance when anomalous frames must be saved during an endoscopic procedure. In this work, CNN with fewer layers is proposed which gives the similar performance of deep standard CNN classification architectures requiring less number of resources and prediction time. The remaining part of the paper consists three sections. The methodology in Sect. 2, which describes the proposed CNN architecture, dataset, training and implementation details. Section 3 presents the results obtained by experimental evaluation of the proposed and comparison with existing standard CNN architectures. The last section consists the discussion and conclusive summary of the work.

2 2.1

Methodology CNN Model

The proposed CNN architecture aims at the detection of abnormal frames from GI Endoscopy video. It is implemented using five main blocks, as displayed in Fig. 2. The model accepts an input image of size 256 × 256 consisting three channels. The features of the input images are extracted by stack of convolution layers. The first CRBC block shown in Fig. 3a consists of a 3 × 3 convolution layer succeeded with batch normalization (BN) layer, reLU activation operation and a 3 × 3 convolution layer. The next main component is BRCBRC block shown in Fig. 3b. BRCBRC consists two convolution layers. The extracted features are given to classification-head consisting fully-connected layers and a softmax activation layer. Each convolution layer is described in Table 1 in regard to the number of filters, the size of the filters, and the output dimensions. 2.2

Description of the Dataset

Dataset for classifying the input frame into normal, abnormal and uninformative is created using the HyperKvasir dataset consisting images and videos collected during real GI endoscopy and colonoscopy procedures. The dataset contains 374 videos of normal and various pathological findings which are labelled by GIendoscopists [2]. These videos are captured at 30 fps. From each video sequence, frames are extracted at a frame rate of 1 fps. The extracted frames are cropped to remove the non-image part consisting the text. Finally, images are resized to

WCE Video Summarization

31

Table 1. Parameter details of the proposed CNN architecture Block

Layer-type

No. of Filters Filter Size Output Size



Input





256 × 256 × 4

CBRC

Conv-1 Conv-2

16 16

3×3 3×3

256 × 256 × 16 256 × 256 × 16

BRCBRC-1 Maxpooling – Conv-3 32 Conv-4 32

– 3×3 3×3

128 × 128 × 16 128 × 128 × 32 128 × 128 × 32

BRCBRC-2 Maxpooling – Conv-5 64 Conv-6 64

– 3×3 3×3

64 × 64 × 32 64 × 64 × 64 64 × 64 × 64

BRCBRC-3 Maxpooling – Conv-7 128 Conv-8 128

– 3×3 3×3

32 × 32 × 64 32 × 32 × 128 32 × 32 × 128

BRCBRC-4 Maxpooling – Conv-9 256 Conv-10 256

– 3×3 3×3

16 × 16 × 128 16 × 16 × 256 16 × 16 × 256

– – –

1 × 65536 1 × 256 1×3

FC Layer-1 – FC Layer-1 – FC layer-3 –

256 × 256 pixel resolution. Abnormal and normal class images are the frames extracted from videos of abnormal and normal mucosa findings. Uninformative frames results due to the movement of endoscopy instrument are selected from both normal and abnormal videos. The example images belongs to normal, abnormal and uninformative frames are shown in Fig. 4, Fig. 5 and Fig. 6 respectively. The dataset details used for training the CNN models is given in Table 2 and Table 3.

Fig. 4. Typical frames with normal findings

32

B. Sushma and P. Aparna

Fig. 5. Frames with abnormalities

2.3

Model Training Details

The model developed is trained by using Keras, a deep learning API using Tensorflow back-end on an NVIDIA-TESLA-P100 GPU with 16 GB RAM. The inilr . Network parameters tial learning-rate (lr) is set to 0.0001 with decay = epochs are initialized using Xavier initialization method. Optimum model parameters are found using adam optimizer during back propagation. Network is trained for 50 epochs with the batch size of 8. The model is trained with sparse categorical cross entropy loss function. The learning rate curves are shown in Fig. 7.

Fig. 6. Uninformative Frames Table 2. Dataset partition Class

Train Validation Test

Abnormal

3600

Normal

3606

730

489

Uninformative 3600

720

480

720

480

WCE Video Summarization

33

Table 3. Frames of various abnormalities in abnormal class Abnormality

Train Validation Test

Esophagal-Cancer 570

115

75

Stomach Cancer

555

115

75

Esophagitis

765

150

80

Ulcer

420

95

95

Colon Polyps

675

95

95

Colorectal-Cancer 699

150

60

Fig. 7. Train and validation learning curve with decaying learning rate Table 4. Proposed model performance comparison with standard CNN architectures Model

Precision

AlexNet

0.91 ± 0.02 0.93 ± 0.04 0.9224 ± 0.0135 0.9198 ± 0.0266 60

Recall

Accuracy

F1-Score

Model size TPE 11.55 s

DenseNet-121 0.77 ± 0.01 0.73 ± 0.04 0.7523 ± 0.0198 0.7494 ± 0.0162 0.4

66.44 s

Inception-V4

0.97 ± 0.01 0.95 ± 0.01 0.9621 ± 0.01

0.9598 ± 0.01

25

74.28 s

Resnet-50

0.96 ± 0.01 0.97 ± 0.01 0.9686 ± 0.01

0.9649 ± 0.01

25

32.82 s

SqueezeNet

0.83 ± 0.02 0.85 ± 0.02 0.8426 ± 0.015

0.8398 ± 0.021

1.24

11.45 s

VGG-19

0.97 ± 0.01 0.95 ± 0.03 0.9631 ± 0.02

0.9598 ± 0.015

144

69.23 s

Proposed

0.96 ± 0.01 0.97 ± 0.03 0.9698 ± 0.015

0.9649 ± 0.02

12

13 s

Size: Model size in million parameters, TPE: Training time per epoch in seconds

3

Experimental Results

This section provides the comprehensive analysis of the proposed model and the other deep learning methods proposed for classification. To quantitative evaluation of the models is done by using precision (P), recall (R), F1-Score (F1) and accuracy (ACC) computed using (1) to (4).

34

B. Sushma and P. Aparna

ACC =

R=

TP TP + FN

(1)

P =

TP TP + FP

(2)

TP + TN TP + TN + FP + FN

(3)

(P × R) (4) P +R In the above equations TP, TN, FP and FN represent true positives, true negatives, false positives and false negatives respectively. The model’s performance is systematically evaluated by comparing with the standard CNN architectures ResNet [6], DenseNet [7], VGG [13], SqueezeNet [8], Inception v4 [16] and AlexNet [5] trained with 50 epochs. Table 4 provides the detailed comparative results. For medical data recall should be high compared to precision which is achieved by the proposed with high accuracy. The proposed CNN architecture outperforms the other models with an accuracy of 0.9698 and better recall with 12 million parameters. The model is saved in h5 file which is the form of hierarchical data format. The size of the model after training is 90MB. The performance of the model is close to Inception-v4, Resnet-50 and VGG-19 which is achieved using reduced number of training parameters and training time. Also in the process of training these models, the over-fitting problem was observed. The performance of the denseNet and squeezenet is poor compared to the proposed model. These learning rate curves of the proposed model are represented in Fig. 7 demonstrates how a learning rate drops that is appropriate for the problem and selected model architecture can lead to final weights that are skilled and stable at convergence, which is a desirable quality in a final model at the conclusion of a training run. F1 = 2

4

Conclusion

The generation of GI endoscopy video summary consisting of only abnormal frames is presented in this work. The proposed method reduces the video data and allows the hospitals to save the endoscopy procedural videos. Also, it allows the physicians to have a quick review of the video, when the patient goes for the second opinion. In this work a deep CNN is trained to classify the input frames into normal, abnormal and uninformative frames. Abnormal frames are saved and considered for review when the patient seeks a second opinion. In the work, it is also demonstrated that complexity and depth of the deep CNN is not necessary to achieve better performance. The proposed model provided better results compared to Inception, Resnet-50 and VGG-19 which are very deep CNNs at the reduced complexity. In future a model can be integrated to detect and segment the region of anomalies in each frame.

WCE Video Summarization

35

References 1. Ali, S., et al.: Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy. Med. Image Anal. 70, 102002 (2021) 2. Borgli, H., et al.: Hyperkvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci. Data 7(1), 1–14 (2020) 3. Chen, J., Zou, Y., Wang, Y.: Wireless capsule endoscopy video summarization: a learning approach based on Siamese neural network and support vector machine. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1303– 1308. IEEE (2016) 4. El-Nakeep, S., El-Nakeep, M.: Artificial intelligence for cancer detection in upper gastrointestinal endoscopy, current status, and future aspirations. Artif. Intell. Gastroenterol. 2(5), 124–132 (2021) 5. ul Hassan, M.: Alexnet imagenet classification with deep convolutional neural networks (2018) 6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 7. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 8. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and ¡0.5 mb model size, 1(10). arXiv preprint arXiv:1602.07360 (2016) 9. Jia, X., et al.: Automatic polyp recognition in colonoscopy images using deep learning and two-stage pyramidal feature prediction. IEEE Trans. Autom. Sci. Eng. 17(3), 1570–1584 (2020) 10. Klang, E., et al.: Deep learning algorithms for automated detection of Crohn’s disease ulcers by video capsule endoscopy. Gastrointest. Endosc. 91(3), 606–613 (2020) 11. Kundu, A.K., Fattah, S.A., Wahid, K.A.: Multiple linear discriminant models for extracting salient characteristic patterns in capsule endoscopy images for multidisease detection. IEEE J. Transl. Eng. Health Med. 8, 1–11 (2020) 12. Li, B.N., et al.: Celiac disease detection from videocapsule endoscopy images using strip principal component analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 18(4), 1396–1404 (2019) 13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 14. Sushma, B., Aparna, P.: Summarization of wireless capsule endoscopy video using deep feature matching and motion analysis. IEEE Access 9, 13691–13703 (2020) 15. Sutton, R.T., Zaiane, O.R., Goebel, R., Baumgart, D.C.: Artificial intelligence enabled automated diagnosis and grading of ulcerative colitis endoscopy images. Sci. Rep. 12(1), 1–10 (2022) 16. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

Interpretation of Feature Contribution Towards Diagnosis of Diabetic Retinopathy from Exudates in Retinal Images Kanupriya Mittal(B)

and V. Mary Anita Rajam

Department of CSE, CEG, Anna University, Chennai, India [email protected] Abstract. Diabetic retinopathy is a complication related to diabetes and is one of the leading causes of blindness worldwide. The presence of exudates is one of the earliest clinical signs of diabetic retinopathy. Machine learning algorithms have been used to classify the presence or absence of exudates in retinal images. The black box nature of the image classification based on machine learning models limits the clinical applicability. This paper proposes an explainable machine learning model to detect and classify exudates. Dynamic fuzzy histogram equalization and morphological operations are used for the segmentation of exudates. The XGBoost classifier with a feature vector of eighteen features is used to classify the presence or absence of exudates in images. A SHAP tree explainer is used for the interpretation of the classifier’s output. The SHAP tree explainer helps in the identification and ranking of the features extracted from the images towards output probabilities. Keywords: Model interpretability · Shapley values · Retinal image Fuzzy histogram equalization · Exudates · Classification

1

·

Introduction

Machine learning (ML) models have been widely used in computer aided medical diagnosis systems. Despite that, doctors are doubtful to adhere to the computer aided medical diagnosis systems as the process behind the learning made by computer models is not transparent. Computer aided medical diagnosis systems should be interpretable, understandable and transparent to overcome the black box nature of such systems. Explainability of the machine learning models will help to enhance the trust of clinical experts in the machine learning system. Interpretation of ML models can be in model-specific manner or modelagnostic manner. Model-specific methods are restricted to specific models and the explanations are derived from the internal model parameters [5]. Modelagnostic methods are applicable to any machine learning model [22]. Some of the model-agnostic models are partial dependence plots (PDP), local interpretable c IFIP International Federation for Information Processing 2023  Published by Springer Nature Switzerland AG 2023 E. Mercier-Laurent et al. (Eds.): ICCCSP 2023, IFIP AICT 670, pp. 36–50, 2023. https://doi.org/10.1007/978-3-031-39811-7_4

Interpretation in Retinal Images

37

model-agnostic explanations (LIME) [26], model understanding through subspace explanations (MUSE) [16] and Shapley additive explanations (SHAP) [17]. In this work, we have proposed an interpretable machine learning model for exudates detection from retinal images. Exudates are white or yellowish spots with high contrast, and have varying shapes and sizes. These are the lipid deposits formed due to the damaged blood vessels in the retina. The presence of exudates is one of the indicators of diabetic retinopathy (DR), which is one of the leading causes of blindness in the world. The size of the exudates increases with the disease severity level. The automated process of exudates detection would help in early treatment of DR. The proposed method uses a dynamic fuzzy histogram equalization technique during the pre-processing step, mathematical morphological operations for exudate candidate detection and then exudate segmentation, extreme gradient boosting (XGBoost) classifier for classification and then a SHAP tree explainer for model interpretation. The method is tested on images from publicly available datasets. The technique is fast and requires very low computing power. The paper is organised as follows: The related work is briefed in Sect. 2. Section 3 gives details about the proposed methodology. Section 4 provides the details on the datasets used and evaluation metrics. Section 5 provides the details on the results and discussion. Section 6 discusses the conclusion and future scope.

2

Related Work

Medical image classification of retinal exudates has been around as a research area but only limited work is available in the literature on the interpretable machine learning model for retinal exudates detection and other medical areas. Jiang [13] proposed class activation mapping based interpretable deep learning model for diabetic retinopathy classification. Sayres [27] used integrated gradient based interpretable model for understanding diabetic retinopathy grading using deep learning approach. Several methods have been proposed by various researchers for screening exudates in colour retinal images. A number of methods (thresholding, region growing based, morphology, machine learning) have been proposed for exudates detection. Mathematical morphology-based approaches have been applied by a large number of researchers for exudates segmentation [21]. Welfer [30] proposed a three-step process for the segmentation of exudates on LUV colour space using mathematical morphology technique. The shortcoming of the method was that it had high false positive cases. Wisaeng [32] used a morphology mean shift algorithm for exudates detection. Coarse segmentation was first achieved using mean shift algorithm and then mathematical morphology was applied to achieve fine segmentation. The limitation of the method was that it did not work well in the case of small exudates. Sopharak [29] proposed fuzzy c-means and support vector machines (SVM) based techniques. Image was enhanced using the local contrast method and then FCM was used. SVM was applied on the segmented exudates for better results.

38

K. Mittal and V. M. A. Rajam

The limitation of this method was that its success depended on the accurate removal of the optic disc (OD) and the blood vessels. Wisaeng [31] used fuzzy c-means clustering with mathematical morphology for exudate detection. Sidibe [28] performed bright lesions detection from retinal images. Sparse coding techniques were used for classification. Kusakunniran [15] approach used multi-layer perceptron (MLP) to identify initial seeds. Then the iterative graph cut approach was used for segmentation. The technique used both supervised and unsupervised learning for exudates segmentation. Javidi [11] proposed approach based on morphological component analysis (MCA) with the adaptive representations obtained from dictionary learning. Marin [20] used feature extraction with supervised classification technique for exudates detection. Deep learning methods are recently applied for exudates detection. Prentavsic [25] have used deep convolutional neural networks incorporated with high level anatomical knowledge for exudate detection. Feng’s [6] approach was based on fully convolutional neural network. The state-of-the-art methods discussed above have some limitations and challenges. The requirement of good quality images is one of the limitations. Some of the methods work with bright lesions only and there are false positive cases detected due to dull boundaries and colour. The drawback of machine learning methods is their black box nature, and also they require high computing power and time for processing. Also, the supervised learning approaches are highly dependent on training data for classification. Although the deep learning methods show promising results, the state-of-the-art conventional image analysis methods have reported high sensitivity and accuracy than the deep learning methods. The main challenges for deep learning methods are the limited availability of the labelled training data, the imbalanced representations of different classes and their inability to explain the results to end-user. Keeping the above challenges and limitations in mind, we propose an interpretable exudates detection technique, which is self explainable for the decision it makes, will smoothen the retinal fundus examination process and assist ophthalmologists.

3

Proposed Methodology (SHAP-DFHE-MR Model)

The proposed system is based on dynamic fuzzy histogram equalization technique (DFHE), mathematical morphology and Shapley value based tree explainer and hence is named as SHAP-DFHE-MR model. The schema of the SHAP-DFHE-MR model is presented in Fig. 1. The proposed methodology consists of: preprocessing, optic disc segmentation, exudate candidate detection, feature extraction, classification and SHAP tree explainer interpretation. The SHAP-DFHE-MR model computes the contribution of each feature extracted from the images towards the classification of the presence or the absence of exudates.

Interpretation in Retinal Images

39

Fig. 1. Proposed Methodology (SHAP-DFHE-MR)

3.1

Preprocessing

The input images belong to different datasets and are taken with different camera types. The intensity of the images can vary. Hence, preprocessing of the acquired retinal images is important. The original image (Iorig ) is first transformed to the hue, saturation, intensity (HSI) colour space and then the intensity band is selected. HSI colour space is selected as the exudates are bright lesions with high contrast from the background. Median filter removes the noise from the I-band image. Then the dynamic fuzzy histogram equalization (DFHE) technique is applied to enhance the image to get the preprocessed image (Fig. 2(b)). The preprocessed image (Iprepro ) is then used as input for the next stage.

Dynamic Fuzzy Histogram Equalization (DFHE) Technique. The dynamic fuzzy histogram equalization (DFHE) is a methodology for enhancing the image contrast and preserving the image brightness [10,19]. It is a three-step process. In the first step, a fuzzy histogram computation is performed with a triangular fuzzy membership function to handle the inexactness of grey level values. In the second phase, the fuzzy histogram is divided into sub-histograms and dynamic histogram equalization is done for each sub-histogram. Finally, normalization of image brightness is performed. Assume an image I with grey levels in the range [0, L-1]. The concept of fuzzy sets is applied to deal with the imprecise grey values [12]. The classical concept of histogram is extended to a fuzzy setting by considering grey values as fuzzy ˜ j). A numbers. So let a grey value I(i, j) be considered as a fuzzy number I(i,

40

K. Mittal and V. M. A. Rajam

fuzzy histogram is a sequence of real numbers k(i) where i{0, 1, . . . ., L − 1} and k(i) is the frequency of occurrence of grey levels that are around i. Let μI(i,j) ˜ be a triangular fuzzy membership function then the fuzzy histogram is given as in Algorithm 1. The first and second derivatives of the fuzzy histogram are first calculated. Then the sub histograms are obtained using the local maxima of the fuzzy histogram. The equalization of these sub histograms thus obtained is performed by the dynamic histogram equalization technique given by Ibrahim et al. [10]. Normalization is then applied on the equalized image obtained so that the mean brightness is the same as the original input image. This method gives better results as we are equalizing only the intensity band I, whereas, in other methods R, G, B planes are equalized separately. 3.2

Optic Disc Segmentation

The segmentation and elimination of the optic disc (OD) is an important step in exudates detection as it has similar features (brightness and contrast) as that of exudates, and hence, can lead to false positive cases. OD is the brightest and largest structure in the retinal image. The mathematical morphological closing operation is applied on the pre-processed image (Iprepro ) with the flat disk structuring element of size 4. Then thresholding give the optic disc region and a masked image is formed by complementing the threshold image. Then, mathematical morphological dilation operation with the flat disk structuring element of size 4 is applied to expand the optic disc region and dilation is done again with the flat disk structuring element of size 8 to achieve better results and the reconstructed image is achieved. Then the difference is taken between the original pre-processed image (Iprepro ) and the reconstructed image and Otsu threshold is applied. The Otsu algorithm is used to automatically select the image threshold value [23]. The candidate optic disc regions thus obtained are shown in Fig. 2(c). In our work, we have used area as a shape feature to find the optic disc, the region with the largest area. All the connected components of the image in Fig. 2(c) are found using 8-pixel connectivity. A connected component is a cluster of pixels connected to each other through 8-pixel connectivity. The 8pixel connectivity groups the pixels if they are connected along their edges or corners. For each of the connected components, area is calculated. Then the region with the largest area is selected as the OD. The image obtained is then inverted and binarized to show the localised optic disc. The image (Iod ) with the optic disc segmented is shown in Fig. 2(d). 3.3

Exudates Detection

The pre-processed image (Iprepro ) along with the OD segmented image (Iod ) is processed for the extraction of exudates. The high-contrast retinal blood vessels are first removed from the pre-processed image with a morphological closing operator and a disk-shaped structuring element of size 8, as in Fig. 3(a).

Interpretation in Retinal Images

41

Algorithm 1: Pre-processing Algorithm Input : Original Image (Iorig ) Output: Pre-processed Image (Iprepro ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

foreach (Iorig ) do /* convert RGB to HSI colour space */ HSI Img = RGB2HSI(Iorig ) F Img = MedianFilter(HSI Img(:,:,3)) /* DFHE technique */ N ← total number of pixels in F Img [0, L − 1] ← grey level range in F Img foreach i[0, 1, . . . ., L − 1] do k(i) ← frequency of occurrence of grey levels F (i) ← fuzzy histogram [Imin , Imax ] ← fuzzy histogram intensity range μI(i,j) ← triangular fuzzy membership function ˜ /* Fuzzy histogram creation */ μI(i,j) = max(0, 1 − |I(i, j) − i|/4) ˜   F (i) ← k(i) + i j μI(i,j) ˜ /* Histogram partitioning */ F  (i) = dFdi(i) 2

F  (i) = d diF2(i) local maxima points [m0 , .., mn ] ∀F  (i) = 0, F  (i) < 0 sub histograms = [Imin , m0 ],...,[mn+1 , Imax ] Map sub histograms Iequalized ← equalized image end /* Normalization */ m1 ← mean brightness of Iorig m2 ← mean brightness of Iequalized Iprepro = m1 I m2 equalized end

In this work, we have used the local variance to get the mean standard deviation. This helps to get the local image characteristics of the exudates clusters as in Fig. 3(b). A small sliding window is used to compute the local variance of an image. The window is moved throughout the entire image by moving one pixel at a time. Then the mean of all the local variances computed is taken to get the local variance of the image as in algorithm 2 [1,33]. The local variant image is then thresholded using Otsu algorithm and then flood fill operation is applied to get the candidate exudates regions. Thresholding will remove all the regions with low local variation. The optic disc detected in the previous step (Iod ) is dilated with a diskshaped structuring element of size 15. This image is then subtracted from the image (If lf ill ) as obtained above to remove the optic disc, as per Eq. 1. The image (Icandex ) thus obtained has all the candidate exudates regions and is shown in Fig. 3(c). The resultant image is then used for final level exudates detection.

42

K. Mittal and V. M. A. Rajam

Fig. 2. Optic disc extraction stage (a)original Image (Iorig ) (b)DFHE enhanced (preprocessed) image (c)candidate OD regions (d)OD segmented image (Iod )

Icandex = If lf ill − dilate(Iod )

(1)

Morphological reconstruction is then applied on the image obtained. The morphological reconstruction is based on the dilation process and it has some unique features. The technique uses connectivity concept with two images (a marker and a mask). Basically, repeated dilation of marker image is done till its contour fits the mask image. Hence, final dilation results in reconstructed image. The image Icandex is used to create a marker image, Imarker as per Eq. 2. Imarker = Iprepro · ∗(1 − Icandex )

(2)

This image, Imarker is now morphologically reconstructed upon the preprocessed image, Iprepro . The resultant image, Ireconstrct is as shown in Fig. 3(d). The difference is taken between the images Iprepro and Ireconstrct and then a threshold is applied on this difference image to get the final exudate detected image, Iex , as in Fig. 3(e). Then, all the exudates pixels in Iex are surfaced on the original image (Iorig ) as in Fig. 3(f). 3.4

Feature Extraction and Classification

Features are unique attributes of any image and the feature extraction process aims at pixel depiction using a feature vector. In this work, the automated exudate detection system requires features of healthy and diseased images to

Interpretation in Retinal Images

43

Algorithm 2: Exudate Detection Input : Pre-processed image (Iprepro ) Input : OD segmented image (Iod ) Output: Ex detected image (Iex ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

foreach Iprepro , Iod do se2 ← structuring element size 8 inobv = (Iprepro ⊕ se2)  se2 /* Local variance */ foreach Jinobv do J ← block of (M,N) in inobv μ ← mean of pixel intensities of J Jm,n ← pixel intensity at (m,n)  −1 N −1 μ = M1N M Jm,n m=0  −1n=0 N −1 2 σ 2 = M N1 −1 M m=0 n=0 (Jm,n − μ) 2 ilocalvar ← mean of all σ end iotsuth ← Otsu threholded on ilocalvar If lf ill ← flood-fill on iotsuth se3 ← structuring element size 15 idilate = (Iod ⊕ se3) Icandex = (If lf ill − idilate ) /*Remove optic disc */ Imarker ← marker image of Icandex Ireconstrct ← morphological reconstruction on Imarker and Iprepro Iex ← Exudates detected image Iex = threshold(Iprepro − Ireconstrct ) end

distinguish between them. We have extracted both morphological and texture features. The morphological features give the shape and size of the extracted exudates region. The texture features give the information about the variation in the pixel intensities of the extracted exudates region. The features extracted are area and perimeter of the candidate exudate region, number of exudate candidates, GLCM features, 1st derivative and 2nd derivative of Gaussian filter with 2D Gaussian kernel, average pixel intensity of the input image and average contrast of the image. The Grey Level Co-occurrence Matrix (GLCM) method is used for capturing second-order statistical texture features of an image. Haralick has used GLCM to define 14 measures for extracting texture information [8]. In this work, 5 features, namely, contrast, entropy, homogeneity, energy (angular second moment) and correlation are considered. A feature vector is created with eighteen features to distinguish between exudate (class 1) and non-exudate/healthy images (class 0). The XGBoost classifier is used for the classification. The extreme gradient boosting (XGBoost) classifier is a decision tree based ensemble classifier and uses gradient boosting framework [2]. The extracted feature vector with eighteen features is given as input to the

44

K. Mittal and V. M. A. Rajam

XGBoost classifier. The image with the presence of exudates is classified as class 1 and the healthy image with no exudates is classified as class 0 by the XGBoost classifier. 3.5

SHAP Tree Explainer Interpretation

The Shapley Additive explanations (SHAP) is a game theory based approach which can explain the output of any machine learning model. Each feature is assumed as a player in a game and the prediction is the pay-out. The SHAP approach then explains the prediction by how fairly the pay-out is distributed among the features. Different features have different contributions (magnitude and sign) to the model’s output. A fair reward to each feature can be assigned by computing Shapley values for the features [17]. The Shapley values show the importance of each feature in making the prediction.

Fig. 3. Exudate segmentation (a)image without high contrast blood vessel (b)image with local variance (c)candidate region image [(Icandex )] (d)morphological reconstructed image [(Ireconstrct )] (e)Exudate segmented image [(Iex )] (f)result superimposed in original image

The SHAP tree explainer calculates Shapley values for tree based machine learning models. It is a very fast, model-specific approach. The fast speed of the model helps in the interpretation of feature importance, feature dependence, and feature interactions. The Shapley values also help to plot SHAP summary and

Interpretation in Retinal Images

45

dependence plots, which support in the understanding of tree based machine learning models [18]. In this work, the SHAP tree explainer is used to explain the classification outcome of the XGBoost classifier. For each input image, the XGBoost classifier and the features extracted are given as input to the SHAP tree explainer. The SHAP tree explainer calculates a Shapley value for each feature of each input image. For a particular feature, the average of the Shapley values found for all the images is computed. Thus, 18 average Shapley values corresponding to the 18 features are computed. These average values are used to know how each feature contributes to the classification. The higher the Shapley value, the more significant is the contribution of the feature.

4

Datasets

The proposed work is tested on four datasets. The online datasets used are DIARETDB1, e-optha EX, MESSIDOR and IDRID databases. All these available datasets have ground truth annotations of exudates. The DIARETDB1 database [14] consists of 89 colour fundus images, with 84 images having mild non-proliferative signs of diabetic retinopathy, and 5 normal images [14]. The IDRID database contains 516 eye fundus colour images with 4288 × 2848 pixels resolution [24]. The e-optha EX database has 82 retinal fundus images, with 47 pathological and remaining healthy retinal images [3]. The MESSIDOR database contains 1200 eye fundus colour images with a 45-degree field of view and 1440 × 960, 2240 × 1488 or 2304 × 1536 pixels resolution [4].

5

Results and Discussion

The performance of the exudates segmentation and the classification model is evaluated using the parameters, sensitivity, specificity and accuracy. These performance parameters are obtained by pixel to pixel comparison between the automated segmented image and the reference ground truth [14]. 5.1

Performance Evaluation of Segmentation of Exudates

A lesion is considered a cluster of exudates. At the lesion level, the proposed method shows a sensitivity, specificity and accuracy of 94.76%, 99.42% and 98.61% for DiaretDB1 database, 95.79%, 99.39% and 98.99% for IDRID database and 88.28%, 99.32%, 98.66% for e-optha EX database respectively. Table 1 shows that our method has achieved high performance when compared to other methods. The proposed method is evaluated on a large set of databases as compared to the existing work. The tiny exudates have high variability in shape and contrast and hence are difficult to detect. This has caused some false negative cases in our work.

46

5.2

K. Mittal and V. M. A. Rajam

Performance Evaluation of Classification

The classification of the presence or the absence of exudates in an image is very useful for diabetic retinopathy detection. The feature vector with eighteen features is classified using the XGBoost classifier. A sensitivity, specificity and accuracy of 96.76%, 94.02% and 94.14% is achieved. The area under the curve receiver operating characteristics (AUC-ROC) is also plotted. ROC represents probability curve and AUC is the measure of the classifier’s ability to identify the classes. Figure 4 shows the AUC-ROC curve for the XGBoost classifier. In literature, very limited work is available for evaluation at the image level. The results clearly indicate the high performance of the proposed method at image level criteria. The results show the ability of the proposed method to differentiate images based on the presence or the absence of exudates. 5.3

Analysis of Explainability

The SHAP tree explainer is used to interpret the output of the XGBoost classifier. Figure 5(a) shows the SHAP feature importance plot for the XGBoost classifier trained for exudates classification. This plot shows the average Shapley value for each feature (for all the 18 features) calculated across the dataset. Features which have larger Shapley values are important. It can be seen that the Table 1. Comparative performance of the proposed method at lesion level Authors

Method used

Database

Sensitivity% Specificity% Accuracy%

Welfer [30]

Mathematical morphology Morphology and active contour based technique Morphology mean shift algorithm Contextual cues and ensemble classification MLP and iterative graph cut method

DiaretDB1

70.40

98.84

-

DiaretDB1

75

-

-

DiaretDB1

97.05

97.18

97.14

DiaretDB1 e-ophtha EX DiaretDB1 e-ophtha EX Morphological e-ophtha component analysis EX Fully CNN Private dataset SHAP-DFHEDiaretDB1 MR e-ophtha EX IDRID

92.42 81.20

81.25 94.60

87.72 89.25

89.10 56.4

-

-

80.51

99.84

-

81.35

98.76

-

94.76 88.28 95.79

99.42 99.32 99.39

98.61 98.66 98.99

Harangi et al. [9] Wisaeng [32] Fraz [7]

Kusakunniran [15] Javidi [11] Feng [6] Proposed Method

Interpretation in Retinal Images

47

Fig. 4. AUC-ROC Curve

feature ’number of exudate candidates’ has the largest Shapley value and hence is the most important feature that contributes to the classification. Features with lower Shapley values can hence be removed. As can be seen in Fig. 5(a), the two features (second-order derivative Gaussian features for σ = 2.0) have low Shapley values and hence, do not contribute to the decision making. Hence, we have removed these two features and trained the classifier again with a feature vector of 16 features. After the removal of these two features, the classification accuracy of the model increased from 94.14% to 96.83%. The summary plot in Fig. 5(b) shows the feature importance after the removal of the two unimportant features. In the summary plot, the Shapley values of all the images are plotted for each feature. The X-axis represents the Shapley value and the Y-axis represents the feature. The overlapping points show the density (more number of images having about the same Shapley value) and the colour shows the feature value. As can be seen in the Fig. 5(b), the features are arranged based on their importance. The SHAP explanation force plots for two images (one with exudates and the other without exudates) are shown in Figs. 5(c, d). The base value is the average model output for the images in the training dataset. The model output is taken as 1 for images with exudates and 0 for images without exudates. From the figures, it can be seen that the features shown in red force the prediction higher and the features that are shown in blue force the prediction lower than the base value. The size of the arrow gives the magnitude of that particular feature’s effect. The force plot yields a constructive summary for the prediction. In Fig. 5(e) the XGBoost classifier has incorrectly classified the image as class 0 (no exudate present), the SHAP explanation force plot drawn for this image shows features that are responsible for shifting the output towards incorrect classification. Thus, the SHAP analysis helps in understanding the model errors with reasons for the incorrect results. It analyses the features which make strong positive and negative contributions.

48

K. Mittal and V. M. A. Rajam

Fig. 5. SHAP plots (a) SHAP feature importance (18 features) (b) SHAP summary (16 features) (c) SHAP explanation of image with exudates (class1) (d) SHAP explanation of image without exudates (class 0) (e) SHAP explanation of image with error in classification

6

Conclusion

This research paper proposes a Shapley value based interpretable machine learning model for segmentation of exudates from retinal images. The proposed model is able to segment exudates from the retinal images using dynamic fuzzy histogram equalization technique and mathematical morphology operations with high accuracy. A feature vector with eighteen features is extracted and is given as input to the XGBoost classifier to classify the presence or absence of exudates in images. Then the SHAP tree explainer is used to interpret the classification made by XGBoost classifier. The SHAP tree explainer clarifies the black box nature of the classifier used. We found that our model provided meaningful explanations of the classification made for the two classes and the features which are most prominent for making correct classification. In medical imaging applications, such insights are of vital importance. Thus, the SHAP approach can be used for the analysis of machine learning models for medical applications and can assist medical experts by making the model interpretable, understandable and trustable.

Interpretation in Retinal Images

49

References 1. Bocher, P.K., McCloy, K.R.: The fundamentals of average local variance-part i: detecting regular patterns. IEEE Trans. Image Process. 15(2), 300–310 (2006) 2. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016) 3. Decenci`ere, E., et al.: Teleophta: machine learning and image processing methods for teleophthalmology. Irbm 34(2), 196–203 (2013) 4. Decenci`ere, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., Gain, P., Ordonez, R., Massin, P., Erginay, A., et al.: Feedback on a publicly distributed image database: the messidor database. Image Anal. Stereol. 33(3), 231–234 (2014) 5. Du, M., Liu, N., Hu, X.: Techniques for interpretable machine learning. Commun. ACM 63(1), 68–77 (2019) 6. Feng, Z., Yang, J., Yao, L., Qiao, Y., Yu, Q., Xu, X.: Deep retinal image segmentation: a fcn-based architecture with short and long skip connections for retinal image segmentation. In: International Conference on Neural Information Processing, pp. 713–722. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70093-9 76 7. Fraz, M.M., Jahangir, W., Zahid, S., Hamayun, M.M., Barman, S.A.: Multiscale segmentation of exudates in retinal images using contextual cues and ensemble classification. Biomed. Signal Process. Control 35, 50–62 (2017) 8. Haralick, R.M., Shanmugam, K., et al.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 6, 610–621 (1973) 9. Harangi, B., Hajdu, A.: Automatic exudate detection by fusing multiple active contours and regionwise classification. Comput. Biol. Med. 54, 156–171 (2014) 10. Ibrahim, H., Kong, N.S.P.: Brightness preserving dynamic histogram equalization for image contrast enhancement. IEEE Trans. Consum. Electron. 53(4), 1752–1758 (2007) 11. Javidi, M., Harati, A., Pourreza, H.: Retinal image assessment using bi-level adaptive morphological component analysis. Artif. Intell. Med. 99, 101702 (2019) 12. Jawahar, C., Ray, A.: Incorporation of gray-level imprecision in representation and processing of digital images. Pattern Recogn. Lett. 17(5), 541–546 (1996) 13. Jiang, H., Yang, K., Gao, M., Zhang, D., Ma, H., Qian, W.: An interpretable ensemble deep learning model for diabetic retinopathy disease classification. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2045–2048. IEEE (2019) 14. Kauppi, T., Kalesnykiene, V., Kamarainen, J.K., Lensu, L., Sorri, I., Raninen, A., Voutilainen, R., Uusitalo, H., K¨ alvi¨ ainen, H., Pietil¨ a, J.: The diaretdb1 diabetic retinopathy database and evaluation protocol. BMVC 1, 1–10 (2007) 15. Kusakunniran, W., Wu, Q., Ritthipravat, P., Zhang, J.: Hard exudates segmentation based on learned initial seeds and iterative graph cut. Comput. Methods Programs Biomed. 158, 173–183 (2018) 16. Lakkaraju, H., Kamar, E., Caruana, R., Leskovec, J.: Faithful and customizable explanations of black box models. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 131–138 (2019) 17. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4765–4774 (2017) 18. Lundberg, S.M., Erion, G.G., Lee, S.I.: Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888 (2018)

50

K. Mittal and V. M. A. Rajam

19. Magudeeswaran, V., Ravichandran, C.: Fuzzy logic-based histogram equalization for image contrast enhancement. Mathematical Problems in Engineering 2013 (2013) 20. Marin, D., Gegundez-Arias, M.E., Ponte, B., Alvarez, F., Garrido, J., Ortega, C., Vasallo, M.J., Bravo, J.M.: An exudate detection method for diagnosis risk of diabetic macular edema in retinal images using feature-based and supervised classification. Med. Biological Eng. Comput. 56(8), 1379–1390 (2018). https:// doi.org/10.1007/s11517-017-1771-2 21. Mittal, K., Mary Anita Rajam, V.: Computerized retinal image analysis-a survey. Multimedia tools and Applications (2020) 22. Molnar, C.: Interpretable machine learning. Lulu com (2019) 23. Otsu, N.: A threshold selection method from gray-level histograms [j]. Automatica 11(285–296), 23–27 (1975) 24. Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sahasrabuddhe, V., Meriaudeau, F.: Indian diabetic retinopathy image dataset (IDRiD): a database for diabetic retinopathy screening research. Data 3(3), 25 (2018) 25. Prentaˇsi´c, P., Lonˇcari´c, S.: Detection of exudates in fundus photographs using deep neural networks and anatomical landmark detection fusion. Comput. Methods Programs Biomed. 137, 281–292 (2016) 26. Ribeiro, M.T., Singh, S., Guestrin, C.: Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016) 27. Sayres, R., Taly, A., Rahimy, E., Blumer, K., Coz, D., Hammel, N., Krause, J., Narayanaswamy, A., Rastegar, Z., Wu, D., et al.: Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy. Ophthalmology 126(4), 552–564 (2019) 28. Sidib´e, D., Sadek, I., M´eriaudeau, F.: Discrimination of retinal images containing bright lesions using sparse coded features and SVM. Comput. Biol. Med. 62, 175– 184 (2015) 29. Sopharak, A., Uyyanonvara, B., Barman, S.: Automatic exudate detection from non-dilated diabetic retinopathy retinal images using fuzzy c-means clustering. Sensors 9(3), 2148–2161 (2009) 30. Welfer, D., Scharcanski, J., Marinho, D.R.: A coarse-to-fine strategy for automatically detecting exudates in color eye fundus images. Computerized Med. Imaging Graph. 34(3), 228–235 (2010) 31. Wisaeng, K., Sa-Ngiamvibool, W.: Improved fuzzy c-means clustering in the process of exudates detection using mathematical morphology. Soft. Comput. 22(8), 2753–2764 (2018) 32. Wisaeng, K., Sa-Ngiamvibool, W.: Exudates detection using morphology mean shift algorithm in retinal images. IEEE Access 7, 11946–11958 (2019) 33. Yang, J., Zhu, G., Shi, Y.Q.: Analyzing the effect of jpeg compression on local variance of image intensity. IEEE Trans. Image Process. 25(6), 2647–2656 (2016)

False Positive Reduction in Mammographic Mass Detection S. Shrinithi, R. Lavanya(B) , and Devi Vijayan Amrita School of Engineering, Amrita Vishwa Vidyapeetham University, Coimbatore, India [email protected]

Abstract. A mortal disease affecting women globally is breast cancer. The survival rate rises in the matter of breast cancer when the presence of mass is identified ahead of time through a mammogram. The mass is obscured in dense breast tissues and hence the sensitivity is limited in the case of mammography. A computer aided diagnosis (CAD) system helps in overcoming the sensitivity issue in mammography but in turn the system is prone to many false positives. The proposed work incorporates the development of automated density-specific models for false positive reduction. A feature-classifier combination that performs the false positive reduction efficiently in each expert model has been identified. Normal and abnormal mammograms of all the four density types from the Image Retrieval in Medical Applications (IRMA) version of the Digital Database for Screening Mammography (DDSM) database have been employed in this work, resulting in the classification accuracy of 96%, 80%, 76% and 88% and false positive rates (FPR) of 0, 0.26, 0.25 and 0.12 for each density-specific mass detection models respectively. Keywords: Breast cancer · Breast density · False positive reduction · Mammogram · Mass · Mass detection

1 Introduction Breast cancer is a mortal disease that affects women universally. An estimation states that, annually, about 508,000 female population die from breast cancer [1]. Mammography is utilized for detecting breast cancer in prior [2]. Breast Imaging Reporting and Data System (BI-RADS) has classified breast density into the following four categories such as almost entirely fatty, scattered areas of dense tissues, heterogeneously dense and extremely dense. The sensitivity of the mass detection decreases as the density increases [3, 4]. The dense areas and masses present in the breast tissue tend to appear bright in a mammogram resulting in an undifferentiable appearance. Whereas, the non-dense areas appear dark as the X-rays transverse through them [5, 6]. Computer Aided Diagnosis (CAD) systems are built for automating the mass detection in the breast tissues. A CAD system incorporates the steps starting from preprocessing followed by segmentation, extraction of features, feature selection and classification. Generally, a CAD system helps in overcoming the lower sensitivity issue © IFIP International Federation for Information Processing 2023 Published by Springer Nature Switzerland AG 2023 E. Mercier-Laurent et al. (Eds.): ICCCSP 2023, IFIP AICT 670, pp. 51–65, 2023. https://doi.org/10.1007/978-3-031-39811-7_5

52

S. Shrinithi et al.

occurring in the process of mammography. The segmentation part of the CAD system tends to identify all the suspicious regions as masses increasing the sensitivity as well as the number of false positives. This condition generally occurs in the case of dense breasts where the mass as well as the dense breast tissue appears bright making it difficult to conclude the breast tissue as abnormal or normal. So, false positive reduction is performed following the mass detection stage to precisely classify the breast tissue as normal or abnormal [7]. The proposed work mainly focuses on the development of automated density-specific mass detection models that incorporates the false positive reduction stage. Four separate models have been developed for addressing each breast density category to avoid misinterpretations as it is tedious for a single model to adapt according to each breast density type and provide the result. Features extracted using various detector and descriptor combinations have been employed and a comparative analysis has been done to identify the significant feature detector and descriptor for each model. A comparative analysis of the density-specific mass detection models and a non-density specific mass detection model has also been performed to highlight the significance of density-specific models. The paper is arranged in this manner; Sect. 2 comprising the Literature Survey, Sect. 3 dealing with the Proposed Methodology, Sect. 4 including the Results and Discussion and Sect. 5 dealing with the Conclusion.

2 Literature Survey The various research works that dealt with false positive reduction in mammographic mass detection have been discussed. Several image processing, deep learning and machine learning techniques had been incorporated in the works. Sami Dhahbi et al. [8], dealt with the reduction of false positives using mammographic texture analysis in computer-aided mass detection using the region of interests (ROIs) from the DDSM database. Kolmogorov–Smirnov distance, maximum subregion descriptors, Hilbert’s image representation, fractal texture analysis and gray level cooccurrence matrix (GLCM) had been employed for extracting the features and random forest classifier had been employed for classification. Khaoula Belhaj Soulami et al. [9], dealt with the classification of mammogram masses as abnormal or normal and normal, malignant or benign using the deep learning capsule network. Density-specific mass detection results had also been analysed. Romesh Laishram et al. [10], dealt with the diagnosis and detection of masses in mammograms from the DDSM and miniMammographic Image Analysis Society (MIAS) database. False positive reduction had been done by extracting the multi-gradient local quinary pattern (M-GQP) features and then classifying it as normal or abnormal and malignant or benign. Jayasree Chakraborty et al. [11], dealt with the automatic diagnosis and detection of a mammographic mass from DDSM. GLCM features had been extracted from the suspicious ROIs and were classified as normal or abnormal. Multi-resolution features had also been extracted from the ROIs and were classified as benign or malignant. João Otávio Bandeira Diniz et al. [12] dealt with the implementation of density-adapted mass detection models employing the DDSM database. The density classifier model and the density-adapted mass detection models employed the convolutional neural network (CNN) classifier. Yanfeng

False Positive Reduction in Mammographic Mass Detection

53

Li et al. [13], dealt with the detection of masses in mammograms by employing the CNN classifier and bilateral analysis. Mammograms from the InBreast databases had been employed and mass detection had been performed using the Siamese-Faster-RCNN network. From the literature survey it can inferred that, the development of density-specific models for false positive reduction in mammograms have not been explored much. The incorporation of four breast density categories and machine learning techniques to develop the density-specific models have also not been carried out.

3 Proposed Methodology The proposed work’s block diagram is shown in the Fig. 1. Based on breast density type in accordance with the BI-RADS density assessment categories, four density-specific mass detection models have been developed. Each model incorporates the following phases; pre-processing, segmentation, extraction of features, selection of features and classification. The abnormal and normal mammograms undergo pre-processing and the suspicious regions are segmented as region of interests (ROIs). Various features using different detector and descriptor combinations are then extracted from the ROIs. Feature selection has been done to select the significant features that help in an efficient classification. Each density-specific mass detection model performs a binary classification and various machine learning classifiers are used to classify the significant feature set of each ROI as abnormal or normal resulting in false positive reduction. A comparison of the various features and classifiers used is done and in accordance with the performance metrics obtained, the classifier and the feature set that performs the false positive reduction in an efficient manner has been identified for each density-specific mass detection model. 3.1 Dataset Description The mammogram dataset utilized in this work is acquired from the Image Retrieval in Medical Applications (IRMA) version of the Digital Database for Screening Mammography (DDSM) database. The mammograms are in the PNG format and are grayscale images. The dataset consists of abnormal and normal mammograms of both MLO and CC views. This work has employed 624 normal mammograms and 1098 abnormal mammograms from 197 patients and 307 patients, respectively. The number of segmented ROIs employed by the four density-specific mass detection models have been listed in the Table 1. 3.2 Data Pre-processing Data pre-processing enhances the quality of the mammograms for better findings. The connected component labelling method has been employed to perform the label removal as this method aids in the removal of the desired connected component. Here, the desired component to be removed is the label. Following the previous step, pectoral muscle visible in mediolateral oblique (MLO) view mammograms has been removed using the

54

S. Shrinithi et al.

Fig. 1. Block diagram of the proposed methodology. Table 1. Normal and abnormal ROIs of each breast density category. Density category

Normal ROIs

Abnormal ROIs

1

194

224

Total 418

2

652

682

1334

3

620

631

1251

4

293

263

446

False Positive Reduction in Mammographic Mass Detection

55

convex hull method [14]. This muscle is removed precisely from the whole breast area in the mammogram. Due to low quantum counts during the X-ray imaging process, mammograms are prone to quantum noise [15]. Wiener filter with a window size of 5 × 5 has been employed to eliminate the noise in both the MLO and craniocaudal (CC) view mammograms. Contrast limited adaptive histogram equalization (CLAHE) is employed for performing contrast enhancement on the denoised images as this method would enhance even minute details to a great extent that aids further analysis. Segmentation of suspicious regions has been performed using the k-means clustering method. Suspicious regions refer to the dense breast areas that tend to appear like masses in the breast tissue. The k-means method clusters the image on the basis of the intensity level. The ‘k’ value decides the number of clusters. Each cluster contains a group of pixels with a specific intensity level. From the obtained clusters, connected components containing the brightest pixels are extracted as the suspicious regions, as these regions are dense with high intensity levels. The segmentation algorithm is developed with high sensitivity and this is because, the algorithm is designed with a fixed larger value of ‘k’ to segment all the suspicious regions irrespective of a normal dense breast tissue resulting in a large number of false positives. 3.3 False Positive Reduction in Density-Specific Mass Detection Model The false positive reduction phase includes the extraction of features from the ROIs that were suspected as masses, followed by the classification of the extracted ROI as abnormal or normal. Various features have been extracted from the ROIs employing several detector and descriptor combinations that include KAZE, scale invariant feature transform (SIFT), speeded-up robust features (SURF), binary robust independent elementary features (BRIEF), oriented fast and rotated brief (ORB) and binary robust invariant scalable keypoints (BRISK). A detector detects the most prominent feature points in an abnormal and normal image and a descriptor describes the detected feature points and forms feature vector. The dimension of a feature vector is 1 × 500. Local binary pattern (LBP), gray level co-occurance matrix (GLCM), histogram statistical and gabor features are also extracted with vectors of dimension 1 × 59, 1 × 5, 1 × 5 and 1 × 2 respectively. A total of 14 features have been utilized and a comparative analysis is done on the basis of their performance for each density-specific mass detection model. The various detector and descriptor combinations [16] employed for extracting the features apart from LBP, GLCM, gabor and histogram statistical features are listed in Table 2. Feature selection process involves the selection of the prominent features from the entire feature set using the method analysis of variance (ANOVA). The method works on the basis of the F-scores assigned to each feature [17]. A comparison of all the features is done and the feature performing the classification of abnormal and normal breast tissues efficiently has been identified. Classifiers like Support vector machine (SVM), K-nearest neighbours (KNN), Random forest, Decision tree, Naïve bayes, Logistic regression, Extra trees classifier, Adaptive boosting, Gradient boosting and Multi-layer perceptron (MLP) have been utilized for performing the classification task in all the four mass detection models that results in false positive reduction. The selected features undergo min-max normalization prior to classification. K-fold cross validation is done and the

56

S. Shrinithi et al. Table 2. Various detector and descriptor combinations. Detector

Descriptor

Detector

Descriptor

KAZE

KAZE

SIFT

SIFT

KAZE

SURF

SIFT

KAZE

KAZE

SIFT

BRIEF

SURF

SURF

SURF

BRISK

SURF

SURF

KAZE

ORB

ORB

hyperparameter tuning is also done to acquire the efficient parameters to perform classification. A comparison between the classifiers used is done. The classifier performing an efficient classification is identified and this identification is in accordance with the metrics of each classifier’s performance. The performance metric here denotes the classification accuracy of each classifier.

4 Results and Discussion 4.1 Pre-processing and Segmentation The pre-processing steps performed on the abnormal and normal mammograms of both MLO and CC views are removal of label, removal of pectoral muscle, removal of noise and contrast enhancement. In the segmentation phase of each model, suspicious regions from the abnormal and normal pre-processed images have been segmented. The extracted suspicious regions are the ROIs that have been utilized by the forthcoming phases in the mass detection models. The results of the pre-processing and segmentation phase for CC and MLO view mammograms are displayed in the Fig. 2 (a–h) and Fig. 3 (a–i) respectively. 4.2 False Positive Reduction in Density-Specific Mass Detection Model A comparison of the features used in the density-specific mass detection models 1, 2, 3 and 4 has been depicted in the Table 3. A comparison of the classifiers used has also been done for all the four models to analyze the performance of the model. This has been done by feeding the classifiers with the feature that produced the highest classification accuracy. The classifiers comparative analysis results for the models 1, 2, 3 and 4 has been depicted in the Table 4. From the results obtained it can be inferred that, the KAZE features, KAZE_SIFT features, SIFT_KAZE features and KAZE features extracted from the normal and abnormal ROIs of density categories 1, 2, 3 and 4 respectively have been classified efficiently by the SVM classifier resulting in the classification accuracies of 96%, 80%, 76% and 88% respectively. For each classifier, the decision making is based on the class with the highest probabilistic value given by the Eq. (1). max{P(Wi |Xj , Ci )}

(1)

False Positive Reduction in Mammographic Mass Detection

57

Fig. 2. (a) Abnormal original image (b) Label removed image (c) Noise removed image (d) Contrast enhanced image (e) Segmented image (f) Suspicious regions image (g) A suspicious region marked as an ROI image (h) Extracted ROI image.

where, {P(W i | Xj, C i )} is the probability obtained by the classifier C i for instance X j and W i is the predicted class. Among the classifiers, the one with the highest classification accuracy is chosen and this is given by the Eq. (2). max{Accuracy(Ci )}

(2)

where, C i represents the classifier. Various performance metrics like confusion matrix, F1-score, precision, recall and receiver operating characteristic curve (ROC) have also been obtained to analyze the performance of the density-specific mass detection models. Tables 5, 6, 7 and 8 depicts the confusion matrix of the density category 1, 2, 3 and 4 mass detection models respectively. At the end of the classification, 39 out of 39 normal images and 42 out of 45 abnormal images belonging to the density category 1, 96 out of 130 normal images and 117 out of 136 abnormal images belonging to the density category 2, 93 out of 124 normal images and 97 out of 126 abnormal images belonging to the density category 3 and 52 out of 59 normal images and 46 out of 53 abnormal

58

S. Shrinithi et al.

Fig. 3. (a) Normal original image (b) Label removed image (c) Pectoral muscle removed image (d) Noise removed image (e) Contrast enhanced image (f) Segmented image (g) Suspicious regions image (h) A suspicious region marked as an ROI image (i) Extracted ROI image.

images belonging to the density category 4 have been classified correctly, resulting in an overall classification accuracy of 96%, 80%, 76% and 88% respectively. Various other performance metrics obtained for the four mass detection models are depicted in the Table 9. Figure 4, 5, 6 and 7 depicts the ROC curve and an area under curve (AUC) value of 1.00, 0.88, 0.80 and 0.96 has been obtained for each density-specific mass detection model respectively. In this work, four density-specific mass detection models have been developed as it is difficult for a single classifier model to learn the tissue characteristics of all the density types and this also affects the mass detection performance. A comparative analysis between the mass detection results obtained for the four density-specific mass detection models and the mass detection model that is not density-specific has been performed. A single non-density specific mass detection model has been developed and the classification has been performed on the same set of normal and abnormal ROIs that had been used for the development of the four density-specific mass detection models. About 1759 normal ROIs and 1800 abnormal ROIs irrespective of their density types have been used by the non-density specific mass detection model. At the end of the classification phase, KAZE features extracted from the normal and abnormal ROIs have been efficiently classified using the SVM classifier and this results in an overall classification accuracy of

False Positive Reduction in Mammographic Mass Detection Table 3. Comparative analysis of the features. Feature (Detector_Descriptor)

Density_1 Model Accuracy

Density_2 Model Accuracy

Density_3 Model Accuracy

Density_4 Model Accuracy

KAZE

96%

77%

72%

88%

KAZE_SURF

95%

68%

64%

78%

SIFT

94%

75%

70%

73%

SIFT_KAZE

93%

77%

76%

85%

KAZE_SIFT

92%

80%

75%

76%

SURF_KAZE

90%

79%

75%

87%

SURF

89%

68%

62%

77%

BRISK_SURF

85%

63%

62%

73%

BRIEF_SURF

82%

64%

60%

73%

LBP

80%

72%

67%

70%

ORB

79%

67%

66%

75%

GLCM

71%

63%

65%

71%

Gabor

63%

54%

52%

57%

Histogram Statistical

55%

55%

64%

68%

Table 4. Comparative analysis of the classifiers. Classifier

Density_1 Model Accuracy

Density_2 Model Accuracy

Density_3 Model Accuracy

Density_4 Model Accuracy

SVM

96%

80%

76%

88%

Gradient boosting

96%

75%

73%

83%

Multilayer perceptron

95%

77%

68%

87%

Logistic regression

94%

72%

70%

86%

Random forest

93%

69%

69%

73%

Naïve bayes

93%

63%

61%

66%

Adaboost

90%

74%

68%

81%

Extra trees classifier

88%

62%

68%

71%

Decision tree

86%

61%

66%

63%

KNN

81%

62%

56%

66%

59

60

S. Shrinithi et al. Table 5. Confusion matrix – model 1. Actual class Predicted class

Class

1

0

1

42

0

0

3

39

Table 6. Confusion matrix – model 2. Actual class Predicted class

Class

1

0

1

117

34

0

19

96

Table 7. Confusion matrix – model 3. Actual class Predicted class

Class

1

0

1

97

31

0

29

93

Table 8. Confusion matrix – model 4. Actual class Predicted class

Class

1

0

1

46

7

0

7

52

79%. Table 10 depicts the confusion matrix of the non-density specific mass detection model. After the classification, 304 out of 360 abnormal ROIs and 259 out of 352 normal ROIs have been correctly classified, resulting in an overall classification accuracy of 79%. Various other performance metrics obtained for the non-density specific mass detection model are depicted in the Table 11 and Fig. 8 depicts the ROC curve obtained. AUC value of 0.86 has been obtained for this model. A comparative analysis between the number of wrongly classified abnormal and normal ROIs obtained for the four density-specific mass detection models and the nondensity specific mass detection model has been depicted in Table 12. In a CAD system, segmentation will have high sensitivity and hence a normal breast image without the

False Positive Reduction in Mammographic Mass Detection

61

Table 9. Performance metrics of density-specific mass detection models. Classifier

Density_1 Model Accuracy

Density_2 Model Accuracy

Density_3 Model Accuracy

Density_4 Model Accuracy

Accuracy

96%

80%

76%

88%

Precision

0.93, 1.00

0.83, 0.77

0.76, 076

0.88, 0.87

Recall

1.00, 0.93

0.74, 0.86

0.75, 0.77

0.88, 0.87

F1-score

0.96, 0.97

0.78, 0.82

0.76, 0.76

0.88, 0.87

Fig. 4. Roc curve – model 1.

Fig. 5. Roc curve – model 2.

presence of mass has a chance of being classified as a breast image with mass and this leads to a rise in false positive rate and so the CAD model must incorporate the false positive reduction stage to reduce the false positive count. A comparative analysis between the false positive rates (FPR) obtained for the four density-specific mass detection models and the non-density specific mass detection model for each density category has also been depicted in Table 12. In the table, four breast density categories, BI-RADS I, II, III and IV are denoted by the terms BD 1, BD 2, BD 3 and BD 4 respectively. From Table 12 it is inferred that, a total of 130 normal and abnormal images have been misclassified by the four-density specific mass detection models and a total of 149 normal and abnormal images have been misclassified by the non-density specific mass detection model. Density category_1 mass detection model deals with fatty breast tissue mammograms. After the segmentation phase itself, the model will produce a smaller number of false positives resulting in an approximate identification of mass. Further, when these mammograms undergo false positive reduction, they result in zero false positives. This ‘0’ FPR has been achieved in this phase by using the KAZE features

62

S. Shrinithi et al.

Fig. 6. Roc curve – model 3.

Fig. 7. Roc curve – model 4.

Table 10. Confusion matrix – non-density specific mass detection model. Actual class Predicted class

Class

1

0

1

97

31

0

29

93

Table 11. Performance metrics – non-density specific mass detection model. Performance metrics

Results obtained

Accuracy

79%

Precision

0.82, 0.77

Recall

0.74, 0.84

F1-score

0.78, 0.80

extracted from the images. The FPR values of each density category in the densityspecific expert models for mass detection is lower than the FPR values of each density category in the non-density specific expert model for mass detection indicating that density-specific expert models outperform the non-density specific expert model for mass detection.

False Positive Reduction in Mammographic Mass Detection

63

Fig. 8. ROC curve – non-density specific mass detection model.

Table 12. Comparative analysis between four density-specific mass detection models and a nondensity specific mass detection model. Four density-specific mass detection models

Single non-density specific mass detection model

Number of misclassified images in each density category BD 1

BD 2

BD 3

BD 4

BD 1

BD 2

BD 3

BD 4

3

53

60

14

7

58

70

14

False positive rate (FPR) of each mass detection model BD 1

BD 2

BD 3

BD 4

BD 1

BD 2

BD 3

BD 4

0

0.26

0.25

0.12

0.12

0.34

0.27

0.14

5 Conclusion The development of density-specific expert models for false positive reduction in mammographic mass detection has been proposed and implemented. The abnormal and normal mammograms from the Image Retrieval in Medical Applications (IRMA) version of the Digital Database for Screening Mammography (DDSM) database have been preprocessed and the regions suspected to be masses have been segmented and extracted as region of interests (ROIs) and then features were extracted from them using various detector and descriptor combinations and were further classified as abnormal or normal resulting in the false positive reduction. Each density-specific mass detection model has been incorporated with the mentioned phases resulting in the classification accuracies of 96%, 80%, 76% and 88% and false positive rates (FPR) of 0, 0.26, 0.25 and 0.12 using

64

S. Shrinithi et al.

the KAZE, KAZE_SIFT, SIFT_KAZE and KAZE features for the four mass detection models respectively.

References 1. Balali, G.J., Yar, D.D., Dela, V.G.A., Yeboah, E.E., Asumang, P., Akoto, D.J.: Breast cancer: a review of mammography and clinical breast examination for early detection of cancer. OAlib. 7, 1–19 (2020) 2. Anupama, M.A., Sowmya, V., Soman, K.P.: Breast cancer classification using capsule network with preprocessed histology images. In: International Conference on Communication and Signal Processing, pp. 143–147. IEEE, India (2019) 3. Mohan, M., Priya, T.L., Nair, L.S.: Fuzzy c-means segmentation on enhanced mammograms using CLAHE and fourth order complex diffusion. In: Proceedings of the Fourth International Conference on Computing Methodologies and Communication, pp. 647–651. IEEE, India (2020) 4. Kiruthika, K., Vijayan, D., Lavanya, R.: Retrieval driven classification for mammographic masses. In: International Conference on Communication and Signal Processing, pp. 725–729. IEEE, India (2019) 5. Kriti, Virmani, J., Dey, N., Kumar, V.: PCA-PNN and PCA-SVM based CAD systems for breast density classification. Appl. Intell. Optim. Biol. Med. 96, 159–180 (2020) 6. Gopakumar, S., Sruthi, K., Krishnamoorthy: Modified level-set for segmenting breast tumor from thermal images. In: International Conference for Convergence in Technology, pp. 1–5. IEEE, India (2018) 7. Crow, E.D., Astley, S.M., Hulleman, J.: Is there a safety-net effect with computer-aided detection. J. Med. Imaging (Bellingham) 7 (2020) 8. Dhahbi, S., Barhoumi, W., Kurek, J., Swiderski, B.: False-positive re duction in computeraided mass detection using mammographic texture analysis and classification. Comput. Methods Programs Biomed. Update. 160, 75–83 (2018) 9. Soulami, K.B., Kaabouch, N., Saidi, M.N.: Breast cancer: classification of suspicious regions in digital mammograms based on capsule network. Biomed. Sig. Process. Control 76 (2022) 10. Laishram, R., Rabidas: WDO optimized detection for mammographic masses and its diagnosis: a unified CAD system. Appl. Soft. Comput. 110 (2021) 11. Chakraborty, J., Midya, A., Rabidas, R.: Computer-aided detection and diagnosis of mammographic masses using multi-resolution analysis of oriented tissue patterns. Expert Syst. Appl. 99, 168–179 (2018) 12. Diniz, J.O.B., Diniz, P.H.B., Valente, T.L.A., Silva, A.C., Paiva, A.C., Gattass, M.: Detection of mass regions in mammograms by bilateral analysis adapted to breast density using similarity indexes and convolutional neural networks. Comput. Methods Programs Biomed. Update. 156, 191–207 (2018) 13. Li, Y., Zhang, L., Chen, H., Cheng, L.: Mass detection in mammograms by bilateral analysis using convolution neural network. Comput. Methods Programs Biomed. Update. 195 (2020) 14. Mughal, B., Muhammad, N., Sharif, M., Rehman, A., Saba, T.: Removal of pectoral muscle based on topographic map and shape-shifting silhouette. BMC Cancer 18 (2018)

False Positive Reduction in Mammographic Mass Detection

65

15. Fredenberg, E., Svensson, B., Danielsson, M., Lazzari, B., Cederstrom, B.: Optimization of mammography with respect to anatomical noise. Med. Phys. 7961 (2021) 16. Bartol, K., Bojani´c, D., Pribani´c, T., Petkovi´c, T., Donoso, Y.D., Mas, J.S.: On the comparison of classic and deep keypoint detector and descriptor methods. In: 11th International Symposium on Image and Signal Processing and Analysis, pp. 64–69. ResearchGate, India (2020) 17. Dabass, J., Hanmandlu, M., Vig, R.: Formulation of probability-based pervasive information set features and Hanman transform classifier for the categorization of mammograms. SN Appl. Sci. 3(6), 1–17 (2021). https://doi.org/10.1007/s42452-021-04616-2

An Efficient and Automatic Framework for Segmentation and Analysis of Tumor Structure in Brain MRI Images K. Bhima1(B) , M. Neelakantappa2 , K. Dasaradh Ramaiah1 , and A. Jagan1 1 B V Raju Institute of Technology, Narsapur, Telangana, India

{bhima.k,dasaradh.k,jagan.amgoth}@bvrit.ac.in 2 Vasavi College of Engineering, Hyderabad, India [email protected]

Abstract. The medical image segmentation techniques are frequently used to diagnosis, tumor detection and determining anatomical structures in brain MRI image and further classifying the pathological regions for treatment planning in clinical analysis. Manual analysis of multi-spectral MRI images is erratic process due to wide-ranging of features and tissue types. Midst several segmentation techniques proposed in literature work, an automatic unified segmentation framework substantiate to be effective method for multi-spectral MRI images segmentation. The efficient and robust framework holds varied popular techniques to accomplish segmentation of multi-spectral MRI images. An Efficient and robust MRI segmentation framework is intended to conglomerate benefits of most popular SVM, Watershed and EM-GM techniques for automatic and accurate segmentation and diagnosis of tumor in multi-spectral MRI images. The SVM method is converts input space to high-dimensional space where multi-spectral MRI images image are typically linear and indivisible to compute best linear discriminant surface. The proposed Efficient and robust Framework has a comprehensive tumor analysis structure which comprise of three stages for identification, segmentation and extraction of disorder region in multi-spectral MRI images. The efficient and robust framework has automatic identification of disorder in MRI Images successively tumor region segmentation. The stage-1 is focused on identification of input MRI Images and classify MRI Images into normal or disorder MRI Images with SVM method. The stage-2 purpose is to segment the tumor with Watershed based technique in disorder in MRI Images that is detected in Stage-1. In stage-3 has an EM-GM for tumor region extraction and approximation of tumor region in actual image. To demonstration an effectiveness of the efficient and robust framework, the proposed framework is evaluated on simulated multi-spectral MRI images from standard open BraTS MRI dataset delivered by ground truth segmentation. Keywords: SVM Method · Watershed Method · EM-GM Method · An efficient and robust framework · Bilateral Filter · Multi-spectral MRI images

© IFIP International Federation for Information Processing 2023 Published by Springer Nature Switzerland AG 2023 E. Mercier-Laurent et al. (Eds.): ICCCSP 2023, IFIP AICT 670, pp. 66–78, 2023. https://doi.org/10.1007/978-3-031-39811-7_6

An Efficient and Automatic Framework for Segmentation and Analysis

67

1 Introduction An automatic analysis and segmentation of MRI Image [1–3] is crucial step in medical image procession era for diagnosis and in advance treatment planning for tumor analysis. An automatic MRI image analysis and segmentations techniques are used in numerous health care systems and clinical applications. MRI has preeminent soft tissue contrast compared with other medical imaging modalities for imaging the brain [10, 11]. The three-stage automatic and rapid method for tumor analysis and segmentation using multimodal MRI images. Proposed efficient and robust framework entails of three stages: tumorous slice detection, tumor region extraction and tumor substructures segmentation i.e. SVM (Stage-1), WM (Stage-2) and EM-GM(Stage-3). The proposed three-stage tumor analysis process for tumorous area detection from MRI human head volumes [1–3]. The efficient and robust framework present classification of MRI automatically after that segmentation of tumor with Watershed Technique. Stage-1 the SVM classifier is used to classification with training and testing process [6–9] with 8 × 8 patches. An optimal features are selected with IFS method. Classify of MRI dataset into a normal or tumorous MRI and identifying tumor in Stage-1. Watershed method [4–6] is mostly used for segmentation of tumor from overlapping regions in MRI in Stage-2. Lastly, Stage3 has a post metric assessment such as analysis and diagnosis of tumor region using EM-GM Method [12–14]. Current research on MRI have understood the growing and necessity of automatic segmentation but present techniques are supervising to develop a framework for automatically extract the features in MRI. Furthermore, present methods are barely offer uncertainty details allied through segmentation process. Preprocessing of input MRI by most popular bilateral filter [15–17] to smooth and enhance MRI quality and edges. In this work, an Efficient and robust multi-spectral MRI images segmentation framework is devoted for classifying diverse brain regions for more investigation. Furthermore, proposed Efficient and robust framework has three stages SVM(Stage1), WM(Stage-2) and EM-GM(Stage-3) to accomplish better results that also presents in details data of tumor regions in segmentation results.

2 An Efficient and Robust Framework An Efficient and robust framework is developed to overcome the limitations of tumor segmentation with fusing of most popular SVM, Watershed and EM-GM techniques for automatic tumor segmentation and diagnosis of tumor in MRI. The Efficient and robust framework consist of automatic MRI image classification into tumors and non-tumors with SVM Method. This work proposed a fully three-stage automatic SVM (Stage-1), WM (Stage-2) and EM-GM (Stage-3) as shown in Fig. 1, which is rapid technique for tumor detection and MRI segmentation for accurate detection and segmentation of tumor. Preprocessing [15, 16] is the preliminary stage used to remove noises which are present in the multi-spectral MRI images using bilateral filter. Figure 1 demonstrates the proposed an efficient and robust framework in which a series of stages is accomplished that is pre-processing of multi-spectral MRI images, tumor segmentation and feature extraction in multi-spectral MRI images and analysis of

68

K. Bhima et al.

Fig. 1. Proposed Segmentation Framework for analysis and diagnosis of multi-spectral MRI images

tumor region. The framework of proposed Efficient and robust is shown in Fig. 1. Comprehend of 3-stages which includes SVM(Stage-1), WM (Stage-2) and EM-GM (Stage3) identification of tumor, tumor segmentation, and tumor extraction correspondingly. Initially, the stage-1 focuses SVM classification [6–9] and identification of disorder MRI. The stage-2 deals with segments tumor in MRI with Watershed method. Finally, third stage encompasses tumor area extraction and assessment of tumor. Figure 1 is describing the framework for extensive analysis and diagnosis of brain tumor which is in input MRI image.

An Efficient and Automatic Framework for Segmentation and Analysis

69

2.1 SVM Classifier for Classification of MRI Dataset An Efficient and robust framework is a comprehensive segmentation technique developed for multi-spectral MRI images in which prominent SVM method is used for efficient classification of the tumors MRI images. The foremost stages in this framework comprise of MRI preprocessing, feature extraction and classification of tumors multi-spectral MRI images. SVM is a supervised machine learning technique which is used to address binary classification problems [9]. SVM is most popular model with distinctive characteristics for extracting tumor features in multi-spectral MRI images with better efficiency and quickness for accurate segmentation. In multi-spectral MRI images segmentation for the brain is to accurately distinct the disorder parts of the brain and eliminate the connection amongst the disorder region and normal regions are essential task of SVM classification method. The binary SVM [6–9] which constructs the hyperplane that can used for a linear and nonlinear SVM. The functional Efficient and robust is tries for classification of multi-spectral MRI images with supervised SVM. SVM training process that is dataset designed in the form of (x1,y1),(x2,y2),…..,(xn,yn)},yi {-1,1},xi Rd,

{

where a data point xi, yi is corresponding designed label and n represents the training samples. The linear SVM finds the optimal solution using:  1 εi } , εi ≥ 0 Minimize = { |w|2 + C 2 n

i=1

Subject to yi(wT xi + b) ≥ 1 − εi , i = 1, 2, ..., n

(1)

Here penalty value is C and normal vector is the w, scalar quantity is b. Slack variables is ∈i . If αi > 0 analogous xi are termed as support vector. Here αi denotes Lagrangian multiplier which is used to reduce the minimum problem. The linear discriminate function can be expressed with the optimal hyperplane parameters w and b and is given as: f (x) = sgn(

n 

αi yi xiT x + b)

(2)

i=1

2.2 Watershed Method Technique for Tumor Region Extraction An extension of the Watershed Method algorithm [4–6] for better boundaries adherence and extraction of tumor. An Efficient and robust multi-spectral MRI images segmentation framework consist of most popular watershed method to overcome the limitation of MRI segmentation problems such as oversegmentation of MRI images and insufficient for finding of brain tumor in low contrast boundaries. A Watershed Method [3–5] is

70

K. Bhima et al.

robust for similarity measure based on content and border information for tumor region aggregations. A tumor extraction which is region aggregations and an adaptive merge threshold criterion (Fig. 2).

Fig. 2. Watershed Method for tumor region extraction in multi-spectral MRI images

2.3 Analysis and Diagnosis of Tumor Structure with EM-GM Method Proposed Framework for Efficient and robust has a multi-structure ExpectationMaximization technique is used for the subdivision and analysis of the tumor. Proposed Framework for efficient and robust method has presented a number of developments to conventional EM techniques which are most established for MRI segmentation [12–14]. EM techniques [12–14] is an iterative which has to compute the maximum likelihood values to presented statistical model such as Expectation and Maximization (Fig. 3). Figure 4 is signifying the analysis sub tumor region and identification of numerous tumor type with EM-GM Method in multi-spectral MRI images. The Fig. 5 is presents the analysis of sub tumor region with EM-GM method on multispectral brain MRI images. EM techniques is most popular for dissemination of intensities with fusion of weighted Gaussians features for classification of tumors.

An Efficient and Automatic Framework for Segmentation and Analysis

71

Fig. 3. EM-GM framework for analysis and diagnosis of tumor structure in multi-spectral MRI images

Fig. 4. Analysis sub tumor region and identification of tumor type with EM-GM Method

3 Results and Discussion Performance of the Efficient and robust framework i.e. SVM method (Stage-1), Watershed Method (Stage-2) and EM-GM method (Stage-3) are assessed and input images are obtained from open dataset for experimental and quantification of results, it has four multi-spectral bands i.e. T1 MRI, T1C MRI, T2 MRI and FLAIR MRI images. The proposed Efficient and robust is experimentally evaluated on challenge MRI dataset

72

K. Bhima et al.

Fig. 5. Analysis of sub tumor region with EM-GM Method

with segmentation accuracy. The challenge datasets which are congregated from standard open MRI dataset origins are used for assessment and quantification of proposed Efficient and robust. The Table 1 and Fig. 6 is representation of assessable parameters are compared between proposed technique and present techniques for tumorous multi-spectral MRI images identification in open datasets with SVM method. The Table 2 is demonstration of most popular Watershed method for tumor extraction in 10 patient’s HGG (high-grade gliomas) MRI and 10 patient’s LGG (low-grade gliomas) MRI images. The obtained segmentation results are achieved enhancement for tumor region extraction on multi sequence 20 patients MRI image. The Table 2 and Fig. 7 is presentation of obtained segmentation results on HGG and LGG MRI from multi-spectral MRI images dataset with watershed method. The Table 3 and Fig. 8 is illustration of assessable parameters are compared between proposed technique and present techniques for EM-GM Method for analysis of tumor region. Experimental results present better outcomes as compared with available methods. The proposed an efficient and robust framework is relatively expected to produce better segmentation results compared with the contemporary mentioned techniques. The proposed an efficient and robust framework extracts better tumor detail to distinguish

An Efficient and Automatic Framework for Segmentation and Analysis

73

Table 1. Comprehensive SVM Classification Accuracy achieved on input datasets for identification of tumors MRI MRI Dataset

Data size

SVM Classification Accuracy (%) Existing

EFFICIENT AND ROBUST Framework

MS_DS001

974

98.35

99.51

MS_DS002

885

98.21

99.18

MS_DS003

955

97.69

98.68

MS_DS004

930

98.17

99.08

MS_DS005

966

98.72

99.18

MS_DS006

873

97.82

98.20

MS_DS007

927

98.07

99.68

MS_DS008

971

98.27

99.08

MS_DS009

978

97.51

98.58

MS_DS010

903

98.65

99.08

Fig. 6. Performance analysis of SVM for tumors and non-tumors classification

the tumor properly. The multi-spectral MRI images features are successfully extracted by the proposed framework for tumor analysis and diagnosis. The projected an efficient and robust framework is tested on BRATS datasets and may be extended to further open databases.

74

K. Bhima et al.

Table 2. Quantitative segmentation results of tumor extraction from multi-spectral MRI images SNO

MRI Image

MRI Sequence

Extraction of tumor region from MRI Sensitivity (%)

Specificity (%)

Accuracy (%)

1

HG0001

T1

99.17

98.31

98.05

2

HG0002

T1

98.14

98.9

98.17

3

HG0003

T1

99.01

98.25

98.97

4

HG0004

T1

98.11

98.01

98.72

5

HG0005

T1

97.23

98.89

97.89

6

HG0006

T1C

96.37

97.29

98.31

7

HG0007

T1C

97.62

97.83

98.64

8

HG0008

T1C

98.28

97.21

99.02

9

HG0009

T1C

97.54

98.01

98.17

10

HG0010

T1C

97.27

96.81

98.32

11

LG0001

T2

98.81

98.88

99.08

12

LG0002

T2

96.29

97.41

98.19

13

LG0003

T2

98.27

97.01

98.96

14

LG0004

T2

99.01

99.82

98.72

15

LG0005

T2

97.26

98.19

98.86

16

LG0006

FLAIR

98.11

97.83

98.19

17

LG0007

FLAIR

97.09

97.18

98.21

18

LG0008

FLAIR

98.19

97.18

98.57

19

LG0009

FLAIR

98.32

97.81

98.29

20

LG0010

FLAIR

98.21

98.15

97.49

An Efficient and Automatic Framework for Segmentation and Analysis

75

Fig. 7. Performance analysis of Watershed method on multi-spectral MRI images

Table 3. Analysis and diagnosis of tumor structure quantitative results for tumor segmentation MRI Image

Segmentation accuracy(%) Active Tumor

Necrosis

Edema

HG0001

96.35

97.25

98.13

HG0002

97.62

98.81

95.62

HG0003

98.27

98.20

98.71

HG0004

98.79

98.87

98.05

HG0005

97.21

97.81

97.87

LG0001

96.18

95.83

93.43

LG0002

97.01

95.92

91.56

LG0003

96.13

96.15

93.92

LG0004

92.87

92.89

97.32

LG0005

95.91

96.17

97.41

76

K. Bhima et al.

Fig. 8. Performance analysis of analysis of tumor region with EM-GM Method

4 Conclusion The MRI segmentation is essential step for analysis but complicated procedure for automatic tumor segmentation in multi-spectral MRI images for clinical and health care applications. The Efficient and robust Framework for tumor detection, Segmentation and diagnosis from multi-spectral MRI images is developed, implemented, and tested on best conventional multi-spectral MRI images datasets. Tumorous MRI images are identified from MRI datasets with SVM method in first stage and the tumor parts were localized and classified. The second stage comprises extraction of tumor region automatically using Watershed method from classified tumorous multi-spectral MRI images. Stage three covers the post metric validation for analysis and diagnosis of tumor region. Based on segmentation results of quantitative and qualitative validation the proposed efficient and robust framework can be used for accurately detect tumorous MRI images, analyze and segment a tumor in multi-spectral MRI images. More techniques are discussed and offered to extend the work for inclusion of the likelihood of conglomerate numerous segmentation techniques in order to present enhanced and precise results for tumor detection in multi-spectral MRI images.

An Efficient and Automatic Framework for Segmentation and Analysis

77

References 1. Woldeyohannes, G.T., Pati, S.P.: Brain MRI classification for detection of brain tumors using hybrid feature extraction and SVM. In: Mishra, D., Buyya, R., Mohapatra, P., Patnaik, S. (eds.) Intelligent and Cloud Computing. Smart Innovation, Systems and Technologies, vol. 286, pp. 571–579. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-9873-6_52 2. Varuna Shree, N., Kumar, T.N.R.: Identification and classification of brain tumor MRI images with feature extraction using DWT and probabilistic neural network. Brain Inform. 5(1), 23–30 (2018). https://doi.org/10.1007/s40708-017-0075-5 3. Kumar, A.: Study and analysis of different segmentation methods for brain tumor MRI application. Multimed. Tools Appl. (2022). https://doi.org/10.1007/s11042-022-13636-y 4. Kaleem, M., Sanaullah, M., Hussain, M.A., Jaffar, M.A., Choi, T.-S.: Segmentation of brain tumor tissue using marker controlled watershed transform method. In: Chowdhry, B.S., Shaikh, F.K., Hussain, D.M.A., Uqaili, M.A. (eds.) IMTIC 2012. CCIS, vol. 281, pp. 222–227. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28962-0_22 5. S. M. Kamrul Hasan and Mohiudding Ahmad: two step verification of brain tumor segmentation using watershed matching algorithm. Brain Inform. 5, 8 (2018). Springer Open Access. https://doi.org/10.1186/s40708-018-0086-x 6. Abdullah, N., Ngah, U.K., Aziz, S.A.: Image classification of brain MRI using support vector machine. In: 2011 IEEE International Conference on Imaging Systems and Techniques, pp. 242–247 (2011). https://doi.org/10.1109/IST.2011.5962185.X 7. Wasule, V., Sonar, P.: Classification of brain MRI using SVM and KNN classifier. In: 2017 Third International Conference on Sensing, Signal Processing and Security (ICSSS), pp. 218– 223 (2017). https://doi.org/10.1109/SSPS.2017.8071594 8. Srinivasa Reddy, A., Chenna Reddy, P.: MRI brain tumor segmentation and prediction using modified region growing and adaptive SVM. Soft. Comput. 25(5), 4135–4148 (2021). https:// doi.org/10.1007/s00500-020-05493-4 9. Moyano-Cuevas, J.L., et al.: 3D segmentation of MRI of the liver using support vector machine. In: Roa Romero, L. (ed.) XIII Mediterranean Conference on Medical and Biological Engineering and Computing 2013. IFMBE Proceedings, vol. 41, pp. 368–371. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-00846-2_91 10. Bhima, K., Jagan, A.: Development of robust framework for automatic segmentation of brain MRI images. In: Satapathy, S.C., Bhateja, V., Favorskaya, M.N., Adilakshmi, T. (eds.) Smart Computing Techniques and Applications. SIST, vol. 225, pp. 517–524. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0878-0_51 11. Bhima, K., Neelakantappa, M., Dasaradh Ramaiah, K., Jagan, A.: Contemporary technique for detection of brain tumor in fluid-attenuated inversion recovery magnetic resonance imaging (MRI) images. In: Satapathy, S.C., Bhateja, V., Favorskaya, M.N., Adilakshmi, T. (eds.) Smart Intelligent Computing and Applications, Volume 2: Smart Innovation, Systems and Technologies, vol. 283, pp. 117–125. Springer, Singapore (2022). https://doi.org/10.1007/ 978-981-16-9705-0_12 12. Binti Kasim, F.A., Pheng, H.S., Binti Nordin, S.Z., Haur, O.K.: Gaussian mixture model expectation maximization algorithm for brain images. In: 2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS), pp. 1–5 (2021). https://doi.org/10.1109/ AiDAS53897.2021.9574309 13. Balafar, M.A.: Gaussian mixture model based segmentation methods for brain MRI images. Artif. Intell. Rev. 41(3), 429–439 (2012). https://doi.org/10.1007/s10462-012-9317-3 14. Meena Prakash, R., Kumari, R.S.S.: Gaussian mixture model with the inclusion of spatial factor and pixel re-labelling: application to MR brain image segmentation. Arab. J. Sci. Eng. 42, 595–605 (2017). https://doi.org/10.1007/s13369-016-2278-0

78

K. Bhima et al.

15. Mustafa, Z.A., Kadah, Y.M.: Multi resolution bilateral filter for MR image denoising. In: 2011 1st Middle East Conference on Biomedical Engineering, pp. 180–184 (2011). https:// doi.org/10.1109/MECBME.2011.5752095 16. Jesline Jeme, V., Albert Jerome, S.: A hybrid filter for denoising of MRI brain images using fast independent component analysis. In: 2021 Fourth International Conference on Microelectronics, Signals and Systems (ICMSS), pp. 1–5 (2021). https://doi.org/10.1109/ICMSS5 3060.2021.9673615 17. Kala, R., Deepa, P.: Adaptive fuzzy hexagonal bilateral filter for brain MRI denoising. Multimed. Tools Appl. 79, 15513–15530 (2020). https://doi.org/10.1007/s11042-019-7459-x

Machine Learning and Deep Learning

Collaborative CNN with Multiple Tuning for Automated Coral Reef Classification R. Jannathul Firdous1 and S. Sabena2(B) 1 A. R College of Engineering and Technology, Tirunelveli, Tamil Nadu, India 2 Anna University Regional Campus, Tirunelveli, Tamil Nadu, India

[email protected]

Abstract. The earliest and oldest ecosystems on our planet are found in coral reefs. The first step in protecting marine life is to have an understanding of the locations of coral reefs. In recent years, there has been a significant proliferation of digital imagery that is utilized for monitoring marine species. Because of the exponential development in the amount of image data that is now accessible, automated detection and classification methods are becoming more necessary. Only around two percent of the images that are collected can be personally examined by an expert in marine biology. (i) The resemblance between various species of coral and (ii) the difficulty in determining the geographical borders between classes due to the fact that many corals occur in groups. These are two of the problems that arise while attempting to identify coral. The purpose of this article is to discuss the findings of research conducted on a completely automated system for the accurate identification and categorization of coral reefs found in the ocean. The solution that is being proposed here is an ensemble of two different Convolutional Neural Network (CNN) models that include fully linked layers that can be adjusted, a variable number of neurons, and training data. When it comes to fine-tuning a pre-trained CNN, our primary emphasis is on the multiple tuning technique. We have shown that our modified pre-trained network can be applied to two datasets, RSMAS and EILAT, and that it can deliver results with high accuracy that are on compared with the techniques that are considered to be state-of-the-art in this field. Keywords: Convolutional Neural Network (CNN) · Ensemble · Automatic Detection · Fully connected layers · Pre trained network

1 Introduction Coral reefs, which are found in the warm, shallow waters of the tropics, are one of the most complex marine ecosystems today. As one species of hard coral dies, its calcium carbonate skeleton is left behind for another to colonize and build the reef. Coral reefs, due to their high levels of biodiversity, are often regarded as the most important ecosystems on Earth. They aid in water purification by absorbing excess nitrogen and carbon, provide a resource for medical research, and generate income via fishing and tourism; © IFIP International Federation for Information Processing 2023 Published by Springer Nature Switzerland AG 2023 E. Mercier-Laurent et al. (Eds.): ICCCSP 2023, IFIP AICT 670, pp. 81–93, 2023. https://doi.org/10.1007/978-3-031-39811-7_7

82

R. J. Firdous and S. Sabena

certain coral species even act as a natural barrier to protect coastlines from hurricanes and other severe weather [1]. A quarter of all marine life on Earth and as many as two million species depend on them for survival. From a human perspective, they are also crucial [2]. Because of this, the identification and categorization of coral reefs have been receiving more and more focus from both private companies and government organizations. This difficulty stems from a number of features that together make it hard to categorize corals: • Additionally, certain coral species have varying sizes, forms, and colors, while other coral species may seem to be visually similar to the human eye. • Images vary in terms of illumination, visibility, depth, and the presence of soft and hard corals in the foreground and background. • Thus, having a qualified biologist has always been essential for accurate coral categorization. It would be fantastic if we could automate the categorization of the vast quantities of coral photos that are now being gathered. The technique discussed in this research may be used to further refine the classification of coral species. Our key contribution is a new method of tuning a network that allows us to generate CNN ensembles for classification much more quickly. The goal is to build a classification network, often known as the “body” of a network, from a CNN that is only coarsely optimized but serves as a solid foundation for future refinement. Afterward, the obtained networks are tweaked and fine-tuned many times using new data. A major aspect in ensemble performance is the variety of the classifiers, thus we take care to preserve it. Training time was reduced by around half while the ensemble’s classification performance was maintained with this method.

2 Related Works A small number of publications have explored the possibility of automatically classifying coral photos. This article explains it in depth. The authors of referenced work [6] retrieved color and texture features using the L*a*b* color space and a Maximum Response (MR) filter bank, then employed a support vector machine (SVM) classifier. This method may automatically annotate an appropriate picture patch for species identification with little effort, and it can classify images based on color data; however, this information is often inconsistent or nonexistent in underwater datasets. For colors and textures, he implemented quantile functions (QF) and Scale-Invariant Feature Transform (SIFT) in [8], and then utilized Linear support vector machine to analyze the results (SVM). Its automated segmentation of features allows it to succeed when other ML techniques have failed. Although SIFT is a fast feature descriptor, it is still time-consuming. Local binary pattern (CLBP), grey level co-occurrence matrix (GLCM), Gabor filter response, opponent angle, hue channel color histograms, and k- nearest neighbor (KNN), neural network (NN), support vector machine (SVM), and probability density weighted mean distance (PDWMD) were all used in the study described in [7]. Very excellent results are achieved when compared to other texture datasets using the same features and classifiers. When classifying a new picture, it is necessary to utilize a large number of methods until the optimal combination is found. It takes a lot of time. The studies

Collaborative CNN with Multiple Tuning for Automated Coral Reef Classification

83

[9, 10] utilized LDP, Tilted Z Local Binary Pattern, and KNN, all of which produced high accuracy, but none of which used deep learning for feature extraction and coral categorization. The use of CNNs presents difficulties in coral classification because of differences in picture quality even within the same class, lighting fluctuations caused by the water column, and the tendency of some coral species to appear in groups. Not only that, but a high-quality performance from a CNN requires a large training dataset. Two common methods used in practice are transfer learning and data augmentation to get over this restriction. Some publications make use of CNNs for the purpose of coral categorization. Researchers in [1] initially used CNNs to the task of classifying coral. Images with texture and form details were given a color boost and smoothed out using filters. He then trained a model using the LeNet-5 neural network and achieved 55% accuracy. Using VGGnet, [4] was able to achieve 90% accuracy on the BENTHOZ-2015 dataset. In [5], the MLC dataset was combined with manually constructed features. Extraction of features using CNNs was performed using the VGGnet network, which had been pretrained on ImageNet, and achieved an accuracy of 84.5% in the MLC. Anima Brown Octa-angled Pattern for Triangular sub area (OPT) Pulse Coupled Convolutional Neural Network (PCCNN) was used by Mary N [11]. By using the PCCNN to filter away unnecessary characteristics from CNNs’ output, we were able to achieve remarkable improvements. While these studies analyze widely-used CNNs like VGGnet and LeNet, they only test their models on a single dataset in total [3]. In addition, the photos do not provide any information on the complete body of the corals; the EILAT and RSMAS datasets, in particular, are especially fascinating because to their tiny sizes, high imbalance, and inclusion of small areas of the corals’ textures. In order to circumvent the shortcomings of currently used deep learning models, we suggest using more robust CNNs. In order to solve the unique issues involved in coral classification, we want to use many data sets to train a model that is as accurate as a human expert. Specifically, we have thought of a combination of two of the most promising CNNs, ResNet [12] and DenseNet [13]. An explanation of the work’s significance was illustrated below, 1) The first stage is to optimize a CNN that has already been trained for the categorization of coral reefs. 2) Here, we use a compound tuning method, in which we first fine-tune on a dataset that is analogous to the target problem’s training set, and then we fine-tune again using data from the actual problem’s training set. 3) And thirdly, it’s distinct in other ways as well Both of the candidate CNN networks employ fully linked layers for making the final decision. 4) After that, we use a streamlined method of generating ensembles of CNNs by combining a DenseNet 121 and ResNet 50 variant for the classification of small datasets of underwater coral texture photos. 5) Contrast our findings to those obtained by using the most advanced but labor-intensive classical techniques

84

R. J. Firdous and S. Sabena

3 CNN Architectures The ResNet (ResNet-50) [12] and DenseNet (DenseNet-161) [13] CNN architectures employed here were selected because they provide state-of-the-art results in distinct classification challenges while also being complimentary to one another. A ResNet element is defined in the same way as in [12]:   yg = Rg yg−1 + yg−1 (1) where x0 is the starting point and yg-1 , and yg are the input and output of layer g. Let R () be a non-linear transformation, consisting of a combination of operations like convolution, relu, pooling, batch normalization, and so on, applied to layer. By stacking many of these units, ResNets are able to create gradient bypasses that enhance the efficiency of back-propagation optimization in very deep networks. Thus, ResNets may produce unnecessary layers. DenseNet units were suggested in [24] as an alternative method for enhancing the gradient flow in deep networks. This would include connecting each layer to the layers above it.   (2) yg = Rg y0 ; y1 ; : : : ; yg−1 where ([y0 ; y1 ;:::; yg-1 ] to the outputs of the preceding layers and appends them together. DenseNets are similar in that they stack several units, but they employ transition layers to regulate the amount of connections between them. Finally, Dense Nets make up for their lack of parameters by reusing features, making them competitive with ResNets. That they act differently might be a contributing factor to their complementary relationship. We use ImageNet [27] weights to seed both the Dense Net and ResNet models. We have taken off the last layer in each network, which is responsible for assigning photos to one of the ImageNet categories. After that, we’ve included a ReLU activation layer, a layer of distinct neurons, and a final layer with as many neurons as classes in the dataset, all of which are completely linked. Since we are only working with limited datasets, we only train the fully linked layers and have frozen the rest. As an optimizer, we deployed Stochastic Gradient Descent with a 0.001-percent learning rate, a 10-percent decay rate, and an overall learning rate of 1. Multiple tuning is conducted in the two selected networks with different fully connected layers and different number of neurons and then ensemble to provide very good results. Initially the datasets EILAT, RSMAS are too small for the deep architectures to train and data augmentation is needed to overcome such limitation. Here we go for a novel approach of creating a new wider datasets by combining images with large set of classes from EILAT,RSMAS,EILAT 2 and create another deeper dataset which consists of large number of images with less number of classes from combining EILAT,RSMAS,EILAT 2.Our CNN architecture such as Resnet-50 and Dense net 121 with difference in their architecture are trained by above mentioned datasets.Of these two architectures,Resnet -50 is less deeper than Densenet as Resnet-50 consists of 50 layers and Densenet-121 consists of 121 layers so another criteria is to be noted i.e. we can supply wider dataset to the thin network i.e. (Resnet-50) and larger dataset for thick network (i.e. Dense net-121) which has 121 layers.

Collaborative CNN with Multiple Tuning for Automated Coral Reef Classification

85

Specifically, we want to implement a fully connected layer as the very last layer in both CNN topologies, just before the softmax layer. These layers are just a fully connected layer with the same number of neurons as classes in the dataset, followed by a softmax activation layer. For the sake of our shortcut, we will refer to the Resnet-50 as the M 1 model and the Densenet121 as the M2 model.

Fig. 1. Pictorial representation of the suggested methodology

86

R. J. Firdous and S. Sabena

The overall representation of the suggested architecture was illustrated in Fig. 1 3.1 M1 Model Table 1. Layers Description of Resnet-50 Layers

Size

ResNet-50

Conv 1

112 × 112

7 × 7,64,stride 2

Conv 2

56 × 56

Conv 3

28 × 28

Conv 4

14 × 14

Conv 5

7×7

3 × 3 max pool, stride 2 ⎡ ⎤ 1 × 1, 64 ⎢ ⎥ ⎢ 3 × 3, 64 ⎥ × 3 ⎣ ⎦ 1 × 1, 256 ⎡ ⎤ 1 × 1, 128 ⎢ ⎥ ⎢ 3 × 3, 128 ⎥ × 4 ⎣ ⎦ 1 × 1, 512 ⎡ ⎤ 1 × 1, 256 ⎢ ⎥ ⎢ 3 × 3, 256 ⎥ × 6 ⎣ ⎦ 1 × 1, 1024 ⎡ ⎤ 1 × 1, 512 ⎢ ⎥ ⎢ 3 × 3, 512 ⎥ × 3 ⎣ ⎦ 1 × 1, 2048

Classification Layer

1×1

Global avg. Pool, no.of class,- fc, softmax

M1 CNN uses a 224 x 224 pixel input picture as its standard. Table 1 displays the architecture of the M1 model, which includes an explanation of the filter sizes and layer outputs. Each unit is repeated three times for a total of three, four, six, and three times. The completely linked 1000-neuron layer is designated as FC 1000. The year 2048 is a symbol for very strong residual traits (DRF). In our coral reef classification issue, we state that the DRF and the number of neurons in the final FC layer change as a function of the classes of coral dataset. Due to the nature of the M1 model, the network is rather small and isolated. As described in [14], for broader datasets we need a larger number of neurons in FC layers and a larger total number of FC layers. In accordance with [14], we add two FC layers at the beginning, and the number of neurons is selected (for freshly added FC layer) as a power of two more than the number of classes, up to 4096. After that, we improve the model by adding a further FC layer, as described in subsection v. 3.2 M2 Model Our model M2 is Densenet-121 architecture which consists of 121 layers.

Collaborative CNN with Multiple Tuning for Automated Coral Reef Classification

87

Table 2. Layers Description of DenseNet-121 “Layers”

“Size”

“Convolution

112 × 112

Pooling

56 × 56

Dense Block (1)

56 × 56

“DenseNet-121”



1 × 1conv

×6

3 × 3conv Transition Layer (1)

56 × 56 28 × 28

Dense Block (2)

28 × 28



1 × 1conv

× 12

3 × 3conv Transition Layer (2)

28 × 28 14 × 14

Dense Block (3)

14 × 14



1 × 1conv

× 24

3 × 3conv Transition Layer (3)

14 × 14 7×7

Dense Block (4)

7×7

Classification Layer

No. of classes



1 × 1conv 3 × 3conv

× 24”

Table 2 shows that DenseNet’s foundation is a convolution and pooling layer. Next comes a dense block, then a transition layer, then another dense block, then another transition layer, and lastly a dense block, then a classification layer. The table lists details about the layers, filter size, and repetition units. Thicker model networks, such as the one described in [14], such as the one used here, need a less number of neurons in the FC layers. Similar to the findings in [14], the performance of deeper CNNs is superior to that of shallower models when applied to larger datasets. In this model, we employ a smaller number of FC layers to accommodate deeper datasets. Thus, in the end, we only add two or one completely connected layer, which results in a much less number of neurons but a larger total than the number of classes in our model M2 .

4 Ensemble Method of Averaging To boost the efficiency of the models, ensemble learning techniques have used general methods, classical methods, and other fusion approaches. Given the computational and data demands of deep learning models, it is essential to pay close attention to ensemble deep learning models as they integrate the mutually beneficial insights of numerous algorithms into a unified whole. In our case, we combine the results of the two best models

88

R. J. Firdous and S. Sabena

by utilizing the unweighted average ensemble technique [15]. According to [15], because to the large variance and low bias of Deep Learning architectures, the generalization performance may be improved by simply averaging the ensemble models. Either the base learners’ outputs themselves are averaged, or the classes’ predicted probabilities are averaged using the softmax function. Gki = m

Pki

(3)

i l=1 exp(Pk )

Where m is the number of classes, the output of the kth unit of the i th base learner (Pk ), and the probability of the k th unit’s output (Gk ). When the performance of the base learners is similar, unweighted averaging is a good choice.

5 Experiment and Analysis 5.1 Dataset Used EILAT is a collection of corals that consists of 973 RGB picture patches that are 64 × 64 pixels in size. The patches were extracted from bigger images that were taken at coral reefs located in the vicinity of Eilat in the Red Sea. The dataset is broken up into eight different categories, each of which has an uneven distribution. The RSMAS dataset is a compact one, with just 766 RGB picture patches (256 × 256 pixels) representing coral. The patches are from the University of Miami’s Rosenstiel School of Marine and Atmospheric Sciences and are derived from bigger pictures. Various cameras were used at various locations to get these photographs. It should be noted that the dataset is split into 14 unequal groups. The 128x128-pixel EILAT2 dataset is a subset of the whole EILAT dataset. It contains 304 picture patches of five distinct kinds. Table 3. Dataset Description Dataset

No. of images

No. of classes

Type

“EILAT + RSMAS + EILAT2

973 + 322 + 103 = 1398

13

Deeper (Set 1)

RSMAS + EILAT + EILAT2

766 + 303 + 50 = 1119

26

Wider (Set 2)

EILAT2 + RSMAS + EILAT”

304 + 677 + 798 = 1779

27

Deeper & wider (Set 3)

The dataset in Table 3 is obtained by selecting particular number of images from each dataset and form a new dataset which should be under the category of deeper, wider. We also form a dataset of both the type. Deeper dataset means the one with large number of images and the wider means one with large number of classes. We proceed our experiment with these datasets using our modified CNN architectures. Our performance analysis and other metrics are explained below.

Collaborative CNN with Multiple Tuning for Automated Coral Reef Classification

89

5.2 Performance Analysis Through our experiments, we were able to enhance the precision and efficiency of underwater picture categorization challenges. One the one hand, there are numerous groups of underwater pictures that feature examples that are drastically different from one another in terms of geometry, color, texture, size, orientation, brightness, depth of field, and so on. However, there are overlapping groups that seem almost identical when seen from certain perspectives and distances. The issues of what patch size is best and how much to change patches remain unanswered. Further, other difficulties such blurring from motion, color attenuation, refracted sunlight patterns, water temperature volatility, sky color variation, and scattering effects must be mitigated to preserve picture quality and trustworthy information content. These concerns are highlighted, and a solution to them is given in the form of an ensemble technique. Finding patterns in images and then exploiting those patterns for categorization is a particularly strong suit of deep learning algorithms, and in particular convolutional neural networks (CNNs). In typical CNN fashion, the algorithm is applied to a patch of an image and the patch is then given a class weight. The picture tiles that will be sent to Resnet 50. We have only trained fully connected layers since we have utilized transfer learning to initialize the considered CNNs with the pre-trained weights of the networks on ImageNet and we have frozen all the layers except for the addition of fully connected levels. Resnet overfits when trained on insufficient data, thus to prevent this we merge the EILAT and RSMAS datasets, paying special attention to class imbalance as we do so. We get around this by dividing the total number of photos into three equal halves and using it for our multiple training. Accuracy is used as a measure of our method’s effectiveness. The exact same process is performed while making the dense net 121. Overall Accuracy (OA) The Overall Accuracy (OA) of a collection of categorized data is calculated as the percentage of test pictures that were properly labeled. OA =

Numberofimagesclassifiedcorrectly Totalimagesindataset

(4)

5.3 Modified CNN Architectures 5.3.1 Resent 50 The input shape in M1 Model is originally 224 × 224 × 3, but we’ve shortened it to 180 × 180 × 3 for more manageable input. Grid search is used to determine the optimal values for the hyperparameters, and the results are analysed in terms of performance. The settings used for this analysis are as follows: learning rate = 0.001, batch size = 64, and the number of epochs is set at 50. We have compiled a comprehensive table detailing results across all available data sets.

90

R. J. Firdous and S. Sabena

Table 4. Epoch = 50 Accuracy is tabulated below for the type of wider dataset of set 2 by increasing the number of neurons. One Fully Connected

Two Fully Connected

Three Fully Connected

32 × 27(88.45)

(64,32) × 27 (91.23)

(128,64,32) × 27(95.6)

64 × 27 (88.93)

(128,64) × 27 (91.72)

(256,128„64) × 27(96.34)

128 × 27(88.98)

(256,64) × 27(91.26)

(512,256,128) × 27(96.78)

256 × 27(89,11)

(512,64) × 27(92.49)

(1024,512,256) × 27(96.76)

512 × 27(89.39)

(1024,64) × 27(93.72)

1024 × 27(89.99)

Table 5. Epoch = 50 Accuracy is tabulated below for the deeper dataset of set 1 by increasing the number of neurons. One Fully Connected

Two Fully Connected

Three Fully Connected

32 × 26(87.880)

(64,32) × 26 (91.780)

(128,64,32) × 26(94.890)

64 × 26 (87.720)

(128,64) × 26( 91.990)

(256,128„64) × 26(94.890)

128 × 26(87.710)

(256,64) × 26( 92.370)

(512,256,128) × 26(95.00)

256 × 26(87.980)

(512,64) × 26( 93.120)

(1024,512,256) × 26(95.730)

512 × 26(87.230)

(1024,64) × 26( 93.980)

1024 × 26(89.00)

M1 Model Data in Tables 4 and 5 demonstrate how implementing fully linked layers in Resnet 50 improves performance. The output findings show that utilizing three fully connected layers provides the maximum accuracy compared to using two or one fully connected layers. Accuracy of 96.78% and 95.73% were achieved using a network consisting of 1024 neurons in the first layer, 512 neurons in the second layer, and 256 neurons in the activation layer. The ensemble takes this network topology as one of its inputs. 5.3.2 Densenet121 M2 Model In M2 Model default input size is 224x224 but for our convenient we resize the input dimension as 180,180,3.The same parameters are used and the tabulation is shown below. From Tables 6 and 7, it is noted that two fully connected layers work best for Densenet 121 rather than one fully connected layer. The results obtained for every procedure and tabulated. The first layer with 512 and the second layer with 64 showed the results of 96.89% and we fix this as in our architecture of densenet 121. This will be another input of the average ensemble architecture.

Collaborative CNN with Multiple Tuning for Automated Coral Reef Classification

91

Table 6. Epoch = 50 Accuracy is tabulated below for the type of deeper dataset of set 1 by increasing the number of neurons One Fully Connected

Two Fully Connected

32 × 13(85.660)

(64,32) × 13(94.760)

64 × 13(85.720)

(128,64) × 13(94.720)

128 × 13(85.450)

(256,64) × 13(95.20)

256 × 13(86.920)

(512,64 × 13(96.890)

512 × 13(87.980)

(1024,64) × 13(96.120)

1024 × 13(89.980)

Table 7. Epoch = 50 Accuracy is tabulated below for the dataset of set 3 by increasing the number of neurons. One Fully Connected

Two Fully Connected

32 × 27(89.980)

(64,32) × 27 (96.880)

64 × 27 (90.340)

(128,64) × 27 (96.720)

128 × 27(91.340)

(256,64) × 27(97.460)

256 × 27(91.930)

(512,64) × 27(96.990)

512 × 27(93.140)

(1024,64) × 27(97.880)

1024 × 27(93.340)

Table 8. Comparison of the proposed with the existing results. Method

Deeper

Wider

Combined

Both

Proposed Method

95.10

96.23

97.2

96.80

DenseNet

96.89

94.87

96.89

97.88

ResNet

92.55

96.78

96.78

95.73

The effectiveness of the suggested strategy with the existing methodology is shown in Table 8. Our proposed method has a maximum accuracy of 97.2%. The existing method seems quite time-consuming. Therefore, the suggested method is applicable to a wide variety of underwater images for precise classification.

6 Conclusion We show that deep learning methods may be used to the task of classifying images of coral reefs in this research. The goal of this research is to standardize the process used to map the distribution of coral reef species and substrates. In this article, we

92

R. J. Firdous and S. Sabena

provide a comprehensive overview of the two most popular classes of existing deep learning-based classification models and compare their features. Using point-by-point ground-truth annotations, we first developed a fine-tuned CNN model and compared it to state-of-the-art CNN-based architectures for image classification. Our best performing recommended model for this task attained an accuracy of 97.2%. Prior efforts on this subject, Dense Net and Resnet, attained accuracies of 96.7 and 96.8 percent, respectively. The gathered data shows that, in contrast to the state-of-the-art systems presently in use, the ensemble-tuned suggested CNN model outperforms them. In the not-too-distant future, we will be able to create semantic maps of the whole ecology of coral reefs. When all of the images from a particular location have been examined and registered, the next step is to construct a two-dimensional semantic map of the coral reef areas using the results of the registrations utilizing the photos. After that, the mosaic ked image may then be reconstructed using superpixels. The suggested CNN architecture allows for each super pixel to be subdivided into smaller patches, which are subsequently categorized by the system. In order to classify face meshes, we use the CNN algorithms that have been presented. All of the existing models discussed in this article have been constructed on top of Dense net and Resnet, which act as the basis upon which further improvements and alterations may be made. It may be possible to expand this study in the future by altering these discoveries so that they can function with other network topologies. Last but not least, considering that stereo cameras were used in the process of image collection, there is scope for further investigation into the possibility of include disparity information as a channel in the input picture. The utilization of several perspectives to improve classification is a potential additional benefit that might be achieved via the development of deep learning systems.

References 1. Elawady, M. Sparse coral classification using deep convolutional neural network. arXiv preprint arXiv:1511.09067 (2015) 2. Ferrario, F., Beck, M.W., Storlazzi, C.D., Micheli, F., Shepard, C.C., Airoldi, L.: The effectiveness of coral reefs for coastal hazard risk reduction and adaptation. Nat. Commun. 5, 3794 (2014). https://doi.org/10.1038/ncomms4794 3. Gomez-Villa, A., Salazar, A., Vargas, F.: Towards automatic wild animal monitoring: identification of animal species in camera-trap images using very deep convolutional neural networks. Eco. Inform. 41(September), 24–32 (2017) 4. Mahmood, A., et al.: Automatic annotation of coral reefs using deep learning. In: OCEANS 2016 MTS/IEEE Monterey, pp. 1–5 (2016a) 5. Mahmood, A., et al.: Coral reef classification with hybrid feature representations. In: Image Processing (ICIP), 2016 IEEE International Conference on, pp. 519–523. IEEE (2016b). doi: https://doi.org/10.1109/ICIP.2016.7532411 6. Beijbom, O., Edmunds, P. J., Kline, D. I., Mitchell, B. G., Kriegman, D.: Automated annotation of coral reef survey images. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1170–77 (2012) 7. Shihavuddin, A.S.M., Gracias, N., Garcia, R., Gleason, A.C., Gintert, B.: Image-based coral reef classification and thematic mapping. Remote Sensing 5(4), 1809–1841 (2013) 8. Stough, J.: Texture and color distribution-based classification for live coral detection. In: Proceedings of the 12th International Coral Reef Symposium, pp. 9–13 (2012)

Collaborative CNN with Multiple Tuning for Automated Coral Reef Classification

93

9. Mary, N.A.B., Dharma, D.: Coral reef image classification employing improved LDP for feature extraction. J. Vis. Commun. Image Represent. 49, 225–242 (2017). https://doi.org/10. 1016/j.jvcir.2017.09.008 10. Mary, N.A.B., Dharma, D.: Classification of coral reef submarine images and videos using a novel Z with tilted Z local binary pattern (Z⊕TZLBP). Wireless Pers. Commun. 98(3), 2427–2459 (2018). https://doi.org/10.1007/s11277-017-4981-x 11. Mary, N.A.B., Dharma, D.: Coral reef image/video classification employing novel octa-angled pattern for triangular sub region and pulse coupled convolutional neural network (PCCNN). Multimedia Tools Appl. 77(24), 31545–31579 (2018). https://doi.org/10.1007/s11042-0186148-5 12. Zhang, K., He, X., Ren, S., Sunm, J.: Deep residual learning for image recognition. In: IEEE CVPR (2016) 13. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE CVPR, pp. 4700–4708 (2017) 14. Shabbeer Basha, S.H., Dubey, S.R., Pulabaigari, V., Mukherjee, S.: Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378, 112–119 (2020). https://doi.org/10.1016/j.neucom.2019.10.008 15. Ganaie, M.A., Hu, M., Malik, A.K., Tanveer, M., Suganthan, P.N.: Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 115, 105151 (2022). https://doi.org/10.1016/j.engappai. 2022.10515

Analysis of Optimum 3-Dimensional Array and Fast Data Movement for Efficient Memory Computation in Convolutional Neural Network Models Deepika Selvaraj1

, Arunachalam Venkatesan1(B)

, and David Novo2

1 Department of Micro and Nanoelectronics, Vellore Institute of Technology, Vellore, India

[email protected], [email protected]

2 French National Centre for Scientific Research (CNRS), University of Montpellier, LIRMM,

Montpellier, France [email protected]

Abstract. CNN-based inference engine’s performance and efficiency always depend on the computational and dataflow-control complexity. Instead of considering a 2-dimensional (2D) feature array for processing, a 3D array of features/weights would improve the dataflow movement & memory computation. The optimum 8 × 8 × 32 3D-feature array size was chosen based on the factor of on-chip memory requirement, data reuse, and PE utilization. Using the optimum 8 × 8 × 32 feature array, seven different combinations of data-flow scheduling strategies were analyzed by varying row, column, and depth-wise parameters on the workload model using a MATLAB environment. From the analysis, strategyV (depth-wise parallel & row/column-wise sequence) is found to be the best with a 4 × 8 processor array. Compared to the state-of-the-art processor strategy, strategy-V achieves the data transfer rate (off-chip to on-chip) and on-chip memory requirement of 3.3 times (higher) and 16 times (lesser) with a small overhead of processor cost. Keywords: Convolutional Neural Network · Data flow Movement · Efficient Memory · Feature Array · Processing Element · Scheduling

1 Introduction Automation in agricultural, automobile, medical and industrial fields universally use sensing devices such as sensors and digital cameras. These devices bring a sequence of data (image/video data) into the inference engines. Therefore, inference engines have to adopt a convolutional neural network (CNN) algorithm to automate and process the huge data in a limited time [1, 2]. The computer vision adopted standard deep learningCNN models such as AlexNet, VGG Net, ResNet, and GoogleNet for image classification and detection [3]. Since 2010, IE-based hardware is using prominently for the CNN models. ASIC-based CNN inference engines are challenging to implement for a specific © IFIP International Federation for Information Processing 2023 Published by Springer Nature Switzerland AG 2023 E. Mercier-Laurent et al. (Eds.): ICCCSP 2023, IFIP AICT 670, pp. 94–108, 2023. https://doi.org/10.1007/978-3-031-39811-7_8

Analysis of Optimum 3-Dimensional Array and Fast Data Movement

95

image application. This implementation has concentrated on major factors to improve the CNN accelerator efficiently such as reducing arithmetic precision, energy efficiency, silicon area, and performance [4]. A generic CNN structure consists of a stack of convolutional layers with activation, pooling layers, and fully connected layers. Input feature maps (IFMs)/Output feature maps (OFMs) from the previous layer and weights/filters (FL) are input to the convolution layer and fully connected layer. It is computed using Multiply- Accumulate unit (MAC) followed by Rectified Linear Unit (RELU) activation function present in the convolutional and fully connected layer. Weights/filters (FL) are extracted from the pre-trained CNN models. Convolutional and fully connected layers are repetitive and used n-times based on the CNN models. Finally, the vector-based fully connected layer classifies the image based on the probability of output value. The general structure for the CNN convolution layer is shown in Fig. 1.

Fig. 1. Generic structure of the convolutional layer in the CNN model.

In this Fig. 1, the Input feature map (IFM h × IFM w × IFM d ) are convoluted with each set of weights/filters (FL = FLh × FLw × FLd × K) to produce the output feature map (OFM = OFM h × OFM w × OFM d ). Each depth of the filter has equal to the IFM depth (IFM d = FLd = N ) and the number of filters has equal to the output feature map depth (K = OFM d ). For example, IFM = 55×55×64 convoluted FL = 3×3×64×512 with one stride (S) and gives OFM = 53 × 53 × 512. Therefore, the convolution layer needs many MAC computations to process IFMs and weights. It is challenging to organize the data flow to the processing elements (MAC units) array processor during the inference stage. Each convolutional layer requires hundreds of MAC computations in gigabytes. It could impact the performance, and memory storage of the CNN accelerator when the data are not stored in an orderly for the further process [5]. Such that impacts the final accuracy in the CNN model which deploys the resource-limited device and IoTbased CNN models. From the above consideration, improvement can be accomplished by fine-tuning the following factors: 1. Optimum data format to represent the IFMs and weights for maintaining the accuracy of the CNN models, 2. Data-flow movement between memory to processing elements (PEs) for improving the data reuse capability, 3. Pipelining and parallel scheduling for near 100% utilization of PEs, 4. Designing

96

D. Selvaraj et al.

the sparse accelerator by skipping the trivial MAC computation. To achieve the abovementioned concern, several hardware accelerators are proposed in the previous works as follows The representation of short floating-point (4-bit exponent and 4-bit mantissa) was proposed in [11] which reduces the computation complexity of the processor and memory storage. But the short floating-point [11] achieves the accuracy of the AlexNet model by 79% and the VGG-16 model by 88%. A 16-bit Fix/Float [12] represents the IFMs as 16-bit fixed points and weights as half-precision floating points. The representation achieves an accuracy near 97%. The hardware performance and energy efficiency have improved by 750MOP/s and 24TOPS/W at 250MHz. Eyeriss processor [6] uses the efficient dataflow called row-stationary which involves the data reuse concept and the high parallelism level. Eyeriss accelerator accepts 168 PEs which adopts the flexibility of the kernel size from 3 × 3 to 11 × 11. Eyeriss achieves 0.0029 DRAM access/MAC access and 35 frames/s for the convolutional layer in AlexNet at 250MHz with a power consumption of 278mW. KOP3 processor [13] uses the n-tile parallel structure to improve the speed for the convolutional and fully connected layers. Also, it adopts the kernel size of 3 × 3 and a circular buffer strategy to reduce the power consumption and memory access in the CNN accelerator. The average speedup of the hardware implementation achieves by 3.77TOPS/W. However, the above methods have been discussed to reduce the off-chip memory access or on-chip memory access, data flow, and data reuse approach [7–10]. The contributions are framed based on the challenges of reducing the data flow computation complexity between the on-chip memory and PEs, and achieving maximum data reuse in IFMs and FL The key contribution of this article is as follows • Analyze and choose the optimum 3-Dimensional (3D) feature array (F a ) of 8×8×32 with consideration of the four key factors including data reuse, the number of F a required per CNN layer, hardware utilization, and memory requirements. 3D feature array has to improve the efficiency of the computation between on-chip memory to a processor for CNN inference. • The combinations of data-flow scheduling strategies (I to VII) have been examined using the optimum 3D feature array (8 × 8 × 32) with a (4 × 8) PE processor. Strategy-V (depth-wise parallelly and row/column-wise sequence) has been adopted to maximize the data reuse in both IFMs and FLs. • Also, scheduling strategy-V has been analyzed for the worst-case convolutional layer present in the standard CNN model (Input workload- IFM : 55 × 55 × 512 and FL : 3 × 3 × 512 × 512). It balanced both data transfer rate and processor cost. • The proposed strategy-V has been validated with standard and existing data-flow model [6, 14, 15] which is adopted from the previous implementation. All the analyses were done using the MATLAB-based design modeling algorithm. Section 2 discussed the analysis of the optimum 3-Dimensional feature array (Fa ) of size 8 × 8 × 32 and different combination data-flow movement strategies I to VII are examined in a MATLAB-based environment and describes the mathematical model. MATLAB-based simulation results are illustrated and discussed for the CNN-based inference engine in terms of processing element, data reuse, and memory requirement in Sect. 3. Finally, Sect. 4 concludes the findings

Analysis of Optimum 3-Dimensional Array and Fast Data Movement

97

2 Analysis of 3-dimensional Feature Array Size and Data-Flow Movement Scheduling Strategy To improve the efficiency of the processor array and memory requirement, the optimum 3D feature array for computation is chosen from the various combination of feature array sizes. From the optimum 3D feature array size, different data flow movement, and scheduling strategies I to strategies VII have been modeled (parallel data flow/sequence data flow/ combination of parallel and sequential data flow). The appropriate 3D feature array (Fa ) and suitable scheduling strategy are chosen based on the prime factors as follows (1) (2) (3) (4)

Maximize the data reuse capability of both IFMs and weights, Maximize the processing element utilization, Reduce the intermediate memory storage, and Minimize the number of data transaction

Fig. 2. MATLAB analysis for selection of 3- dimensional feature array (Fa ) and data-flow movement strategies for the convolutional layer

For Fa and scheduling strategy analysis, the standard CNN models such as VGG-16 &19, SqueezeNet, and ResNet- 18 &50 are adopted. We propose a MATLAB-based model for choosing the suitable 3D feature array (Fa ) and data flow scheduling strategy between on-chip memory to PE processor array. 3-channel images of 226 × 226 pixels

98

D. Selvaraj et al.

for VGG-16 &19, SqueezeNet, and ResNet- 18 &50 is chosen as inputs. For the analysis, the worst-case convolutional layer concerning a maximum number of sets of filter and filter depth, and IFM depth is taken from the above-mentioned standard CNN models. The corresponding set of filters/weights is extracted from the pre-trained CNN models. The worst-case input load of size 55 × 55 × 512 for IFM and 3 × 3 × 512 × 512 for FL is considered for the selection of the 3D feature array. The optimum 3D feature array of size Fa :8 × 8 × 32 has been considered based on the above four major factors. The major four factors have mainly related to the on-chip memory inside the processor. The optimum feature array size is required to transfer the group of IFM and weights from off-chip to on-chip memory/on-chip to PE array. Considering the 3D Fa :8 × 8 × 32, the combination of different data scheduling strategies I to VII are analyzed on the worst-case input load. From the strategies analysis, strategy-V (depth-wise = 3 times parallelly and row/column-wise = 4 times sequence) gives a promising outcome compared to the other strategy combination (I to IV, and VI to VII). The proposed MATLAB-based analysis for the selection of a 3D feature array and data-flow scheduling strategy for an efficient CNN-based inference engine is depicted in Fig. 2. The optimum 3D feature array (Fa ) selection and data flow scheduling strategy of V are explained in the subsection. 2.1 Choosing the Optimum 3D Feature Array (Fa) Based on the factors of an efficient CNN accelerator, we present the selection of feature arrays for the on-chip memory techniques that optimize the off-chip memory bandwidth as a result of the computations. There are two standard potential feature array computation strategies (1) IFM major feature array computation and (2) weight/filter major feature array computation that utilizes data reuse of either the weights or the IFMs for the convolutional layers are provided in this section In IFM major computation approach [16], each set of weights FLw × FLh × FLd performs the element-wise multiplication with each of the K weights/filters, producing the OFM (1 × 1) of length K. This computation operation is replicated for each OFM, where the IFM frames are shifted by a stride (S = 1). In the standard approach, the weight values are reused for every iteration of the output (OFMh , OFMw ) as shown in Fig. 3(a). In the filter-major computation approach in [16], one set of filter/weight is convolved across the entire IFM frames (IFMw × IFMh × IFMd ), producing a single OFM frame of size OFMw × OFMh . This computation operation is repeated for each of the k sets of filters. In this approach, the IFM values are reused multiple times for each set of filter/weights as shown in Fig. 3(b). In Fig. 3(b), IFMs and weights are processed through single frame-wise computation. So, it requires larger memory issues to store the IFMs and weights as well as an intermediate buffer to store the partial OFMs. Considering these approaches, we proposed the optimum 3D structure computation gives a more advantageous data reusing approach on both IFM and weights as shown in Fig. 3(c). The 3D structure depends on the depth ‘p’ of the IFMs and weights. The filters/weights are sized in this method so that the working set of filters (FLa = FLh × FLw × p) fits on the on-chip memory (288 KB). In the same way, we need to size the IFM to be able to hold only the 3D feature array (Fa ) size of IFM, namely Fa = Fah × Faw × p. For example, Fah × Faw × p = 8 × 8 × 32 with a 16-bit fixed-point required 4 KB. This ensures that each input feature map is only read from off-chip once. We can process all

Analysis of Optimum 3-Dimensional Array and Fast Data Movement

99

frames in the Fa with all on-chip resident filters while loading the next set of Fa and its corresponding filters/weights in parallel, resulting in the next set of IFM and filters based on Fa size on the on-chip by the end. This method has improved the reuse of both the IFM values k times and weight values OF aw times to reduce off-chip bandwidth usage.

Fig. 3. Different combinations of IFM & FL reuse computation and its computation order.

For both IFM and weight reuse approach, the different size of the 3D feature array (Fa ) is formed based on the convolutional layer input load. The different size of Fa = Fah × Faw × p varies from (6 × 6 × 8) to (20 × 20 × 32). Fah and Faw are varying by the addition of 2. Likewise, p is varying by multiplies of 2. p denotes the depth of the 3D feature (Fa ) array. Fah and Faw denote the height and weight of the 3D feature array (Fa ). Worst-case input load (IFM : 55 × 55 × 512 and FL : 3 × 3 × 512 × 512) has been processed with each combination of Fa . Each 3D Fa combination has undergone the

100

D. Selvaraj et al.

trade-off analysis in terms of data reuse (both IFMs and weights), memory requirement, and the number of times Fa requirement per input load using a MATLAB-based analysis as plotted in Fig. 4. MATLAB-based analysis helps to understand the impacts on each 3D feature array (Fa ). From the understanding, 8 × 8 × 32 has balanced the required key factors optimally. 10 × 10 × 8 has next close to a better feature array. But data reuse and the number of 3D-Fa are high compared to chosen feature array 8 × 8 × 32. Equation (1) to (3) defines the number of times Fa processes depth-wise (x), row-wise (y), and column-wise (z) on the chosen convolutional layer in the CNN model. N refers to the depth of the IFM (IFM d ), p represents the depth of the 3D feature array, and Pd represents padding (0 or 1). For example, Input load IFM = 16 × 16 × 64 & FL = 3 × 3 × 9 × 64, then Fa requires 2 times for depth wise (x) computation, 3 times for row-wise (y) and column-wise (z) computation. The Fa required 3 × 3 × 2 times for the 16 × 16 × 64 and each Fa output size (OFa = 6 × 6 × 9). In this approach, an optimum 3D feature array has improved the data reuse on both IFM and weights as well as minimizes the off-chip bandwidth access and maximizes the hardware utilization of the PE array N  where, p = 32 p

(1)

IFM h + Pd , row − wisePd = [0, 0] FaH − 2

(2)

IFM w + Pd , column − wisePd = [0, 0] FaW − 2

(3)

x= y= z=

Furthermore, 8 × 8 × 32 feature array size is used for the different combinations of data-flow scheduling strategies to reduce the intermediate memory utilization and efficient PE computation as explained in the next section.

Fig. 4. Optimum feature array, Fa plot for the worst-case input load from the CNN model.

Analysis of Optimum 3-Dimensional Array and Fast Data Movement

101

2.2 Data-Flow Scheduling Using 8 × 8 × 32 3D Feature Array Once chosen the optimum 3D feature array (Fa ), the hardware CNN-based inference engine can also be improved by data movement and scheduling strategies between onchip memory to processor element array/on-chip to off-chip memory efficiently. Therefore, the different combinations of data movement and scheduling have been examined based on the parallel/sequence order to process the PEs/MACs computation for the chosen convolutional layer. Strategies (I to VII) strategies have been validated using the x, y, and z values from the optimum 3D feature array formation. Each strategy outcome has been understood by the improvements of the major key factors as follows: (1) processor hardware cost, (2) data transfer rate, and (3) Number of times data transactions between on-chip memory to the processor array. MATLAB-based mathematical analysis for processor array cost and data transfer rate were done based on the above-mentioned factors as described in Eqs. (4) are related to the on-chip storage, and processor array cost in terms of unit cost using the 3D feature array. According to the 3D- Fa approach, the estimation of the data transfer (number of times) for the IFM and filter is based on (x.y.z) and x. Mon−chiptoPE refers to the size of the memory access required in the on-chip. Dt perlayer refers to the number of times data is transferred from off-chip to on-chip memory/buffer. OF a = (Fa − 2) × (Fa − 2) refers to the output feature map and intermediate registers. PE area refers to the number of PEs present in the processor, FLa = 3 × 3 × p refers to the size of the weight/kernel required for processing the F a   (4) Mon−chiptoPE = (Fa + OF a + FLa )memory access + PE area

Fig. 5. Strategy-V (depth-wise ‘x’ parallel and row/column-wise ‘q’ sequence): Data-flow movement and scheduling method with k set of filters using Fa .

Considering the different strategies, strategy V archives a better data transfer rate and on-chip memory compared with strategies (I to IV & VI to VII) as shown in Table 1. The analysis of different combinations has been done in a MATLAB environment with consideration of different CNN parameters. The Mon−chiptoPE and Dt perlayer are calculated in terms of unit cost and unit data transfer rate in unit times. So, the evaluation is common for all the data formats and word lengths. Strategies are adopted to different

102

D. Selvaraj et al.

Fig. 6. Visual depiction of the frame-wise computation based on the 3D- Fa with strategy V approach.

Fig. 7. Overall CNN inference engine for strategy V with 4 × 8 processor array.

feature array sizes in the standard CNN model such as AlexNet, VGG-16 & 19, ResNet, SqueezeNet, and GoogleNet The scheduling strategy of V is illustrated in Fig. 5. In Fig. 5, the 3D- Fa is convoluted depth-wise ‘x’ with a k set of filters based on the x, and then Fa follows the row-wise and column-wise based on the y and z values. For example, input load (IFM : 55 × 55 × 512 and FL : 3 × 3 × 512) has been processed the depth-wise-parallel at x = 16 and rowwise/column-wise sequence at y = z = 10. It means that the data transfer rate for IFM and weight is 1600 times and 16 times. These 3D feature arrays have been processed with 4×8 processing elements. The processor contains 32 PEs. Each PEs is consisting of a 3 × 3 MAC unit and adopts a 16-bit fixed point. Under this circumstance, an optimum 8 × 8 × 32- 3D feature array with the set of filters calculates the OFM using the PE array. It performs the MAC computation with consideration of the depth (p) of Fa , as listed in Algorithm I

Analysis of Optimum 3-Dimensional Array and Fast Data Movement

103

Algorithm I for subset-1 using optimum 8×8×32 3D-Fa structure (a) Calculate 4×8 processor for X in 1 to x do for (L, K) in (0,0) to (Faw -2, FaH -2) do p=32, FLw = FLh = 3 PEp-31=Frame1(LX+1:FLh+LX, KX+1:FLw+KX)*FLX PEp-30=Frame2(LX+1:FLh+LX, KX+1:FLw+KX)*FLX PEp-29=Frame3(LX+1:FLh+LX, KX+1:FLw+KX)*FLX

PEp=Frame32(LX +1:FLh+LX, KX+1:FLw+KX)*FLX In=add/Sub (PEp-31, PEp-30, PEp-29,…, PEp) end end return In (b) Calculate OFM for (Y, Z) in (1, 1) to (y, z) do for X in 1 to x do calculate 4×8 processor In OFM+=In end end return OFM

With each PE processing a 2-D frame, multiple PEs can be aggregated to complete the 3D input with corresponding shown in Fig. 6. The 3D feature array of size 8 × 8 × 32 has p frames. Each frame is processed with the corresponding set of weights using the stride ‘1’ and produces the partial output feature maps. Each partial sum from each is further accumulated across the PEs vertically. The partial OFa of all the frames is added together to extract the OFM. Likewise, the next set of 3D- Fa is processing the same. For example, PE1,1 : Frame 1 from the 3D- Fa of size 8 × 8 is convoluted with the weight of size 3 × 3 (FL1,1 to FLk,1 ), PE2,1 : Frame 1 from the 3D- Fa of size 8 × 8 is convoluted with the weight of size 3 × 3 (FL1,1 to FLk,1 ) up to frame 32. All the frames in the 3D-Fa are processed using the 4 × 8 processing element parallelly and attain the reuse on both weights and IFMs. Initially, pre-trained IFMs and weights are stored in the off-chip memory (DRAM). Based on the 4 × 8 PEs size, on-chip memory (SRAM/ buffer size) for IFMs and weights are chosen as shown in Fig. 7. Finally, OFM has to be stored in the off-chip which is IFM to the next CNN layer

3 Simulation Results and Discussion In this section, the optimum 3D feature array of size 8 × 8 × 32 and the approach of data-flow scheduling in depth-wise (x) parallel and row/column-wise (y) sequence under the strategy-V has been evaluated for the efficient CNN hardware implementation.

104

D. Selvaraj et al.

The proposed 3D- Fa and data-flow scheduling strategy-V has been adopted and tested with the critical layer (IFM : 55 × 55 × 512, Filter/weight : 3 × 3 × 512) present in the state-of-the-art CNN models such as VGG-16 &VGG-19, ResNet-18 &50, and SqueezeNet. The critical layer results are evidence that scheduling strategy-V is suitable for the other layers present in the CNN models. Also, the results demonstrated that our data-flow scheduling strategy-V with optimum 3D feature array can efficiently reduce the off-chip to on-chip data movement and processor cost with higher utilization of the PEs 3.1 Evaluation of 3-D Feature Array (Fa ) for the CNN Model Array (8 × 8 × 32), the major key factors are followed as explained in Sect. 2. The software-based model for different pre-trained CNN models has been implemented using the MATLAB environment as depicted in Fig. 2. The software environment has the flexibility to adapt the different data formats and word lengths to maintain the accuracy of the model before implementation in the hardware module. In this subsection, the optimum feature array size of 8 × 8 × 32 is selected based on the memory requirement per Fa and the number of times Fa is required per layer and data reuse. By varying the 11 different combinations of 3-dimensional feature arrays, the optimum size has been chosen from the trade-off plot as shown in Fig. 4. Considering all the factors, 8 × 8 × 32 reuses the k set of filters/weights by k × 6 times and reuses the IFM k times per Fa , and storage required for IFMs and weights of size 0.25KB and 18KB, respectively. Likewise, 6 × 6 × 32 reuses the k set of filters/weights by k × 4 times with a small area overhead. From the plot Fig. 5., 8 × 8 × 32 achieves the better trade-off of the abovementioned three factors compared to the other 3D feature array. In the existing method [6, 14, 16], IFM major and filter/weight major reuse computations are followed in the CNN models as shown in Fig. 2. IFMs major computation requires a larger memory size of 3 × 3 × k × N for the set of the filters but our method requires the memory size of 3 × 3 × p × k. compared to the [16], on-chip memory requirement has 16 times lesser and 2 times higher data transfer rate in our optimum method. Similarly, filter major computation has adopted all the IFM values in the on-chip. It is an expensive approach compared to our optimum 3D feature array Likewise, [14] has adopted the depth-wise data-flow strategy. Therefore, this method has implemented that the IFM size is the same as the filter/weight size, so the weight reuse method has not been adopted. But the depth-wise strategy optimizes the memory with maximum utilization of PEs. [6, 15] uses the row-stationary (RS) and Weight stationery (WS) data-flow strategy to reduce the expensive data movement by reusing the data maximally and this strategy are suitable for efficient DRAM access. Considering all the methods, our optimum 3D feature array is suitable for the trade-off of the abovementioned prime factors which are the major parameters in the efficient CNN hardware models 3.2 Analysis and Evaluation of Strategies for the CNN Layer To improve the data-flow scheduling between the memories, the different combinations of strategies are framed based on the data transfer rate (off-chip to on-chip memory)

Analysis of Optimum 3-Dimensional Array and Fast Data Movement

105

and processor cost (on-chip memory and processor area). From the strategy (I to VII) analysis, strategy-V is chosen by considering the trade-off of the key factors as discussed in Sect. 2.2. The specification for each supported data-flow scheduling strategy is listed in Table 1 Generally, the data-flow scheduling depends on the stage-wise row/column and depth in a parallel/sequence manner from the off-chip memory to the on-chip memory for the CNN models. The IFM and weights of each CNN layer bring with deluge amount of data. Therefore, the different combinations of stage-wise data scheduling are arranged with parallel or sequence mode using a 3D feature array (Fa ) as illustrated in Table 1. Processor cost (Mon−chiptoPE ) and data tranfer rate (Dt perlayer ) are evaluated from each scheduling combination. Strategy V gives a better trade-off between two major parameters as shown in Fig. 8. The analysis is done in software-based implementation. Hence, the unit for the Mon−chiptoPE and Dt perlayer are in terms of unit cost and unit times

Fig. 8. Trade-off plot for cost and data transfer rate in the different scheduling strategies (I to VII) using the (8 × 8 × 32) 3D- Fa .

Generally, the data-flow scheduling depends on the stage-wise row/column and depth in a parallel/sequence manner from the off-chip memory to the on-chip memory for the CNN models. The IFM and weights of each CNN layer bring with deluge amount of data. Therefore, the different combinations of stage-wise data scheduling are arranged with parallel or sequence mode using a 3D feature array (Fa ) as illustrated in Table 1. Processor cost (Mon−chiptoPE ) and data tranfer rate (Dt perlayer ) are evaluated from each scheduling combination. Strategy V gives a better trade-off between two major parameters as shown in Fig. 8. The analysis is done in software-based implementation. Hence, the unit for the Mon−chiptoPE and Dt perlayer are in terms of unit cost and unit times In Table 1, Strategy-I [14] has adopted first- row-wise, depth-wise, and then columnwise in a sequence manner. This strategy I and II give better processor memory but the data transfer rate is higher (3.3 times) due to the off-chip to on-chip data-flow scheduling than strategy-V. Similarly, strategy III is accepted both parallelly and sequence transfer and it achieves a data transfer rate the same as strategy V. But, moderately higher in the processor cost. Strategy-VI [6, 15] uses the weight stationary (kernel reuse) and row stationery (input reuse) which utilizes more processor cost (1.05 times) than strategy V.

106

D. Selvaraj et al.

But the data transfer rate (1.44 times higher) has the same moderate as in the strategy V. Compared to all the combination of the data-flow scheduling, strategy-V with optimum 3D feature array achieves the better trade-off between the data transfer and processor cost Table 1. Different combinations of scheduling strategies with a 3D feature array of 8 × 8 × 32. Strategy Different scheduling methods using 3D- Feature array of size 8 × 8 × 32

M on−chiptoPE Dt perlayer in in terms of unit terms of unit cost times

I [14]

1st : Row-wise, y (sequence)

2nd : Depth-wise, x (sequence)

3st : Column-wise, z (sequence)

Low

High

II

1st : Column-wise, z (sequence)

2nd : Depth-wise, x (sequence)

3st : Row-wise, Low y (sequence)

High

III

1st : Depth-wise, x (sequence)

2st : 3st : Row-wise, Moderate Column-wise, y z (sequence) (parallel)

Moderate

IV

1st Row-wise, y (parallel)

2nd : 3nd : Column-wise, Depth-wise, x z (sequence) (parallel)

High

Low

V

1st : Depth-wise, x (parallel)

2st : Row-wise, y (sequence)

3st : Columnwise, z (sequence)

Low

Moderate

VI [6, 15]

1st : Row & 2nd : column-wise, Depth-wise, x y × x (sequence) (parallel)

-

High

Moderate

VII

1st : Row-wise, y (parallel)

3st : Column-wise, z (parallel)

High

Low

2nd : Depth-wise, x (parallel)

Note: 1. A feature block of size 55 × 55 × 512 with a kernel size of 3 × 3 × 512 is considered for analysis. 2. Everything on-chip produces 100% high memory with a 0% low transfer rate from off-chip to on-chip. 3. Considering this as a baseline, the memory and data transfer rate for the strategies (I to VII) are categorized into: High is 70 to 100; Moderate is 30 to 70, and Low is 0 to 30.

Analysis of Optimum 3-Dimensional Array and Fast Data Movement

107

4 Conclusion In this paper, optimization of the data-flow approaches for the CNN models has been discussed. Data-flow complexity depends on the size of on-chip memory and its controller in the processing elements (PEs). Data reuse, and data-flow strategies can influence the size of on-chip memory. First, to improve the data reuse on both IFMs and weights a 3D feature array (Fa ) of size 8 × 8 × 32 is proposed. Secondly, the different (I to VII) data-flow strategies using the proposed 3D feature array are analyzed in the understanding of the requirements of processor cost and data transfer rate (off-chip to on-chip). From the analysis of data scheduling strategies, strategy V uses the optimum number of PEs and on-chip memory size. It works on the input features and weights of the CNN workload layer in a depth-wise parallelly and row/column-wise sequence manner. The proposed strategy-V utilizes the 3-dimensional feature array size with considering the examination of prime factors including the number of Fa required per CNN workload layer, data reuse, and PEs utilization. Also, the strategy-V data-flow model of 8 × 8 × 32 size provides a better trade-off between the processor cost in terms of unit cost and data transfer rate in terms of unit times. The MATLAB- based software analysis shows that the strategy-V can achieve the data transfer rate by 3.3 times (faster) compared to the Eyeriss processor data-flow strategy with a small area overhead and the requirement of on-chip memory has 16 times (lesser) than the Mem-Opt processor. In this paper, we present heuristic design principles that aim to optimize for particular dataflow scenarios. So, the scope of the future is to implement the optimum 3D feature array (Fa ) and data scheduling based on strategy V in hardware inference for pre-trained CNN models Acknowledgment. The work was supported in part by the Council of Scientific and Industrial Research (CSIR), New Delhi, under CSIR-SRF Grant 09/844(0104)/2020- EMR.I.

References 1. Hatcher, W.G., Yu, W.: A survey of deep learning: platforms, applications and emerging research trends. IEEE Access 6, 24411–24432 (2018) 2. Moolchandani, D., Kumar, A., Sarangi, S.R.: Accelerating CNN Inference on ASICs: a survey. J. Syst. Architect., preprint, Sept. (2020) 3. Chen, Y., Xie, Y., Song, L., Chen, F., Tang, T.: A survey of accelerator architectures for deep neural networks. Engineering 6(3), 264–274 (2020) 4. Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017) 5. Peemen, M., Setio, A.A., Mesman, B., Corporaal, H.: Memory-centric accelerator design for convolutional neural networks. In: 2013 IEEE 31st International Conference on Computer Design (ICCD), pp. 13–19. IEEE (2013) 6. Chen, Y.H., et al.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 127–138 (2017) 7. Moons, B., Uytterhoeven, R., Dehaene, W., Verhelst, M.: 14.5 envision: a 0.26-to-10tops/w subword-parallel dynamic-voltageaccuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 246–247. IEEE (2017)

108

D. Selvaraj et al.

8. Han, S., et al.: EIE: efficient inference engine on compressed deep neural network. In: Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), Seoul, South Korea, Jun, pp. 243–254 (2016) 9. Parashar, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks. In: Proceedings of the ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Jun. 2017, pp. 27–40 10. Yuan, Z., et al.: STICKER: an energy-efficient multi-sparsity compatible accelerator for convolutional neural networks in 65-nm CMOS. IEEE J. Solid-State Circuits 55(2), 465–477 (2020) 11. Kang, H.J.: Short floating-point representation for convolutional neural network inference. IEICE Electronics Express, pp. 15-20180909 (2018) 12. Deepika, S., Arunachalam, V.: Analysis & design of convolution operator for high speed and high accuracy convolutional neural network-based inference engines. IEEE Trans. Comput. 71(2), 390–396 (2021) 13. Yue, J., et al.: A 3.77 TOPS/W convolutional neural network processor with priority-driven kernel optimization. IEEE Trans. Circuits Syst. II, Exp. Briefs 66, 277–281 (2018) 14. Dinelli, G., Meoni, G., Rapuano, E., Pacini, T., Fanucci, L.: MEM-OPT: a scheduling and data re-use system to optimize on-chip memory usage for CNNs On-Board FPGAs. IEEE J. Emerg. Selected Top. Circ. Syst. 10(3), 335–347 (2020) 15. Chong, Y.S., Goh, W.L., Ong, Y.S., Nambiar, V.P., Do, A.T.: An energy-efficient convolution unit for depthwise separable convolutional neural networks. IEEE Int. Symp. Circ. Syst. (ISCAS) 2021, 1–5 (2021) 16. Siu, K., Stuart, D.M., Mahmoud, M., Moshovos, A.: Memory requirements for convolutional neural network hardware accelerators. In: 2018 IEEE International Symposium on Workload Characterization (IISWC), pp. 111–121. IEEE (2018)

Generation Z’s Satisfaction with Artificial Intelligence Voice Enabled Digital Assistants Thiruvenkadam Thiagarajan(B)

and Sudarsan Jayasingh

SSN School of Management, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam 603110, India [email protected]

Abstract. Artificial Intelligence voice enabled digital assistants, such as Alexa, Siri and Google assistants are increasingly used by Generation Z’s Day to day life. This study attempts to study Gen Z satisfaction level of these new age voice enabled digital assistants and its determinants. 235 valid responses were collected using online survey. The results show usefulness is the most important determinants for using digital assistants, which is followed by user’s competency. The research shows that privacy perception is negatively related to satisfaction. Keywords: Artificial Intelligence · Gen Z · Voice Enabled Digital Assistant · Satisfaction

1 Introduction Our living spaces today occupied by artificial intelligence (AI) backed voice enabled digital assistants like Alexa, Siri and Google Assistants. Digital assistant or virtual assistant is a software that uses advanced artificial intelligence (AI), natural language processing, and machine learning to interact with consumers [1]. Voice-activated digital assistants (e.g., Apple Siri, Amazon Alexa, and Google Assistant) showing increasing usage as these technologies makes daily routines much easier [2]. It is estimated that 4.2 billion devices are fitted with some form of digital voice assistants around the world [3]. Digital Assistants enable users to use voice commands for controlling home appliances, recommend news, enjoy preferred music, setting reminders and alarms [2]. AI based digital assistants shows good number of benefits, but privacy and security is one of the major concerns [4]. The technology adoption largely varies from generation to generation. Baby Boomers are the one first experienced with personal computer, the Generation X is the first generation to experience internet. The Millennials are the one born with mobile phones. But when it comes to Generation Z, the children born from 1997 to 2012, they were born with ‘Smartphones’ in their hand. The Metafacts survey (2018) says that around 50 percent of internet users are below the age of 24 and they spend more than 70 h per week on their smart devices [5]. Given the relative less usage of voice enabled digital assistant, there is very few empirical work conducted to measure the customer satisfaction towards the digital assistants [6]. This study is an attempt to © IFIP International Federation for Information Processing 2023 Published by Springer Nature Switzerland AG 2023 E. Mercier-Laurent et al. (Eds.): ICCCSP 2023, IFIP AICT 670, pp. 109–115, 2023. https://doi.org/10.1007/978-3-031-39811-7_9

110

T. Thiagarajan and S. Jayasingh

find out the satisfaction of generation Z on the AI enabled devises such as Siri, Alexa, and other digital assistants used widely by the Gen Z in everyday life.

2 Literature Review There is remarkable literature available on the topic Artificial Intelligence, Smart Devices, IOT etc. But most of the studies are technical in nature or application oriented. Only limited studies are available in terms of measuring customer satisfaction towards these devises. A research by Brill et al., [6] studied the satisfaction of customers using Siri, Alexa, and other digital assistants. The study focused on their expectations and whether the expectations are met in terms of usefulness, competence, benevolence, integrity, and perceived privacy protection. Another study by Vitezi´c and Peric [7] studied the willingness of generation Z in terms of accepting AI Devices. The study found that the frequency of smartphone usage has an important role between the perceived effort of AI Usage and emotions of Generation Z. The perceived usefulness of technology devices will affect the attitude and satisfaction of users [8]. Davis and Davis [9] had developed and tested the scales to test the perceived usefulness and user acceptance of information technology. The result of the study conveys that there is a very high correlation between these two variables. Previous studies shows that one of the major concerns of some users is privacy concern of using the digital assistant. Ebbers et al. [10] study examines user preference towards design of privacy features in digital assistants. Their research shows that privacy features in digital assistants are strongly preferred by the respondents. Competence is the faith of the users on the capability of the product or service providers to meet their expectations [11]. Integrity is the faith on one person by another person that the person will be honest and keep his promise [12]. Koon et. Al. [2] applied Unified Theory of Acceptance and Use of Technology (UTAUT2) model to study the barriers and facilitators to use a digital assistant among users of age older than 55 years. Their research shows difficulties with the voice activation features hindered the continued use. Davis and Davis [9] had found that perceived ease of use has a high correlation with the current usage and future usage of technology. The dimensions of competence, benevolence and integrity are usually applied to trust humans but there are researchers who have adopted these dimensions to nun-humans also [11]. The following hypothesis was developed based on the literature review (see Fig. 1.): Hypothesis 1 – Usefulness positively impact satisfaction towards voice enabled digital assistants. Hypothesis 2 – Competence positively impact satisfaction towards voice enabled digital assistants. Hypothesis 3 – Benevolence positively impact satisfaction towards voice enabled digital assistants. Hypothesis 4 – Integrity positively impact satisfaction towards voice enabled digital assistants. Hypothesis 5 – Privacy Protection positively impact satisfaction towards voice enabled digital assistants.

Generation Z’s Satisfaction with Artificial Intelligence Voice Enabled Digital Assistants

111

Fig. 1. Satisfaction towards Voice Enabled Digital Assistants

3 Research Methodology The objective of this research is to identify the determinants of satisfaction of AI voice enabled digital assistant. A total of 253 survey responses were obtained using online survey forms. A total of 235 valid responses were used for further analysis after eliminating the incomplete responses. The online survey was conducted during the period of October 2022. The characteristics of the respondents is presented in Table 1. Instrument used for this study is based on previous researcher’s survey instrument [6]. Data analysis was conducted using IBM SPSS 28.0. The details about the questionnaire is attached in Appendix 1. Table 1. Characteristics of respondents Variable

Characteristics

Frequency

Percentage

Gender

Male

110

46.8

Female

125

53.2

Yes

130

55.3

Maybe

22

9.4

No

83

35.3

Using Voice Enabled Digital Assistant

4 Data Analysis and Discussion This study made a systematic attempt to understand the determinants of generation z satisfaction to use voice enabled digital assistants. The survey data was collected using an online questionnaire. Out of 235 responses 46.8% were male. 64.7 respondents have

112

T. Thiagarajan and S. Jayasingh

mentioned that they have used some form of digital assistant for e-commerce transactions (see Table 1). The summary statistics of the predicted variable is presented in Fig. 2.

Fig. 2. Summary statistics of the predicted variable “Satisfaction towards Voice Enabled Digital Assistants”

A multiple linear regression model was validated using five independent variables is built using SPSS software. The standardized coefficient of five independent variables usefulness, benevolence, privacy, competency, and integrity towards satisfaction of using voice enabled digital assistance is presented in Table 2. The results show usefulness is the most important determinants for using digital assistants, which is followed by user’s competency (see Table 2). The research shows that privacy is negatively related to satisfaction. Y = 2.168 + 0.336X1 + 0.028X2 − 0.131X3 + 0.203X4 − 0.012X5 The R2 is the squared correlation between the observed and predicted value which is satisfaction towards voice enabled digital assistants. The results indicate R2 of 0.248 which can be inferred as moderate fit model. Further the data analysis shows that Usefulness, Privacy and Competency are the three variables with Sig. Lower than 0.05. Figure 3. Shows the final accepted model of the study based on the regression model output.

Generation Z’s Satisfaction with Artificial Intelligence Voice Enabled Digital Assistants

113

Table 2. Coefficient of conceptual model Characteristics

Unstandardized Coefficients

Standardized Coefficients Beta

t

Sig.

(Constant)

2.168

.294

7.371